The 4-Bit Nibbler CPU and Bitscope for Layer 1 Logic Analysis Fun

Nibbler-bitscope-smAfter a closer look at how to program the 4-Bit Nibbler CPU and how to cope with its intentional limitations it was time to go one step further and have a closer look at how the hardware works one layer further down. To see the signals and how they propagate through the system a logic analyzer is needed with as many inputs as possible to trace digital signals at different places. Unfortunately, most logic analyzers I found are not cheap, costing several hundred euros even for entry level models. Another alternative would have been to buy a cheap clone hardware from China and then use it with software from the original vendor. As I don't think that's fair I never considered that approach either. But thanks to the January 2016 edition of the Linux Voice magazine I stumbled across the Bitscope Micro, a 2 channel low cost oscilloscope and 8 channel logic analyzer that costs around 120 Euros, tax and shipping included. That's quite in the range of what I was willing to spend. In addition they offer their software for Windows, the Mac and also for Linux. In other words, a perfect match for my needs.

Bitscope screenshotThe logic analyzer can sample digital signals at a rate of up to 40 MSamples/s, enough to have a decent resolution for my Nibbler board running at 2.4 MHz. Any channel can be used as a trigger with rising and falling flanks or a high and low signal level so it's possible to capture signals at a specific moment. The picture and screenshot on the left shows the Bitscope Micro connected to the Nibbler and a commented screenshot that shows how instructions are read from the ROM and put into the instruction register two clock cycles later. For the screenshot I used the blinking LED program that just uses 5 instructions to switch the LED on and off again and then jumps back to the beginning of the program. In total that is exactly 10 clock cycles in which the instructions repeat over and over. This way it's easy to find the beginning and the end of the loop when looking at the signal levels. I spent many hours analyzing traces of signals from many different parts to confirm my theoretical knowledge of how the control unit, clock and phase make the system "come alive". A wonderful exercise during which I once again learnt a lot about what makes a computer really tick.

VoLTE Roaming – S8HR in 3GPP S8HR TR 23.749

In Ocotober 2015, NTT Docomo was the first network operator who has started VoLTE Roaming based on a pre-standard implementation of the S8 Home Routing (S8HR) concept. I wrote about it at the time and speculated how they might have implemented the service as there was little official technical documentation available. In the meantime, 3GPP has continued their investigation of what is necessary to fully standardize the S8-HR concept and the details can be found in Technical Report (which is not a specification, it's just a report) TR 23.749 for 3GPP Release 14 (!). In December 2015, version 1.0.0. of the document was published that contains a number of interesting insights into what is already available and what is still missing in the specifications depending on which features an operator wants to use for VoLTE S8HR Roaming.

Dedicated Bearers for voice packets possible: At the time I wrote that if a network operator wants to play things purely on his own, VoLTE Roaming can be deployed without any interaction with the visited network operator except for the need, of course, to have a general LTE roaming requirement in place. If some more cooperation between home and visited network is put in place it is even possible to assign a dedicated bearer for voice packets by the IMS system in the home network to prefer voice packets in the network and especially on the air interface.

As shown in TR 23.749 in figure 4-2.1 this works without any additional specification work as the PCRF (Packet Control Resource Function) in the home network is told by the VoLTE IMS system in the home network to establish a dedicated bearer. The PCRF then talks to the Packet Gateway (P-GW) in the home network which in turn forwards the request to the Serving Gateway (S-GW) in the visited network. From there it goes to the LTE base station (eNodeB) and from there to the mobile device. In principle it is the same message flow as in a non-roaming scenario but the roaming interconnect has to let such configuration messages pass between P-GW and S-GW. The big advantage of the approach is that the VoLTE client on the mobile device does not have to behave differently when roaming abroad, i.e. there is no need for it to decide whether it has to wait for a dedicated bearer for the voice packets or not before proceeding the call.

Single Radio Voice Call Continuity: One of the main drawbacks off S8-HR VoLTE roaming that it is more difficult to put a mechanism in place to switched over from an IP-based LTE bearer to a circuit switched 2G or 3G channel at the edge of LTE coverage. TR. 23.749 points out that from an architectural point of view, no additional specifications are necessary for SR-VCC to work across borders in a Release 8 SR-VCC configuration. I presume, however, that quite a number of network operators have already gone to a post-Release 8 SR-VCC implementation in the network which has a number of additional components to speed up the process. For this setup, SR-VCC across borders is not possible with the current specification. Interestingly enough the 3GPP technical report excludes discussions of potential solutions from the document. In other words, it seems to be complicated enough to be put into a separate TR. It would be interesting to know of Docomo has actually implemented SR-VCC for their S8HR VoLTE roaming with Korea or if they just went ahead without it as LTE coverage is said to be pretty much deployed everywhere in Korea in the meantime anyway so there's no need for it in the first place.

There are two other topics discussed in the technical report, how emergency calls could be made properly and how to best signal to the VoLTE IMS system that the device is not at home but roaming in another country. Falling back to 2G or 3G cellular for emergency calls is a quick and working solution but the report also gives some details of how it could be done over VoLTE. If you are interested have a look at the document.

While the document had a few positive surprises for me it by and large confirms that S8HR VoLTE roaming should not be too difficult to implement and I wonder how long it will take for other network operators to follow Docomo's lead!?

Ubuntu – Have Two Windows Share the Screen

Tip of the day: Sometimes simple things can massively improve productivity! Recently I saw a friend resizing two windows on a Mac with a key combination so they each occupied exactly half of the screen. That's very helpful in many situations I thought and immediately wondered if and how that would work on Ubuntu as well. And indeed there are key combinations for resizing windows this way as well:

To set a window occupying the left half of the screen use this shortcut:

Ctrl + Super + ←

And for the right half:

Ctrl + Super + →

(Super = the W*ndows key…)

The Nibbler 4-Bit CPU Project – Learn to Love What You Don’t Have – And Work Around It

The Nibbler 4-Bit CPU board is optimized for exactly one thing: Creating a fully functional computer with a CPU split into its components with as few chips as possible to make it easy to build it and to actually understand what is going on. It does an excellent job at this and one can even learn a lot about CPU architecture design from stuff that has been left out of the design on purpose. Here are a couple of examples and how to work around the missing parts:

No stack: Whenever writing a program that does more than just switching an LED on and off it's almost certain that the program will be split up into subroutines or that one uses a library of routines built by others. To be able to jump and return to a subroutine from several places, the current content of the program counter is put on the stack to serve as a return address. In addition, all input variables to be used by the subroutine are pushed onto the stack as well. The subroutine then retrieves (“pops”) the variables from the stack, does whatever it needs to do and then executes a 'Return from Subroutine' CPU instruction. The CPU then puts the return address that was put on the stack back into the program counter which effectively returns to the main program thread. As the Nibbler does not have a stack it's not possible push the program counter and variables on the stack and return from a subroutine to various places with a single instruction at the end of the subroutine. The way to work around this is to implement a jump cascade at the end of a subroutine. Whenever the subroutine is re-used, the jump cascade has to be modified by inserting a new return target at the end. Which jump to take is written into a memory location before jumping to the subroutine. A different value is used from each jump location. In other words, if the subroutine is used in 8 places in the program there is a cascade of cmp/jz instructions at the end of the subroutine. Also, the subroutine has to be modified whenever it's used from an additional location. Not elegant at all but the only way to have subroutines without a stack. To pass variables to the subroutine, they have to be put in memory (the 'heap') at predefined locations. I'm pretty sure if somebody has never ever heard of the 'stack' concept it wouldn't take long to come up with it as it's just a pain to do just about anything without one.

No indexed addressing: One thing computers are good at is to do simple things quickly over and over again. For example, in many cases it's required to make the same calculation on consecutive input data and to put the results back into memory one after each other. Another repetitive thing is to write into a buffer, e.g. writing a string to be sent to the LCD display into a buffer, one byte (or nibble in this case) at a time. An elegant way to do this is to do repetitive things in in a loop by using an index variable to point to the current input parameters and an index variable that points to where the next output in a buffer can be written to. The way this is done on machine instruction level is called indexed addressing. An instruction to write into memory is given a base address to which the content of an index register is added. After writing to memory, the index register is increased by one and the next loop iteration begins. Thing is, there is no index register on the Nibbler and therefore no indexed addressing, again for the purpose of making the hardware as simple as possible. The only way to work around this is to do repetitive things one after another rather than in a loop. If an action needs to be repeated 20 times, no loop can be used. Instead, the same instructions have to be repeated 20 times in the code with different source and destination addresses. Like the return cascades above, the missing functionality produces very ugly code and makes more complicated stuff that requires many iterations over different input and/or output data difficult to implement on the Nibbler.

No hardware interrupts: A great way of checking for external events is to use hardware interrupts. When the CPU notices an interrupt bit being set it suspends normal program execution and automatically sets the program counter to the beginning of a service routine for that interrupt. This makes it easy, for example, to check for the user pressing a key and to react to it immediately without delay. On PCs, hardware (and software) interrupts are used for many things such as for example peripheral devices indicating that data has become available for processing. It should come as no surprise that the Nibbler does not have interrupts. Checking for key input on the Nibbler thus requires polling the single 4 bit input register to detect when a bit connected to one of the input keys changes its state. This has to be done frequently as otherwise there will be a noticeable delay between the user pressing a key and the computer reacting. In programs that use delay loops between activities, checking for key input must be done in the delay loops to avoid this lag.

No add with carry: Another thing that very much simplifies the hardware design but makes life difficult on the software side is that there is no add with carry instruction. Therefore, adding up integers that are comprised of more than a nibble requires saving the carry flag in a variable and checking for it when using the add command on the next nibble. In practice even more work has to be done because a carry bit can result from adding one value to another or from adding the carry bit to a value. Together with not having an index register makes the whole affair quite complicated in practice.

Every instruction executed in two clock cycles: One of the brilliant design choices that significantly reduces hardware complexity is to execute every instruction in exactly two clock cycles. In the first clock cycle the instruction is loaded into the FETCH register and then executed during the second clock cycle. As the first clock cycle has also advanced the program counter the new 8 bits from the program ROM will either be used as the next instruction in case the program counter is not increased during the second clock cycle for the current instruction or as the lower 8 bits in combination with another 4 bits of the current instruction and put on the address bus to use the content of a RAM cell as the second operand in an operation. Very clever but that obviously also limits the complexity of a task that can be done with an instruction. That's why more complex CPUs use a variable number of steps per instruction and a more generic addressing scheme. Needless to say that more hardware would be required for that. And on the other hand there are only 16 instructions anyway so there's little opportunity for making some of them more complex.

Lots of technical detail in this post, perhaps better understood when looking directly at the source code. I've put one of my programs I've done for the Nibbler on Github which goes into the details of all the topics mentioned above. It can be compiled and run in the Nibbler simulator or, of course, on the real hardware.

VDSL Speed Upgrade and All-IP – My ISDN Days Are Over

Decomissioned-isdn-equipmentI'm a bit nostalgic today because my ISDN telephony days are over. A few days ago, I was “upgraded” to an all-IP line at home because my network operator of choice wants to decommission its ISDN public telephone network, offer VDSL vectoring (instead of fiber connectivity, yeah, right…) and migrate everyone to Voice Over IP. For me an era comes to an end.

Back in the days at the end of the 1990's when 52 kbit/s modems for analog lines where the hype of the day I switched to an ISDN line at home so I could make phone calls and be connected to the Internet at the same time. Another plus was being able to bundle the two 64 kbit/s ISDN channels for a blazing Internet speed of 128 kbit/s. Back in the day that was not only considered ultra-fast but it actually also felt like it as web pages and stuff to download were tiny compared to the multi-megabyte downloads when accessing a single web page these days with all the adds included (if you don't have an add-blocker installed). Even when I switched from ISDN to DSL for Internet access I kept my ISDN line to benefit from several phone numbers, immediate call forwarding to other destinations on some of them and other 'digital' features that were not so easy to get on analog lines.

Now after almost 20 years, ISDN has gone. The picture on the left shows my decommissioned ISDN equipment: ISDN base phone with a DECT unit, a DECT cordless phone, DSL/ISDN splitter and an NTBA (ISDN network terminator). But to sweeten things up I got four very worthwhile things as part of the “upgrade”.

First, in anticipation of the switch, I bought a new fixed line cordless DECT (or CAT-iq as it's called today) phone a few months ago that can be connected to both ISDN and a VoIP core network which is HD-Voice capable. Not only will I have a much better voice quality to other VoIP fixed line phones in the country, but there's also an HD-Voice gateway between the VoIP fixed line network and my mobile network operator of choice's GSM and UMTS network that converts the 12.2 kbit/s WB-AMR codec used in mobile networks (G.722.2) into the 64 kbit/s wideband codec used in fixed line networks (G.722). Works great and the audio quality is much improved.

Second, my VDSL line was upgraded from 25 Mbit/s in the downlink direction and 5 Mbit/s in the uplink direction to 50 down and 10 up. I fail to be really impressed by that as my fiber line in Paris gives me 264 Mbit/s in the downlink and 48 Mbit/s in the uplink. But every bit/s counts and I did notice the increased speed immediately when I downloaded a Linux image the other day. Also, my VPN server and Owncloud server that I host at home very much benefit from the 10 Mbit/s in the uplink direction.

Third, my VDSL line is now IPv6 enabled so I will finally be able to connect to my servers over IPv6 while out and about, at least while I'm in my home country, as my mobile network operator of choice has introduced IPv4v6 connectivity this summer. Also, it will help me to better understand the IPv6 firewall features of the mobile network and my VDSL router at home. More about that in a future post.

And finally, the overall package now only costs about half of what I paid before. I'm the conservative type when it comes to connectivity so I hadn't changed my fixed line subscription in 6 years. Never change a running system…

Running the Nibbler on a 2 Hz Clock – Who Needs 2 MHz Anyway?

Nibbler-2hz-clockNow that the Nibbler hardware is up and running I can go about and modify the hardware a bit. It's cool to have the board running at 2.4 MHz as everything written in assembler for a 4-bit CPU just runs at a breathtaking speed. Who needs GHz's on such a system? While speed is cool it has the slight disadvantage that doing something as benign as letting an LED blink once a second requires massive delay loops. With 4-bit counters it actually requires 4 nested delay loops. No, the board has no interrupts that one could work with as the focus was on reducing the hardware as much as possible while still having a real computer to work with.

As everything about the Nibbler is static, it's possible to turn down the clock rate as low as 0 Hz. For educational purposes I decided to replace the 2.4 MHz clock generator on the board with a 2 Hz clock I assembled out of two Not (Inverter) gates, a capacitor and a resistor. The extra LED and resistor shown in the image next to the Not-gates IC are not really needed as they are just for showing the clock impulses.

At two Hertz, each assembly instruction of the Nibbler takes exactly two clock pulses, or one second. At that rate it's actually possible to count instructions and visualize where the program currently executes. For the purpose I've written a short program with 5 instructions that switches the on-board LED on and off:

; x-2hz-clock-led.asm

; OUT ports
#define OUT_PORT_LED $E ; 1110 – bit 0 is low
 
; =================================================
 
led_blink_loop:
    ; LED on
    lit #0
    out #OUT_PORT_LED

    ; LED off
    lit #4
    out #OUT_PORT_LED

    jmp led_blink_loop

At 2.4 GHz the only thing that can be observed is a constantly glowing LED though not as bright as it could as it's only switched on 50% of the time. At 2 Hz, however, the LED blinks with a frequency of around 1 Hz. When the program starts and the LED is off it takes exactly 4 clock pulses before the LED is switched on because the output port for the LED is only pulled to ground in the 2nd cycle of the second instruction. 4 cycles later the LED is switched off again. It then takes 6 cycles before the LED is turned on again because there's the jump instruction at the end of the program that takes 2 cycles in addition to the two instructions that load the value to be written to the output port into the accumulator (lit #0 = load immediate) and the output command itself that writes the content of the accumulator to the output port.

Counting machine instruction executions with your fingers, when's the last time you did that? 🙂

LTE dual-SIM, dual standby, GSM-only for the second SIM

Three and a half years ago I had a closer look at how a dual-SIM 3G mobile worked in practice and how both SIM cards can be used simultaneously, or not. Up to today, the two articles (see here and here) remain one of the most viewed ones so I'm not alone with my interest. These days, there are also dual-SIM LTE phones available, not only in the mid- and low-range market but also in the high-end sector. Time to have a look how these work in practice and if two networks can be used simultaneously.

By and large, the behavior of the dual-SIM LTE phone I had is pretty much identical to the Dual-SIM 3G phone from three and a half years back. The phone can receive (i.e. listen) to two networks simultaneously but can only be active (i.e. transmit and receive) in one at at a time. One can, for example, browse the Internet via one network (used with the first SIM card) while the device keeps listening for incoming voice calls and SMS messages on the other network (with the second SIM card). When a voice call comes in on the second SIM card, the mobile interrupts the communication with the first network during the phone call. In other words, it's not possible to access the Internet via one network and have a phone call over the other network at the same time. That means that, like three and a half years ago, it's still a dual-standby approach.

Also, like the device three and a half years ago, one transceiver chain is limited to GSM while the other chain is capable of GSM, UMTS and LTE. SIM cards can by assigned to one of those chains via the menu so its possible to switch SIM cards to and from the LTE chain for data transfers when necessary. This is useful, for example, when using one SIM card for Internet access in the home country and another SIM card for Internet access when traveling abroad. To get an idea of how that looks like in practice click on the links above. The user interface looks a bit different now but the steps to switch and select SIM cards are still the same.

The Nibbler 4-Bit CPU Project – First Run

Nibbler-first-run-smI you haven’t seen my previous posts on the Nibbler, have a look here for what happened so far.

It’s a November evening which means it’s dark and cold outside and I’m looking out my windows to see a steady stream of car headlights. I’m glad I’m back home. Earlier today I’ve bought the missing chip for my Nibbler board and it’s time after all the effort put into understanding the concept and assembling the hardware if it will actually work. Adrenaline is flowing freely now, not only because my progress was slowed down by a pre-scheduled visit to the dentist. I was close to canceling it, I had a good enough ‘technical’ reason but it would have been that, just an excuse. So one dentist appointment later I finally sit at my desk and insert the remaining chip into the waiting socket on the Nibbler board. Once more I verify that all sockets contain the correct chip and come away satisfied.

Time for attaching the board to the power supply. If all goes well, “press any button” should show up on the display. I should have changed the text it into “hello world” but I decided to go ahead with a binary from the author rather than something written myself. More time for playing around with the software later, it’s a hardware thing today. I connect the board to my 5V battery I normally use for recharging my phone as I don’t even a regulated power supply. I intend to run it on a 5V USB mobile phone charger later but as I’m not quite sure the 5V delivered by a charger is flat enough for the Nibbler I decided on the battery instead.

I flip the master switch on the board and the green power supply LED turns on instantly – Apart from that – NOTHING happens on the display. What!?

Pressing the reset button a couple of times I refuse to believe that something could be seriously wrong. But the display remains dark. I then press the up/down/left/right keys and the piezo speaker starts making noises every time I press a button. Hope returns as the program I flashed into the ROM is supposed to do that. So the program must be running! Yay! But why is there nothing on the display, is the display or the output port chip broken? Then comes the flash of insight – I soldered a potentiometer onto the board to control the LCD module’s contrast. During the assembly phase I put it into a middle setting to ensure that I would at least see something when I first power-up the board. Perhaps a middle setting is not good enough? So I change the setting with a screwdriver first in one direction, resulting in nothing, then in the other direction and suddenly “press any button” shows up on the display. HURRAY – it’s only the contrast setting! As you can imagine, I’m overjoyed!

For the next half our I run a number of programs Steve Chamberlain has put together for the Nibbler, all in a single ROM and accessible via different jumper settings, a cool idea from William Bucholz, the creator of the PCB board. Everything works as it should. Wonderful! Now that the hardware is running I can further explore the hardware in ways that are just not possible with a simulator. But before that some sleep is in order to get the adrenaline from the dentist appointment and from those seconds between power-on and realizing that the contrast level has to be adjusted to see something on the display out of the system.

To be continued…

The Nibbler 4-Bit CPU Project – The Missing Chip

Almost-fully-assembled-nibblerThe circuit board is soldered, the microcode and program ROMs are flashed so the final step before switching on my 4-bit CPU board is to put the chips into the sockets. I'm glad I took extra care when doing that because quite to my dismay one bag contained the wrong IC for the data bus driver. Instead of a 74HC244 2x 4-bit buffer with a 3-state output that is required to select either the ALU or the FETCH register for output on the 4-bit data bus, a 74HC574 was delivered which contains d-type flip-flops. Apart from having a completely different functionality, input and output pins on that chip are different than on the 244.

If I hadn't caught the mistake, a number of chips would probably have been fried at first power-on. I could hardly believe it as the invoice correctly showed a 74HC244 and also the the bag for the chip had a 74HC244 sticker on it. I'm glad I didn't trust the bag labeling and checked the number on the chip once more after having inserted it on the board.

Missing-chipsQuite frustrating to sit in front of a fully completed board and not being able to power it on due to a single component missing that is worth only a couple of cents. Fortunately, a local supplier had a 74HC244 in stock so instead of waiting for it to be delivered I went to pick it up in the shop during lunch break the next day. The second picture shows the chips I picked up in the local store. Joy for less than 2 euros! Almost showtime now!