A little while ago I started working on a 16 bit virtual machine running in node. I’d been watching these amazing series of videos from Ben Eater about how he build a fully functional 8 bit computer from scratch, and wanted to see if I could put into practice some of the things I’d learned. So from the beginning I knew I wanted:

• to design an assembly language
• to build an assembler that compiled `*.asm` files to a binary format
• to build a VM that would simulate memory, a CPU, and some form of I/O

The choice of 16 bit was a bit arbitrary, but I wanted to allow for a non trivial amount of computation to take place, and it seemed like a good balance between too small to be useful and too complex to manage. Keeping the complexity low was actually a really big concern for me – it’s very easy with this kind of thing to get stuck with the really tiny details, optimising all kinds of stuff, which in my opinion take away from the core concepts. Because of this I ended up making some tradeoffs in design which forced more creative approaches.

So before we go any further consider this the big github code link where you can find the entire project.

1.0 Virtual Hardware

1.1 Memory

A good place to start is the virtual hardware, since this is what everything runs on top of. I made extensive use javascript’s built in specialised array type Uint16Array to simulate both main memory (RAM) and stack memory. If you haven’t bumped into it before, the Uint part refers to the fact that array can only handle unsigned integers (i.e. no negative numbers), and the 16 refers to 16-bit (i.e. numbers between 0 and 65535). These kind of arrays need to be initialised with a size, which in this case reflects how large the memory will be. The final amount of memory is also the maximum size of the 16 bit integer, and the stack is 1024. A nice thing about Uint typed arrays is that values automatically wrap around (overflow and underflow) as you would expect in hardware. So in memory, 65535 + 1 = 0, and 0 – 1 = 65535.

1.2 CPU

Then we have the virtual CPU. A CPU has some built in memory areas called registers which are used to hold values while the CPU operates on them. Registers can be either general purpose or specialised. General purpose registers can have values loaded into/out of them at will and be used for calculations. Special registers perform specific functions within the CPU, such as the IP (instruction pointer) register, which stores the memory address of the next instruction to be executed, or the SP (stack pointer) which stores the memory address of the current item in the stack. In my VM there are 4 general purpose registers (A, B, C, and D) and two specialised registers (IP, SP).

So the actual fundamental workings of a CPU are quite simple: it’s basically just a 3 step cycle:

1. Fetch the next instruction from memory
2. Decode the instruction from it’s binary representation
3. Execute the instruction

A virtual CPU is no different. It retrieves the instruction that the IP register points to in memory, works out what that instruction is supposed to do, and then does it. The “it” here is quite a narrow range of stuff. Basically it’s going to one of these:

• put a value in a register
• put the value of a register in memory
• get something from memory and put it in a register
• do some math
• make a system call (in this case for either reading input or writing output)

Pretty straightforward when you think about it. Strange to think that all of modern computer architecture rests on these really simple operations!

2.0 Limitation

From the very beginning of this project, I made a decision in the design to reduce complexity, and that was that every individual thing this computer can do (i.e assembly instruction) would fit into a single 16 bit number. That means that for any math, any moving values, reading, writing – all the information of which instruction is it, which registers are involved, which values or memory addresses – all has to fit in 16 bits. You might think that’s a ridiculous restriction, and in some ways it is, but it really allows for a simpler model in assembling each instruction into a binary representation, reading it back, cycling the CPU and keeping track of addresses etc.

You could – and almost every other assembly language does – just encode into a larger number like 32 bits, that would give enough space to easily describe an operation, 2 registers and a memory address covering 16 bits all together. You could get even smarter and encode into a variable number of bits based on exactly what that operation actually needs in terms of space, so maybe you’d have some 6 bit instructions, some 8, some 32. But I kept it to 16 for simplicity.

3.0 Assembly language

The instruction set has gone through multiple iterations, from my first naive attempt to a much more condensed and compact representation of operations. Unlike something like x86 where the language allows to use the same instruction to manipulate a value  or a register:

`mov eax, 5`

or

`mov eax, ebx`

My assembly language cannot operate arbitrarily operate on different kinds of data, and needs separate instructions to deal with registers and values, due to the 16 bit limitation simply not allowing enough information to be generally encoded.

3.1 Instruction set: First attempt

So I decided to reserve the first 4 bits of the 16 to represent the opcode (the part that describes what to do) , which gives 16 possibilities, though initially only 14 were used. Note, by first I mean the rightmost 4 bits, as is conventional in binary.

Ins Args Description Representation
NOP No operation `0000000000000000`
MOV d, s Copy a value from the source register to the destination register `XXXXXXXXSSDD0001`
LDV d, v Load a value into the destination register `VVVVVVVVXXDD0010`
LDA d, m Load the value in the memory address into the destination register `MMMMMMMMXXDD0011`
LDR d, a Load the value in the memory address pointed to by the source register into the destination register `XXXXXXXXSSDD0100`
LDM s, m Load the value in the source register into memory `MMMMMMMMSSXX0101`
ADD d, s Add the source and the destination register, and store the result in the destination register `XXXXXXXXSSDD1001`
SUB d, s Subtract the destination from the source register, and store the result in the destination register `XXXXXXXXSSDD1101`
MUL d, s Multiply the destination by the source register, and store the result in the destination register `XXXXXXXXSSDD0110`
DIV d, s Divide the destination by the source register, and store the result in the destination register `XXXXXXXXSSDD1010`
JMP m Jump to instruction `MMMMMMMMXXXX1110`
JLT d, s, m Jump to instruction in the destination register is less than the source register `MMMMMMMMSSDD0111`
OUT s Output the contents of the source register `XXXXXXXXSSDD1011`
HLT Program halt `XXXXXXXXXXXX1111`

There is quite a lot of waste here, and not a lot of utility. Those `X`s refer to a binary that has no effect on the instruction (i.e it’s wasted). Something like `NOP` (no operation) is really the last instruction you should add if you are this limited. I’d also only reserved 8 bits for addresses, so the maximum size of memory might as well just have been 256 (and was at this stage). `MOV`, `LDV`, `LDR`, and `LDA` all deal with moving data around from memory and registers. `ADD`, `SUB`, `MUL`, and `DIV` are obviously arithmetic instructions. `JMP` allows the instruction pointer to be moved to a given memory location, and `JLT` is a condition jump if one register is less than another. Finally `OUT` is used to output a given register’s value to stdout, and `HLT` is the “we’re done” instruction.

3.2 Labels

Labels let you make human readable references to certain points in the instructions, which are replaced with real values during assembly. A label might be used something like this:

`JMP some_place: ... some_place: LDV A, 0`

If you’re spitting out your coffee and saying to yourself right now “Is that a GOTO?!”, the answer is yes. In assembly, that’s the level of abstraction you get unfortunately¯\_(ツ)_/¯.

3.3 Stack

There’s something quite crucial missing here, which is the stack. The stack is a kind of data structure where data can be “stacked” up, and is described as LIFO – last in first out. If you push a thing A onto the stack, then push a thing B, when you come to take something off you’re going to first get B, then A.

A stack let’s you easily have the concept of “functions”, because you can be executing code instruction by instruction, and then hit a function somewhere else in the memory. When that happens you push the current instruction pointer onto the stack, jump to that function execute there. At the end of the function, there is a `RET` instruction which tells the CPU “pop the previous address off the stack and jump back to it”.

You can of course push the values of registers or memory onto the stack (in this language, only registers) as well, so you can essentially store the state of the registers at the beginning of a function, and at the end place them all back at the end. This makes computations a lot less tedious.

In order to have a functioning stack, I needed to make room for four new instructions:

• `CAL` Call a function in memory. Essentially push instruction pointer and then jump to memory address
• `RET` Return from function
• `PSH` Push the value of a register onto the stack
• `POP` Pop the last value into a register

To make room, `LDA` and `NOP` were cut.

3.4 Getting a bit smarter

You can imagine grouping a bunch of instructions together under a name – like a kind of macro – which could act like a single instruction. For instance with the conditional instruction `JLT`, labels and the `JMP` instruction, you can work out if a number is greater than or equal to instead. From that you can work out if a number is equal to, and from there you can work out if it is not equal to.

So I added a step in the preprocessing stage of the assembler which let’s you use some of these “macros, or “pseudo instructions” as I called them in the code (https://github.com/francisrstokes/16bitjs/tree/master/src/assembler/preprocessor/stages/pseudo-instructions/expanders), before binary encoding.

The pseudo instructions were a bit of a revelation, and shortly after I realised you can group all the math operations into a single instruction, and then create pseudo instructions for `ADD`, `SUB`, `MUL`, `DIV` all under the single `ATH` (arithmetic) instruction. Actually that also handles bitwise math like `AND`, `OR`, `NOT`, `XOR`, `LSF` (left shift), and `RSF` (right shift).

As a side note here, working out the computation to perform these expanded operations like the conditionals based only on `JLT`, and also without disturbing the state of the machine was quite challenging! But this kind of challenge was what the project was all about – much more than just writing a greater than or equal to instruction which just handles this in javascript. It gets to what the core of computation actually is at this level – pure logic and state modelling.

3.5 Input/Output

A basic instruction for output was in the instruction set from the beginning, but it wasn’t really mimicking anything a real set would do. I didn’t mind until I was trying to come up with a nice way to take input from stdin, at which point I thought maybe I should try to handle both in a better way.

Normally an operating system provides some specific way to read/write files, access stdin and stdout, create forked processes, call an external function etc. This is usually through system calls. The VM “os” supports calls for reading a single character from the stdin buffer, and for numerous output modes in stdout. For instance you can print a single register as decimal, hex, binary, or as an interpreted character code (depending on mode), or you can provide a memory address in a register and print each value from memory sequentially until a zero is reached. This is useful when strings are stored in the data section of the binary, or when you’ve read X characters in to memory and want to display those back out later.

The implementation for stdin is also uses a Uint16Array, but of length 1. NodeJs’s `process.stdin` interface is to get key events as they come in. Those key events have their data placed into the Uint16 buffer, and the buffer is copied to a register when the system call comes in. Every time the VM’s stdin is read through a system call, the buffer is set back to zero, so that it’s possible to see if a new key was pressed on every CPU cycle.

3.6 .data and .text

It would be really tedious to have to write instructions to place values in memory that you use over and over again, or to have to have magic numbers of memory addresses, places you store memory offset etc. Other assembly languages have a solution for this, and that is that you can have a special kind of header directive that tells the assembly “Hey, after all the instructions are done, put this data in the binary for me as well. And while you’re at it, whenever I use this name in my file, what I really mean is use the memory address it ends up in instead”.

This is the `.data` directive, for declaring and initialising named data regions in memory.

`.data .first string "Hello\n" .second 0xFF .third size 100`

• A null terminated (0) string in memory, with the name `.first` which is later replaced with the address it starts at
• The single value 0xFF, with the name `.second`
• 100 uninitialised regions in memory, the first of is pointed to by `.third`

There’s another one, the `.text` directive, that is everything else: the code, as well as a line which says where the entry point of the application is.

These directives are required for the program to be valid, and are typically placed as at the top.

3.7 Final instruction set

3.7.1 Real Instructions

Instruction Arguments 16 bit representation Description
`MOV` `D, S` `XXXXXXXXSSDD0000` Move value at source register to destination register
`LDV` `D, V` `VVVVVVVVVVDD0001` Load a value into destination register.
`LDA` `D, M` `MMMMMMMMMMDD1110` Load a value from memory into destination register
`LDM` `D, M` `MMMMMMMMMMDD0011` Load the value in destination register into memory
`LDR` `D, S` `XXXXXXXXSSDD0010` Load the value from memory pointed at by the source register into the destination register
`LDP` `D, S` `XXXXXXXXSSDD1111` Load the value in source register into the memory address pointed to by destination register
`ATH` `D, S, O, M, B` `BBBMOOOOSSDD0100` Perform an arithmetic operation on the source and destination registers. O specifies the operation (listed below) and M is the mode, where 0 = place result in destination register and 1 = place result in source register. If the instruction is right or left shift then B specifies the shifting value
`CAL` `D` `XXXXXXXXXXDD0101` Call a function in memory pointed at by the destination register
`RET` `XXXXXXXXXXXX0110` Return from function
`JLT` `D, S` `XXXXXXXXSSDD0111` Jump to memory address pointed at by the source register, if value in the A register is less than value in destination register
`PSH` `S` `XXXXXXXXSSXX1000` Push the value in source register onto the stack
`POP` `D` `XXXXXXXXXXDD1001` Pop the stack into the destination register
`SYS` `XXXXXXXXXXXX1010` Perform a system call. This is described below in more detail.
`HLT` `XXXXXXXXXXXX1011` Program halt
`JMP` `M` `MMMMMMMMMMXX1100` Jump to address in memory. Can only reference memory up to 0x3FF.
`JMR` `S` `XXXXXXXXSSXX1101` Jump to the address pointed at by the source register

3.7.2 Pseudo instructions

Instruction Arguments Expanded length Description
`ADD` `D, S` 1 Add destination to source and store the result in destination
`ADDS` `D, S` 1 Add destination to source and store the result in source
`SUB` `D, S` 1 Subract destination from source and store the result in destination
`SUBS` `D, S` 1 Subract destination from source and store the result in source
`MUL` `D, S` 1 Multiply destination with source and store the result in destination
`MULS` `D, S` 1 Multiply destination with source and store the result in source
`DIV` `D, S` 1 Divide destination by source and store the result in destination
`DIVS` `D, S` 1 Divide destination by source and store the result in source
`INC` `D` 1 Add one to the destination register
`DEC` `D` 1 Subtract one from the destination register
`LSF` `D, A` 1 Binary shift left the destination register by amount A (max 7)
`LSR` `D, A` 1 Binary shift right the destination register by amount A (max 7)
`AND` `D, S` 1 Binary and the destination and source, and store the result in the destination
`OR` `D, S` 1 Binary or the destination and source, and store the result in the destination
`XOR` `D, S` 1 Binary exclusive-or the destination and source, and store the result in the destination
`NOT` `D` 1 Binary not (invert) the destination
`LDV16` `D, V` 6 Load a full 16-bit value into destination
`SWP` `D, S` 3 Swap the values in the source and destination registers
`JGE` `D, A` 12 Jump to address A if value in destination register is greater than or equal to the A register.
`JEQ` `D, A` 63 Jump to address A if value in destination register is equal to the A register.
`JNE` `D, A` 74 Jump to address A if value in destination register is equal to the A register.

4.0 Assembler

So now there’s a whole language defined, you still need a program to turn that language into something more low level. That’s the job of the the assembler. Basically it’s separated out into 4 main stages:

• preprocessing those instructions
• assembling into a binary representation
• writing that binary to a file

The reading and validating is quite straightforward. Preprocessing I touched on briefly with regard to pseudo instructions, but this step also removes all the comments, performs analysis and extraction on labels, and evaluates any expressions generated by the pseudo instructions. On top of this it takes information declared after the `.data` directive and creates a data table with labels, addresses, values and size, that can be referenced within the code. This data table is also used to insert any declared data into the assembled binary during the assembly stage. Finally there is also a `.text` directive that:

• holds a `.global` directive, pointing to the entry point (label)
• tells the preprocessor that everything every between this and any other directives or the end of the file is an instruction.

4.2 Assembling

The Assembling takes all the information of each instruction and creates a single 16 bit binary number from it using some bitwise operations (shifting and binary or). These are added to a `Uint16Array` that will later be converted to a buffer. Numerical representations for the opcodes and registers were chosen pretty arbitrarily. I’ll go into a bit of detail as to exactly how instruction encoding and decoding works a bit later on. Assembly also takes the data table generated in the processing step and inserts the values into the same `Uint16Array` after all the instructions, at precalculated addresses that match those associated with their corresponding data labels.

4.3 Writing the binary

Finally the binary writing stage just takes all those 16 bit numbers and and writes them out as raw data to the specified file. This file is what would be considered the executable. If you come from a windows background that’s the `.exe` file, and for unix based systems they generally don’t get a special extension. The VM doesn’t care what extension you use or if you even use one, but I’ve tended to use `.bin` while testing.

There’s nothing fancy going on with this file. No header or magic number, just a long list of 1s and 0s. It would of course be nice to have some metadata information encoded into the file, maybe some kind of checksum to provide assurance that it was properly generated, but this adds complexity that’s not necessary so it was out.

5.0 Debugger

I realised writing a debugger so you can actually see what’s going on during execution was probably going to be really helpful in order to develop anything non trivial. It’s pretty basic, but it can give a complete readout of memory and the stack (paginated), values in the registers, and the current instruction being executed, and you can step through the program as it executes. There are no breakpoints, label references, breakdown of function areas or anything fancy like that. The most glamorous feature is that the current instruction being executed is highlighted in yellow in the memory display, so you can keep track more easily.

Development on the debugger could go really far: show functions, data regions, better instruction breakdown, smartly recognising when a block of code was generated from a pseudo instruction, etc. One massive drawback now is that since input for the debugger comes through the terminal, you cannot get real input into the running program. For this reason it would be great to put this in a webpage and control it over sockets. I’m open to discussion and pull requests for improving the logic and interface, so don’t hesitate to get in contact!

6.0 Further goals

Having gotten this far, I think the next logical step is writing a C compiler that can generate my assembly, or at the very least a really slimmed down version of C.

I have plans to at least allow for breakpoints in the debugger. It would also be great to also do some analysis on the instructions and colour regions of memory that refer to functions or other specific kinds of data. Perhaps best would be to have the debugger render into some kind of Web interface where all the control happens so that interfacing with the debugger doesn’t effect stdin, since debugging now completely gets in the way of that.

I’d also love to go back and remove the restriction that enforces 16 bit instructions so that the assembly can be a little more expressive and allow for the same instruction to handle values, registers, labels etc.

7.0 Appendix: Encoding and decoding with binary math

Two parts of this project are like 2 sides of the same coin: the encode instruction step in the assembler, and the decode step during a CPU cycle. If you’ve ever used C then you’re probably already familiar with this kind of thing, but otherwise here’s a a short overview of how it works.

7.1 Bit operations

Bitwise math is mostly the implementation of logical operations (AND, OR, NOT, XOR etc) on binary numbers. For example,  the AND operator:

`0111 & 1101 === 0101`

Because there is a 1 in both the first AND the second number, in the 1st and 3rd positions. Another example, this time with OR:

`0101 | 1010 === 1111`

Because for every bit, there is a 1 in the first number OR the second number.

Another type of bitwise math is bit shifting. This is an operation that takes a binary number and moves every bit either right or left by a specified amount. For example:

`0001 << 1 === 0010`

Shifts every bit to the left by 1. The `LSF` and `RSF` pseudo instructions allows for this kind of manipulation with the assembly language.

With these operations we already have all we need to build an instruction encoder and decoder, but there are a couple more that are still useful.

Not flips every bit in a number.

`~110011 === 001100`

Finally, Xor (Exclusive or) will result in a 1 if there is a one in the place of one number or the other, but not both.

```0 ^ 0 === 0
1 ^ 0 === 1
0 ^ 1 === 1
1 ^ 1 === 0```

7.2 Encoding

The encoding process takes each instruction in the program, consisting of an opcode (4 bits)  and some possible arguments – like source register, destination register (both 2 bits), and memory location or literal value (8-10 bits) – and concatenates them into a single 16 bit number. In order to create this number each part is shifted to the position it would start in, and then ORd together. Let’s take the example of `LDV`:

`LDV A, 0x155`

The encoder will break up this instruction into it’s components:

• opcode: 0010
• destination register: 00
• value: 0101010101

and then shift all these different numbers to their correct positions, ORing the whole thing together:

```instruction = (0010) | (00 << 4) | (0101010101 << 6)
= 0101010101000010```

Voila, a single number that represents the operation of loading a specific value into a specific register.

In the end, the binary format for executables is just an extremely simple list of every encoded instruction. It would be better to have some kind of header, where the first X bytes of the program described what it was and maybe some kind of checksum, but that adds complexity and the goal was to keep complexity at a minimum.

The actual encoding process uses named constants for the opcodes, registers, and shifting amounts which makes it a little easier to read.

7.3 Decoding

Decoding is only a little more complicated, since it also entails involving the AND operator to isolate parts of the instruction. In order to extract any part of this number, it can be ANDed with a bitmask that has a 1 in all the positions we want to isolate, and then shift the result to the right by it’s position (the reverse shift of the encoding process). So to extract the destination register of the `LDV` example above:

`destination register = (0101010101000010 & 0000000000110000) >> 4`

Just like in the encoder, there are constants and helper functions that make all this a little bit more human accessible.

If you made it all the way here, you might as well go and check out the code first hand. It’s hosted on github. If you have questions or want to contribute, that’s probably a good place to do it.