Levels of abstraction

I promise, I can explain. It started out rather simply, then got a little out of hand. A week or so later, I’m still having an immense amount of fun.

It all started when Google gave me an awesome Christmas present: An HTC Dream. It’s a very shiny mobile phone, and what’s more, it’s an unlocked developer edition. It’s hacking time!

This is where things get a bit complicated. Lemme take you through the reasoning.

My first idea when I saw this beast was to try to get emulators running on it. A phone is nice, but a phone that can play vintage games is even better. I decided on playing with the Sega Genesis first, as I have rather fond memories of Sonic the Hedgehog.

First obstacle: Android (the open source OS that Google develops) currently can run only Java code. There is currently no open source Genesis emulator written in Java. Most of them are written in C, or in extreme cases, even in x86 assembler. There is currently no official way to execute native code on the android phone. I’d like to make this software available to everyone.

I therefore need to write a Genesis emulator in Java.

Okay, that should be simple. The Genesis is a relatively old console, so it can’t be too elaborate. I mean, it’s no PS3. All I need is an emulator for the CPU, a decoder for the ROM format, and some audio and graphics hookups within Android, and I should be good to go.

Well, first, the Genesis has three processors. A Motorola 68000 CPU, a Z80 sound processor, and a custom made graphics processor. Let’s start with the 68k CPU. Apparently, there is no well known open source Java emulator for the 68k.

I therefore need to write a Motorola 68k emulator in Java.

But Java sucks. I mean, it’s obviously a successful language, but I find no pleasure at all programming in Java. It is pure pain without an IDE on the level of Eclipse or Netbeans, and those IDEs aggravate me in various ways. The the the language language language is is is way way way too too too verbose verbose verbose (verbose verbose).

Plus, after consulting the 68k specs and sampling a few C implementations, it looks like an extremely repetitive task: most opcodes have around 20 variants, depending on addressing modes and various flag bits. It would be very tedious to implement this by hand, not only because it’d be in Java, but because it’d be even more mind numbing and uninteresting. However, the kinds of variants that are needed are quite amenable to be described at a high level, leaving the repetitive task of actual implementation to a program. And I can use a language that I enjoy for that, say, Python.

I therefore need to write a Motorola 68k emulator generator in Python.

After a bit of prototyping, I came to the conclusion that implementing this in Python would also be rather tedious, for a variety of reasons. First, I started off badly by writing a generator that goes straight from high level description language to a Java source code string, mushing several levels of abstraction together. Second, the description needed to generate the variants quickly lead me to combinatorial explosions, or to independent components that began interacting with each other in hilarious ways. Not good.

Plus, one day, when Android does have a supported way of running native code, I’d probably want an emulator in C or C++, running on the CPU directly, instead of under the Dalvik virtual machine. At which point all my work will have been for nothing.

I therefore need something of an emulator compiler, that parses the high level description into an execution tree for the opcode implementations, which a code generator then translates into a variety of output languages, such as Java, C++ or Brainfuck.

Python is nice, but I don’t think that writing compilers is one of its fortes, despite what the PyPy folks appear to think. Manipulating the code representations is cumbersome at best, and the divide between the living Python code and the dead data it manipulates is rather wide. Manipulating code as data and vice versa is one of the often described merits of the Lisp family of programming languages. I’ve been wanting to get back to Common Lisp as a language and poke around with it more, and it feels like the ideal language in which to build a compiler. What could be more code-as-data-as-code than a program that takes apart a description of a program and puts it back together again in another form?

I therefore need to write an m68k emulator compiler in Common Lisp, at first targeting the Java programming language, and later possibly other languages.

And that is how, a week after Google gives me a mobile phone, I find myself writing Common Lisp code, for a compiler that compiles a lisp-like language into Java code, that will be compiled into Dalvik VM bytecode, running on an ARM-based embedded system, which when executed will emulate a Motorola 68000 CPU.

I feel like I’ve just had a Wikipedia attack. You know, that thing where you go to Wikipedia to look up something very specific, say how Tesla coils can be used to play music, and end up three hours later reading through an analysis of 14th century persian battle tactics, with no idea how you got there. That’s kinda how I felt when I came up for air yesterday and looked back. “So, I got a phone… And now I’m writing a compiler… I’m pretty sure I have a good reason…”

Oh, and the emulator compiler is starting to work. I was very rusty in Common Lisp, but in a couple of days of hacking and prototyping, I’m starting to get somewhere. I can already generate the implementation of the simplest variant of the 68k [tt]ADD[/tt] opcode. The “source” looks like this, with comments added

(instruction
   ;; Instruction name, with variant information.
   "add_dreg_to_dreg"

   ;; The description of the meaning of the 16 bits of the opcode.
   ((:literal 4 #b1101)      ; 4 constant bits, with the given binary value
    (output-register 3 dest) ; The output register, whose number is coded
                             ; over 3 bits. Its value is available in the
                             ; 'dest' variable.
    (:literal 6 #010000)     ; More constant bits, describing the addressing mode.
    (input-register 3 src))  ; Input register, similar declaration to dest.

   ;; How to perform this operation, in a subset of Common Lisp.
   (setf dest (+ src dest)))

The above instruction definition produces an opcode object that contains two things: information for the instruction decoder, so that it can identify this instruction, and the intermediate representation of the implementation of that instruction:

;; The constant bit values in the opcode, and the mask to test them,
;; for the instruction decoder.
(debug-print-opcode-mask the-above-opcode-object)

--> Output: 1101---010000---

;; The intermediate representation of the implementation.
(opcode-ast the-above-opcode-object)

--> (LET ((SRC (REGISTER-VALUE :DATA
                               (VM-OPCODE-BITS 3 0)))
          (DEST (WRITABLE-REGISTER-VALUE :DATA
                                         (VM-OPCODE-BITS 3 9))))
      (SETF DEST (+ DEST SRC)))

This opcode can then be fed to the Java code generator, to produce the output implementation:

public static void op_add_dreg_to_dreg(unsigned short opcode) {
    unsigned long src = mDataRegisters[(opcode >> 0) & 0x7];
    unsigned long dest = mDataRegisters[(opcode >> 9) & 0x7];
    dest = dest + src;
    mDataRegisters[(opcode >> 9) & 0x7] = dest;
}

There is still a lot to be done. For one, the ADD opcode is supposed to update the CPU’s state flags with information about the result of the addition. After that, the addressing modes other than to/from a numbered register must be supported. Implementing more opcodes will surely bring more things that need to be implemented.

Once a solid base is laid, a higher-still level of description must be layered on, so that all the variants of an instruction are produced from a single implementation definition. Once all that is done, a C++ backend would be nice. And why not attempt to generalize the compiler infrastructure, so as to support the compilation of emulators other than m68k CPUs?

By the time I get anywhere near that, I suspect that it will have become possible to easily write and release native code for Android, making all of my efforts unnecessary. But I don’t care, this is damn fun!

Some people are of the opinion that I should get my head examined.