Compiling Quake 3 Virtual Machines To JavaScript

Lately, I've spent most of my free time working on the QuakeJS project, which is a port of ioquake3 to the browser.

To accomplish the majority of the port, Emscripten is used to compile the actual C source directly to JavaScript.

For the dynamic libraries that ioquake3 links to (libc, OpenGL, SDL, etc.), Emscripten contains versions hand-written in JavaScript which are linked in and responsible for translating methods to their HTML5 counterparts (e.g. OpenGL calls -> WebGL calls).

Quake 3, like its predecessors, enabled users to easily alter and modify the code of the game. This could be used to make minor changes to gameplay, or to completely alter the face of the game in the case of total conversions such as Urban Terror:

Urban Teror

To enable these modifications, Quake 3 decoupled the core gameplay and UI logic from the engine into external libraries. These external libraries were compiled to QVM (Quake Virtual Machine) files which were dynamically loaded and executed at runtime inside of a managed environment.

For QuakeJS, I wanted to load these QVMs in order to support playing the numerous mods that've been created over the years.

Why Virtual Machines

Previously, Quake 2 shipped with the gamecode split out into libraries that were compiled down to native shared objects (e.g. .dll files on Windows or .so on Unix-like systems).

For Quake 3, virtual machines were instead used for two reasons: portability and security.

By compiling to a common byte code, mod authors would have to release one binary for their mod, not a binary for each platform they intended to support.

Further, by compiling to a common byte code that's executed in a managed environment, security measures can be enforced such as restricting memory access outside of a private address space, and disallowing potentially malicious calls to external libraries (e.g. to access the local file system).

These measures made it easy and safe for users to download and play mods without concern.

Compiling the byte code

Let's lay out a few definitions before moving on:

  • assembly - instructions represented as text-based assembler mnemonics and operands that offer a higher level representation of byte or machine code
  • byte code - instructions represented by octet data designed to be executed by an interpreter
  • machine code - instructions represented by octet data that are executed directly by the CPU

C -> LCC assembly

Quake 3's build process first uses the LCC compiler to compile each C file to LCC's intermediate representation, a stack-based assembly language.

As an example, let's look at the following C program which returns the binary complement of -14:

int main() {
  int a = -14;
  return ~a;
}

which in LCC assembly looks like:

code                ; directive for the assembler to start emitting instructions to the code segment
proc main 4 0       ; allocate 4 bytes of storage on the call stack
ADDRLP4 0           ; push address of first local to the operation stack
CNSTI4 -14          ; push constant -14 to the op stack
ASGNI4              ; pops op stack first for the value
                    ; pops op stack a second time for the address and assigns the value to it
ADDRLP4 0           ; push address of the first local to op stack
INDIRI4             ; pop op stack for an address
                    ; load 4-byte int from address
                    ; push loaded value back to the op stack
BCOMI4              ; pop op stack for a 4-byte int
                    ; take binary complement of popped value, push back to op stack
RETI4               ; pop value from stack, used as return value
endproc main 4 0    ; shrink call stack by 4 bytes, push return value to stack

LCC assembly -> QVM byte code

QVM

Afterwards, the custom q3asm assembler is used to compile the LCC assembly to the QVM object file.

During this process, assembler mnemonics (e.g. ASGNI4) are converted to opcodes, operands which are symbol references (e.g. ADDRGP4 bg_itemlist) will be resolved to absolute addresses and the results will be written to the appropriate segments.

This process is fairly straight forward, and when simplified looks something like this:

for (token in input) {
  if (token == "code" || token == "data" || token == "bss") {
    segment = token;
    continue;
  }

  int opcode = GetOpcode(token);                // translate the text-based mnemonic to
                                                // the appropriate byte value
  int[] operands = GetOperands(opcode, input);  // parse the operands for the current
                                                // opcode from the input. NOTE: if symbols
                                                // are encountered, they will be resolved
                                                // to their absolute memory address

  EmitByte1(segment, opcode);                   // emit a single byte for the opcode to
                                                // the current segment
  for (op in operands) {
    EmitByte4(segment, op);                     // emit a 4 byte value for each operand
                                                // to the current segment
  }
}

WriteSegment("code");
WriteSegment("data");
WriteSegment("bss");

If you're interested to read a complete overview of the QVM format, check out the work done by PhaethonH.

Executing the byte code

Originally, Quake 3 intended to run this byte code through an interpreter. An interpreter emulates a processor in an extremely straight-forward fashion:

  1. Get the instruction at the current instruction pointer
  2. Decode the instruction
  3. Execute the instruction
  4. Update the instruction pointer

Interpreters are easy to implement, but they're often much slower than the equivalent machine code.

Some work was done to ease the load on the interpreter by moving computationally expensive code to the engine. In the end however, it wasn't fast enough and Quake 3 changed to generating machine code from the byte code at runtime.

Code generators were created for several architectures including the x86, PowerPC and SPARC. For architectures where a code generator wasn't available, it would fallback to interpreting. According to John Carmack's .plan file, the performance of the generated machine code was fairly close to the performance of libraries generated by an actual compiler.

Running the original Quake 3 QVMs through the interpreter in QuakeJS actually produced pretty good results on the default timedemo:

1260 frames 13.4 seconds 94.1 fps 3.0/10.6/146.0/6.3 ms

However, for Quake 3 Fortress (a popular mod) who's gameplay code is much more complex, the results were much worse:

1858 frames 86.2 seconds 21.6 fps 22.0/46.4/241.0/30.3 ms

After seeing Quake 3 Fortress's poor performance, I wanted to try following convention by compiling the byte code ahead of time to the lowest level possible for our architecture, JavaScript.

Generating JavaScript

As I mentioned, Quake 3 already contains code generators for several architectures, so it's fairly easy to plug a new compiler into it:

QVM loading

The engine will handle parsing the QVM, allocating and copying the data, lit and bss segments into the heap and allocating space for the VM's call stack. It will then call into our custom VM_Compile and VM_CallCompiled with a pointer to the vm_t structure containing this info.

For our custom VM_Compile the general idea is to enumerate the instructions translating them to a giant string of JavaScript, eval it, and the return value of the eval will provide an entry point into the module for VM_CallCompiled to use.

There are a few caveats with this, as JavaScript doesn't provide conventions for some operations used by low-level byte code (e.g. jump instructions).

Handling the operation stack

The QVM byte code relies on an operation stack at runtime to handle moving data between operations. All instructions pop operands from this stack, and push their results to it. However, when converting to a high-level language such as JavaScript, the operation stack can be emulated at compile time as JavaScript can express multiple expressions in a single statement.

As an example, here is what adding a value with a constant and storing it back into a local variable looks like in LCC assembly:

ADDRLP4 4  ; push address of second local to op stack
ADDRLP4 0  ; push address of first local to op stack
INDIRI4    ; pop op stack, load value for address, push value to op stack
CNSTI4 15  ; push constant to op stack
ADDI4      ; pop two values from op stack, push result back to stack
ASGNI4     ; pop value from stack, pop address from stack, store value at address

When converting to JavaScript, this can be expressed as:

HEAP32[STACKTOP+4] = HEAP32[STACKTOP] + 15;

Handling Jumps

The byte code expects to be able to jump to arbitrary instructions within a function. Let's look at the following C code:

int a = 1;
int main() {
  if (a) {
    return 13;
  }
  return 0;
}

translated to LCC assembly:

proc main 0 0     ; instruction 0
ADDRGP4 a
INDIRI4           ; push the value from a to op stack
CNSTI4 0          ; push the constant 0 to op stack
EQI4 $2           ; pop two values from the op stack, if they're equal, jump to $2
CNSTI4 13
RETI4
ADDRGP4 $1
JUMPV
LABELV $2         ; instruction 9, target of above jump
CNSTI4 0
RETI4
endproc main 0 0

Note, once compiled to QVM byte code, the label symbols are converted to absolute instruction addresses. So instead of EQI4 $2, we'd actually see EQI4 9.

JavaScript doesn't have support for performing jumps to labeled instructions (e.g. C's goto) as required by EQI4, so we have to be a little clever. Borrowing the idea from Emscripten, the following JavaScript would be output:

// functions are named after their instruction offset
function fn0() {
  var label = 1;
  while (1) switch (label) {
    case 1:
      if (HEAP32[1024 /* address of a */] == 0) {
        label = 9;
        break;
      }
      return 13;

    case 9:
      return 0;
  }
}

The local variable label stores the jump destination, and the while(1) switch (label) {} mechanism is used to branch to it.

Handling The Call Stack

Currently, a single typed array buffer is used to store call stack information for the VM.

All function locals and parameters are stored to and loaded from this array. It'd be nice to rely more on the JavaScript engine's call stack for this information, but it's currently not possible for two reasons:

  1. You can't pass the address of a variable in JavaScript. However, by using a single typed array for locals, the index into the array can be used to address the variable.
  2. Currently, it's a requirement to be able to suspend / resume the VM. By maintaining our own call stack, this is very easy to do. If the JavaScript engine's call stack was relied on, there'd be no easy way to save this information upon suspension. NOTE: it may be possible to do this with the new yield operator

To illustrate this, I'll provide annotated output at each step, but let's start with the original C code:

int foobar(int a) {
  int b = 5;
  return a + b;
}

int main() {
  return foobar(7);
}

compiled to LCC assembly:

proc foobar 4 0         ; declare foobar with a 4 byte frame size
ADDRLP4 0               ; push address of first local to op stack
CNSTI4 5                ; push 5 to op stack
ASGNI4                  ; pop value from op stack, pop address from op stack, store value
ADDRFP4 0               ; push address of first argument to op stack
INDIRI4                 ; pop address from op stack, load, push result to op stack
ADDRLP4 0               ; push address of first local to op stack
INDIRI4                 ; pop address from op stack, load, push result to op stack
ADDI4                   ; pop two values from the op stack, add, push result to op stack
RETI4                   ; pop value from op stack, shrink frame size, push result to op stack
endproc foobar 4 0

proc main 4 4           ; declare main with a 8 byte frame size
CNSTI4 7                ; push 7 to op stack
ARGI4                   ; pop op value from stack, store value as first argument
ADDRLP4 0               ; push address of first local to op stack
ADDRGP4 foobar          ; push address of foobar to op stack
CALLI4                  ; pop address from call stack, call
ASGNI4                  ; pop value from op stack (return value), pop address from op stack, store value
ADDRLP4 0               ; push address of first local to op stack
INDIRI4                 ; pop address from op stack, load, push result to op stack
RETI4                   ; pop value from op stack, shrink frame size, push result to op stack
endproc main 4 4

and finally, once compiled to JavaScript:

var buffer = new ArrayBuffer(1024);
var HEAP32 = new Int32Array(buffer);
var STACKTOP = 1024;

// NOTE the stack here grows down

// foobar
function fn0() {
  var label = 0;
  while (1) switch (label) {
    case 0:
      // increase the stack size by the frame size (+8 bytes padding for return info)
      STACKTOP -= 12;
      // store 5 into local 0
      HEAP32[STACKTOP+8>>2]=5;
      // return, shrinking the stack and leaving the result of arg 0 + local 0 on top of the stack
      HEAP32[STACKTOP+8>>2]=HEAP32[STACKTOP+20>>2])+(HEAP32[STACKTOP+8>>2];
      STACKTOP += 12;
      return;
  }
}

// main
function fn10() {
  var label = 10;
  while (1) switch (label) {
    case 10:
      STACKTOP -= 16;
      // return information (current function and the next instruction)
      // used for suspend / resume functionality
      HEAP32[STACKTOP+0>>2]=10;
      HEAP32[STACKTOP+4>>2]=16;
      // store 8 in arg 0
      HEAP32[STACKTOP+8>>2]=8;
      fn0();
    case 16:
      // store return value of fn0 into local 0
      HEAP32[STACKTOP+12>>2]=HEAP32[STACKTOP-4>>2];
      // return, shrinking the stack frame and leaving local 0 on top of the stack
      HEAP32[STACKTOP+12>>2]=HEAP32[STACKTOP+12>>2];
      STACKTOP += 16;
      return;
  }
}

Performance Results

Original Quake 3

interpreter  1260 frames 13.4 seconds 94.1 fps 3.0/10.6/146.0/6.3 ms
compiled     1260 frames 11.4 seconds 110.1 fps 4.0/9.1/129.0/4.8 ms

The compiled code here runs about ~15% faster. It's not a huge improvement, but the original Quake 3 game code was pretty far removed from the heavy lifting.

Quake 3 Fortress

interpreter  1858 frames 86.2 seconds 21.6 fps 22.0/46.4/241.0/30.3 ms
compiled     1858 frames 21.9 seconds 84.9 fps 7.0/11.8/145.0/5.7 ms

When benching the Quake 3 Fortress mod however, the compiled code starts to shine. Here, the compiled code runs nearly 400% faster.

In its current state, the generated code is far from perfect, and there are perhaps still some major gains that could come from making more use of the JavaScript engine's call stack as discussed above.

However, even as it is now, the results of compiling the code are worth the added complexity of the runtime compiler.

You can check out the compiler's source code on GitHub.


blog comments powered by Disqus