RTS4 XScript Engine

This blog post is going to be pretty boring, no pretty pictures, and its late, and I’m very tired while typing it, but it underpins a lot of the features and functionality available in AoM.

XScript powers a significant part of the data-driven systems in the BANG! engine. As mentioned earlier, random maps are created in this language, but the engine also uses XScripts to describe other routines, such as the start-up sequence, victory conditions, and artificial intelligence. The language and frameworks available are very object-oriented c-style, using functions to create and manipulate objects by passing their ID as the first parameter.

Three annoying issues I’ve found so far with the language is the lack of support for classes, the strong typing, and the strange scoping. Objects in the BANG! engine are created by calling the appropriate createX() function, and storing the ID returned. The object can be interacted with by passing this ID as the first parameter of other functions. Dynamic, or even non-primitive types are not supported, and blocks do not indicate scope; a new scope is only created inside functions.

Parsing

The simplicity of this language makes parsing relatively easily, though there were some strange bits of behavior that had to be accounted for.

  • for loops must iterate one integer variable, this variable must not be declared and is implicitly declared by the for statement
  • it uses the strange syntax where “void” can be specified in the arguments of a function declaration when there are no arguments.

Originally I tokenised the code, then built an abstract syntax tree, ready to be passed to the compiler to select instructions, however after building this system, I realised there was a lot of data duplication; the set of valid tokens which may appear after a given token was required both in the tokenisation pass and in the AST generation. Removing data duplication is one programming rule I always stick with, as I find it significantly improves maintainability and reduces bugs in code, so the system was rewritten to do this tokenisation/AST generation in a single pass.

To achieve this in a flexible and reliable way, a list of token types is fed into the compiler (things like number, string, literal, controlFlow, type), and set of grammar rules (statement -> if | for | flow | funcDeclr…, if -> “if” & “(” & condition & “)”…, condition -> expression…). These grammar rules contain in the possible tokens which may appear after a given token, and are able to take into consideration the context of the parse (ie. ‘-‘ can be treated as a unary operator or subtraction, “if” can be treated as a control flow instruction, or a variable). The parser starts by attempting to parse the given source code as a block. This method is then called recursively to process statements in the source code bit-by-bit. Each rule used is then pushed to a list before processing child-rules, and popped off when parsing of the given rule is complete.

Generating pseudo-instructions

For each rule, the system calls a special piece of code to gather required input from the source code and add instructions to another list representing the required behavior. The instructions generated by sub-rules is passed to parent rules when they begin running their special routine, so that they can organise sub instructions (ie. push both literals before running the addition instruction) and extract required meta-information (ie. where the condition of an if statement ends, and the code begins)

Generating VM-instructions

After a list of instructions has been generated, their types must be resolved, and the appropriate VM instruction selected. This is done by creating a virtual stack, and pushing or popping types required by instructions as iterating over each instruction. Doing this is not flexible and most would consider ugly and unreliable, though for the given cases, it works well, and most of the time, offers the expected types on the stack, if the code were to be run, by abusing the fact that code is typically sequential. A set of VM instructions is looked up by name based on the current instructions name, and of the matches, the best VM instruction to match the types expected on the stack is selected.

As this process treats the instructions as if they were always executed sequentially, it has problems past forks and jump instructions, however fortunately, almost all of the time, the stack will be balanced before and after a conditional jump section.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s