Object File Format

This format describes a single object file. Each file must have a header with the following data, at the given byte offsets. The first value is "magic number" to assert that this is indeed an object file. It is defined in util.Version.objMagicNumber.

Byte offsetValue
0Magic Number
4Symbol table size (symTabSize)
8Reference table size (refTabSize)
12Data segment size (dataSegSize)
16Text segment size (textSegSize)
20Symbol table
20 + symTabSizeReference table
20 + symTabSize + refTabSizeData segment
20 + symTabSize + refTabSize + dataSegSizeText segment

Symbol Table

This includes just the global symbols defined in this module, represented as an ASCII string like this "foo 16 bar 128 baz 32". It maps symbol to an offset into the data segment. (We should actually put all symbols in here.)

Reference Table

The reference table is another symbol table, but it maps symbols to LISTS of offsets into the text segment to where the references occur. An example looks like this: "print 256 260 alloc 60 globalVariable 8"

Data Segment

The data segment consists of binary data representing the static data of this module.

Text Segment

The text segment consists of binary data representing the text (instructions) in this module. Some of the instructions will have offsets that have yet to be resolved. These might be references to global variables that will be stitched by the linker.

Executable File Format

The following format governs the specification of Cebollita executable files:

Byte offsetValue
0Magic Number
4Text segment size
8Data segment size
12Stack size
16Heap size
20Entry point (first instruction to execute)
24Text segment
24 + textSegSizeData segment

The first value is a "magic number", used to assert that this file is indeed an object file. It is defined in util.Version.exeMagicNumber. The next five values are integers that represent information about the executable: text and data segment size describe the size of the respective segments. Stack and heap size are the requested maximum sizes for these two dynamic memory regions. Entry point is used to specify the offset into the text segment that shall be the first instruction to execute.

The text segment is the binary instructions with fully resolved offsets. Finally, the data segment represents the program's initialized static/global data.