Cebollita C-- Language Specification

C-- is a restricted subset of C. The language is similar to C, although it has a weaker type system (if that can be possible), and a restricted set of built-in types, operators, control constructs. Its goal is to be simple enough that a student can understand/modify the compiler, while illustrating the mapping of high level language features to assembly language. It should, however, also be expressive enough to write reasonably interesting benchmark programs.

Below are the major features/bugs/differences from C:

Comments

Only // single line comments are accepted. Currently no quoted strings are allowed in comments (BUG).

Control Constructs

Only if and while are supported. The body clauses must be surrounded by curly braces, even if there is one statement. The following is not legal:
if (x == 10)          // syntax error!
  y = x;

while (x != 0)        // syntax error!
  x = x - 1; 
The above would need to be rewritten as:
if (x == 10) {
  y = x;
}

while (x != 0) {
  x = x - 1; 
}
Note that the commonly used dangling form of if-statement is illegal in C-- syntax:
if (foo) {
  // do one thing
}
else if (bar) {
  // do another thing
}
else {
  // do yet another thing
}

Datatypes

The only supported types are: char, int, char*, int*, arrays of chars, and arrays of ints. Typedefs and composite types (structures) are not allowed.

Global variables

Are supported. A variable that is defined (globally) in another file, must be declared extern in a file that wishes to reference it (as in regular C programs). Initializers are limited to literal data of the allowed types (just ints and strings, see below). Intializers may not be arbitrary expressions -- just literal data! Here are some examples:
int i = -10;              // initialize i to -10
int j = 0x1234abcd;       // initialize j to 0x1234abcd
char newline = 10;        // a single character
char* msg = "hello";      // an initialized message
int* foobar = 24;         // a pointer to address 24
char buffer[20];          // a buffer of 20 chars...
int myInts[10];           // an array of 10 ints
extern int* intArray;     // an array defined elsewhere
extern char gCH;          // a character defined elsewhere

Literal Data

Only integers and strings may be expressed as literal data in C--. Integers may be signed decimal or hexidecimal (eg. 0x0000FFA1). Strings are (as in C) text surrounded by double quotes ("foobar"). \n, \t, \0 are the only valid escape characters in strings. Strings are allocated globally, and are of type char*.
main() {
  char* msg = "hello there\n";
  printString(msg);
  printInt(strlen(msg));
}

Functions

All functions are assumed to return values of type int. Function definitions do not require a return type. In fact, if one is given, it will be ignored. Actual parameters are not type checked against formal parameters.

Function calls are not type checked -- that is, the number and types of arguments do not need to match the number and type of parameters. For this reason, while external functions are allowed, they do not (should not) need to be declared as extern. This lack of inter-module checking obviates the need for header files. Cebollita neither requires nor supports them.

To return a value from a function, the reserved word return should be used.

Operators

The following operators are supported: +, -, *, /, %, <<, >>, |, &, ^, ==, !=, >=, <=, >, <, [] (array reference).

NO OTHER OPERATORS (eg. ++, --, ^, &, |, etc) are supported.

Reserved words

The following words are reserved: if, else, while, return, extern.

A Simple Program

fact(int n) {
  int result = 1;
  while (n != 0) {
    result = result * n;
    n = n - 1;
  }  
  return result;
}

main() {
  int n = 9;
  int result = fact(n);
  printString("The factorial is: ");
  printInt(result);
}
The above program would need to be linked against a library that provides the I/O functions printString and printInt.