Compiler Sniffer

Overview
This software helps you write C compilers. Instead of starting from scratch, run this program on the target architecture using an existing (possibly commercial) compiler. It'll give you a good starting point, showing how the existing compiler does some of its basic translation (how it adds two integers together, etc). The compiler sniffer is most useful when using lcc, a retargetable compiler, as there is a one-to-one mapping between 'lcc node types' and 'compiler sniffer code snippets'.
The compiler sniffer works in three stages. In the first stage, it generates a large C file that tests a broad group compiler functions, mostly arithmetic, with all possible data types. After this, it run's a user supplied compiler on the resulting C source file (using the "no optimization" and "output to assembly" options). In the third stage, it analyzes the resulting assembly file, neatly formatting as a text or html file.
Investigate / Download
Here is an example of the raw C file (from stage 1) - sniff.c
Here are some analysis examples for several architectures and compilers (from stage 3) - [Sparc/CC] [Sparc/gcc] [Intel x86/gcc]
The complete source code. This is organized as a JBuilder project. You can download a free version of JBuilder from the JBuilder website - sniffer.zip
More Information
The compiler sniffer is written in Java. You supply the sizes (in bytes) of each of the C types (an int is 4 bytes, etc) as well as a few other options. The array of sizes correspond to the following types: "char", "short", "int", "long", "long long", "float", "double", "long double", "int *". Here is an example of the input's supplied to the program. Just to clarify, this sizes[] array means: a "char" is one byte, a "short" is two bytes, an "int" is four bytes, a "long" is four bytes, a "long long" is eight bytes, etc.
// inputs
public static int sizes[] = { 1, 2, 4, 4, 8, 4, 8, 16, 8 };
public static final String nodeDelimiter = "rasta";
public static File sourceFile = new File("./sniff.c");
public static File assembledFile = new File("./sniff.s");
public static String compilerExecutionString = "gcc -O0 -S sniff.c";First, the program generates a large C source file that contains code snippets for each of the lcc node types. For example, here are the source snippets for the 2-argument ADD arithmetic node types:
=== snip =====================================
{
float op1;
float op2;
float result;
rasta();
result = op1 + op2;
rasta();
}
{
double op1;
double op2;
double result;
rasta();
result = op1 + op2;
rasta();
}
{
long double op1;
long double op2;
long double result;
rasta();
result = op1 + op2;
rasta();
}
{
int op1;
int op2;
int result;
rasta();
result = op1 + op2;
rasta();
}
{
long long op1;
long long op2;
long long result;
rasta();
result = op1 + op2;
rasta();
}
{
int * op1;
int * op2;
int * result;
rasta();
result = op1 + op2;
rasta();
}
{
int op1;
int op2;
int result;
rasta();
result = op1 + op2;
rasta();
}
{
long long op1;
long long op2;
long long result;
rasta();
result = op1 + op2;
rasta();
}=== snip =====================================
The program then compiles the source file to assembly languages, and roots through the assembly file, collecting snippets of assembly that map back to known lcc node types. All of this information is contained in a large data structure (a hashmap, that maps node type strings like "ADDF8" to the corresponding assembly language snippet). For example:
System.out.println( outputHash.get("ADDI4") );
would pump out:
ld [%fp-20], %f2
ld [%fp-24], %f3
fadds %f2, %f3, %f2
st %f2, [%fp-28]