K31 Compilers | |||||||||||||||||||||||||||
Spring Semester 2015 | |||||||||||||||||||||||||||
|
Course ProjectDesign and implementation of a compiler for the MiniJava language (a small subset of Java) To implement the compiler you will use the tools JavaCC and JTB The implementation for phases 2 and 3 of the project will be done in Java utilizing the visitor pattern
Homework 4 - Static Analysis and OptimizationsIn your final project, you will implement static program analysis algorithms over the Spiglet intermediate language program. The analysis will compute all the information needed to perform optimizations, but you do not need to actually optimize the Spiglet code. Specifically, you will implement 5 analyses:
You do not have to implement your own data-flow framework to express these analyses. Instead, you will use an implementation of the Datalog language called IRIS. This is probably the easiest way to express general fixpoint algorithms without having to worry about the details of their evaluation. You can see the IRIS Datalog syntax here. You can also use the IRIS online demo and test your Datalog programs there. That is, your visitors over the Spiglet code will emit tuples for the following relations:
For instance: instruction("A_foo", 1, "MOVE TEMP 2 HALLOCATE 12"). instruction("A_foo", 2, "MOVE TEMP 3 TEMP 2"). instruction("A_foo", 3, "MOVE TEMP 4 10").
For instance: var("A_foo", "TEMP 2"). var("A_foo", "TEMP 3"). var("A_foo", "TEMP 4").
For instance: next("A_foo", 1, 2).
For instance: varMove("A_foo", 2, "TEMP 3", "TEMP 2").
For instance: constMove("A_foo", 3, "TEMP 4", 10).
For instance: varUse("A_foo", 2, "TEMP 2").
For instance: varDef("A_foo", 1, "TEMP 2"). varDef("A_foo", 2, "TEMP 3"). varDef("A_foo", 3, "TEMP 4"). If you see a need, you can modify the aforementioned schema in order to express additional information regarding labels e.g., you can add additional fields to existing relations or introduce new relations to the schema. But every such change should be well-justified. The current schema should be sufficient for all your analyses. The above relations are the input to the Datalog code that computes your analyses. The Datalog code should be just a few rules per analysis, but you need to think about them carefully to ensure they are correct. Example Datalog computations: cJumpInstruction(?m, ?i) :- next(?m, ?i, ?j), next(?m, ?i, ?k), ?i+1 = ?k, ?j != ?k. I.e., an instruction is a (non-trivial) CJUMP iff its possible next instructions are its immediate next in program order and a different instruction. jumpInstruction(?m, ?i) :- next(?m, ?i, ?j), ?i+1 = ?k, ?j != ?k, !next(?m, ?i, ?k). I.e., an instruction is a (non-trivial) JUMP iff it has a next instruction that is not its immediate next in program order. (Note that this elides trivial jumps, which jump to the immediately next instruction. These are not real jumps for program analysis purposes.) Bonus part (20% extra credit): As a bonus part to this project you can apply the optimizations identified by your analyses to actual Spiglet code. That is, given an input Spiglet file, your program should apply the necessary transformations and produce an optimized Spiglet file based on the optimization opportunities you discovered. You will receive more examples of Datalog analyses in lecture and lab. Homework 3 – MiniJava Generating intermediate code
In this part of the project you have to write visitors that convert MiniJava code into an intermediate language which we are going to call Spiglet. You can see most of the elements of this language by reading its grammar (BNF form, JavaCC form). Below is some explanation and elaboration:
If you do not remember or haven't seen how a virtual table (v-table) is constructed, essentially it forms a table structure pointed by the first 4 bytes of an object and defines an offset for each function. Consider a function foo in position 0 and bar in position 1 of the table (with actual offset 4). If a method is overridden, the overriding version is inserted in the same location of the virtual table as the overridden version. Virtual calls are implemented by finding the address of the function to call through the virtual table. If we wanted to depict this in C, imagine that object obj is located at location x and we are calling foo that is in the 3rd position of the v-table. The function that is going to be called is in memory location x[0][12]. Many things that this description may lack can be found by reading the grammar of Spiglet. It is small enough and understandable. You can also understand the form of Spiglet by examining the examples given here (corresponding to the earlier MiniJava examples from HW2). You can use this program in order to "pretty-print" the Spiglet code you produce, for readability(*). To ensure that your output is legal Spiglet code, you must use this parser by typing "java -jar spp.jar < [Input-program]". If the program is syntactically legal Spiglet, you will receive the message "Program parsed successfully". Additionally you can use this interpreter to "run" your Spiglet code and see if it produces correct results. (The interpreter is more permissive than the parser: it misses several restrictions of Spiglet.) The correct results can be seen by comparing the output that javac and java produce with the output you get from executing this Spiglet interpreter on your output code. Your program should run as follows: java [MainClassName] [file1.java] [file2.java] ... [fileN.java] That is, your program must compile to Spiglet all .java files given as arguments. Moreover, the outputs must be stored in files named file1.spg, file2.spg, ... fileN.spg respectively. (*) run: java -jar pretty-printer.jar [file1.spg] [file2.spg] ... [fileN.spg] This command stores the outputs at file1-pretty.spg, file2-pretty.spg, etc. s S Homework 2 – MiniJava Static Checking (Semantic Analysis)This homework introduces your semester project, which consists of building a compiler for MiniJava, a subset of Java. MiniJava is designed so that its programs can be compiled by a full Java compiler like javac. Here is a partial, textual description of the language. Much of it can be safely ignored (most things are well defined in the grammar or derived from the requirement that each MiniJava program is also a Java program):
The MiniJava grammar in BNF can be downloaded here. You can make small changes to grammar, but you must accept everything that MiniJava accepts and reject anything that is rejected by the full Java language. Making changes is not recommended because it will make your job harder in subsequent homework assignments. Normally you won't need to touch the grammar. The MiniJava grammar in JavaCC form is here. You will use the JTB tool to convert it into a grammar that produces class hierarchies. Then you will write one or more visitors who will take control over the MiniJava input file and will tell whether it is semantically correct, or will print an error message. It isn’t necessary for the compiler to report precisely what error it encountered and compilation can end at the first error. But you should not miss errors or report errors in correct programs. The visitors you will build should be subclasses of the visitors generated by JTB, but they may also contain methods and fields to hold information during static checking, to transfer information from one visitor to the next, etc. In the end, you will have a Main class that runs the semantic analysis initiating the parser that was produced by JavaCC and executing the visitors you wrote. You will turn in your grammar file, if you have made changes, otherwise just the code produced by JavaCC and JTB alongside your own classes that implement the visitors, etc. and a Main. The Main should parse and statically check all the MiniJava files that are given as arguments. There will be a tutorial for JavaCC and JTB. You can use these files as MiniJava examples and to test your program. Obviously you are free to make up your own files, however the homework will be graded purely on how your compiler performs on all the files we will test it against (both the above sample files and others). You can share ideas and test files, but obviously you are not allowed to share code. Your program should run as follows: java [MainClassName] [file1] [file2] ... [fileN] That is, your program must perform semantic analysis on all files given as arguments. May the Force be with you! Homework 1 - Calculator grammarPart 1For the first part of this homework you should implement a simple calculator. The calculator should accept expressions with addition, subtraction, multiplication, and division operators, as well as parentheses. The grammar (for single-digit numbers) is summarized in: exp -> num | exp op exp | (exp) op -> + | - | * | / num -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 You need to change this grammar to support priority between addition and multiplication, to remove the left recursion for LL parsing, etc. This part of the homework is divided in two parts:
Part 2In the second part of this homework you will implement a parser and translator for a language supporting string operations. The language supports the concatanation operator over strings, function definitions and calls, conditionals (if-else i.e, every if must be followed by an else), and the following logical expressions:
All values in the language are strings. Your parser, based on a context-free grammar, will translate the input language into S-expressions, i.e., fully parenthesized expressions with the operator in the first position. You will use JavaCUP for the generation of the parser combined either with a hand-written lexer or a generated-one (e.g., using JFlex, which is encouraged). You will infer the desired syntax of the input and output languages from the examples below. The output language is a subset of Scheme so it can be interpreted by mit-scheme or online scheme interpreters like this, if you want to test your output. There is no need to perform type checking for the argument types or a check for the number of function arguments. You can assume that the program input will always be semantically correct. Note that extra parentheses are fine in the input syntax but not in the S-expression language. For instance "x" is not the same as "(x)" in the S-expression language: the latter means a call to function "x". Example #1Input: name() { "John" } name() surname() { "Doe" } fullname(first_name, sep, last_name) { first_name + sep + last_name } surname() fullname(name(), " ", surname()) Output (S-expressions): (define (name) "John") // function name is defined. This is a comment and not part of the output (name) // function name is called (define (surname) "Doe") (define (fullname first_name sep last_name) (string-append (string-append first_name sep) last_name)) (surname) (fullname (name) " " (surname)) Example #2Input: name() { "John" } repeat(x) { x + x } cond_repeat(c, x) { if (c = "yes") repeat(x) else x } cond_repeat("yes", name()) cond_repeat("no", "Jane") Output (S-expressions): (define (name) "John") (define (repeat x) (string-append x x)) (define (cond_repeat c x) (if (equal? c "yes") (repeat x) x)) (cond_repeat "yes" (name)) (cond_repeat "no" "Jane") Example #3Input: may_be_gerund(x) { if ("ing" in x) "yes" else "no" } repeat(x) { x + x } cond_repeat_gerund(c, x) { if (c = "yes") if (may_be_gerund(x) = "yes") repeat(x) else x else x } cond_repeat_gerund("yes", "running") cond_repeat_gerund("yes", "run") cond_repeat_gerund("no", "running") Output (S-expressions): (define (may_be_gerund x) (if (substring? "ing" x) "yes" "no")) (define (repeat x) (string-append x x)) (define (cond_repeat_gerund c x) (if (equal? c "yes") (if (equal? (may_be_gerund x) "yes") (repeat x) x) x)) (cond_repeat_gerund "yes" "running") (cond_repeat_gerund "yes" "run") (cond_repeat_gerund "no" "running") You can use the following (inefficient but elegant) definition of substring? to test the s-expression code generated for the in conditional operator: (define (substring? string1 string2) (let ((len1 (string-length string1)) (len2 (string-length string2))) (cond ((> len1 len2) #f) ((string=? string1 (substring string2 0 len1)) #t) (else (substring? string1 (substring string2 1 len2)))))) |