Numbers, UnOps, Vars¶
- Let’s write:
a compiler that translates a string into assembly
a runtime to execute it
Adder-1¶
We have numbers!
The Runtime¶
#include <stdio.h>
extern int our_code() asm("our_code_label");
int main (int argc, char** argv) {
int result = our_code();
printf("%d\n", result);
return 0;
}
The main function calls our_code and prints its return value, where the return value is
the value stored in register EAX (per C calling convention) after running the assembly starting at label
our_code_label.
Testing the Runtime
A really simple example: Given the source program 42, we want to compile an assembly file fourty_two.s that
looks like:
section .text
global our_code_label
our_code_label:
mov eax, 42
ret
Assembling
$ nasm -f macho -o fourty_two.o fourty_two.s
$ clang -g -m32 -o fourty_two.run fourty_two.o main.c
$ fourty_two.run
42
The Compiler¶
Step 1: Types¶
Text -Parse> AST -CodeGen> ASM (.s)
The first step is to model the problem domain using types.
Let’s call the input text Text, the AST Expr, and the ASM its own type.
A Text is a string.
An Expr is a tree structure:
data Expr = Number Int
-- we'll add more later!
ASM
- An
Asmprogram is a list of instructions which can: create a
Labelmove an
Arginto aRegisterReturnto the runtime.
type Asm = [Instruction]
data Instruction
= ILabel Text
| IMov Arg Arg
| IRet
-- and:
data Register
= EAX
data Arg
= Const Int -- literal number
| Reg Register -- register
Step 2: Transforms¶
parse :: Text -> Expr
parse = parseWith expr -- built in to parser.hs
where
expr = integer
compile :: Expr -> Asm
compile (Number n) =
[ IMov (Reg EAX) (Const n)
, IRet
]
asm :: Asm -> Text
asm is = L.intercalate
"\n" [instr i | i <- is]
-- where: (stringifiers)
instr :: Instruction -> Text
instr (IMov a1 a2) =
printf "mov %s, %s" -- String -> String
(arg a1) (arg a2)
arg :: Arg -> Text
arg (Const n) = printf "%d" n
arg (Reg r) = reg r
reg :: Register -> Text
reg EAX = "eax"
Note
We have 4 functions that crunch types to the text representation of x86:
asm :: Asm -> Text
instr :: Instruction -> Text
arg :: Arg -> Text
reg :: Register -> Text
Let’s write an overloaded function using typeclasses:
class ToX86 a where
asm :: a -> Text
instance ToX86 Asm where
asm is = L.intercalate "\n" [instr i | i <- is]
instance ToX86 Instruction where
asm (IMov a1 a2) = printf "mov %s, %s" (asm a1) (asm a2)
asm IRet = "ret"
instance ToX86 Arg where
asm (Const n) = printf "%d" n
asm (Reg r) = reg r
instance ToX86 Register where
asm EAX = "eax"
Adder-2¶
Let’s add incrementing!
add1(7), add1(add1(42)), etc.
Examples¶
add1(7)
- In English:
move
7intoeaxadd
1toeax
in ASM:
mov eax, 7
add eax, 1
Now we have to handle the add instruction!
add1(add1(42))
mov eax, 42
add eax, 1
add eax, 1
Note
We have to write the compiler in a compositional manner:
generate
Asmfor each subexpression independentlygenerate
Asmfor each superexpression assuming the value of the subexpression is ineax
Types¶
data Expr
= Number Int
| Add1 Expr
data Instruction
= ILabel Text
| IMov Arg Arg
| IRet
| IAdd Arg Arg
Now our examples are:
src1 = "add1(7)"
exp1 = Add1 (Number 7)
asm1 = [ IMov (EAX) (Const 7)
, IAdd (EAX) (Const 1)
]
src2 = "add1(add1(42))"
exp2 = Add1 (Add1 (Number 42))
asm2 = [ IMov (EAX) (Const 42)
, IAdd (EAX) (Const 1)
, IAdd (EAX) (Const 1)
]
Parser¶
parse :: Text -> Expr
parse = parseWith expr
expr :: Parser Expr
expr = try primExpr
<|> integer
primExpr :: Parser Expr
primExpr = Add1 <$> rWord "Add1" -- something, missing notes
Transformer¶
instance ToX86 Instruction where
asm (IMov a1 a2) = printf "mov %s, %s" (asm a1) (asm a2)
asm (IAdd a1 a2) = printf "add %s, %s" (asm a1) (asm a2)
asm IRet = "ret"
Compile¶
compile :: Expr -> Asm
compile (Number n) =
[IMov (Reg EAX) (Const n)]
compile (Add1 e)
= compile e
++ [IAdd (Reg EAX) (Const 1)]
Adder-4¶
Numbers, Increment, Decrement, Local Vars
e.g. let x = add1(7), y = add1(x) in add1(y)
Examples¶
let x = 10 in x
Store 1 variable - x
let x = 10, y = add1(x), z = add1(y) in add1(z)
store 3 variables: x, y, z
let x = 10
, c = let b = add1(a)
in add1(b)
in add1(c)
We need to handle N values in limited registers!
Let’s look at memory!
Memory¶
As the stack gets higher, memory addresses get lower
We get a bunch of 4-byte slots on the stack at offsets from the stack pointer:
EBP - 4 * 1, EBP - 4 * 2, etc
The i th stack variable lives at EBP - 4 * i, the previous stack frame lives at EBP, and the return address lives at EBP + 4.
EBP points to the bottom of the stack frame, ESP points to the top of the stack
So we need a mapping from source variables to stack positions.
So we maintain an Env that maps Id |-> StackPosition, so
let x = e1 in e2 adds x |-> i to Env where i is the current stack height.
Quiz: http://tiny.cc/cse110a-stackvar-ind -> B
Example:
let a = 1, -- []
c = -- [a |-> 1]
let b = add1(a)
in add1(b) -- [b |-> 2, a |-> 1]
in -- b is popped from stack and c assigned
add1(c) -- [c |-> 2, a |-> 1]
Strategy¶
Variable Use
To compile x given env, move [ebp - 4 * i] into eax
Variable Defn
- To compile
let x = e1 in e2, compile
e1usingenv(the result is stored ineax)move
eaxinto[ebp - 4 * i]push
xintoenvat positionicompile
e2usingenv'
Example
let x = 10
in add1(x)
-- ==
mov eax, 10
mov [ebp-4*1], eax
mov eax, [ebp-4*1]
add eax, 1
Types¶
Let’s extend the types!
type Id = Text
data Expr =
-- ...
| Let Id Expr Expr
| Var Id
data Arg =
-- ...
| RegOffset Int Reg -- [ebp - 4 * i] == RegOffset i (Reg EBP)
Environments¶
data Env = [(Id, Int)]
lookupEnv :: Env -> Id -> Maybe Int
lookupEnv [] x = Nothing
lookupEnv ((y, n):rest) x = if x == y
then Just n
else lookupEnv rest x
pushEnv :: Env -> Id -> (Int, Env)
pushEnv env x = (xn, env')
where
env' = (x, xn):env
xn = 1 + length env