Numbers, UnOps, Vars¶
- Let’s write:
a compiler that translates a string into assembly
a runtime to execute it
Adder-1¶
We have numbers!
The Runtime¶
#include <stdio.h>
extern int our_code() asm("our_code_label");
int main (int argc, char** argv) {
int result = our_code();
printf("%d\n", result);
return 0;
}
The main function calls our_code
and prints its return value, where the return value is
the value stored in register EAX
(per C calling convention) after running the assembly starting at label
our_code_label
.
Testing the Runtime
A really simple example: Given the source program 42
, we want to compile an assembly file fourty_two.s
that
looks like:
section .text
global our_code_label
our_code_label:
mov eax, 42
ret
Assembling
$ nasm -f macho -o fourty_two.o fourty_two.s
$ clang -g -m32 -o fourty_two.run fourty_two.o main.c
$ fourty_two.run
42
The Compiler¶
Step 1: Types¶
Text -Parse> AST -CodeGen> ASM (.s)
The first step is to model the problem domain using types.
Let’s call the input text Text
, the AST Expr
, and the ASM its own type.
A Text
is a string.
An Expr
is a tree structure:
data Expr = Number Int
-- we'll add more later!
ASM
- An
Asm
program is a list of instructions which can: create a
Label
move an
Arg
into aRegister
Return
to the runtime.
type Asm = [Instruction]
data Instruction
= ILabel Text
| IMov Arg Arg
| IRet
-- and:
data Register
= EAX
data Arg
= Const Int -- literal number
| Reg Register -- register
Step 2: Transforms¶
parse :: Text -> Expr
parse = parseWith expr -- built in to parser.hs
where
expr = integer
compile :: Expr -> Asm
compile (Number n) =
[ IMov (Reg EAX) (Const n)
, IRet
]
asm :: Asm -> Text
asm is = L.intercalate
"\n" [instr i | i <- is]
-- where: (stringifiers)
instr :: Instruction -> Text
instr (IMov a1 a2) =
printf "mov %s, %s" -- String -> String
(arg a1) (arg a2)
arg :: Arg -> Text
arg (Const n) = printf "%d" n
arg (Reg r) = reg r
reg :: Register -> Text
reg EAX = "eax"
Note
We have 4 functions that crunch types to the text representation of x86:
asm :: Asm -> Text
instr :: Instruction -> Text
arg :: Arg -> Text
reg :: Register -> Text
Let’s write an overloaded function using typeclasses:
class ToX86 a where
asm :: a -> Text
instance ToX86 Asm where
asm is = L.intercalate "\n" [instr i | i <- is]
instance ToX86 Instruction where
asm (IMov a1 a2) = printf "mov %s, %s" (asm a1) (asm a2)
asm IRet = "ret"
instance ToX86 Arg where
asm (Const n) = printf "%d" n
asm (Reg r) = reg r
instance ToX86 Register where
asm EAX = "eax"
Adder-2¶
Let’s add incrementing!
add1(7)
, add1(add1(42))
, etc.
Examples¶
add1(7)
- In English:
move
7
intoeax
add
1
toeax
in ASM:
mov eax, 7
add eax, 1
Now we have to handle the add
instruction!
add1(add1(42))
mov eax, 42
add eax, 1
add eax, 1
Note
We have to write the compiler in a compositional manner:
generate
Asm
for each subexpression independentlygenerate
Asm
for each superexpression assuming the value of the subexpression is ineax
Types¶
data Expr
= Number Int
| Add1 Expr
data Instruction
= ILabel Text
| IMov Arg Arg
| IRet
| IAdd Arg Arg
Now our examples are:
src1 = "add1(7)"
exp1 = Add1 (Number 7)
asm1 = [ IMov (EAX) (Const 7)
, IAdd (EAX) (Const 1)
]
src2 = "add1(add1(42))"
exp2 = Add1 (Add1 (Number 42))
asm2 = [ IMov (EAX) (Const 42)
, IAdd (EAX) (Const 1)
, IAdd (EAX) (Const 1)
]
Parser¶
parse :: Text -> Expr
parse = parseWith expr
expr :: Parser Expr
expr = try primExpr
<|> integer
primExpr :: Parser Expr
primExpr = Add1 <$> rWord "Add1" -- something, missing notes
Transformer¶
instance ToX86 Instruction where
asm (IMov a1 a2) = printf "mov %s, %s" (asm a1) (asm a2)
asm (IAdd a1 a2) = printf "add %s, %s" (asm a1) (asm a2)
asm IRet = "ret"
Compile¶
compile :: Expr -> Asm
compile (Number n) =
[IMov (Reg EAX) (Const n)]
compile (Add1 e)
= compile e
++ [IAdd (Reg EAX) (Const 1)]
Adder-4¶
Numbers, Increment, Decrement, Local Vars
e.g. let x = add1(7), y = add1(x) in add1(y)
Examples¶
let x = 10 in x
Store 1 variable - x
let x = 10, y = add1(x), z = add1(y) in add1(z)
store 3 variables: x, y, z
let x = 10
, c = let b = add1(a)
in add1(b)
in add1(c)
We need to handle N values in limited registers!
Let’s look at memory!
Memory¶
As the stack gets higher, memory addresses get lower
We get a bunch of 4-byte slots on the stack at offsets from the stack pointer:
EBP - 4 * 1, EBP - 4 * 2
, etc
The i
th stack variable lives at EBP - 4 * i
, the previous stack frame lives at EBP
, and the return address lives at EBP + 4
.
EBP
points to the bottom of the stack frame, ESP
points to the top of the stack
So we need a mapping from source variables to stack positions.
So we maintain an Env
that maps Id |-> StackPosition
, so
let x = e1 in e2
adds x |-> i
to Env
where i
is the current stack height.
Quiz: http://tiny.cc/cse110a-stackvar-ind -> B
Example:
let a = 1, -- []
c = -- [a |-> 1]
let b = add1(a)
in add1(b) -- [b |-> 2, a |-> 1]
in -- b is popped from stack and c assigned
add1(c) -- [c |-> 2, a |-> 1]
Strategy¶
Variable Use
To compile x
given env
, move [ebp - 4 * i]
into eax
Variable Defn
- To compile
let x = e1 in e2
, compile
e1
usingenv
(the result is stored ineax
)move
eax
into[ebp - 4 * i]
push
x
intoenv
at positioni
compile
e2
usingenv'
Example
let x = 10
in add1(x)
-- ==
mov eax, 10
mov [ebp-4*1], eax
mov eax, [ebp-4*1]
add eax, 1
Types¶
Let’s extend the types!
type Id = Text
data Expr =
-- ...
| Let Id Expr Expr
| Var Id
data Arg =
-- ...
| RegOffset Int Reg -- [ebp - 4 * i] == RegOffset i (Reg EBP)
Environments¶
data Env = [(Id, Int)]
lookupEnv :: Env -> Id -> Maybe Int
lookupEnv [] x = Nothing
lookupEnv ((y, n):rest) x = if x == y
then Just n
else lookupEnv rest x
pushEnv :: Env -> Id -> (Int, Env)
pushEnv env x = (xn, env')
where
env' = (x, xn):env
xn = 1 + length env