Numbers, UnOps, Vars

Slides

Let’s write:
  • a compiler that translates a string into assembly

  • a runtime to execute it

Adder-1

We have numbers!

The Runtime

#include <stdio.h>

extern int our_code() asm("our_code_label");

int main (int argc, char** argv) {
    int result = our_code();
    printf("%d\n", result);
    return 0;
}

The main function calls our_code and prints its return value, where the return value is the value stored in register EAX (per C calling convention) after running the assembly starting at label our_code_label.

Testing the Runtime

A really simple example: Given the source program 42, we want to compile an assembly file fourty_two.s that looks like:

section .text
global our_code_label
our_code_label:
    mov eax, 42
    ret

Assembling

$ nasm -f macho -o fourty_two.o fourty_two.s
$ clang -g -m32 -o fourty_two.run fourty_two.o main.c
$ fourty_two.run
42

The Compiler

Step 1: Types

Text -Parse> AST -CodeGen> ASM (.s)

The first step is to model the problem domain using types.

Let’s call the input text Text, the AST Expr, and the ASM its own type.

A Text is a string.

An Expr is a tree structure:

data Expr = Number Int
-- we'll add more later!

ASM

An Asm program is a list of instructions which can:
  • create a Label

  • move an Arg into a Register

  • Return to the runtime.

type Asm = [Instruction]

data Instruction
    = ILabel Text
    | IMov   Arg Arg
    | IRet

-- and:
data Register
    = EAX

data Arg
    = Const Int  -- literal number
    | Reg Register  -- register

Step 2: Transforms

parse :: Text -> Expr
parse = parseWith expr  -- built in to parser.hs
    where
        expr = integer

compile :: Expr -> Asm
compile (Number n) =
    [ IMov (Reg EAX) (Const n)
    , IRet
    ]

asm :: Asm -> Text
asm is = L.intercalate
    "\n" [instr i | i <- is]

-- where: (stringifiers)
instr :: Instruction -> Text
instr (IMov a1 a2) =
    printf "mov %s, %s"  -- String -> String
        (arg a1) (arg a2)

arg :: Arg -> Text
arg (Const n) = printf "%d" n
arg (Reg r)   = reg r

reg :: Register -> Text
reg EAX = "eax"

Note

We have 4 functions that crunch types to the text representation of x86:

asm :: Asm -> Text
instr :: Instruction -> Text
arg :: Arg -> Text
reg :: Register -> Text

Let’s write an overloaded function using typeclasses:

class ToX86 a where
    asm :: a -> Text

instance ToX86 Asm where
    asm is = L.intercalate "\n" [instr i | i <- is]

instance ToX86 Instruction where
    asm (IMov a1 a2) = printf "mov %s, %s" (asm a1) (asm a2)
    asm IRet         = "ret"

instance ToX86 Arg where
    asm (Const n) = printf "%d" n
    asm (Reg r)   = reg r

instance ToX86 Register where
    asm EAX = "eax"

Adder-2

Let’s add incrementing!

add1(7), add1(add1(42)), etc.

Examples

add1(7)

In English:
  • move 7 into eax

  • add 1 to eax

in ASM:

mov eax, 7
add eax, 1

Now we have to handle the add instruction!

add1(add1(42))

mov eax, 42
add eax, 1
add eax, 1

Note

We have to write the compiler in a compositional manner:

  • generate Asm for each subexpression independently

  • generate Asm for each superexpression assuming the value of the subexpression is in eax

Types

data Expr
    = Number Int
    | Add1   Expr

data Instruction
    = ILabel Text
    | IMov   Arg Arg
    | IRet
    | IAdd   Arg Arg

Now our examples are:

src1 = "add1(7)"
exp1 = Add1 (Number 7)
asm1 = [ IMov (EAX) (Const 7)
       , IAdd (EAX) (Const 1)
       ]

src2 = "add1(add1(42))"
exp2 = Add1 (Add1 (Number 42))
asm2 = [ IMov (EAX) (Const 42)
       , IAdd (EAX) (Const 1)
       , IAdd (EAX) (Const 1)
       ]

Parser

parse :: Text -> Expr
parse = parseWith expr

expr :: Parser Expr
expr = try primExpr
     <|> integer

primExpr :: Parser Expr
primExpr = Add1 <$> rWord "Add1" -- something, missing notes

Transformer

instance ToX86 Instruction where
    asm (IMov a1 a2) = printf "mov %s, %s" (asm a1) (asm a2)
    asm (IAdd a1 a2) = printf "add %s, %s" (asm a1) (asm a2)
    asm IRet         = "ret"

Compile

compile :: Expr -> Asm
compile (Number n) =
    [IMov (Reg EAX) (Const n)]
compile (Add1 e)
    = compile e
    ++ [IAdd (Reg EAX) (Const 1)]

Adder-3

Add the twice() function that doubles the internal function!

… in homework!

Adder-4

Numbers, Increment, Decrement, Local Vars

e.g. let x = add1(7), y = add1(x) in add1(y)

Examples

let x = 10 in x

Store 1 variable - x

let x = 10, y = add1(x), z = add1(y) in add1(z)

store 3 variables: x, y, z

let x = 10
    , c = let b = add1(a)
            in add1(b)
    in add1(c)

We need to handle N values in limited registers!

Let’s look at memory!

Memory

As the stack gets higher, memory addresses get lower

We get a bunch of 4-byte slots on the stack at offsets from the stack pointer:

EBP - 4 * 1, EBP - 4 * 2, etc

The i th stack variable lives at EBP - 4 * i, the previous stack frame lives at EBP, and the return address lives at EBP + 4.

EBP points to the bottom of the stack frame, ESP points to the top of the stack

So we need a mapping from source variables to stack positions.

So we maintain an Env that maps Id |-> StackPosition, so

let x = e1 in e2 adds x |-> i to Env where i is the current stack height.

Quiz: http://tiny.cc/cse110a-stackvar-ind -> B

Example:

let a = 1,               -- []
    c =                  -- [a |-> 1]
        let b = add1(a)
        in add1(b)       -- [b |-> 2, a |-> 1]
in                       -- b is popped from stack and c assigned
    add1(c)              -- [c |-> 2, a |-> 1]

Strategy

Variable Use

To compile x given env, move [ebp - 4 * i] into eax

Variable Defn

To compile let x = e1 in e2,
  • compile e1 using env (the result is stored in eax)

  • move eax into [ebp - 4 * i]

  • push x into env at position i

  • compile e2 using env'

Example

let x = 10
in add1(x)

-- ==

mov eax, 10
mov [ebp-4*1], eax
mov eax, [ebp-4*1]
add eax, 1

Quiz: http://tiny.cc/cse110a-let-ind

Types

Let’s extend the types!

type Id = Text

data Expr =
    -- ...
    | Let Id Expr Expr
    | Var Id

data Arg =
    -- ...
    | RegOffset Int Reg  -- [ebp - 4 * i] == RegOffset i (Reg EBP)

Environments

data Env = [(Id, Int)]

lookupEnv :: Env -> Id -> Maybe Int
lookupEnv [] x = Nothing
lookupEnv ((y, n):rest) x = if x == y
                                then Just n
                                else lookupEnv rest x

pushEnv :: Env -> Id -> (Int, Env)
pushEnv env x = (xn, env')
    where
        env' = (x, xn):env
        xn   = 1 + length env