Branches and Binary Operators ============================= Let's add branches and binops! Branches -------- .. code-block:: haskell data Expr = -- ... | EIf Expr Expr Expr Examples ^^^^^^^^ Treat 0 as false, non-zero as true .. code-block:: haskell if 10: 22 else: sub1(0) -- 22 if sub1(1): 22 else: sub1(0) -- -1 **Quiz**: https://tiny.cc/cse110a-if3-ind -> 0 Assembly Control Flow ^^^^^^^^^^^^^^^^^^^^^ To compile branches, we will use assembly labels - landmarks from which execution can be started from or skipped to Also, comparisons: ``cmp a1, a2`` - performs a numeric comparison between the 2 args - stores the result in a special processor flag And jumps: ``jmp LABEL``, ``je LABEL``, ``jne LABEL`` - uses the result of the flag set by the most recent ``cmp`` **Quiz**: https://tiny.cc/cse110a-control-ind -> A Example: .. code-block:: haskell if 10: 22 else: 33 -- compiles to mov eax, 10 cmp eax, 0 je if_false if_true: mov eax, 22 jmp if_exit if_false: mov eax, 33 if_exit: Strategy ^^^^^^^^ to compile: .. code-block:: haskell if eCond: eThen else: eElse we will: 1. compile eCond 2. compare the result in eax to 0 3. jump if the result is zero to a special false label 4. otherwise continue into eTrue, then jump to exit label **What about multiple if-statements?** We need a way to generate unique labels for each branch, since assembly can't reuse labels Types ^^^^^ .. code-block:: haskell data Expr a = -- ... | EIf (Expr a) (Expr a) (Expr a) a We add tags of type ``a`` for each subexpression (e.g. source-position info for error messages) Let's define a name for ``Tag``: ``type Tag = Int`` and we now use: .. code-block:: haskell type BareE = Expr () type TagE = Expr Tag Now, extend the assembly: .. code-block:: haskell data Label = BranchFalse Tag | BranchExit Tag data Instruction = ... | ICmp Arg Arg | ILabel Label -- Create a label | IJmp Label -- Jump always | IJe Label -- Jump if equal | IJne Label -- Jump if not-equal Transforms ^^^^^^^^^^ Tags """" We don't want the programmer to put in tags, so let's add a tag step to our pipeline! .. code-block:: haskell > let e = parseStr "if 1: 22 else: 33" > e If (Number 1 ()) (Number 22 ()) (Number 33 ()) () > label e If (Number 1 ((),0)) (Number 22 ((),1)) (Number 33 ((),2)) ((),3) The key work is done by ``doTag i e``: 1. recursively walk over the ``BareE`` (e) starting tagging at counter i 2. return a pair ``(i', e')`` of the updated counter and tagged expr **Quiz**: https://tiny.cc/cse110a-tag-ind -> D We can now tag the entire program by calling doTag with the initial counter, and throwing away the final counter. CodeGen """"""" .. code-block:: haskell compile env (If eCond eTrue eFalse i) = compile env eCond ++ -- start with the input [ICmp (Reg EAX) (Const 0), -- test if it's false IJe (BranchFalse i)] ++ -- if it is, jump to the false label compile env eTrue ++ -- otherwise keep doing true [IJmp (BranchExit i)] ++ -- then leave [ILabel (BranchFalse i)] ++ -- the false label compile env eFalse ++ -- do the false [ILabel (BranchExit i)] -- the leave label Binary Operations ----------------- **Quiz**: http://tiny.cc/cse110a-sub-ind -> B Examples ^^^^^^^^ Numbers are easy: .. code-block:: haskell -- 33 - 10 mov eax, 33 sub eax, 10 -- 2 * 2 mov eax, 2 mul eax, 2 If either operand is a number, just use it as a literal But what about: .. code-block:: haskell let x = 10, y = 20, z = 30 in x + (y * z) -- == let x = 10, y = 20, z = 30, tmp = y * z in x + tmp -- == mov eax, 10 mov [ebp-4*1], eax -- x = 10 mov eax, 20 mov [ebp-4*2], eax -- y = 20 mov eax, 30 mov [ebp-4*3], eax -- z = 30 mov eax, [ebp-4*2] -- y mul eax, [ebp-4*3] -- y * z mov [ebp-4*4], eax -- tmp = y * z mov eax, [ebp-4*1] -- x add eax, [ebp-4*4] -- x + tmp So, if either operand is a variable, just grab it from the stack **Quiz**: https://tiny.cc/cse110a-nest-ind -> C But then what about ``(1 + 2) * (3 + 4)``? Both ``1`` and ``x`` (numbers and variables) are immediate operations - the operand values don't require any computation Administrative Normal Form (ANF) """""""""""""""""""""""""""""""""" An expression is in ANF if all primitive operations have immediate arguments **Quiz**: https://tiny.cc/cse110a-anf-ind -> C Not in ANF: ``(1 + 2) * (3 + 4)`` in ANF: .. code-block:: haskell let t1 = 1 + 2, t2 = 3 + 4 in t1 * t2 Strategy ^^^^^^^^ We can convert any expression to ANF by adding temporary variables for subexpressions, so.. we add a Normalize step to our code pipeline to make things ANF before we get to codegen ``Text =Parse> AST =Norm> ANF =Tag> ANF-Tag =CodeGen> ASM`` Types ^^^^^ .. code-block:: haskell data Prim2 = -- called prim2 since it takes 2 args Plus | Minus | Times data Expr a = ... | Prim2 Prim2 (Expr a) (Expr a) a -- so, 2+3 becomes Prim2 Plus (Number 2 ()) (Number 3 ()) () -- assembly: data Instruction = ... | IAdd Arg Arg | ISub Arg Arg | IMul Arg Arg ANF """ We can define a separate type for ANF, but that's boring so let's write a function that describes immediate code: .. code-block:: haskell isImm :: Expr a -> Bool isImm (Number _ _) = True isImm (Var _ _) = True isImm _ = False -- and isANF: isAnf :: Expr a -> Bool isAnf (Number _ _) = True isAnf (Var _ _) = True isAnf (Prim2 _ e1 e2 _) = isImm e1 && isImm e2 isAnf (If e1 e2 e3 _) = isAnf e2 && isAnf e3 isAnf (Let x e1 e2 _) = isAnf e2 Now, we define an **ANF expression** any ``Expr`` s.t. ``isAnf expr -> True``. .. code-block:: haskell type BareE = Expr () type AnfE = Expr () type AnfTagE = Expr Tag type ImmTagE = Expr Tag Compiling ^^^^^^^^^ ANF -> ASM """""""""" Going from ``AnfTagE`` to ``Asm`` is easy, since both operands for every binop will be an immediate. .. code-block:: haskell compile :: Env -> TagE -> Asm compile env (Prim2 o v1 v2) = [ IMov (Reg EAX) (immArg env v1), (prim2 o) (Reg EAX) (immArg env v2) ] prim2 :: Prim2 -> Arg -> Arg -> Instruction prim2 Plus = IAdd prim2 Minus = ISub prim2 Times = IMul immArg :: Env -> ImmTag -> Arg immArg _ (Number n _) = Const n immArg env (Var x _) = RegOffset ESP i where i = fromMaybe err (lookup x env) err = error "unbound variable" **Quiz**: https://tiny.cc/cse110a-anf2-ind -> E Bare -> ANF """"""""""" AKA **A-Normalization** The base cases are easy: .. code-block:: haskell anf (Number n) = Number n anf (Var x) = Var x But... .. note:: **Example 1**: ``1 + 2 + 3`` needs to become: .. code-block:: haskell let t1 = 1 + 2 in t1 + 3 **Example 2**: ``((1 + 2) + 3) + 4`` becomes .. code-block:: haskell let t1 = 1 + 2, t2 = t1 + 3 in t2 + 4 **Example 3**: ``((1 + 2) + 3) + ((4 + 5) + 6)`` becomes .. code-block:: haskell let t1 = 1 + 2, t2 = t1 + 3, t3 = 4 + 5, t4 = t3 + 6 in t2 + t4 So, let's write a helper function ``imm :: BareE -> ([(Id, AnfE)], ImmE)``, which returns: - a pair containing: - a list of pairs of ``ti, ai`` containing new temp vars bound to ANF exprs - an immediate value ``v`` - such that: .. code-block:: haskell let t1 = a1, ... in v Then, we can: 1. invoke ``imm`` on both the operands 2. concat the let bindings 3. apply the binop to the immediates .. code-block:: haskell anf (Prim2 o e1 e2) = lets (b1s ++ b2s) (Prim2 o (Var v1) (Var v2)) where (b1s, v1) = imm e1 (b2s, v2) = imm e2 lets :: [(Id, AnfE)] -> AnfE -> AnfE lets [] e = e lets ((x,e):bs) e' = Let x e (lets bs e') -- and lets and ifs are just recursive anf (Let x e1 e2) = Let x e1' e2' where e1' = anf e1 e2' = anf e2 anf (If e1 e2 e3) = If e1' e2' e3' where e1' = anf e1 e2' = anf e2 e3' = anf e3 -- imm imm :: BareE -> ([(Id, AnfE)], ImmE) imm (Number n l) = ( [], Number n l ) imm (Id x l) = ( [], Id x l ) imm (Prim2 o e1 e2) = ( b1s ++ b2s ++ [(t, Prim2 o v1 v2)] , Id t ) where t = makeFreshVar () (b1s, v1) = imm e1 (b2s, v2) = imm e2 imm e@(If _ _ _) = immExp e imm e@(Let _ _ _) = immExp e immExp :: AnfE -> ([(Id, AnfE)], ImmE) immExp e = ([(t, e')], t) where e' = anf e t = makeFreshVar () So what's up about ``makeFreshVar``? Well... we're going to have to pass around a counter. .. code-block:: haskell anf :: Int -> BareE -> (Int, AnfE) imm :: Int -> AnfE -> (Int, [(Id, AnfE)], ImmE) -- code not included. Check out the slides, page 20. fresh :: Int -> (Id, Int) fresh n = (n+1, "t" ++ show n)