Heap Data ========= Let's add support for data structures! For this, we need heap allocation We already have support for two: .. code-block:: haskell data Ty = TNumber | TBoolean and we could add things like floats or chars, etc But these are all fixed-length structures! Unbounded Structures -------------------- Let's look at unbounded data structures like lists or trees - we need to put things on the heap Pairs ----- Let's add pairs. Constructing pairs: ``pair := (e0, e1)`` Accessing pairs: ``p[0], p[1]`` So how do we: - represent pairs in memory - construct pairs in assembly - access pairs in assembly? Representation ^^^^^^^^^^^^^^ For all our previous data types, they were always a single word on the stack or in a register. This doesn't work for pairs though. Pointers """""""" We represent a pair as a pointer to a block of 2 adjacent words of memory. For example, the structure ``(1, (2, 3))`` is stored on the heap as: .. code-block:: text literal 1 pointer a a: literal 2 literal 3 How do we tell the difference between a number and a pointer? We'll tag the last 3 bits. - if the LSB is 0, it's a number - if the 3 LSBs are 111, it's a bool - if the 3 LSBs are 001, it's a pointer This only leaves 29 bits for the address, but if we always allocate in 8-byte chunks, the 3 LSBs don't matter! Construction ^^^^^^^^^^^^ To construct a pair, we need to: - allocate a 2-word block and get the starting address from eax - copy the first value into ``[eax]``, and the second ``[eax+4]`` - tag the last bit of eax with 1 The resulting eax is the value of the pair (a pointer). Also, both elements of a pair need to be immediate. And pointers are immediates! Allocation ^^^^^^^^^^ The global register ``esi`` keeps track of where the next free block on the heap is so when we need a new block, we can: - copy ``esi`` into ``eax`` - set the last bit to 1 - increment ``esi`` by 8 (note: some padding might be required!) But this means we have no garbage collection! Example: .. code-block:: haskell let p = (3, (4, 5)), x = p[0], y = p[1][0], z = p[1][1] in x + y + z -- becomes, after ANFing let anf0 = (4, 5), p = (3, anf0), x = p[0] anf1 = p[1], y = anf1[0], z = anf1[1], anf2 = x + y in anf2 + z -- and the heap becomes 0: 0x8 -- literal 4 4: 0xA -- literal 5 8: 0x6 -- literal 3 c: 0x1 -- pointer to 0x0 -- and the stack (in the in clause) 18: 0xE -- anf2: literal 7 14: 0xA -- z: literal 5 10: 0x8 -- y: literal 4 0c: 0x1 -- anf1: pointer to 0x0 08: 0x6 -- x: literal 3 04: 0x9 -- p: pointer to 0x8 00: 0x1 -- anf0: pointer to 0x0 Accessing ^^^^^^^^^ To access pair elements (i.e. compile expressions like ``e[0]``): - check that ``e`` is an immediate that is a pointer - load ``e`` into ``eax`` - remove the tag bit from ``eax`` - copy the value in ``[eax]`` (or ``[eax+4]`` for snd) into ``eax`` Implementation ^^^^^^^^^^^^^^ First, in the C runtime, we need to allocate heap space and pass a heap pointer to the assembly. And also update ``print()`` to print pairs. .. code-block:: c int main(int argc, char** argv) { int* HEAP = calloc(HEAP_SIZE, sizeof(int)); int result = our_code_starts_here(HEAP); // find this param and put it in esi! print(result); return 0; } // for printing, we need to recursively look until we find a primitive int isPair(int p) { return (p & 0x7) == 0x1; } void print(int val) { if (val & 0x00000001 ^ 0x00000001) { // val is a number printf("%d", val >> 1); } else if (val == 0xFFFFFFFF) { // val is true printf("true"); } else if (val == 0x7FFFFFFF) { // val is false printf("false"); } else if (isPair(val)) { int* valp = (int*) (val - 1); // extract address printf("("); print(*valp); // print first element printf(", "); print(*(valp + 1)); // print second element printf(")"); } else { printf("Unknown value: %#010x", val); } } Types """"" .. code-block:: haskell data Expr a = ... | Pair (Expr a) (Expr a) a | GetItem (Expr a) Field a data Field = First | Second -- dynamic types data Ty = TNumber | TBoolean | TPair -- register data Register = ... | ESI Transforms """""""""" Now, our code has to: - initialize ``esi`` - construct pairs - access pairs The latter two are handled by the AST. **Initialize ESI** .. code-block:: haskell prelude :: [Instruction] prelude = [ IMov (Reg ESI) (RegOffset 4 ESP), -- copy param off stack IAdd (Reg ESI) (Const 8), -- adjust to ensure 8-byte aligned IAnd (Reg ESI) (HexConst 0xfffffff8) ] **Construct** .. code-block:: haskell type Asm = [Instruction] compileExpr env (Pair v1 v2) = pairAlloc -- 1. allocate pair ++ pairCopy First (immArg env v1) -- 2. copy values ++ pairCopy Second (immArg env v2) ++ setTag EAX TPair -- 3. tag pointer pairAlloc :: Asm pairAlloc = [ IMov (Reg EAX) (Reg ESI), -- copy free addr into eax IAdd (Reg ESI) (Const 8) ] -- increment esi by 8 pairCopy :: Field -> Arg -> Asm pairCopy fld a = [ IMov (Reg EBX) a, IMov (pairAddr fld) (Reg EBX) ] -- the field's slot is either [eax] or [eax+4] depending on whether it's the fst or snd field pairAddr :: Field -> Arg pairAddr fld = Sized DWordPtr (RegOffset (4 * fieldOffset) EAX) fieldOffset :: Field -> Int fieldOffset First = 0 fieldOffset Second = 1 setTag :: Register -> Ty -> Asm setTag r ty = [ IAdd (Reg r) (typeTag ty) ] typeTag :: Ty -> Arg typeTag TNumber = HexConst 0x0 typeTag TBoolean = HexConst 0x7 typeTag TPair = HexConst 0x1 **Access** .. code-block:: haskell compileExpr env (GetItem e fld) = assertType env e TPair -- 1. check that e is a (pair) pointer ++ [ IMov (Reg EAX) (immArg env e) ] -- 2. load pointer into eax ++ unsetTag EAX TPair -- 3. remove tag bit to get address ++ [ IMov (Reg EAX) (pairAddr fld) ] -- 4. copy value from resp. slot to eax unsetTag :: Register -> Ty -> Asm unsetTag r ty = [ ISub (Reg r) (typeTag ty) ] N-ary Tuples ------------ How do we generalize pairs to allow for tuples of arbitrary size? Well, we can store them like Haskell lists ``(first, rest)`` Construction ^^^^^^^^^^^^ .. code-block:: haskell def tup3(x1, x2, x3): (x1, (x2, x3)) -- and so on... -- although, we should really encode them like Haskell lists and not like this. Accessing ^^^^^^^^^ .. code-block:: haskell -- get.egg def get(t, i): if i == 0: t[0] else: if istuple(t[1]): get(t[1], i-1) else: false -- or some error Constructing Lists ^^^^^^^^^^^^^^^^^^ .. code-block:: haskell def empty(): false def cons(h, t): (h, t) -- so, [1,2,3,4] becomes cons(1, cons(2, cons(3, cons(4, empty())))) Accessing Lists ^^^^^^^^^^^^^^^ .. code-block:: haskell def isEmpty(l): l == empty() def head(l): l[0] def tail(l): l[1] Now, we can do things like: .. code-block:: haskell def range(i, j): if i < j: cons(i, range(i + 1, j)) else: empty() def sum(xs): if isEmpty(xs): 0 else: head(xs) + sum(tail(xs))