Mapping High-Level Constructs to LLVM IR

Table of Contents

Introduction

A Quick Primer

Some Useful LLVM Tools

Mapping Basic Constructs to LLVM IR

Global Variables

Local Variables

Constants

Constant Expressions

Size-Of Computations

Function Prototypes

Function Definitions

Simple Public Functions

Simple Private Functions

Functions with a Variable Number of Parameters

Exception-Aware Functions

Function Pointers

Casts

Bitwise Casts

Zero-Extending Casts (Unsigned Upcasts)

Sign-Extending Casts (Signed Upcasts)

Truncating Casts (Signed and Unsigned Downcasts)

Floating-Point Extending Casts (Float Upcasts)

Floating-Point Truncating Casts (Float Downcasts)

Pointer-to-Integer Casts

Integer-to-Pointer Casts

Address-Space Casts (Pointer Casts)

Incomplete Structure Types

Structures

Nested Structures

Unions

Structure Expressions

Getting a Pointer to a Structure Member

Mapping Control Structures to LLVM IR

Mapping Advanced Constructs to LLVM IR

Lambda Functions

Closures

Generators

Mapping Exception Handling to LLVM IR

Exception Handling by Propagated Return Value

Setjmp/Longjmp Exception Handling

Zero Cost Exception Handling

Resources

Mapping Object-Oriented Constructs to LLVM IR

Classes

Virtual Methods

Single Inheritance

Multiple Inheritance

Virtual Inheritance

Interfaces

Boxing and Unboxing

Class Equivalence Test

Class Inheritance Test

The New Operator

The Instance New Operator

The Array New Operator

Interoperating with a Runtime Library

Interfacing to the Operating System

How to Interface to POSIX Operating Systems

Sample POSIX "Hello World" Application

How to Interface to the Windows Operating System

Sample Windows "Hello World" Application

Resources

Epilogue

Appendix A: How to Implement a String Type in LLVM

Appendix B: Task List

Appendix C: Thank You

Introduction

In this document we will take a look at how to map various classic high-level programming language constructs to LLVM IR. The purpose of the document is to make the learning curve less steep for aspiring LLVM users.

For the sake of simplicity, we'll be working with a 32-bit target machine so that pointers and word-sized operands are 32-bits.

Also, for the sake of readability we do not mangle (encode) names. Rather, they are given simple, easy-to-read names that reflect their purpose. A production compiler for any language that supports overloading would generally need to mangle the names so as to avoid conflicts between symbols.

A Quick Primer

Here are a few things that you should know before reading this document:

  1. LLVM IR is not machine code, but sort of the step just above assembly.
  2. LLVM IR is highly typed so expect to be told when you do something wrong.
  3. LLVM IR does not differentiate between signed and unsigned integers.
  4. LLVM IR assumes two's complement signed integers so that say trunc works equally well on signed and unsigned integers.
  5. Global symbols begin with an at sign (@).
  6. Local symbols begin with a percent symbol (%).
  7. All symbols must be declared or defined.
  8. Don't worry that the LLVM IR at times can seem somewhat lengthy when it comes to expressing something; the optimizer will ensure the output is well optimized and you'll often see two or three LLVM IR instructions be coalesced into a single machine code instruction.
  9. If in doubt, consult the Language Reference. If there is a conflict between the Language Reference and this document, this document is wrong!
  10. All LLVM IR examples are presented without a data layout and without a target triple. You need to add those yourself, if you want to actually build and run the samples. Get them from Clang for your platform.

Some Useful LLVM Tools

The most important LLVM tools for use with this article are as follows:

Name Function Reads Writes Arguments
clang C Compiler .c .ll-c -emit-llvm -S
clang++ C++ Compiler.cpp .ll-c -emit-llvm -S
llvm-disDisassembler.bc .ll
opt Optimizer .bc/.llsame
llc IR Compiler .ll .s

While you are playing around with generating or writing LLVM IR, you may want to add the option -fsanitize=undefined to Clang/Clang++ insofar you use either of those. This option makes Clang/Clang++ insert run-time checks in places where it would normally output an ud2 instruction. This will likely save you some trouble if you happen to generate undefined LLVM IR. Please notice that this option only works for C and C++ compiles.

Mapping Basic Constructs to LLVM IR

In this chapter, we'll look at the most basic and simple constructs that are part of nearly all imperative/OOP languages out there.

Global Variables

Global varibles are trivial to implement in LLVM IR:

int variable = 14;

int main()
{
    return variable;
}

Becomes:

@variable = global i32 14

define i32 @main() nounwind {
    %1 = load i32* @variable
    ret i32 %1
}

Please notice that LLVM views global variables as pointers; so you must explicitly dereference the global variable using the load instruction when accessing its value, likewise you must explicitly store the value of a global variable using the store instruction.

Local Variables

There are two kinds of local variables in LLVM:

  1. Register-allocated local variables (temporaries).
  2. Stack-allocated local variables.

The former is created by introducing a new symbol for the variable:

%1 = some computation

The latter is created by allocating the variable on the stack:

%2 = alloca i32

Please notice that alloca yields a pointer to the allocated type. As is generally the case in LLVM, you must explicitly use a load or store instruction to read or write the value respectively.

The use of alloca allows for a neat trick that can simplify your code generator in some cases. The trick is to explicitly allocate all mutable variables, including arguments, on the stack, initialize them with the appropriate initial value and then operate on the stack as if that was your end goal. The trick is to run the "memory to register promotion" pass on your code as part of the optimization phase. This will make LLVM store as many of the stack variables in registers as it possibly can. That way you don't have to ensure that the generated program is in SSA form but can generate code without having to worry about this aspect of the code generation.

This trick is also described in chapter 7.4, Mutable Variables in Kaleidoscope, in the OCaml tutorial on the LLVM website.

Constants

There are two different kinds of constants:

  1. Constants that do not occupy allocated memory.
  2. Constants that do occupy allocated memory.

The former are always expanded inline by the compiler as there is no LLVM IR equivalent of those. In other words, the compiler simply inserts the constant value wherever it is being used in a computation:

%1 = add i32 %0, 17     ; 17 is an inlined constant

Constants that do occupy memory are defined using the constant keyword:

@hello = internal constant [6 x i8] c"hello\00"
%struct = type { i32, i8 }
@struct_constant = internal constant %struct { i32 16, i8 4 }

Such a constant is really a global variable whose visibility can be limited with private or internal so that it is invisible outside the current module.

Constant Expressions

TODO: Document the various forms of constant expressions that exist and .. how they can be very useful. For instance, getelementptr constant expressions are almost unavoidable in all but the simplest programs.

Size-Of Computations

Even though the compiler ought to know the exact size of everything in use (for statically checked languages), it can at times be convenient to ask LLVM to figure out the size of a structure for you. This is done with the following little snippet of code:

%Struct = type { i8, i32, i8* }
@Struct_size = constant i32 ptrtoint (%Struct* getelementptr (%Struct* null, i32 1)) to i32

@Struct_size will now contain the size of the structure %Struct. The trick is to compute the offset of the second element in the zero-based array starting at null and that way get the size of the structure.

Function Prototypes

A function prototype, aka a profile, is translated into an equivalent declare declaration in LLVM IR:

int Bar(int value);

Becomes:

Or you can leave out the descriptive parameter name:

Function Definitions

The translation of function definitions depends on a range of factors, ranging from the calling convention in use, whether the function is exception-aware or not, and if the function is to be publicly available outside the module.

Simple Public Functions

The most basic model is:

int Bar(void)
{
    return 17;
}

Becomes:

define i32 @Bar() nounwind {
    ret i32 17
}

Simple Private Functions

A static function is a function private to a module that cannot be referenced from outside of the defining module:

Functions with a Variable Number of Parameters

To call a so-called vararg function, you first need to define or declare it using the elipsis (...) and then you need to make use of a special syntax for function calls that allows you to explictly list the types of the parameters of the function that is being called. This "hack" exists to allow overriding a call to a function such as a function with variable parameters. Please notice that you only need to specify the return type once, not twice as you'd have to do if it was a true cast:


declare i32 @printf(i8*, ...) nounwind

@.text = internal constant [20 x i8] c"Argument count: %d\0A\00"

define i32 @main(i32 %argc, i8** %argv) nounwind {
    ; printf("Argument count: %d\n", argc)
    %1 = call i32 (i8*, ...)* @printf(i8* getelementptr([20 x i8]* @.text, i32 0, i32 0), i32 %argc)
    ret i32 0
}

Exception-Aware Functions

A function that is aware of being part of a larger scheme of exception-handling is called an exception-aware function. Depending upon the type of exception handling being employed, the function may either return a pointer to an exception instance, create a setjmp/longjmp frame, or simply specify the uwtable (for UnWind Table) attribute. These cases will all be covered in great detail in the chapter on Exception Handling below.

Function Pointers

Function pointers are expressed almost like in C and C++:

int (*Function)(char *buffer);

Becomes:

Casts

There are nine different types of casts:

  1. Bitwise casts (type casts).
  2. Zero-extending casts (unsigned upcasts).
  3. Sign-extending casts (signed upcasts).
  4. Truncating casts (signed and unsigned downcasts).
  5. Floating-point extending casts (float upcasts).
  6. Floating-point truncating casts (float downcasts).
  7. Pointer-to-integer casts (todo: Document pointer-to-integer casts).
  8. Integer-to-pointer casts (todo: Document integer-to-pointer casts).
  9. Address-space casts (pointer casts).

Bitwise Casts

A bitwise cast (bitcast) reinterprets a given bit pattern without changing any bits in the operand. For instance, you could make a bitcast of a pointer to byte into a pointer to some structure as follows:

typedef struct
{
    int a;
} Foo;

extern void *malloc(size_t size);
extern void free(void *value);

void allocate()
{
    Foo *foo = (Foo *) malloc(sizeof(Foo));
    foo.a = 12;
    free(foo);
}

Becomes:

%Foo = type { i32 }

declare i8* @malloc(i32)
declare void @free(i8*)

define void @allocate() nounwind {
    %1 = call i8* @malloc(i32 4)
    %foo = bitcast i8* %1 to %Foo*
    %2 = getelementptr %Foo* %foo, i32 0, i32 0
    store i32 12, i32* %2
    call void @free(i8* %1)
    ret void
}

Zero-Extending Casts (Unsigned Upcasts)

To upcast an unsigned value like in the example below:

uint8 byte = 117;
uint32 word;

void main()
{
    /* The compiler automatically upcasts the byte to a word. */
    word = byte;
}

You use the zext instruction:

@byte = global i8 117
@word = global i32 0

define void @main() nounwind {
    %1 = load i8* @byte
    %2 = zext i8 %1 to i32
    store i32 %2, i32* @word
    ret void
}

Sign-Extending Casts (Signed Upcasts)

To upcast a signed value, you replace the zext instruction with the sext instruction and everything else works just like in the previous section:

@char = global i8 -17
@int  = global i32 0

define void @main() nounwind {
    %1 = load i8* @char
    %2 = sext i8 %1 to i32
    store i32 %2, i32* @int
    ret void
}

Truncating Casts (Signed and Unsigned Downcasts)

Both signed and unsigned integers use the same instruction, trunc, to reduce the size of the number in question. This is because LLVM IR assumes that all signed integer values are in two's complement format for which reason trunc is sufficient to handle both cases:

@int = global i32 -1
@char = global i8 0

define void @main() nounwind {
    %1 = load i32* @int
    %2 = trunc i32 %1 to i8
    store i8 %2, i8* @char
    ret void
}

Floating-Point Extending Casts (Float Upcasts)

Floating points numbers can be extended using the fpext instruction:

float small = 1.25;
double large;

void main()
{
    /* The compiler inserts an implicit float upcast. */
    large = small;
}

Becomes:

@small = global float 1.25
@large = global double 0.0

define void @main() nounwind {
    %1 = load float* @small
    %2 = fpext float %1 to double
    store double %2, double* @large
    ret void
}

Floating-Point Truncating Casts (Float Downcasts)

Likewise, a floating point number can be truncated to a smaller size:

@large = global double 1.25
@small = global float 0.0

define void @main() nounwind {
    %1 = load double* @large
    %2 = fptrunc double %1 to float
    store float %2, float* @small
    ret void
}

Pointer-to-Integer Casts

todo: Document pointer-to-integer casts.

Integer-to-Pointer Casts

todo: Document integer-to-pointer casts.

Address-Space Casts (Pointer Casts)

todo: Find a useful example of an address-space casts, using the addrspacecast instruction, to be included here.

Incomplete Structure Types

Incomplete types are very useful for hiding the details of what fields a given structure has. A well-designed C interface can be made so that no details of the structure are revealed to the client, so that the client cannot inspect or modify private members inside the structure:

void Bar(struct Foo *);

Becomes:

%Foo = type opaque
declare void @Bar(%Foo)

Structures

LLVM IR already includes the concept of structures so there isn't much to do:

struct Foo
{
  size_t _length;
};

It is only a matter of discarding the actual field names and then index with numerals starting from zero:

%Foo = type { i32 }

Nested Structures

Nested structures are straightforward:

%Object = type {
    %Object*,      ; 0: above; the parent pointer
    i32            ; 1: value; the value of the node
}

Unions

Unions are getting more and more rare as the years have shown that they are quite dangerous to use; especially the C variant that does not have a selector field to indicate which of the union's variants are valid. Some may still have a legacy reason to use unions. In fact, LLVM does not support unions at all:


union Foo
{
    int a;
    char *b;
    double c;
};
Foo Union;

Becomes this when run through Clang++:

%union.Foo = type { double }
@Union = %union.Foo { 0.0 }

What happened here? Where did the other union members go? The answer is that in LLVM there are no unions; there are only structs that can be cast into whichever type the front-end want to cast the struct into. So to access the above union from LLVM IR, you'd use the bitcast instruction to cast a pointer to the "union" into whatever pointer you'd want it to be:

%1 = bitcast %union.Foo* @Union to i32*
store i32 1, i32* %1
%2 = bitcast %union.Foo* @Union to i8**
store i8* null, i8** %2

This may seem strange, but the truth is that a union is nothing more than a piece of memory that is being accessed using different implicit pointer casts.

If you want to support unions in your front-end language, you should simply allocate the total size of the union (i.e. the size of the largest member) and then generate code to reinterpret the allocated memory as needed.

The cleanest approach might be to simply allocate a range of bytes (i8), possibly with alignment padding at the end, and then cast whenever you access the structure. That way you'd be sure you did everything properly all the time.

Structure Expressions

As already told, structure members are referenced by index rather than by name in LLVM IR. And at no point do you need to, or should you, compute the offset of a given structure member yourself. The getelementptr instruction is available to compute a pointer to any structure member with no overhead (the getelementptr instruction is typically coascaled into the actual load or store instruction).

Getting a Pointer to a Structure Member

The C++ code below illustrates various things you might want to do:

struct Foo
{
    int a;
    char *b;
    double c;
};

int main(void)
{
    Foo foo;
    char **bptr = &foo.b;

    Foo bar[100];
    bar[17].c = 0.0;

    return 0;
}

Becomes:

%Foo = type {
    i32,        ; 0: a
    i8*,        ; 1: b
    double      ; 2: c
}

define i32 @main() nounwind {
    ; Foo foo
    %foo = alloca %Foo
    ; char **bptr = &foo.b
    %1 = getelementptr %Foo* %foo, i32 0, i32 1

    ; Foo bar[100]
    %bar = alloca %Foo, i32 100
    ; bar[17].c = 0.0
    %2 = getelementptr %Foo* %bar, i32 17, i32 2
    store double 0.0, double* %2

    ret i32 0
}

todo: Document the extractvalue and insertvalue instructions.

Mapping Control Structures to LLVM IR

todo: Add common control structures such as if, for, switch, and while.

todo: Explain the purpose of the phi instruction; show how it becomes obvious that you need it as soon as you encounter multiple blocks that contribute a value through different temporaries. In a way, phi ought to have been called "join" as it sort of joins up subexpressions from different basic blocks.

Mapping Advanced Constructs to LLVM IR

In this chapter, we'll look at various non-OOP constructs that are highly useful and are becoming more and more widespread in use.

Lambda Functions

A lambda function is an anonymous function with the added spice that it may freely refer to the local variables (including argument variables) in the containing function. Lambdas are implemented just like Pascal's nested functions, except the compiler is responsible for generating an internal name for the lambda function. There are a few different ways of implementing lambda functions (see `Wikipedia on nested functions Wikipedia on Nested Functions for more information).

int foo(int a)
{
    auto function = [a](int x) { return x + a; }
    return function(10);
}

Here the "problem" is that the lambda function references a local variable of the caller, namely a, even though the lambda function is a function of its own. This can be solved easily by passing the local variable in as an implicit argument to the lambda function:

define internal i32 @lambda(i32 %a, i32 %x) alwaysinline nounwind {
    %1 = add i32 %a, %x
    ret i32 %1
}

define i32 @foo(i32 %a) nounwind {
    %1 = call i32 @lambda(i32 %a, i32 10)
    ret i32 %1
}

Alternatively, if the lambda function uses more than a few variables, you can wrap them up in a structure which you pass in a pointer to the lambda function:

int foo(int a, int b)
{
    int c = integer_parse();
    auto function = [a, b, c](int x) { return (a + b - c) * x; }
    return function(10);
}

Becomes:

%Lambda_Arguments = type {
    i32,        ; 0: a (argument)
    i32,        ; 1: b (argument)
    i32         ; 2: c (local)
}

define i32 @lambda(%Lambda_Arguments* %args, i32 %x) nounwind {
    %1 = getelementptr %Lambda_Arguments* %args, i32 0, i32 0
    %a = load i32* %1
    %2 = getelementptr %Lambda_Arguments* %args, i32 0, i32 1
    %b = load i32* %2
    %3 = getelementptr %Lambda_Arguments* %args, i32 0, i32 2
    %c = load i32* %3
    %4 = add i32 %a, %b
    %5 = sub i32 %4, %c
    %6 = mul i32 %5, %x
    ret i32 %6
}

declare i32 @integer_parse()

define i32 @foo(i32 %a, i32 %b) nounwind {
    %args = alloca %Lambda_Arguments
    %1 = getelementptr %Lambda_Arguments* %args, i32 0, i32 0
    store i32 %a, i32* %1
    %2 = getelementptr %Lambda_Arguments* %args, i32 0, i32 1
    store i32 %b, i32* %2
    %c = call i32 @integer_parse()
    %3 = getelementptr %Lambda_Arguments* %args, i32 0, i32 2
    store i32 %c, i32* %3
    %4 = call i32 @lambda(%Lambda_Arguments* %args, i32 10)
    ret i32 %4
}

Obviously there are some possible variations over this theme:

  1. You could pass all implicit as explicit arguments as arguments.
  2. You could pass all implicit as explicit arguments in the structure.
  3. You could pass in a pointer to the frame of the caller and let the lambda function extract the arguments and locals from the input frame.

Closures

todo: Describe closures.

Generators

A generator is a function that repeatedly yields a value in such a way that the function's state is preserved across the repeated calls of the function; this includes the function's local offset at the point it yielded a value.

The most straigthforward way to implement a generator is by wrapping all of its state variables (arguments, local variables, and return values) up into an ad-hoc structure and then pass the address of that structure to the generator.

Somehow, you need to keep track of which block of the generator you are doing on each call. This can be done in various ways; the way we show here is by using LLVM's blockaddress instruction to save the address of the next local block of code that should be executed. Other implementations use a simple state variable and then do a switch-like dispatch according to the value of the state variable. In both cases, the end result is the same: A different block of code is executed for each local block in the generator.

The important thing is to think of iterators as a sort of micro-thread that is resumed whenever the iterator is called again. In other words, we need to save the address of how far the iterator got on each pass through so that it can resume as if a microscopic thread switch had occured. So we save the address of the instruction after the return instruction so that we can resume running as if we never had returned in the first place.

I resort to pseudo-C++ because C++ does not directly support generators. First we look at a very simple case then we advance on to a slightly more complex case:

#include <stdio.h>

generator int foo()
{
    yield 1;
    yield 2;
    yield 3;
}

int main()
{
    foreach (int i in foo())
        printf("Value: %d\n", i);

    return 0;
}

This becomes:

%foo_context = type {
    i8*,      ; 0: block (state)
    i32       ; 1: value (result)
}

define void @foo_setup(%foo_context* %context) nounwind {
    ; set up 'block'
    %1 = getelementptr %foo_context* %context, i32 0, i32 0
    store i8* blockaddress(@foo_yield, %.yield1), i8** %1

    ret void
}

; The boolean returned indicates if a result was available or not.
; Once no more results are available, the caller is expected to not call
; the iterator again.
define i1 @foo_yield(%foo_context* %context) nounwind {
    ; dispatch to the active generator block
    %1 = getelementptr %foo_context* %context, i32 0, i32 0
    %2 = load i8** %1
    indirectbr i8* %2, [ label %.yield1, label %.yield2, label %.yield3, label %.done ]

.yield1:
    ; store the result value (1)
    %3 = getelementptr %foo_context* %context, i32 0, i32 1
    store i32 1, i32* %3

    ; make 'block' point to next block to execute
    %4 = getelementptr %foo_context* %context, i32 0, i32 0
    store i8* blockaddress(@foo_yield, %.yield2), i8** %4

    ret i1 1

.yield2:
    ; store the result value (2)
    %5 = getelementptr %foo_context* %context, i32 0, i32 1
    store i32 2, i32* %5

    ; make 'block' point to next block to execute
    %6 = getelementptr %foo_context* %context, i32 0, i32 0
    store i8* blockaddress(@foo_yield, %.yield3), i8** %6

    ret i1 1

.yield3:
    ; store the result value (3)
    %7 = getelementptr %foo_context* %context, i32 0, i32 1
    store i32 3, i32* %7

    ; make 'block' point to next block to execute
    %8 = getelementptr %foo_context* %context, i32 0, i32 0
    store i8* blockaddress(@foo_yield, %.done), i8** %8

    ret i1 1

.done:
    ret i1 0
}

declare i32 @printf(i8*, ...) nounwind

@.string = internal constant [11 x i8] c"Value: %d\0A\00"

define void @main() nounwind {
    ; allocate and initialize generator context structure
    %context = alloca %foo_context
    call void @foo_setup(%foo_context* %context)
    br label %.head

.head:
    ; foreach (int i in foo())
    %1 = call i1 @foo_yield(%foo_context* %context)
    br i1 %1, label %.body, label %.tail

.body:
    %2 = getelementptr %foo_context* %context, i32 0, i32 1
    %3 = load i32* %2
    %4 = call i32 (i8*, ...)* @printf(i8* getelementptr([11 x i8]* @.string, i32 0, i32 0), i32 %3)
    br label %.head

.tail:
    ret void
}

And now for a slightly more complex example that involves local variables:

#include <stdio.h>

generator int foo(int start, int after)
{
    for (int index = start; index < after; index++)
    {
        if (index % 2 == 0)
            yield index + 1;
        else
            yield index - 1;
    }
}

int main(void)
{
    foreach (int i in foo(0, 5))
        printf("Value: %d\n", i);

    return 0;
}

This becomes something like this:

%foo_context = type {
    i8*,      ; 0: block (state)
    i32,      ; 1: start (argument)
    i32,      ; 2: after (argument)
    i32,      ; 3: index (local)
    i32       ; 4: value (result)
}

define void @foo_setup(%foo_context* %context, i32 %start, i32 %after) nounwind {
    ; set up 'block'
    %1 = getelementptr %foo_context* %context, i32 0, i32 0
    store i8* blockaddress(@foo_yield, %.init), i8** %1

    ; set up 'start'
    %2 = getelementptr %foo_context* %context, i32 0, i32 1
    store i32 %start, i32* %2

    ; set up 'after'
    %3 = getelementptr %foo_context* %context, i32 0, i32 2
    store i32 %after, i32* %3

    ret void
}

define i1 @foo_yield(%foo_context* %context) nounwind {
    ; dispatch to the active generator block
    %1 = getelementptr %foo_context* %context, i32 0, i32 0
    %2 = load i8** %1
   indirectbr i8* %2, [ label %.init, label %.loop_close, label %.end ]

.init:
    ; copy argument 'start' to the local variable 'index'
    %3 = getelementptr %foo_context* %context, i32 0, i32 1
    %start = load i32* %3
    %4 = getelementptr %foo_context* %context, i32 0, i32 3
    store i32 %start, i32* %4
    br label %.head

.head:
    ; for (; index < after; )
    %5 = getelementptr %foo_context* %context, i32 0, i32 3
    %index = load i32* %5
    %6 = getelementptr %foo_context* %context, i32 0, i32 2
    %after = load i32* %6
    %again = icmp slt i32 %index, %after
    br i1 %again, label %.loop_begin, label %.exit

.loop_begin:
    %7 = srem i32 %index, 2
    %8 = icmp eq i32 %7, 0
    br i1 %8, label %.even, label %.odd

.even:
    ; store 'index + 1' in 'value'
    %9 = add i32 %index, 1
    %10 = getelementptr %foo_context* %context, i32 0, i32 4
    store i32 %9, i32* %10

    ; make 'block' point to the end of the loop (after the yield)
    %11 = getelementptr %foo_context* %context, i32 0, i32 0
    store i8* blockaddress(@foo_yield, %.loop_close), i8** %11

    ret i1 1

.odd:
    ; store 'index - 1' in value
    %12 = sub i32 %index, 1
    %13 = getelementptr %foo_context* %context, i32 0, i32 4
    store i32 %12, i32* %13

    ; make 'block' point to the end of the loop (after the yield)
    %14 = getelementptr %foo_context* %context, i32 0, i32 0
    store i8* blockaddress(@foo_yield, %.loop_close), i8** %14

    ret i1 1

.loop_close:
    ; increment 'index'
    %15 = getelementptr %foo_context* %context, i32 0, i32 3
    %16 = load i32* %15
    %17 = add i32 %16, 1
    store i32 %17, i32* %15
    br label %.head

.exit:
    ; make 'block' point to the %.end label
    %x = getelementptr %foo_context* %context, i32 0, i32 0
    store i8* blockaddress(@foo_yield, %.end), i8** %x
    br label %.end

.end:
    ret i1 0
}

declare i32 @printf(i8*, ...) nounwind

@.string = internal constant [11 x i8] c"Value: %d\0A\00"

define i32 @main() nounwind {
    ; allocate and initialize generator context structure
    %context = alloca %foo_context
    call void @foo_setup(%foo_context* %context, i32 0, i32 5)
    br label %.head

.head:
    ; foreach (int i in foo(0, 5))
    %1 = call i1 @foo_yield(%foo_context* %context)
    br i1 %1, label %.body, label %.tail

.body:
    %2 = getelementptr %foo_context* %context, i32 0, i32 4
    %3 = load i32* %2
    %4 = call i32 (i8*, ...)* @printf(i8* getelementptr([11 x i8]* @.string, i32 0, i32 0), i32 %3)
    br label %.head

.tail:
    ret i32 0
}

Another possible way of doing the above would be to generate an LLVM IR function for each state and then store a function pointer in the context structure, which is updated whenever a new state/function needs to be invoked.

Mapping Exception Handling to LLVM IR

Exceptions can be implemented in one of three ways:

  1. The simple way, by using a propagated return value.
  2. The bulky way, by using setjmp and longjmp.
  3. The efficient way, by using a zero-cost exception ABI.

Please notice that many compiler developers with respect for themselves won't accept the first method as a proper way of handling exceptions. However, it is unbeatable in terms of simplicity and can likely help people to understand that implementing exceptions does not need to be very difficult.

The second method is used by some production compilers, but it has large overhead both in terms of code bloat and the cost of a try-catch statement (because all CPU registers are saved using setjmp whenever a try statement is encountered).

The third method is very advanced but in return does not add any cost to execution paths where no exceptions are being thrown. This method is the de-facto "right" way of implementing exceptions, whether you like it or not. LLVM directly supports this kind of exception handling.

In the three sections below, we'll be using this sample and transform it:

#include <stdio.h>
#include <stdlib.h>

class Object
{
public:
    virtual ~Object()
    {
    }
};

class Exception: public Object
{
public:
    Exception(const char *text):
        _text(text)
    {
    }

    const char *GetText(const)
    {
        return _text;
    }

private:
    const char *_text;
}

class Foo
{
public:
    int GetLength() const
    {
        return _length;
    }

    void SetLength(int value)
    {
        _length = value;
    }

private:
    int _length;
};

int Bar(bool fail)
{
    Foo foo;
    foo.SetLength(17);
    if (fail)
        throw new Exception("Exception requested by caller");
    foo.SetLength(24);
    return foo.GetLength();
}

int main(int argc, const char *argv[])
{
    int result;

    try
    {
        /* The program throws an exception if an argument is specified. */
        bool fail = (argc >= 2);

        /* Let callee decide if an exception is thrown. */
        int value = Bar(fail);

        result = EXIT_SUCCESS;
    }
    catch (Exception *that)
    {
        printf("Error: %s\n", that->GetText());
        result = EXIT_FAILURE;
    }
    catch (...)
    {
        puts("Internal error: Unhandled exception detected");
        result = EXIT_FAILURE;
    }

    return result;
}

Exception Handling by Propagated Return Value

This method is a compiler-generated way of implicitly checking each function's return value. Its main advantage is that it is simple - at the cost of many mostly unproductive checks of return values. The great thing about this method is that it readily interfaces with a host of languages and environments - it is all a matter of returning a pointer to an exception.

The C++ sample above maps to the following code:

;********************* External and Utility functions *********************

declare i8* @malloc(i32) nounwind
declare void @free(i8*) nounwind
declare i32 @printf(i8* noalias nocapture, ...) nounwind
declare i32 @puts(i8* noalias nocapture) nounwind


;***************************** Object class *******************************

%Object_vtable_type = type {
    %Object_vtable_type*,         ; 0: above: parent class vtable pointer
    i8*                           ; 1: class: class name (usually mangled)
    ; virtual methods would follow here
}

@.Object_class_name = private constant [7 x i8] c"Object\00"

@.Object_vtable = private constant %Object_vtable_type {
    %Object_vtable_type* null,    ; This is the root object of the object hierarchy
    i8* getelementptr([7 x i8]* @.Object_class_name, i32 0, i32 0)
}

%Object = type {
    %Object_vtable_type*          ; 0: vtable: class vtable pointer (always non-null)
    ; class data members would follow here
}

; returns true if the specified object is identical to or derived from the
; class with the specified name.
define i1 @Object_IsA(%Object* %object, i8* %name) nounwind {
.init:
    ; if (object == null) return false
    %0 = icmp ne %Object* %object, null
    br i1 %0, label %.once, label %.exit_false

.once:
    %1 = getelementptr %Object* %object, i32 0, i32 0
    br label %.body

.body:
    ; if (vtable->class == name)
    %2 = phi %Object_vtable_type** [ %1, %.once ], [ %7, %.next]
    %3 = load %Object_vtable_type** %2
    %4 = getelementptr %Object_vtable_type* %3, i32 0, i32 1
    %5 = load i8** %4
    %6 = icmp eq i8* %5, %name
    br i1 %6, label %.exit_true, label %.next

.next:
    ; object = object->above
    %7 = getelementptr %Object_vtable_type* %3, i32 0, i32 0

    ; while (object != null)
    %8 = icmp ne %Object_vtable_type* %3, null
    br i1 %8, label %.body, label %.exit_false

.exit_true:
    ret i1 true

.exit_false:
    ret i1 false
}


;*************************** Exception class ******************************

%Exception_vtable_type = type {
    %Object_vtable_type*,                        ; 0: parent class vtable pointer
    i8*                                          ; 1: class name
    ; virtual methods would follow here.
}

@.Exception_class_name = private constant [10 x i8] c"Exception\00"

@.Exception_vtable = private constant %Exception_vtable_type {
    %Object_vtable_type* @.Object_vtable,        ; the parent of this class is the Object class
    i8* getelementptr([10 x i8]* @.Exception_class_name, i32 0, i32 0)
}

%Exception = type {
    %Exception_vtable_type*,                     ; 0: the vtable pointer
    i8*                                          ; 1: the _text member
}

define void @Exception_Create_String(%Exception* %this, i8* %text) nounwind {
    ; set up vtable
    %1 = getelementptr %Exception* %this, i32 0, i32 0
    store %Exception_vtable_type* @.Exception_vtable, %Exception_vtable_type** %1

    ; save input text string into _text
    %2 = getelementptr %Exception* %this, i32 0, i32 1
    store i8* %text, i8** %2

    ret void
}

define i8* @Exception_GetText(%Exception* %this) nounwind {
    %1 = getelementptr %Exception* %this, i32 0, i32 1
    %2 = load i8** %1
    ret i8* %2
}


;******************************* Foo class ********************************

%Foo = type { i32 }

define void @Foo_Create_Default(%Foo* %this) nounwind {
    %1 = getelementptr %Foo* %this, i32 0, i32 0
    store i32 0, i32* %1
    ret void
}

define i32 @Foo_GetLength(%Foo* %this) nounwind {
    %1 = getelementptr %Foo* %this, i32 0, i32 0
    %2 = load i32* %1
    ret i32 %2
}

define void @Foo_SetLength(%Foo* %this, i32 %value) nounwind {
    %1 = getelementptr %Foo* %this, i32 0, i32 0
    store i32 %value, i32* %1
    ret void
}


;********************************* Foo function ***************************

@.message1 = internal constant [30 x i8] c"Exception requested by caller\00"

define %Exception* @Bar(i1 %fail, i32* %result) nounwind {
    ; Allocate Foo instance
    %foo = alloca %Foo
    call void @Foo_Create_Default(%Foo* %foo)

    call void @Foo_SetLength(%Foo* %foo, i32 17)

    ; if (fail)
    %1 = icmp eq i1 %fail, true
    br i1 %1, label %.if_begin, label %.if_close

.if_begin:
    ; throw new Exception(...)
    %2 = call i8* @malloc(i32 8)
    %3 = bitcast i8* %2 to %Exception*
    %4 = getelementptr [30 x i8]* @.message1, i32 0, i32 0
    call void @Exception_Create_String(%Exception* %3, i8* %4)
    ret %Exception* %3

.if_close:
    ; foo.SetLength(24)
    call void @Foo_SetLength(%Foo* %foo, i32 24)
    %5 = call i32 @Foo_GetLength(%Foo* %foo)
    store i32 %5, i32* %result
    ret %Exception* null
}


;********************************* Main program ***************************

@.message2 = internal constant [11 x i8] c"Error: %s\0A\00"
@.message3 = internal constant [44 x i8] c"Internal error: Unhandled exception detectd\00"

define i32 @main(i32 %argc, i8** %argv) nounwind {
    ; "try" keyword expands to nothing.

    ; Body of try block.

    ; fail = (argc >= 2)
    %fail = icmp uge i32 %argc, 2

    ; Function call.
    %1 = alloca i32
    %2 = call %Exception* @Bar(i1 %fail, i32* %1)
    %3 = icmp ne %Exception* %2, null
    br i1 %3, label %.catch_block, label %.exit

.catch_block:
    %4 = bitcast %Exception* %2 to %Object*
    %5 = getelementptr [10 x i8]* @.Exception_class_name, i32 0, i32 0
    %6 = call i1 @Object_IsA(%Object* %4, i8* %5)
    br i1 %6, label %.catch_exception, label %.catch_all

.catch_exception:
    %7 = getelementptr [11 x i8]* @.message2, i32 0, i32 0
    %8 = call i8* @Exception_GetText(%Exception* %2)
    %9 = call i32 (i8*, ...)* @printf(i8* %7, i8* %8)
    br label %.exit

.catch_all:
    %10 = getelementptr [44 x i8]* @.message3, i32 0, i32 0
    %11 = call i32 @puts(i8* %10)
    br label %.exit

.exit:
    %result = phi i32 [ 0, %0 ], [ 1, %.catch_exception ], [ 1, %.catch_all ]
    ret i32 %result
}

Setjmp/Longjmp Exception Handling

The basic idea behind the setjmp and longjmp exception handling scheme is that you save the CPU state whenever you encounter a try keyword and then do a longjmp whenever you throw an exception. If there are few try blocks in the program, as is typically the case, the cost of this method is not as high as it might seem. However, often there are implicit exception handlers due to the need to release local resources such as class instances allocated on the stack and then the cost can become quite high.

Setjmp/longjmp exception handling is often abbreviated SjLj for SetJmp/LongJmp.

The sample translates into something like this:

; jmp_buf is very platform specific, this is for illustration only...
%jmp_buf = type { i32 }
declare i32 @setjmp(%jmp_buf* %env)
declare void @longjmp(%jmp_buf* %env, i32 %val)

;********************* External and Utility functions *********************

declare i8* @malloc(i32) nounwind
declare void @free(i8*) nounwind
declare i32 @printf(i8* noalias nocapture, ...) nounwind
declare i32 @puts(i8* noalias nocapture) nounwind


;***************************** Object class *******************************

%Object_vtable_type = type {
    %Object_vtable_type*,         ; 0: above: parent class vtable pointer
    i8*                           ; 1: class: class name (usually mangled)
    ; virtual methods would follow here
}

@.Object_class_name = private constant [7 x i8] c"Object\00"

@.Object_vtable = private constant %Object_vtable_type {
    %Object_vtable_type* null,    ; This is the root object of the object hierarchy
    i8* getelementptr([7 x i8]* @.Object_class_name, i32 0, i32 0)
}

%Object = type {
    %Object_vtable_type*          ; 0: vtable: class vtable pointer (always non-null)
    ; class data members would follow here
}

; returns true if the specified object is identical to or derived from the
; class with the specified name.
define i1 @Object_IsA(%Object* %object, i8* %name) nounwind {
.init:
    ; if (object == null) return false
    %0 = icmp ne %Object* %object, null
    br i1 %0, label %.once, label %.exit_false

.once:
    %1 = getelementptr %Object* %object, i32 0, i32 0
    br label %.body

.body:
    ; if (vtable->class == name)
    %2 = phi %Object_vtable_type** [ %1, %.once ], [ %7, %.next]
    %3 = load %Object_vtable_type** %2
    %4 = getelementptr %Object_vtable_type* %3, i32 0, i32 1
    %5 = load i8** %4
    %6 = icmp eq i8* %5, %name
    br i1 %6, label %.exit_true, label %.next

.next:
    ; object = object->above
    %7 = getelementptr %Object_vtable_type* %3, i32 0, i32 0

    ; while (object != null)
    %8 = icmp ne %Object_vtable_type* %3, null
    br i1 %8, label %.body, label %.exit_false

.exit_true:
    ret i1 true

.exit_false:
    ret i1 false
}


;*************************** Exception class ******************************

%Exception_vtable_type = type {
    %Object_vtable_type*,                        ; 0: parent class vtable pointer
    i8*                                          ; 1: class name
    ; virtual methods would follow here.
}

@.Exception_class_name = private constant [10 x i8] c"Exception\00"

@.Exception_vtable = private constant %Exception_vtable_type {
    %Object_vtable_type* @.Object_vtable,        ; the parent of this class is the Object class
    i8* getelementptr([10 x i8]* @.Exception_class_name, i32 0, i32 0)
}

%Exception = type {
    %Exception_vtable_type*,                     ; 0: the vtable pointer
    i8*                                          ; 1: the _text member
}

define void @Exception_Create_String(%Exception* %this, i8* %text) nounwind {
    ; set up vtable
    %1 = getelementptr %Exception* %this, i32 0, i32 0
    store %Exception_vtable_type* @.Exception_vtable, %Exception_vtable_type** %1

    ; save input text string into _text
    %2 = getelementptr %Exception* %this, i32 0, i32 1
    store i8* %text, i8** %2

    ret void
}

define i8* @Exception_GetText(%Exception* %this) nounwind {
    %1 = getelementptr %Exception* %this, i32 0, i32 1
    %2 = load i8** %1
    ret i8* %2
}


;******************************* Foo class ********************************

%Foo = type { i32 }

define void @Foo_Create_Default(%Foo* %this) nounwind {
    %1 = getelementptr %Foo* %this, i32 0, i32 0
    store i32 0, i32* %1
    ret void
}

define i32 @Foo_GetLength(%Foo* %this) nounwind {
    %1 = getelementptr %Foo* %this, i32 0, i32 0
    %2 = load i32* %1
    ret i32 %2
}

define void @Foo_SetLength(%Foo* %this, i32 %value) nounwind {
    %1 = getelementptr %Foo* %this, i32 0, i32 0
    store i32 %value, i32* %1
    ret void
}


;********************************* Foo function ***************************

@.message1 = internal constant [30 x i8] c"Exception requested by caller\00"

define i32 @Bar(%jmp_buf* %throw, i1 %fail) nounwind {
    ; Allocate Foo instance
    %foo = alloca %Foo
    call void @Foo_Create_Default(%Foo* %foo)

    call void @Foo_SetLength(%Foo* %foo, i32 17)

    ; if (fail)
    %1 = icmp eq i1 %fail, true
    br i1 %1, label %.if_begin, label %.if_close

.if_begin:
    ; throw new Exception(...)
    %2 = call i8* @malloc(i32 8)
    %3 = bitcast i8* %2 to %Exception*
    %4 = getelementptr [30 x i8]* @.message1, i32 0, i32 0
    call void @Exception_Create_String(%Exception* %3, i8* %4)
    %5 = ptrtoint %Exception* %3 to i32
    call void @longjmp(%jmp_buf* %throw, i32 %5)
    ; we never get here
    br label %.if_close

.if_close:
    ; foo.SetLength(24)
    call void @Foo_SetLength(%Foo* %foo, i32 24)
    %6 = call i32 @Foo_GetLength(%Foo* %foo)
    ret i32 %6
}


;********************************* Main program ***************************

@.message2 = internal constant [11 x i8] c"Error: %s\0A\00"
@.message3 = internal constant [44 x i8] c"Internal error: Unhandled exception detectd\00"

define i32 @main(i32 %argc, i8** %argv) nounwind {
    ; "try" keyword expands to a call to @setjmp
    %env = alloca %jmp_buf
    %status = call i32 @setjmp(%jmp_buf* %env)
    %1 = icmp eq i32 %status, 0
    br i1 %1, label %.body, label %.catch_block

.body:
    ; Body of try block.
    ; fail = (argc >= 2)
    %fail = icmp uge i32 %argc, 2

    ; Function call.
    %2 = call i32 @Bar(%jmp_buf* %env, i1 %fail)
    br label %.exit

.catch_block:
    %3 = inttoptr i32 %status to %Object*
    %4 = getelementptr [10 x i8]* @.Exception_class_name, i32 0, i32 0
    %5 = call i1 @Object_IsA(%Object* %3, i8* %4)
    br i1 %5, label %.catch_exception, label %.catch_all

.catch_exception:
    %6 = inttoptr i32 %status to %Exception*
    %7 = getelementptr [11 x i8]* @.message2, i32 0, i32 0
    %8 = call i8* @Exception_GetText(%Exception* %6)
    %9 = call i32 (i8*, ...)* @printf(i8* %7, i8* %8)
    br label %.exit

.catch_all:
    %10 = getelementptr [44 x i8]* @.message3, i32 0, i32 0
    %11 = call i32 @puts(i8* %10)
    br label %.exit

.exit:
    %result = phi i32 [ 0, %.body ], [ 1, %.catch_exception ], [ 1, %.catch_all ]
    ret i32 %result
}

Zero Cost Exception Handling

todo: Explain how to implement exception handling using zero cost exception handling.

Resources

  1. Compiler Internals - Exception Handling.
  2. Exception Handling in C without C++.
  3. How a C++ Compiler Implements Exception Handling.
  4. DWARF standard - Exception Handling.
  5. Itanium C++ ABI.

Mapping Object-Oriented Constructs to LLVM IR

In this chapter we'll look at various object-oriented constructs and see how they can be mapped to LLVM IR.

Classes

A class is nothing more than a structure with an associated set of functions that take an implicit first parameter, namely a pointer to the structure. Therefore, is is very trivial to map a class to LLVM IR:

#include <stddef.h>

class Foo
{
public:
    Foo()
    {
        _length = 0;
    }

    size_t GetLength() const
    {
        return _length;
    }

    void SetLength(size_t value)
    {
        _length = value;
    }

private:
    size_t _length;
};

We first transform this code into two separate pieces:

  1. . The structure definition.
  2. . The list of methods, including the constructor.
; The structure definition for class Foo.
%Foo = type { i32 }

; The default constructor for class Foo.
define void @Foo_Create_Default(%Foo* %this) nounwind {
    %1 = getelementptr %Foo* %this, i32 0, i32 0
    store i32 0, i32* %1
    ret void
}

; The Foo::GetLength() method.
define i32 @Foo_GetLength(%Foo* %this) nounwind {
    %1 = getelementptr %Foo* %this, i32 0, i32 0
    %2 = load i32* %this
    ret i32 %2
}

; The Foo::SetLength() method.
define void @Foo_SetLength(%Foo* %this, i32 %value) nounwind {
    %1 = getelementptr %Foo* %this, i32 0, i32 0
    store i32 %value, i32* %1
    ret void
}

Then we make sure that the constructor (``Foo_Create_Default``) is invoked whenever an instance of the structure is created:

Foo foo;
%foo = alloca %Foo
call void @Foo_Create_Default(%Foo* %foo)

Virtual Methods

A virtual method is no more than a compiler-controlled function pointer. Each virtual method is recorded in the vtable, which is a structure of all the function pointers needed by a given class:

class Foo
{
public:
    virtual int GetLengthTimesTwo() const
    {
        return _length * 2;
    }

    void SetLength(size_t value)
    {
        _length = value;
    }

private:
    int _length;
};

int main()
{
    Foo foo;
    foo.SetLength(4);
    return foo.GetLengthTimesTwo();
}

This becomes:

%Foo_vtable_type = type { i32(%Foo*)* }

%Foo = type { %Foo_vtable_type*, i32 }

define i32 @Foo_GetLengthTimesTwo(%Foo* %this) nounwind {
    %1 = getelementptr %Foo* %this, i32 0, i32 1
    %2 = load i32* %1
    %3 = mul i32 %2, 2
    ret i32 %3
}

@Foo_vtable_data = global %Foo_vtable_type {
    i32(%Foo*)* @Foo_GetLengthTimesTwo
}

define void @Foo_Create_Default(%Foo* %this) nounwind {
    %1 = getelementptr %Foo* %this, i32 0, i32 0
    store %Foo_vtable_type* @Foo_vtable_data, %Foo_vtable_type** %1
    %2 = getelementptr %Foo* %this, i32 0, i32 1
    store i32 0, i32* %2
    ret void
}

define void @Foo_SetLength(%Foo* %this, i32 %value) nounwind {
    %1 = getelementptr %Foo* %this, i32 0, i32 1
    store i32 %value, i32* %1
    ret void
}

define i32 @main(i32 %argc, i8** %argv) nounwind {
    %foo = alloca %Foo
    call void @Foo_Create_Default(%Foo* %foo)
    call void @Foo_SetLength(%Foo* %foo, i32 4)
    %1 = getelementptr %Foo* %foo, i32 0, i32 0
    %2 = load %Foo_vtable_type** %1
    %3 = getelementptr %Foo_vtable_type* %2, i32 0, i32 0
    %4 = load i32(%Foo*)** %3
    %5 = call i32 %4(%Foo* %foo)
    ret i32 %5
}

Please notice that some C++ compilers store _vtable at a negative offset into the structure so that things like memcpy(this, 0, sizeof(*this)) work, even though such commands should always be avoided in an OOP context.

Single Inheritance

Single inheritance is very straightforward: Each "structure" (class) is laid out in memory after one another in declaration order.

class Base
{
public:
    void SetA(int value)
    {
        _a = value;
    }

private:
    int _a;
};

class Derived: public Base
{
public:
    void SetB(int value)
    {
        SetA(value);
        _b = value;
    }

protected:
    int _b;
}

Here, a and b will be laid out to follow one another in memory so that inheriting from a class is simply a matter of declaring a the base class as a first member in the inheriting class:

%Base = type {
    i32         ; '_a' in class Base
}

define void @Base_SetA(%Base* %this, i32 %value) nounwind {
    %1 = getelementptr %Base* %this, i32 0, i32 0
    store i32 %value, i32* %1
    ret void
}

%Derived = type {
    i32,        ; '_a' from class Base
    i32         ; '_b' from class Derived
}

define void @Derived_SetB(%Derived* %this, i32 %value) nounwind {
    %1 = bitcast %Derived* %this to %Base*
    call void @Base_SetA(%Base* %1, i32 %value)
    %2 = getelementptr %Derived* %this, i32 0, i32 1
    store i32 %value, i32* %2
    ret void
}

So the base class simply becomes plain members of the type declaration for the derived class.

And then the compiler must insert appropriate type casts whenever the derived class is being referenced as its base class as shown above with the bitcast operator.

Multiple Inheritance

Multiple inheritance is not that difficult, either, it is merely a question of laying out the multiply inherited "structures" in order inside each derived class.

class BaseA
{
public:
    void SetA(int value)
    {
        _a = value;
    }

private:
    int _a;
};

class BaseB: public BaseA
{
public:
    void SetB(int value)
    {
        SetA(value);
        _b = value;
    }

private:
    int _b;
};

class Derived:
    public BaseA,
    public BaseB
{
public:
    void SetC(int value)
    {
        SetB(value);
        _c = value;
    }

private:
    int _c;
};

This is equivalent to the following LLVM IR:

%BaseA = type {
    i32         ; '_a' from BaseA
}

define void @BaseA_SetA(%BaseA* %this, i32 %value) nounwind {
    %1 = getelementptr %BaseA* %this, i32 0, i32 0
    store i32 %value, i32* %1
    ret void
}

%BaseB = type {
    i32,        ; '_a' from BaseA
    i32         ; '_b' from BaseB
}

define void @BaseB_SetB(%BaseB* %this, i32 %value) nounwind {
    %1 = bitcast %BaseB* %this to %BaseA*
    call void @BaseA_SetA(%BaseA* %1, i32 %value)
    %2 = getelementptr %BaseB* %this, i32 0, i32 1
    store i32 %value, i32* %2
    ret void
}

%Derived = type {
    i32,        ; '_a' from BaseA
    i32,        ; '_b' from BaseB
    i32         ; '_c' from Derived
}

define void @Derived_SetC(%Derived* %this, i32 %value) nounwind {
    %1 = bitcast %Derived* %this to %BaseB*
    call void @BaseB_SetB(%BaseB* %1, i32 %value)
    %2 = getelementptr %Derived* %this, i32 0, i32 2
    store i32 %value, i32* %2
    ret void
}

And the compiler then supplies the needed type casts and pointer arithmentic whenever baseB is being referenced as an instance of BaseB. Please notice that all it takes is a bitcast from one class to another as well as an adjustment of the last argument to getelementptr.

Virtual Inheritance

Virtual inheritance is actually quite simple as it dictates that identical base classes are to be merged into a single occurence. For instance, given this:

class BaseA
{
public:
    int a;
};

class BaseB: public BaseA
{
public:
    int b;
};

class BaseC: public BaseA
{
public:
    int c;
};

class Derived:
    public virtual BaseB,
    public virtual BaseC
{
    int d;
};

Derived will only contain a single instance of BaseA even if its inheritance graph dictates that it should have two instances. The result looks something like this:

class Derived
{
public:
    int a;
    int b;
    int c;
    int d;
};

So the second instance of a is silently ignored because it would cause multiple instances of BaseA to exist in Derived, which clearly would cause lots of confusion and ambiguities.

Interfaces

An interface is nothing more than a base class with no data members, where all the methods are pure virtual (i.e. has no body).

As such, we've already described how to convert an interface to LLVM IR - it is done precisely the same way that you convert a virtual member function to LLVM IR.

Boxing and Unboxing

Boxing is the process of converting a non-object primitive value into an object. It is as easy as it sounds. You create a wrapper class which you instantiate and initialize with the non-object value:

Unboxing is the reverse of boxing: You downgrade a full object to a mere scalar value by retrieving the boxed value from the box object.

It is important to notice that changes to the boxed value does not affect the original value and vice verse. The code below illustrates both steps:


@Boxee = global i32 17

%Integer = type { i32 }

define void @Integer_Create(%Integer* %this, i32 %value) nounwind {
    ; you might set up a vtable and associated virtual methods here
    %1 = getelementptr %Integer* %this, i32 0, i32 0
    store i32 %value, i32* %1
    ret void
}

define i32 @Integer_GetValue(%Integer* %this) nounwind {
    %1 = getelementptr %Integer* %this, i32 0, i32 0
    %2 = load i32* %1
    ret i32 %2
}

define i32 @main() nounwind {
    ; box @Boxee in an instance of %Integer
    %1 = load i32* @Boxee
    %2 = alloca %Integer
    call void @Integer_Create(%Integer* %2, i32 %1)

    ; unbox @Boxee from an instance of %Integer
    %3 = call i32 @Integer_GetValue(%Integer* %2)

    ret i32 0
}

Class Equivalence Test

There are two ways of doing this:

  1. If you can guarantee that each class a unique vtable, you can simply compare the pointers to the vtable.
  2. If you cannot guarantee that each class has a unique vtable (because different vtables may have been merged by the linker), you need to add a unique field to the vtable so that you can compare that instead.

The first variant goes roughly as follows (assuming identical strings aren't merged by the compiler, something that they are most of the time):

bool equal = (typeid(first) == typeid(other));

todo: Finish up class equivalence test sample.

As far as I know, RTTI is simply done by adding two fields to the _vtable structure: parent and signature. The former is a pointer to the vtable of the parent class and the latter is the mangled (encoded) name of the class. To see if a given class is another class, you simply compare the signature fields. To see if a given class is a derived class of some other class, you simply walk the chain of parent fields, while checking if you have found a matching signature.

Class Inheritance Test

A class inheritance test is a question of the form:

Is class X identical to or derived from class Y?

To answer that question, we can use one of two methods:

  1. The naive implementation where we search upwards in the chain of parents.
  2. The faster implementation where we search a preallocated list of parents.

The naive implementation is documented in the first two exception handling examples as the Object_IsA function.

todo: Document the faster class inheritance test implementation.

The New Operator

The new operator is generally nothing more than a type-safe version of the C malloc function - in some implementations of C++, they may even be called interchangeably without causing unseen or unwanted side-effects.

The Instance New Operator

All calls of the form new X are mapped into:

declare i8* @malloc(i32) nounwind

%X = type { i8 }

define void @X_Create_Default(%X* %this) nounwind {
    %1 = getelementptr %X* %this, i32 0, i32 0
    store i8 0, i8* %1
    ret void
}

define void @main() nounwind {
    %1 = call i8* @malloc(i32 1)
    %2 = bitcast i8* %1 to %X*
    call void @X_Create_Default(%X* %2)
    ret void
}

Calls of the form new X(Y, Z) are the same, except Y and Z are passed into the constructor as arguments.

The Array New Operator

New operations involving arrays are equally simple. The code new X[100] is mapped into a loop that initializes each array element in turn:

declare i8* @malloc(i32) nounwind

%X = type { i32 }

define void @X_Create_Default(%X* %this) nounwind {
    %1 = getelementptr %X* %this, i32 0, i32 0
    store i32 0, i32* %1
    ret void
}

define void @main() nounwind {
    %n = alloca i32                  ; %n = ptr to the number of elements in the array
    store i32 100, i32* %n

    %i = alloca i32                  ; %i = ptr to the loop index into the array
    store i32 0, i32* %i

    %1 = load i32* %n                ; %1 = *%n
    %2 = mul i32 %1, 4               ; %2 = %1 * sizeof(X)
    %3 = call i8* @malloc(i32 %2)    ; %3 = malloc(100 * sizeof(X))
    %4 = bitcast i8* %3 to %X*       ; %4 = (X*) %3
    br label %.loop_head

.loop_head:                         ; for (; %i < %n; %i++)
    %5 = load i32* %i
    %6 = load i32* %n
    %7 = icmp slt i32 %5, %6
    br i1 %7, label %.loop_body, label %.loop_tail

.loop_body:
    %8 = getelementptr %X* %4, i32 %5
    call void @X_Create_Default(%X* %8)

    %9 = add i32 %5, 1
    store i32 %9 i32* %i

    br label %.loop_head

.loop_tail:
    ret void
}

Interoperating with a Runtime Library

It is common to provide a set of run-time support functions that are written in another language than LLVM IR and it is trivially easy to interface to such a run-time library. The use of malloc and free in the examples in this document are examples of such use of externally defined run-time functions.

The advantages of a custom, non-IR run-time library function is that it can be optimized by hand to provide the best possible performance under certain criteria. Also a custom non-IR run-time library function can make explicit use of native instructions that are foreign to the LLVM infrastructure.

The advantages of IR run-time library functions is that they can be run through the optimizer and thereby also be inlined automatically.

Interfacing to the Operating System

I'll divide this chapter into two sections:

  1. How to Interface to POSIX Operating Systems.
  2. How to Interface to the Windows Operating System.

How to Interface to POSIX Operating Systems

On POSIX, the presence of the C run-time library is an unavoidable fact for which reason it makes a lot of sense to directly call such C run-time functions.

Sample POSIX "Hello World" Application

On POSIX, it is really very easy to create the Hello world program:

declare i32 @puts(i8* nocapture) nounwind

@.hello = private unnamed_addr constant [13 x i8] c"hello world\0A\00"

define i32 @main(i32 %argc, i8** %argv) {
    %1 = getelementptr [13 x i8]* @.hello, i32 0, i32 0
    call i32 @puts(i8* %1)
    ret i32 0
}

How to Interface to the Windows Operating System

On Windows, the C run-time library is mostly considered of relevance to the C and C++ languages only, so you have a plethora (thousands) of standard system interfaces that any client application may use.

Sample Windows "Hello World" Application

Hello world on Windows is nowhere as straightforward as on POSIX:

target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f80:128:128-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32-S32"
target triple = "i686-pc-win32"

%struct._OVERLAPPED = type { i32, i32, %union.anon, i8* }
%union.anon = type { %struct.anon }
%struct.anon = type { i32, i32 }

declare dllimport x86_stdcallcc i8* @"\01_GetStdHandle@4"(i32) #1

declare dllimport x86_stdcallcc i32 @"\01_WriteFile@20"(i8*, i8*, i32, i32*, %struct._OVERLAPPED*) #1

@hello = internal constant [13 x i8] c"Hello world\0A\00"

define i32 @main(i32 %argc, i8** %argv) nounwind {
    %1 = call i8* @"\01_GetStdHandle@4"(i32 -11)    ; -11 = STD_OUTPUT_HANDLE
    %2 = getelementptr [13 x i8]* @hello, i32 0, i32 0
    %3 = call i32 @"\01_WriteFile@20"(i8* %1, i8* %2, i32 12, i32* null, %struct._OVERLAPPED* null)
    ; todo: Check that %4 is not equal to -1 (INVALID_HANDLE_VALUE)
    ret i32 0
}

attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"
"no-infs-fp-math"="fa lse" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false"
"use-soft-float"="false"
}

Resources

This chapter lists some resources that may be of interest to the reader:

  1. Modern Compiler Implementation in Java, 2nd Edition.
  2. Alex Darby's series of articles on low-level stuff.

Epilogue

If you discover any errors in this document or you need more information than given here, please write to the author at Mikael Lyngvig.

Please also remember that you can learn a lot by using the -emit-llvm option to the clang++ compiler. This gives you a chance to see a live production compiler in action and precisely how it does things.

Appendix A: How to Implement a String Type in LLVM

There are two ways to implement a string type in LLVM:

  1. To write the implementation in LLVM IR.
  2. To write the implementation in a higher-level language that generates IR.

I'd personally much prefer to use the second method, but for the sake of the example, I'll go ahead and illustrate a simple but useful string type in LLVM IR. It assumes a 32-bit architecture, so please replace all occurences of i32 with i64 if you are targetting a 64-bit architecture.

We'll be making a dynamic, mutable string type that can be appended to and could also be inserted into, converted to lower case, and so on, depending on which support functions are defined to operate on the string type.

It all boils down to making a suitable type definition for the class and then define a rich set of functions to operate on the type definition:

; The actual type definition for our 'String' type.
%String = type {
    i8*,     ; 0: buffer; pointer to the character buffer
    i32,     ; 1: length; the number of chars in the buffer
    i32,     ; 2: maxlen; the maximum number of chars in the buffer
    i32      ; 3: factor; the number of chars to preallocate when growing
}

define fastcc void @String_Create_Default(%String* %this) nounwind {
    ; Initialize 'buffer'.
    %1 = getelementptr %String* %this, i32 0, i32 0
    store i8* null, i8** %1

    ; Initialize 'length'.
    %2 = getelementptr %String* %this, i32 0, i32 1
    store i32 0, i32* %2

    ; Initialize 'maxlen'.
    %3 = getelementptr %String* %this, i32 0, i32 2
    store i32 0, i32* %3

    ; Initialize 'factor'.
    %4 = getelementptr %String* %this, i32 0, i32 3
    store i32 16, i32* %4

    ret void
}

declare i8* @malloc(i32)
declare void @free(i8*)
declare i8* @memcpy(i8*, i8*, i32)

define fastcc void @String_Delete(%String* %this) nounwind {
    ; Check if we need to call 'free'.
    %1 = getelementptr %String* %this, i32 0, i32 0
    %2 = load i8** %1
    %3 = icmp ne i8* %2, null
    br i1 %3, label %free_begin, label %free_close

free_begin:
    call void @free(i8* %2)
    br label %free_close

free_close:
    ret void
}

define fastcc void @String_Resize(%String* %this, i32 %value) {
    ; %output = malloc(%value)
    %output = call i8* @malloc(i32 %value)

    ; todo: check return value

    ; %buffer = this->buffer
    %1 = getelementptr %String* %this, i32 0, i32 0
    %buffer = load i8** %1

    ; %length = this->length
    %2 = getelementptr %String* %this, i32 0, i32 1
    %length = load i32* %2

    ; memcpy(%output, %buffer, %length)
    %3 = call i8* @memcpy(i8* %output, i8* %buffer, i32 %length)

    ; free(%buffer)
    call void @free(i8* %buffer)

    ; this->buffer = %output
    store i8* %output, i8** %1

    ret void
}

define fastcc void @String_Add_Char(%String* %this, i8 %value) {
    ; Check if we need to grow the string.
    %1 = getelementptr %String* %this, i32 0, i32 1
    %length = load i32* %1
    %2 = getelementptr %String* %this, i32 0, i32 2
    %maxlen = load i32* %2
    ; if length == maxlen:
    %3 = icmp eq i32 %length, %maxlen
    br i1 %3, label %grow_begin, label %grow_close

grow_begin:
    %4 = getelementptr %String* %this, i32 0, i32 3
    %factor = load i32* %4
    %5 = add i32 %maxlen, %factor
    call void @String_Resize(%String* %this, i32 %5)
    br label %grow_close

grow_close:
    %6 = getelementptr %String* %this, i32 0, i32 0
    %buffer = load i8** %6
    %7 = getelementptr i8* %buffer, i32 %length
    store i8 %value, i8* %7
    %8 = add i32 %length, 1
    store i32 %8, i32* %1

    ret void
}

Appendix B: Task List

This chapter serves as an informal task list and is to be updated as new to-do items are completed:

  1. How to enable debug information? (Line and Function, Variable)
  2. How to interface with a garbage collector? (link to existing docs)
  3. How to express a custom calling convention? (link to existing docs)
  4. Representing constructors, destructors, finalization
  5. How to examine the stack at runtime? How to modify it? (i.e. reflection, interjection)
  6. Representing subtyping checks (with full alias info), TBAA, struct-path TBAA.
  7. How to exploit inlining (external, vs within LLVM)?
  8. How to express array bounds checks for best optimization?
  9. How to express null pointer checks?
  10. How to express domain specific optimizations? (i.e. lock elision, or matrix math simplification) (link to existing docs)
  11. How to optimize call dispatch or field access in dynamic languages? (ref new patchpoint intrinsics for inline call caching and field access caching)

todo: Ask various front-end implementors (Rust, Haskell (GHC), Rubinius, and more) to review and/or contribute so as to make the document great.

Appendix C: Thank You

A brief list of all people who've contributed to the document (thank you all very much!):

Should your name be here? If so, contribute :-)