October 22 2016
by ScrimpyCat

Simple Lock-Free Epoch Memory Reclamation

Epoch based memory reclamation or EBR, is an approach that was outlined in this paper. It describes that all you need to successfully manage the delayed reclamation of memory used by your lock-free algorithms, is to separate references into 3 groups. These groups are the epochs. The oldest epoch at any given time that can be safely reclaimed is 2 epochs behind the most recent.

This means that as a new thread enters a new epoch it will be placed 2 places ahead of the current global epoch, and will proceed to add any of its memory that needs reclaiming to it. Then when no threads are currently accessing the current global (stale) epoch, have all moved 2 places ahead, that old memory can now safely be reclaimed. And then the global epoch is moved one place further. This allows for the continued delayed reclamation of memory, while making sure only the memory that can potentially still be accessed is kept around.

The Approach

Unfortunately EBR seems to be one of the less frequently reviewed strategies for memory reclamation in a lock-free algorithm. Which has made it quite difficult to come across other implementations. However the implementations I was able to find, I noticed when it came time to deciding on whether they could perform reclamation of the stale epoch or not they typically followed the structure of iterating over all the threads registered by the system, which was requiring a lock. To address this it occurred to me that you could just retain references to each epoch that is currently being used. Now this does come at the expense of introducing additional CAS operations, but it allows for the implementation to be lock-free and quite straightforward to understand.

Implementation

We will implement this technique using C11 (atomics and threads), though will not bother with overriding the memory order for simplicity.

To start off we will make our definitions.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdatomic.h>
#include <threads.h>

typedef uint64_t epoch_t;

typedef struct entry_t {
    struct entry_t *next;
    void *item;
    void (*reclaimer)(void*);
} entry_t;

typedef struct {
    entry_t *list;
    uint32_t refCount;
} managed_list_t;

typedef struct {
    entry_t *head;
    entry_t *tail;
    epoch_t epoch;
} local_t;

typedef struct {
    _Atomic(managed_list_t) managed[3];
    _Atomic(epoch_t) epoch;
    tss_t key;
} gc_t;

gc_t *Create(void);
void Destroy(gc_t *GC);
void Reclaim(gc_t *GC, entry_t *Node, epoch_t Epoch);
void Start(gc_t *GC);
void Manage(gc_t *GC, void *Item, void (*Reclaimer)(void*));
void Stop(gc_t *GC);

entry_t represents a reclaimable item that is managed by our system, managed_list_t is the global epoch reference, local_t is our thread local data, and gc_t is our garbage collector/EBR system reference.

The first functions we will implement are the creation and destruction functions for the system. These are relatively straightforward, they are initialising the system with it's initial state and then cleaning up the system and destroying it.

gc_t *Create(void)
{
    gc_t *GC = malloc(sizeof(gc_t));
    if (GC)
    {
        if (tss_create(&GC->key, free) != thrd_success)
        {
            free(GC);
            return NULL;
        }

        atomic_init(&GC->managed[0], (managed_list_t){ .list = NULL, .refCount = 0 });
        atomic_init(&GC->managed[1], (managed_list_t){ .list = NULL, .refCount = 0 });
        atomic_init(&GC->managed[2], (managed_list_t){ .list = NULL, .refCount = 0 });
        atomic_init(&GC->epoch, 0);
    }

    return GC;
}

void Destroy(gc_t *GC)
{
    tss_delete(GC->key);

    for (int Loop = 0; Loop < 3; Loop++)
    {
        managed_list_t Managed = atomic_load(&GC->managed[Loop]);
        Reclaim(GC, Managed.list, atomic_load(&GC->epoch));
    }

    free(GC);
}

Next will be the Reclaim function. This function will be used when a stale epoch is able to be reclaimed. So it will reclaim the items that belong to that epoch, free our entry_t nodes, and increment the global epoch. An improvement that could be made, is if we used a pool based allocator for our entry_t nodes, as this would allow us to alleviate some of the overhead with allocating/deallocating those nodes.

void Reclaim(gc_t *GC, entry_t *Node, epoch_t Epoch)
{
    while (Node)
    {
        Node->reclaimer(Node->item); //Reclaim entries

        entry_t *Temp = Node;
        Node = Node->next;
        free(Temp); //Free node
    }

    atomic_compare_exchange_strong(&GC->epoch, &Epoch, Epoch + 1); //Increment the global epoch only once for the given epoch (as multiple threads may reach this point, though only one will have a non-NULL Node)
}

Next will be the Start function. This function will be used by our lock-free algorithms to indicate when a memory managed section will begin (adding items to be managed or reading managed items). It will first get the local state for the thread, or create it if there is none. Then the local epoch will either be moved 2 places ahead, so it is no longer referencing the global (stale) epoch. We then retain a reference to the epoch we will be using.

void Start(gc_t *GC)
{
    local_t *Local = tss_get(GC->key);
    if (!Local)
    {
        Local = malloc(sizeof(local_t));
        if (!Local) return; //Handle failure however you want

        Local->epoch = 0;
        tss_set(GC->key, Local);
    }

    //Repeat until we successfully retain a reference to an epoch +2 ahead of the global
    for (epoch_t GlobalEpoch = atomic_load(&GC->epoch); ; )
    {
        epoch_t Epoch = (GlobalEpoch + 2) % 3; //Set current thread's epoch to +2 above global epoch
        *Local = (local_t){ .head = NULL, .tail = NULL, .epoch = Epoch };

        //Retain a reference to the current epoch
        managed_list_t Managed;
        do {
            Managed = atomic_load(&GC->managed[Epoch]);
        } while (!atomic_compare_exchange_weak(&GC->managed[Epoch], &Managed, ((managed_list_t){ .list = Managed.list, .refCount = Managed.refCount + 1 })));

        //Check to see global epoch has not changed since retaining the local epoch reference
        epoch_t NewGlobalEpoch = atomic_load(&GC->epoch);
        if (GlobalEpoch == NewGlobalEpoch) break;

        GlobalEpoch = NewGlobalEpoch;

        //Release the reference to the current epoch
        if (!atomic_compare_exchange_strong(&GC->managed[Epoch], &Managed, ((managed_list_t){ .list = Managed.list, .refCount = Managed.refCount - 1 })))
        {
            do {
                Managed = atomic_load(&GC->managed[Epoch]);
            } while (!atomic_compare_exchange_weak(&GC->managed[Epoch], &Managed, ((managed_list_t){ .list = Managed.list, .refCount = Managed.refCount - 1 })));
        }
    }
}

Next we will implement the Manage function. This function will be used to add any items that we wish to have managed by the system. So we will create a new entry and then store it in the local thread state. Reason we store it in the thread's local state here is to alleviate an additional CAS-loop that would have otherwise been required. Instead we keep a list, keeping track of head and tail nodes, of entries from that thread which will be merged only once the managed section has finished.

void Manage(gc_t *GC, void *Item, void (*Reclaimer)(void*))
{
    entry_t *Entry = malloc(sizeof(entry_t));
    if (!Entry) return; //Handle failure however you want

    //Add the item to the current local list of managed items
    local_t *Local = tss_get(GC->key);
    *Entry = (entry_t){ .next = Local->head, .item = Item, .reclaimer = Reclaimer };
    Local->head = Entry;

    if (!Local->tail) Local->tail = Local->head;
}

Finally all that is left is to implement our Stop function. This function will be used by our lock-free algorithms to indicate when a memory managed section will end (no longer items to be managed or reading managed items). It will first get the global epoch, this will become our stale epoch (if it meets the right conditions). We then need to decrement our reference to our current epoch and merge our entries with it. Both can be achieved with a single CAS-loop. Finally to workout whether our global epoch is stale, and can be reclaimed. We need to check that there are no longer any references to it and there are no references to the epoch after it (all references are now +2 places ahead). If those conditions are met we can safely reclaim it.

void Stop(gc_t *GC)
{
    local_t *Local = tss_get(GC->key);

    const epoch_t GlobalEpoch = atomic_load(&GC->epoch);
    const epoch_t Epoch = Local->epoch;

    managed_list_t Managed;
    if (Local->head)
    {
        //Release a reference to the current epoch and add local entries to global entries for this epoch
        do {
            Managed = atomic_load(&GC->managed[Epoch]);
            Local->tail->next = Managed.list;
        } while (!atomic_compare_exchange_weak(&GC->managed[Epoch], &Managed, ((managed_list_t){ .list = Local->head, .refCount = Managed.refCount - 1 })));
    }

    else
    {
        //Release a reference to the current epoch
        do {
            Managed = atomic_load(&GC->managed[Epoch]);
        } while (!atomic_compare_exchange_weak(&GC->managed[Epoch], &Managed, ((managed_list_t){ .list = Managed.list, .refCount = Managed.refCount - 1 })));
    }

    const epoch_t StaleEpoch = GlobalEpoch % 3;
    Managed = atomic_load(&GC->managed[StaleEpoch]);
    if (Managed.refCount == 0) //Check if the stale epoch is no longer referenced
    {
        managed_list_t NextManaged = atomic_load(&GC->managed[(StaleEpoch + 1) % 3]);
        if ((NextManaged.refCount == 0) && (atomic_compare_exchange_strong(&GC->managed[StaleEpoch], &Managed, ((managed_list_t){ .list = NULL, .refCount = 0 })))) //Check if the stale epoch can be reclaimed
        {
            if (GlobalEpoch == atomic_load(&GC->epoch)) Reclaim(GC, Managed.list, GlobalEpoch); //Before reclaiming double check another thread hasn't beaten us to it
            else if (Managed.list)
            {
                //If another thread has beaten us and there were nodes attached, proceed to add them back
                entry_t *List = Managed.list, *Tail = Managed.list;
                while (Tail->next) Tail = Tail->next;

                for (epoch_t GlobalEpoch = atomic_load(&GC->epoch); ; )
                {
                    //Retain a reference to an epoch +2 places ahead of global
                    epoch_t Epoch = (GlobalEpoch + 2) % 3;
                    managed_list_t Managed;
                    do {
                        Managed = atomic_load(&GC->managed[Epoch]);
                    } while (!atomic_compare_exchange_weak(&GC->managed[Epoch], &Managed, ((managed_list_t){ .list = Managed.list, .refCount = Managed.refCount + 1 })));

                    //Check to see global epoch has not changed since retaining the local epoch reference
                    epoch_t NewGlobalEpoch = atomic_load(&GC->epoch);
                    if (GlobalEpoch == NewGlobalEpoch)
                    {
                        //Release a reference to the current epoch and add local entries to global entries for this epoch
                        do {
                            Managed = atomic_load(&GC->managed[Epoch]);
                            Tail->next = Managed.list;
                        } while (!atomic_compare_exchange_weak(&GC->managed[Epoch], &Managed, ((managed_list_t){ .list = List, .refCount = Managed.refCount - 1 })));

                        break;
                    }

                    GlobalEpoch = NewGlobalEpoch;

                    //Release the reference to the current epoch
                    if (!atomic_compare_exchange_strong(&GC->managed[Epoch], &Managed, ((managed_list_t){ .list = Managed.list, .refCount = Managed.refCount - 1 })))
                    {
                        do {
                            Managed = atomic_load(&GC->managed[Epoch]);
                        } while (!atomic_compare_exchange_weak(&GC->managed[Epoch], &Managed, ((managed_list_t){ .list = Managed.list, .refCount = Managed.refCount - 1 })));
                    }
                }
            }
        }
    }
}

Caveats

CAS-loops are expensive, and unfortunately this implementation requires a number of them, however it only requires 2 of them and 2 strong CAS on the best case. Other memory reclamation strategies such as the lock-based EBR avoids them, hazard pointers avoid them, QSBR avoids them, etc. It may be possible to replace them somehow, but I haven't given too much thought as to how that may be achieved for now.

The other problem is thread failure as brought up in the paper. If a thread fails in the critical section, its reference will remain forever and memory will start to "leak" as the global epoch will no longer able to advance so that memory won't be reclaimed. A strategy to avoid this might be to watch for non-recoverable thread failure, replace the GC's other threads are using and destroy the old ones.

Concluding

To see this in-practice the implementation can be found here. One interesting approach that can be taken from this is simply using the retained references for critical sections and avoiding the epochs altogether. Now that strategy would only be appropriate in very low contested situations as it would only be able to reclaim the memory when no threads are currently in the critical section, but for some use cases that may be possible.

Lastly as always, lock-free programming is hard. Hopefully there are no errors, so always use at your own risk.

Updates

Major changes have been made to this article. The details of those changes are outlined below.

12th May 2018

As it would turn out lock-free programming is certainly hard and misleading. The original implementation presented in this article (which can be seen here) did end up having some bugs. Amusingly I had been using an implementation like this for years and had never experienced a crash related to memory reclamation.

It was only recently when I was working on a new lock-free algorithm that when I went to stress test it, I started noticing incorrect results (unfortunately stress testing my other lock-free algorithms never happened to bring up this issue). For ages I thought something must've been wrong with this new algorithm, but I couldn't workout what. Eventually I managed to track it down to the epoch memory reclaimer implementation I was using, which is what led me to making the above fixes.

The changes made were:

August 14 2016
by ScrimpyCat

Engine Dev: Better Scripting

Scripting in the engine was feeling very much like a hacked on solution. Which it actually was. The biggest concern was now that the language was slowly getting bigger and bigger (more and more API exposed to it), it started becoming more obvious that down the road the language would start to become more difficult to use without having to frequently look up the documentation.

Editors

Up to that point I was using my text editor and simply marking the source as Scheme. While this provided some nice syntax highlighting, it missed out on exposing any of the APIs or potentially helpful auto-completions for the language. To solve this I decided to create a better editing environment. This meant having helpful docs, syntax highlighting designed for the language, and context/API aware auto-completions.

So I went about creating an Atom package that would provide these things. To accommodate a changing API I structured it by making the auto-completer lookup API spec definitions. These are JSON files specifying the definition of the API available (functions, options, state, enums, inputs). An example of these files can be found here. This not only made it simple to add or change API definitions over time, but it also opens up the possibility for non-Engine API to be added (user's adding their own custom API spec definitions).

auto-complete

The auto-completion portion of the package provided support for some other useful behaviours. Such as providing suggestions for locally defined state or enums, in addition to inferring their type definition and using that to provide more useful auto-completions.

type-inference

Now this probably seems fairly obvious, however I still think this approach is worth presenting. Since even if you're developing a game with a standard scripting language (e.g. Lua, Javascript, Python, Scheme, etc.), because you're often exposing additional API to those languages from your engine's core language. This means that unless you're using something like FFI bindings or something else that your editor may be aware of (infer these hidden API declarations from), then you're most likely going to find that the functionality you're exposing to the language is not presented to you in your editor.

While this might not feel like a huge problem, going the extra lengths to enable these capabilities I've found to be well worth the effort. In a previous game engine I worked on, I sorely regret not doing something like this. As at the time I was complacent with simply referring to my docs, and using the standard auto-completions the editor provided for me because I was using Javascript. But looking back I realize just how much time I wasted frequenting the docs.

Improvements

Aside from the editing experience, other improvements were made to the language. These consisted of defining strict naming conventions, and optimizing the execution of the scripts. However the aim with any of these improvements was to not come at a cost of making the scripting language itself more complicated.

Naming Conventions

Due to the language being very flexible, it didn't take long before I ran across the problem of naming collisions. Having state using the same name as a function, or an optional argument in a function having the same name as some state, etc. This simplest way to get around this was to define some strict naming rules to indicate what type of atom is being used. The new naming rules defined:

The additional benefit of applying these strict naming rules is we can now optimize the evaluation of these expressions, as we no longer need to find out what type it is. The type can instead be inferred directly from the prefix/suffix being used. In other words a state reference now simply looks up it's state, the evaluator doesn't try to test if it is a function.

Lookups

As the original implementation just went about looking up functions and state very naively (string comparison in array), and this was one thing that I knew would have a detrimental effect on performance. The time arrived that it needed to be optimized, and became one of the key areas to focus on.

The most obvious improvement to make was to use hash maps for the actual lookup operation. Especially as function registration is very infrequent, while actual lookup happens all the time. While to be expected this had a noticeable improvement, there was still much room for improvement. So the next glaring issue turned out to be hashing of strings.

As most strings are small and so take the form of one of the tagged string variants, this also created the problem of not being able to cache the hash value. So to alleviate this I implemented a global cache for tagged string hashes. This had the benefit of not only improving scripting performance but the performance of strings throughout the engine.

The next improvement was to remove the need for the repeated lookups. This was done by making the tagged atom expression for a function type, actually be an index reference to its function. So lookup then became simply a shift and array index. This could've been taken a step further by completely eliminating the need for any kind of lookup by storing the function pointer itself in the tagged type rather than an index to that function pointer. However this would have meant we need to make guarantees about the layout in memory that function's can reside, which I didn't want to make.

Constant Lists

One redundant operation that was identified was evaluating lists. The reason this was a problem was because every time a list was evaluated, it would return a copy of that list. To fix this, there is now a check to determine if a list is constant (contains only constant elements), which can be determined at parsing time as well as runtime. And if it is we only need to return a reference to that list instead of requiring a copy and all of its elements to be evaluated too.

State Invalidators

While all the improvements helped, there was still one underlying flaw that had the biggest implication in terms of performance. State values would often contain a lot of logic instead of being constant values (all the quoted expressions you'd see, this essentially allows you to achieve a reactive based flow). The reason this was so bad was because if one state depended on the value of another state, and that state depends on the value of two other states, etc. You ended up having huge amounts of state being re-evaluated every time another state was referenced.

This was such a big problem in-fact that rendering only 25 GUI sliders was causing a ~75% performance hit on my slow machine. While the solution was obvious, the unnecessary repeated evaluations of state needed to stop. Coming up with a fix that didn't hurt the clarity of the scripts, nor was going to require a big programming effort on my part was not so easy.

Thinking about when state actually needed to be re-evaluated, such as every reference, once per frame, on window resize, etc. Led me to the solution of associating invalidators with state. These invalidators are expressions that are evaluated every time a state is referenced to determine if the state's value needs to be re-evaluated or not. Now this does move the actual problem over to the invalidators, since there will be repeated redundant invalidator evaluation now. It is still loads better than what it was.

This is now what the current state of the scripting looks like:

(gui "gui-button"
    (enum!
        "&normal"
        "&highlighted"
        "&pushed"
    )

    (state! ".colour" (255 211 73) (invalidate: (quote (frame-changed?))))
    (state! ".radius" (quote (/ 8.0 (max .width .height))) (invalidate: (quote (frame-changed?))))
    (state! ".outline" 0 (invalidate: (quote (frame-changed?))))
    (state! ".outline-colour" (quote (darken .colour 5)) (invalidate: (quote (frame-changed?))))
    (state! ".status" &normal)
    (state! ".label" (quote (
        (quote (render-rect .rect .colour (radius: .radius) (outline: .outline .outline-colour)))  ; &normal
        (quote (render-rect .rect (lighten .colour 5) (radius: .radius) (outline: .outline (lighten .outline-colour 5)))) ; &highlighted
        (quote (render-rect .rect (darken .colour 5) (radius: .radius) (outline: .outline (darken .outline-colour 5))))  ; &pushed
    )) (invalidate: (quote (frame-changed?))))
    (state! ".on-click" (quote (print "Clicked!")))
    (state! ".inside" #f)
    (state! ".dragged" #f)

    (render:
        (unquote (get .status .label))
    )

    (control:
        (on (click: :left .rect)
            (if @press
                ((.status! &pushed) (.inside! #t) (.dragged! #t))
                ((.status! &highlighted) (if .inside .on-click) (.inside! #f) (.dragged! #f))
            )
            ((.status! &normal) (.inside! #f) (.dragged! @press))
        )
        (if .dragged
            (if .inside (on (cursor: .rect) (.status! &pushed) (.status! &highlighted)))
            (on (cursor: .rect) (.status! &highlighted) (.status! &normal))
        )
    )
)
April 2 2016
by ScrimpyCat

Elixir: Implementing Protocols For Specific Values

I recently came across a problem, I wanted to use protocols to provide overrideable functionality for my data. The data consists of basic Elixir terms such as strings, tuples, lists. And the ideal solution I was looking for would allow me to provide functions for the different variations of these datatypes I was expecting, but giving me the ability to override any specific one of these variations (or adding a new variation) without having to redefine all the other functions for the other variations just because they're of the same type.

defprotocol Styler do
    @fallback_to_any true

    @type colour :: :default | :red | :green | :blue

    @spec format(Any) :: { colour, Any }
    def format(input)
end

defimpl Styler, for: Any do
    def format(input), do: { :default, input }
end

defimpl Styler, for: Integer do
    def format(input) when input < 0, do: { :red, input }
    def format(input), do: { :default, input }
end

#Somewhere else we decide we want: 0 => { :blue, 0 }
defimpl Styler, for: Integer do
    def format(0), do: { :blue, 0 }
end

As Elixir only allows implementations to set the behaviour for a single type, this means the re-implementation for the Integer type will cause the previous defintions to be lost. And so doing either Styler.format(1) or Styler.format(-1) will not have a matching function and will cause an error. This behaviour is fine for typical usage, but when we want more specific control this becomes a problem. Since if we wanted it to work we would need to then add the previously defined functions in this new implementation. The only type Elixir allows more specific control of is structs, as you're able to create a new implementation for each struct without impacting the implementations of other structs.

Structs To The Rescue!

So let's do that, lets use different structs in-place of these specific values. So we can customize the behaviour for specific values, without affecting those already defined.

defmodule Styler.Type.Integer.Zero do
    defstruct input: 0
end

defmodule Styler.Type.Integer.Positive do
    defstruct input: 1
end

#...

Ok that isn't going to work if we have lots of values we want to make structs for. What it we wanted to access every individual integer separately? Yeh, it's definitely not the right way to go about it.

Because of the metaprogramming capabilities in Elixir, one possible solution would be to generate the structs for the expected inputs. This could work fine, but we're now polluting the VM with potentially lots of modules, that we only need for this hack. That seems kind of ugly. Hmm, well structs are maps, what if we do something with that.

Maps To The Rescue!

As you're probably aware, structs are just maps with an additional member __struct__, which contains the struct type (module). Defining a struct does do a few additional things such as creating a module (the module that defines the struct), and creating a struct function for default values and lookup. But we don't want any of that other stuff, we only need a map that looks like a struct so our protocol implementations can be applied separately.

defimpl Styler, for: Any do
    def format(%{ __struct__: _, input: input }), do: { :default, input }
    def format(input), do: { :default, input }
end

defimpl Styler, for: Integer do
    def format(0), do: Styler.format(%{ __struct__: String.to_atom("Elixir.Styler.Type.Integer.0"), input: 0 })
    def format(input) when input < 0, do: Styler.format(%{ __struct__: String.to_atom("Elixir.Styler.Type.Integer.Negative"), input: input })
    def format(input), do: Styler.format(%{ __struct__: String.to_atom("Elixir.Styler.Type.Integer.Positive"), input: input })
end

defimpl Styler, for: Styler.Type.Integer.Positive do
    def format(%{ input: input }), do: Styler.format(%{ __struct__: String.to_atom("Elixir.Styler.Type.Integer.Positive." <> to_string(input)), input: input })
end

defimpl Styler, for: Styler.Type.Integer.Negative do
    def format(%{ input: input }), do: Styler.format(%{ __struct__: String.to_atom("Elixir.Styler.Type.Integer.Negative." <> to_string(input)), input: input })
end

#Override the behaviours
defimpl Styler, for: Styler.Type.Integer.Negative do
    def format(%{ input: input }), do: { :red, input }
end

defimpl Styler, for: :"Elixir.Styler.Type.Integer.0" do
    def format(_), do: { :blue, 0 }
end

defimpl Styler, for: :"Elixir.Styler.Type.Integer.Positive.1" do
    def format(_), do: { :green, 1 }
end

# iex(1)> Styler.format -2
# {:red, -2}
# iex(2)> Styler.format -1
# {:red, -1}
# iex(3)> Styler.format 0 
# {:blue, 0}
# iex(4)> Styler.format 1
# {:green, 1}
# iex(5)> Styler.format 2
# {:default, 2}

This technique can be applied in a variety of ways depending on how you want to separate values. There are some gotchas to this approach however.

March 13 2016
by ScrimpyCat

Elixir Metaprogramming

The metaprogramming and macroing capabilities of Elixir were one of the key features that originally draw me too the language. As someone that was coming from Erlang, many of the attractive features of Elixir were actually features of Erlang. I also didn't mind Erlang's syntax, though the punctuation can be a little off-putting when beginning, and actually prefer some of its syntactical choices over Elixir. Such as the syntax used for messaging, functions, and comprehension lists. But luckily Elixir offered many great reasons to pick it up, the macroing system being one of them, the tooling, a more sensible standard library, etc. In this post I thought I'd discuss some of the possible capabilities for metaprogramming within Elixir.

AST

As Elixir gives you access to the AST (abstract syntax tree) which is conveniently represented using its own data types, it allows for some very powerful metaprogramming capabilities. The AST is structured like so:

:atom #Atoms
1 #Integers
1.0 #Floats
[1, 2] #Lists
"string" #Strings
{ key, value } #Two element tuples
{ atom | tuple, list, list | atom } #Everything else

These AST's can be created through quoting expressions (quote do: 1 + 2), manually creating the terms ({ :+, [], [1, 2] }), or using the convenience functions provided in the Macro module.

While the AST's that have the exact same representation as the type they're representing (quote literals) are easy to understand, the more complex catch-all representation is a little more involved. The fields of it can be used to include the operation type, the context data (context, module, line number), and the arguments.

Quoting, Unquoting, Evaluation

The core operations around working with these AST's comes down to quoting and unquoting expressions, and evaluating those AST's.

Quoting an expression, as mentioned above, will convert a block into its AST representation. This is done using the quote function defined in Kernel.SpecialForms. Options may optionally be passed to the function to change its behaviour.

iex(1)> quote do: 1 + 2 * 3
{:+, [context: Elixir, import: Kernel],
 [1, {:*, [context: Elixir, import: Kernel], [2, 3]}]}

iex(1)> quote do
...> IO.puts "hello"
...> IO.puts "world"
...> end
{:__block__, [],
 [{{:., [], [{:__aliases__, [alias: false], [:IO]}, :puts]}, [], ["hello"]},
  {{:., [], [{:__aliases__, [alias: false], [:IO]}, :puts]}, [], ["world"]}]}

iex(1)> quote do: a + b * c
{:+, [context: Elixir, import: Kernel],
 [{:a, [], Elixir},
  {:*, [context: Elixir, import: Kernel],
   [{:b, [], Elixir}, {:c, [], Elixir}]}]}

Unquoting an expression allows you to evaluate the current expression and inject its result into the quoted expression. This is done using the unquote function defined in Kernel.SpecialForms.

iex(1)> a = 1
1
iex(2)> b = 2
2
iex(3)> c = 3
3
iex(4)> quote do: unquote(a) + unquote(b) * unquote(c)
{:+, [context: Elixir, import: Kernel],
 [1, {:*, [context: Elixir, import: Kernel], [2, 3]}]}

iex(1)> quote do: quote do: a
{:quote, [], [[do: {:a, [], Elixir}]]}
iex(2)> quote do: unquote(quote do: a)
{:a, [], Elixir}

Evaluating an AST can be done in a couple of different ways. We can have the AST compiled with our code (the AST will be injected into our code), or we can evaluate the AST at runtime and retrieve the result. The former can be achieved using macros, or where the AST represents a valid module Code.compile_quoted defined in Code can be used. While the latter can be achieved using Code.eval_quoted defined in Code, where variable bindings and additional options can be provided.

iex(1)> Code.compile_quoted(quote do
...> defmodule Example do
...> def hello, do: IO.puts "hello"
...> end
...> end)
[{Example,
  <<70, 79, 82, 49, 0, 0, 4, 224, 66, 69, 65, 77, 69, 120, 68, 99, 0, 0, 0, 128, 131, 104, 2, 100, 0, 14, 101, 108, 105, 120, 105, 114, 95, 100, 111, 99, 115, 95, 118, 49, 108, 0, 0, 0, 4, 104, 2, 100, 0, ...>>}]
iex(2)> Example.hello
hello
:ok

iex(1)> Code.eval_quoted(quote do: IO.puts "hello")
hello
{:ok, []}

Macros

Macros allow for an AST to be injected into the code of the caller. How does this differ from functions? In two ways, it essentially allows us to have code injected at compile-time (this could be new code or constant expressions), and allows for metaprogrammability.

Macros are created in much the same way functions are, with the exception that we use defmacro or defmacrop instead.

defmodule MacroExample do
    defmacro add(x, y) do #Return the AST to add two values together
        quote do
            unquote(x) + unquote(y)
        end
    end

    defp add_next(0), do: 0
    defp add_next(i) when i > 0 do
        quote do: unquote(add_next(i - 1)) + unquote(i)
    end

    defmacro sum(count) do #Return the AST to sum n values together
        add_next(count)
    end

    defmacro sum_constant(count) do #Return the constant value result for the sum
        { v, _ } = Code.eval_quoted(add_next(count))
        v
    end
end

In the above example we have defined three different macros: add/2, sum/1, sum_constant/1. The first two will return the AST of the code representing the operation, while the last will evaluate this expression and return the result. The difference here is that after compilation, the first two will only workout the value at runtime, compared to the third which will compile with the value. To have a look at AST being injected when using these functions, we can use the function Macro.expand_once defined in Macro.

iex(1)> Macro.expand_once(quote(do: MacroExample.add(1,2)), __ENV__)
{:+, [context: MacroExample, import: Kernel], [1, 2]}
iex(2)> Macro.expand_once(quote(do: MacroExample.add(1,2)), __ENV__) |> Macro.to_string
"1 + 2"

iex(1)> Macro.expand_once(quote(do: MacroExample.sum(4)), __ENV__)
{:+, [context: MacroExample, import: Kernel],
 [{:+, [context: MacroExample, import: Kernel],
   [{:+, [context: MacroExample, import: Kernel],
     [{:+, [context: MacroExample, import: Kernel], [0, 1]}, 2]}, 3]}, 4]}
iex(2)> Macro.expand_once(quote(do: MacroExample.sum(4)), __ENV__) |> Macro.to_string
"0 + 1 + 2 + 3 + 4"

iex(1)> Macro.expand_once(quote(do: MacroExample.sum_constant(4)), __ENV__)
10
iex(2)> Macro.expand_once(quote(do: MacroExample.sum_constant(4)), __ENV__) |> Macro.to_string
"10"

Manipulating The AST

One of the important features that allows for powerful metaprogramming capabilities is the ability to modify the AST directly. This is showcased somewhat with the previous functions by creating AST's from inputs. But can be taken even further, as we can traverse and manipulate the AST further if we wish. A simple example could be something like the following:

iex(1)> {:+,ctx,[v1,v2]} = quote do: 1 + 2
{:+, [context: Elixir, import: Kernel], [1, 2]}
iex(2)> Code.eval_quoted({:+,ctx,[v1 * 10, v2 * 5]})
{20, []}

More complicated examples could be as a restructuring pipeline, or an optimization step. There's many possibilities here for what you can do, as you're given to the flexibility to manipulate as you see fit to solve your current problem.

Techniques For Metaprogramming

There's a few different strategies that can be used for metaprogramming. One of which is using macros literally, where they return the AST and inject it into the code using them (as shown in an above example). Another is using an external file/non-valid elixir terms to generate an AST from. The last is to use macros to configure behaviours that the AST will later be generated from.

An example for the file/non-valid elixir terms can be seen with how Elixir supports Unicode characters. This is where you may have data in a format other than standard Elixir terms, and use that information to generate the code.

defmacro conditions(input, list) do
    quote do
        cond do
            unquote(Enum.map(list, fn { result, match } -> 
                [condition] = quote do
                    unquote(input) =~ unquote(match) -> unquote(result)
                end
                condition
            end))
        end
    end
end

@input """
integer: ^[+-]?\\d*$
float:   ^[+-]?\\d*\\.?\\d*$
string:  ^".*"$
"""
def get_type(string) when is_bitstring(string) do
    conditions(string, unquote(Macro.escape(String.split(@input, "\n", trim: true) |> Enum.map(fn s ->
        [type, match] = String.split(s, ":", parts: 2)
        { String.to_atom(String.strip(type)), Regex.compile!(String.strip(match)) }
    end))))
end

#Generates this function:
#def get_type(string) when is_bitstring(string) do
#    cond do
#        string =~ ~r/^[+-]?\d*$/ -> :integer
#        string =~ ~r/^[+-]?\d*\.?\d*$/ -> :float
#        string =~ ~r/^".*"$/ -> :string
#    end
#end

Using macros for configuration is another powerful technique. As this allows for the code generation step to be deferred, after all of the inputs have been initialized. There are a few important features in achieving this: module attributes, the __using__/1 macro described in Kernel, and the __before_compile__/1 macro described in Module. An example for using macros for configuration can be seen in the Plug library.

defmodule Matcher do
    defmacro __using__(_options) do
        quote do
            import Matcher

            @before_compile unquote(__MODULE__)
            @matches []
        end
    end

    defmacro __before_compile__(env) do
        quote do
            def get_type(string) when is_bitstring(string) do
                cond do
                    unquote(Enum.map(Enum.reverse(Module.get_attribute(env.module, :matches)), fn { result, match } -> 
                        [condition] = quote do
                            string =~ unquote(match) -> unquote(result)
                        end
                        condition
                    end))
                end
            end
        end
    end

    defmacro match([condition]) do
        quote do
            @matches [unquote(Macro.escape(condition))|@matches]
        end
    end
end

defmodule TypeMatcher do
    use Matcher
    
    match integer: ~r/^[+-]?\d*$/
    match float: ~r/^[+-]?\d*\.?\d*$/
    match string: ~r/^".*"$/
end

This last technique is very convenient for building DSLs within Elixir. A more fleshed out example can be seen in my binary parser/loader DSL Tonic. Below is an example of the DSL.

#Example PNG loader (parses a few different chunks)
defmodule PNG do
    use Tonic, optimize: true

    endian :big
    repeat :magic, 8, :uint8
    repeat :chunks do
        uint32 :length
        string :type, length: 4
        chunk get(:length) do
            on get(:type) do
                "IHDR" ->
                    uint32 :width
                    uint32 :height
                    uint8 :bit_depth
                    uint8 :colour_type
                    uint8 :compression_type
                    uint8 :filter_method
                    uint8 :interlace_method
                "gAMA" ->
                    uint32 :gamma, fn { name, value } -> { name, value / 100000 } end
                "cHRM" ->
                    group :white_point do
                        uint32 :x, fn { name, value } -> { name, value / 100000 } end
                        uint32 :y, fn { name, value } -> { name, value / 100000 } end
                    end
                    group :red do
                        uint32 :x, fn { name, value } -> { name, value / 100000 } end
                        uint32 :y, fn { name, value } -> { name, value / 100000 } end
                    end
                    group :green do
                        uint32 :x, fn { name, value } -> { name, value / 100000 } end
                        uint32 :y, fn { name, value } -> { name, value / 100000 } end
                    end
                    group :blue do
                        uint32 :x, fn { name, value } -> { name, value / 100000 } end
                        uint32 :y, fn { name, value } -> { name, value / 100000 } end
                    end
                "iTXt" ->
                    string :keyword, ?\0
                    string :text
                _ -> repeat :uint8
            end
        end
        uint32 :crc
    end
end

#Example load result:
#{{:magic, [137, 80, 78, 71, 13, 10, 26, 10]},
# {:chunks,
#  [{{:length, 13}, {:type, "IHDR"}, {:width, 48}, {:height, 40},
#    {:bit_depth, 8}, {:colour_type, 6}, {:compression_type, 0},
#    {:filter_method, 0}, {:interlace_method, 0}, {:crc, 3095886193}},
#   {{:length, 4}, {:type, "gAMA"}, {:gamma, 0.45455}, {:crc, 201089285}},
#   {{:length, 32}, {:type, "cHRM"}, {:white_point, {:x, 0.3127}, {:y, 0.329}},
#    {:red, {:x, 0.64}, {:y, 0.33}}, {:green, {:x, 0.3}, {:y, 0.6}},
#    {:blue, {:x, 0.15}, {:y, 0.06}}, {:crc, 2629456188}},
#   {{:length, 345}, {:type, "iTXt"}, {:keyword, "XML:com.adobe.xmp"},
#    {:text,
#     <<0, 0, 0, 0, 60, 120, 58, 120, 109, 112, 109, 101, 116, 97, 32, 120, 109, 108, 110, 115, 58, 120, 61, 34, 97, 100, 111, 98, 101, 58, 110, 115, 58, 109, 101, 116, 97, 47, 34, ...>>},
#    {:crc, 1287792473}},
#   {{:length, 1638}, {:type, "IDAT"},
#    [88, 9, 237, 216, 73, 143, 85, 69, 24, 198, 241, 11, 125, 26, 68, 148, 25,
#     109, 4, 154, 102, 114, 192, 149, 70, 137, 137, 209, 152, 152, 24, 19, 190,
#     131, 75, 22, 234, 55, 224, 59, ...], {:crc, 2269121590}},
#   {{:length, 0}, {:type, "IEND"}, [], {:crc, 2923585666}}]}}

The code generated by the above example will be:

(
  def(load(currently_loaded, data, name, endian)) do
    loaded = {}
    scope = []
    endian = :big
    {value, data} = repeater(&load_repeat___tonic_anon___41_11/4, 8, [scope | currently_loaded], data, :magic, endian, fn {name, value} -> {name, Enum.map(value, fn {i} -> i end)} end)
    loaded = :erlang.append_element(loaded, value)
    scope = [var_entry(:magic, value) | scope]
    {value, data} = repeater(&load_repeat_chunks_6_1/4, fn _ -> false end, [scope | currently_loaded], data, :chunks, endian)
    loaded = :erlang.append_element(loaded, value)
    scope = [var_entry(:chunks, value) | scope]
    {loaded, data}
  end
  def(load_group_blue_34_8(currently_loaded, data, name, endian)) do
    loaded = {name}
    scope = []
    {value, data} = callback(uint32([scope | currently_loaded], data, :x, endian), fn {name, value} -> {name, value / 100000} end)
    loaded = :erlang.append_element(loaded, value)
    scope = [var_entry(:x, value) | scope]
    {value, data} = callback(uint32([scope | currently_loaded], data, :y, endian), fn {name, value} -> {name, value / 100000} end)
    loaded = :erlang.append_element(loaded, value)
    scope = [var_entry(:y, value) | scope]
    {loaded, data}
  end
  def(load_repeat___tonic_anon___41_11(currently_loaded, data, name, endian)) do
    loaded = {}
    scope = []
    {value, data} = callback(uint8([scope | currently_loaded], data, nil, endian), fn {_, value} -> value end)
    loaded = :erlang.append_element(loaded, value)
    scope = [var_entry(nil, value) | scope]
    {loaded, data}
  end
  def(load_repeat_chunks_6_1(currently_loaded, data, name, endian)) do
    loaded = {}
    scope = []
    {value, data} = uint32([scope | currently_loaded], data, :length, endian)
    loaded = :erlang.append_element(loaded, value)
    scope = [var_entry(:length, value) | scope]
    {value, data} = repeater(&load_repeat___tonic_anon___41_11/4, 4, [scope | currently_loaded], data, :type, endian, &convert_to_string/1)
    loaded = :erlang.append_element(loaded, value)
    scope = [var_entry(:type, value) | scope]
    (
      size = get_value([scope | currently_loaded], :length)
      next_data3 = binary_part(data, size, byte_size(data) - size)
      data = binary_part(data, 0, size)
    )
    (
      case(get_value([scope | currently_loaded], :type)) do
        "IHDR" ->
          {value, data} = uint32([scope | currently_loaded], data, :width, endian)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:width, value) | scope]
          {value, data} = uint32([scope | currently_loaded], data, :height, endian)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:height, value) | scope]
          {value, data} = uint8([scope | currently_loaded], data, :bit_depth, endian)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:bit_depth, value) | scope]
          {value, data} = uint8([scope | currently_loaded], data, :colour_type, endian)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:colour_type, value) | scope]
          {value, data} = uint8([scope | currently_loaded], data, :compression_type, endian)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:compression_type, value) | scope]
          {value, data} = uint8([scope | currently_loaded], data, :filter_method, endian)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:filter_method, value) | scope]
          {value, data} = uint8([scope | currently_loaded], data, :interlace_method, endian)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:interlace_method, value) | scope]
        "gAMA" ->
          {value, data} = callback(uint32([scope | currently_loaded], data, :gamma, endian), fn {name, value} -> {name, value / 100000} end)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:gamma, value) | scope]
        "cHRM" ->
          {value, data} = load_group_blue_34_8([scope | currently_loaded], data, :white_point, endian)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:white_point, value) | scope]
          {value, data} = load_group_blue_34_8([scope | currently_loaded], data, :red, endian)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:red, value) | scope]
          {value, data} = load_group_blue_34_8([scope | currently_loaded], data, :green, endian)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:green, value) | scope]
          {value, data} = load_group_blue_34_8([scope | currently_loaded], data, :blue, endian)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:blue, value) | scope]
        "iTXt" ->
          {value, data} = repeater(&load_repeat___tonic_anon___41_11/4, fn [{c} | _] -> c == 0 end, [scope | currently_loaded], data, :keyword, endian, &convert_to_string_without_last_byte/1)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:keyword, value) | scope]
          {value, data} = repeater(&load_repeat___tonic_anon___41_11/4, fn _ -> false end, [scope | currently_loaded], data, :text, endian, &convert_to_string/1)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:text, value) | scope]
        _ ->
          {value, data} = repeater(&load_repeat___tonic_anon___41_11/4, fn _ -> false end, [scope | currently_loaded], data, :__tonic_anon__, endian, fn {_, value} -> Enum.map(value, fn {i} -> i end) end)
          loaded = :erlang.append_element(loaded, value)
          scope = [var_entry(:__tonic_anon__, value) | scope]
      end
      {loaded, scope, data}
    )
    data = next_data3
    {value, data} = uint32([scope | currently_loaded], data, :crc, endian)
    loaded = :erlang.append_element(loaded, value)
    scope = [var_entry(:crc, value) | scope]
    {loaded, data}
  end
)

Possibilities

As can be seen, using these different constructs allows for some very powerful metaprogramming capabilities within Elixir. Some of the possibilities are reducing large amounts of boilerplate code, restructuring pipelines, post-optimizations, ease of creating custom DSLs both within the syntax itself or out of new syntax and converting it to Elixir.

February 26 2016
by ScrimpyCat

Engine Dev: Scripting - Decisions & Tagged Pointers

The engine code focused on in this article can be found here: https://github.com/ScrimpyCat/CommonGameKit/tree/master/src/scripting.

Offering scripting in an engine isn't a must, but for an engine written in a language such as C, having a scripting language can provide a lot of benefit. As you can easily allow stuff like hot code reloading, simplify certain actions, offer a more user friendly way to interface with the engine, etc.

Selecting The Language

In the past I've used Javascript, JSCocoa (extended with some of my own modifications), a visual node-based language. While these were all fine options, I wanted something a little different out of this scripting language.

I had been spending a lot of time with Elixir and a key interest of mine with the language (coming from Erlang) was its macro system. Being able to access the AST and manipulate it however you like, in the language itself. This led me to learning about homoiconicity (Elixir isn't homoiconic however) where the language itself is also its data, such as with languages like Lisp or Scheme. While homoiconicity for the purpose of metaprogramming was not of concern for how I saw its use in the engine, the fact that the language is also the structure of its data is what made it an attractive concept.

One Syntax For All The Things!

The thought of being able to use the same syntax for custom data formats (in-place of where you might have used XML, some other markup language, or even rolled some custom structure in the past) and for the scripts, was a tempting concept.

Now there are languages that have a data format, and then extra syntactical features for representing operations/manipulation and program structure which is not included. Such as Javascript with JSON, or Erlang/Elixir with its data types (terms can be parsed using io:read), etc. But the languages with a Lisp-like representation won me over, as there was something so elegant about exact relation between data structure and code.

; Example of current scripting and what I had in mind at the time.
(library "mylib" ; A shader library with fixed sources
    (if @opengl (
        (source "vert" vertex (dir "myshader.vs"))
        (source "frag" fragment (dir "myshader.fs"))
    ))
)

(library "core" ; A shader library that uses all the .fs/.vs files in the current directory
    (if @opengl (
        (loop "file" (search @cd ("*.fs" "*.vs"))
            (source
                (replace "." "-" (filename file))
                (if (suffix "fs" file) fragment vertex)
                (dir file)
            )
        )
    ))
)

(gui "gui-rect" ; Creating a new UI object type
    (state! "colour" (1.0 0 0))
    (render
        (render-rect rect colour)
    )
)

(gui-rect ; Using the UI
    (enabled! 0)
    (rect! 0 0 (quote (super (percent-width 50))) (quote (super (percent-height 50))))
    (children
        (gui-rect
            (colour! (0 0 1.0))
            (rect! (quote ((super (percent-width 25)) (super (percent-height 25)) (super (percent-width 50)) (super (percent-height 50))))
        )
    )
)

Scheme

So it was decided, Scheme was my favourite choice for a language to use. I started looking for implementations which would be appropriate, some obvious options were: Chibi Scheme, s7, TinyScheme. With some recommended options being: Guile and CHICKEN.

The Story

I ended up going with Chibi Scheme, as its benchmarks looked good (faster than vanilla Lua) and the C API did not look too complicated. However I had trouble getting it to work as a library. Evaluating expressions would just return an exception. I was definitely messing something up, but not sure what, and the limited docs (from what I could find) didn't help much either.

At the time I didn't think much of it, I'll revisit the issue when I need to. But I did want to start introducing some configuration files to the engine. So the obvious solution was to roll my own, right? RIGHT?. As I only need it for some simple data representation, I really only needed some basic parsing (parsing a Lisp-like language is one of the simplest parsers you can write). So I decided to do just that.

The implementation is different than your usual implementation, as I wasn't expecting to use it for anything more (only as a temporary solution). So I didn't go with s-expressions, but rather an expression or list (array) of expressions. And didn't do much in the way of designing evaluation to be efficient, I went with the lazy option of just making it copy heavy (expressions are copied when they're passed around).

In Too Deep

I eventually kept pushing off using a proper implementation, and kept hacking away at my own, adding features as needed. I did get to a point where I saw if I go any further with it I'd probably just stick with it. So I decided to try out some of the other implementations, s7 was very easy to setup and use. And I gave serious consideration to using it, but I ultimately decided the API I had created for my own language was simpler to use. But if I ever reach a problem point with it, I'll likely switch.

So I kept this odd hack of a language. Now one lingering thought was performance, I knew I'd have to optimize it at some point. One approach would be to just redesign it, but there were obvious optimizations I could make in the meantime for reasonable performance gain. They might even be enough, as I'm not planning on making scripts very logic heavy anyway.

Two of the clear areas to be optimized would be function lookup and copying expressions. Function lookups are currently just looping an array of strings looking for a match, an obvious optimization would be to convert it to a hash map. While copying expressions is another big problem, as a copy is an allocation. Two solutions (both can be used together) to this problem are introducing reference counting, as expressions are immutable this is a good approach. While the other is to introduce tagged types (tagged pointers), to allow the creation of types that don't actually need to allocate data but can store the necessary data in the pointer itself.

Tagged Pointers

Tagged pointers are a common concept among language developers/compiler hackers. It essentially involves storing additional data in a pointer. This is often done because many platforms different types have different alignments, and many have malloc's which return pointers at word boundaries. If that is true for a particular platform, it essentially means you always have some bits that will never be used. Or you can choose to enforce alignment yourself, so you'll know you have the number of bits required free.

Now I was using it as a way to alleviate the cost of copying expressions, by forgoing from allocating whenever possible. My expressions have the following types: atoms (string), strings, integers (int32), floats (float/binary32), list, custom data. Obvious choices to apply this concept are strings, integers, floats, and atoms.

Strings

With strings I decided to leverage the tagged variety my string implementation already supports. It does this by reserving the last 2 bits of the pointer to indicate what type of string it is. There is essentially 2 (well 3, a constant string but that's not included in the tagged pointer) strings, a regular string (pointer to structure containing string data), and a tagged string (pointer is the string itself).

The tagged strings act essentially as an array of character map indices shifted together. They're limited to a particular max length, but the character maps can be customized for the apps particular use case and 3 different maps of differing bit ranges are supported to extend the length of the strings. Those 2 bit values mentioned previously, indicate one of the three different versions of tagged strings.

xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx00 -> pointer to allocated string
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx01 -> tagged string (CCStringMapSet127)
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx10 -> tagged string (CCStringMapSet63)
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx11 -> tagged string (CCStringMapSet31)

With 2 bits used up from the pointer this gives us 30 bits to spare on 32 bit platforms, and 62 bits to spare on 64 bit platforms. The 3 character maps used, use the bit ranges of 5 (31 character lookup), 6 (63 character lookup), 7 (127 character lookup). So the 5 bit range (smallest lookup) will allow a string of length 6 (on 32 bit platform) and 12 (on 64 bit platform).

32 bit
66666555 55444443 33332222 211111tt : max 6 characters (CCStringMapSet31)
55555544 44443333 33222222 111111tt : max 5 characters (CCStringMapSet63)
xx444444 43333333 22222221 111111tt : max 4 characters (CCStringMapSet127)

64 bit
xxcccccb bbbbaaaa a9999988 88877777 66666555 55444443 33332222 211111tt : max 12 characters (CCStringMapSet31)
xxaaaaaa 99999988 88887777 77666666 55555544 44443333 33222222 111111tt : max 10 characters (CCStringMapSet63)
xxxxxx88 88888777 77776666 66655555 55444444 43333333 22222221 111111tt : max 8 characters (CCStringMapSet127)

Now because of this, I decided to preserve the last 2 bits of a tagged expression to indicate a string (a tagged string). As this saves me from having to add any extra complication to using strings in this manner. Because of this any following tagged expressions will need to use bits above the 2 bit range (those will have to remain empty). So I went with a 3rd bit to indicate that it's one of the other tagged expression types, and then require only 2 more bits to select the type. Reason I went with this is it allows me to only need pointers to be 8 byte aligned. Since to test whether an expression is a tagged expression only requires checking if any of the first 3 bits are set.

xxxxxxxx xxxxxxxx xxxxxxxx xxxxx000 -> pointer to allocated expression
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx01 -> tagged expression (string)
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx10 -> tagged expression (string)
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx11 -> tagged expression (string)
xxxxxxxx xxxxxxxx xxxxxxxx xxxxx100 -> tagged expression

xxxxxxxx xxxxxxxx xxxxxxxx xxx00100 -> tagged expression (atom)
xxxxxxxx xxxxxxxx xxxxxxxx xxx01100 -> tagged expression (integer)
xxxxxxxx xxxxxxxx xxxxxxxx xxx10100 -> tagged expression (float)
xxxxxxxx xxxxxxxx xxxxxxxx xxx11100 -> tagged expression (unused)

Integers

As the other tagged expression types are left with 27 unused bits on a 32 bit platform, and 59 unused bits on a 64 bit platform. This meant that on a 64 bit platform as the language only uses 32 bit integers, we could store the entire integer in a tagged expression. However a 32 bit platform could only store 27 bits worth.

To more effectively make use of the storage on a 32 bit platform, I assumed the most common integer values that will be used will be around the 0 range, rather than being close to INT32_MIN or INT32_MAX. So an even distribution (including the signed bit) using 27 bits would allow for a value range of -67108864 to 67108863.

32 bit (only between range -67108864 to 67108863)
vvvvvvvv vvvvvvvv vvvvvvvv vvs01100 : s = signed bit, v = value
vvvvvvvv vvvvvvvv vvvvvvvv vv001100 : value << 5 (positive)
vvvvvvvv vvvvvvvv vvvvvvvv vv101100 : ~(-1 + (value + 1)) << 5 (negative)

64 bit (full range INT32_MIN to INT32_MAX)
xxxxxxxx xxxxxxxx xxxxxxxx xxxvvvvv vvvvvvvv vvvvvvvv vvvvvvvv vvv01100 : value << 5

Floats

Like integers, with float's being binary32's, the 64 bit implementation is trivial. However deciding how to handle it on 32 bit platforms becomes a little trickier. While I just took a naive approach by reducing the precision of the mantissa, so tagged floats are any that can be represented with 1 signed bit, 8 exponent bits, and only 18 (instead of 23) mantissa bits. While this is trivial and fast, it leaves a lot to be desired as it is not an effective distribution of values.

One approach could be reducing the precision of the exponent, however with 5 bits being used up, reducing the exponent down to 3 bits does not allow for a very big range of values. One obvious range might be to use it around 127 to 134 (127 + 7 bit value). This would give us full binary32 precision from the ranges of 1.0e+00 up to 256.0e+02 (and their negatives). Or if wanting to include some of the lower range, using 123 to 130 might be a good range. This would give the full range of precision for values in the ranges of 6.25e-02 up to 1.6e+01 (and their negatives). But trying to include more of the lower range starts to become more problematic, as the precision starts to grow.

Another approach is storing it in an entirely different format, that can be better optimized for its specific use. An example of this might be if we wanted a format that only needed 5 decimal places of precision, and ranged from 1.0 up to the highest value possible (including negative) with the 27 bits available. One possible way to represent this could be as an integer multiple of 100000 (so it has the 5 decimal places), where 0 will equal 1.0, and reserving the highest bit for the signed bit.

#include <stdint.h>

typedef union {
    float f;
    uint32_t u;
    int32_t i;
    struct {
        uint32_t mantissa : 23;
        uint32_t exponent : 8;
        uint32_t sign : 1;
    } internal; //Requires a compiler that will guarantee this bit order
} binary32_t;

float UInt32ToFloat(uint32_t v) //This is a redundant function, as it could simply cast it to a float. But this is more interesting :P
{
    //Find the highest set bit
    uint32_t e = v;
    e |= e >> 1;
    e |= e >> 2;
    e |= e >> 4;
    e |= e >> 8;
    e |= e >> 16;
    e ^= (e >> 1);

    //Get mask for lowest unset bits, and store the non-power of 2 value in m
    uint32_t m = (e = ~-e) & v;

    //Count set bits
    e -= ((e >> 1) & 0x55555555);
    e = (e & 0x33333333) + ((e >> 2) & 0x33333333);
    e = (e + (e >> 4)) & 0xf0f0f0f;
    e = (e * 0x1010101) >> 24;

    return (binary32_t){ .internal = { .exponent = e + 127, .mantissa = m << (23 - e) } }.f;
}

float IntRepToFloat(uint32_t v)
{
    _Bool s = v & 0x80000000; //Get signed bit
    v = (v & 0x7fffffe0) >> 5; //Get value (removing signed bit and tagged mask)
    v += 100000; //Adjust the value so it begins from 1.0
    binary32_t f = (binary32_t){ .f = UInt32ToFloat(v / 100000) }; //Convert the whole number to a float
    f.internal.sign = s; //Set the signed bit of the float

    //Get the fractional portion of the value (no need to worry about rounding as the expected input would be originally created from a valid float anyway)
    for (int Index = 0; (Index < 22) && (v); Index++)
    {
        v = (v % 100000) * 2;
        f.internal.mantissa |= ((v >= 100000) << (22 - Index)) >> (f.internal.exponent - 127);
    }

    return f.f;
}

/*
 IntRepToFloat(0) -> 1.00000
 IntRepToFloat(100000 << 5) -> 2.00000
 IntRepToFloat(125000 << 5) -> 2.25000
 IntRepToFloat((125000 << 5) | 0x80000000) -> -2.25000
 IntRepToFloat(0x7fffffe0) -> 672.08862 (max)
 IntRepToFloat(0xffffffe0) -> -672.08862 (min)
*/

The caveats of this approach however are it requires additional processing, whereas the other approaches only involve a few operations to repair the float. And in this particular example, it's not making effective use of all the bits we have available. As the floating point precision of the 5 decimal places become less effective the larger the value becomes. In other words, not all the bits will represent a unique floating point value.

What approach I'll change over to is still to be determined, as I'm not too sure what values will be used the most. There are some likely attributes I'd want. Will definitely want to support a range from 0.0 - 1.0 (including negative), as well as larger values, probably capping off around the 140 exponent value. Some things that's unlikely to be needed (or won't be a common occurrence) will be infinite values, NaN's, denormals, and as mentioned very large values.

Atoms

Implementing a tagged type for atoms has not been done as of yet, but the general idea is as follows. As atoms are mostly used as function references, state references, and as unique words (options). The best outcome is one that tells you the type of atom it is, has a reference or can obtain a reference to the function or state in as few steps as possible, and comparing of atoms is quick. One current caveat is new state could be created at anytime, and new functions could be added at anytime.

The current idea is to use some bits to denote the type of atom it is, this will essentially act as a cache, as the system will need to re-evaluate this if any breaking changes have been made. All the atom strings will be stored in a lookup, and reserve some bits of the tagged atom as an index reference to their string, so code that uses atom strings still functions correctly. And lastly depending on the atom type, if it is a function reserve some bits for an index to the function. Ideally some similar information could be used for states references too, but the current implementation of expression's state makes it a little tricky.

Concluding

In conclusion, this was a look at what led me to the decisions I made with the scripting language, and one of the optimization techniques used. I chose not to go into detail about the language, as it's currently changing too much and it's far from complete in its current state. So that will be saved for a future post.

February 26 2016
by ScrimpyCat

My Programming Backstory

How I got my start in programming is a bit different than the usual story, it seems to share more similarities with how people used to get into it over how most find their interest with it nowadays.

It all started in 2008 when I got my first computer (at age 16), prior to this I wasn't an avid computer user and often found myself not sure what I was doing half the time. Though I was an avid gamer (console, and occasionally some PC games at school). Well after getting my first computer, I was at first just using it like any typical teenager (entertainment, messaging, school work, etc.). After awhile I took a particular interest in a MMORPG I was playing at the time, and decided to look into how people went about using cheats/hacks in it.

First Exposure

At the time people were hacking it using a tool designed to simplify editing the data.pak file. With this we could change things such as disabling an enemy's aggressiveness or attack, the amount of enemies you could lure, etc. The other alternative was grabbing a hex editor and going through and making the changes to the data.pak manually, this was needed if we wanted to change things the tool didn't handle, such as the player's speed.

I played around with editing the data.pak for a little but it wasn't until sometime later when someone worked out how to unpack the data.pak that my interest and involvement really peaked.

Data Files Exposed!

Someone released a tool that could unpack the data.pak (expose its directory structure and files), and worked out that you could get the game to run using those unpacked files by adding a launch option. This was a big thing, especially for me, as it meant all the data was now structured logically and in readable/useful formats (such as .csv's).

As a result me and a friend took right to it, we made some alternate characters to hack on and just dived into the files. Seeing what could be changed, what happens if you change it. We figured out a number hacks, and started getting a pretty solid understanding of how things worked in the game.

It wasn't until quite sometime later (well into 2009) when I found out you could modify the executable itself.

You Can Do What?!

I had no idea that was even possible. But now knowing that you could actually take a look at all the executable's code (albeit its disassembly), and modify whatever you wanted.

So I grabbed OllyDbg (a windows debugger and disassembler for x86), and started stepping through the instructions. At first I had no idea what I was doing, but over time seeing what changes when you step over an instruction, and changing an instruction(s) to something else, I started to get a feel for how assembly worked. Yes, I really was crazy (naive?) enough to learn assembly without a resource (it's not a path I recommend others to take, although I do strongly recommend learning asm and I think it's a really good first language but please for your sake use a book, haha; and as a side note to any potential future employers, don't worry I have since gone through the Intel manuals).

After familiarising myself with assembly and OllyDbg, I then found an instruction reference sheet, and learnt about a tool call Cheat Engine. From here on out I started reversing routines in the game, finding what routines modify particular data and writing Cheat Engine auto-assembler scripts. And at some stage I moved onto MASM32 as well, and started writing separate Windows programs. One of the most amusing hacks during this period was the game gave special (game master) abilities to characters whose name contained a certain prefix. While most of these abilities were handled by the server, the only thing that was stopping a player from creating a character with that prefix was a clientside check (as you can guess you'd just nop that sucker out).

Watchout, I'll Hack You

Don't worry I never actually became like that. But I did begin to get interested in the general hacking side. I started getting involved in the reverse engineering scene (solving crackme's, keygenme's), as well as learning about exploitation.

This also tied back in with what I was doing in the game hacking scene. Some of these things were gaining admin access to a test version of their website, to finding a buffer overflow in the chat protocol for one game (never went through with exploiting this vulnerability however), to discovering a SQL exploit in a survey service a company used to give game currency (essentially in-app currency) for survey completions (was able to trick the service into thinking I completed a survey and hence get as much currency as required). But after sometime of hacking games and delving further into the more general reversing and exploitation side, I eventually began shifting over to the private server scene. As the private server files for one game I was involved in got leaked and people started running their own servers.

Introduction To Game Development

While I wouldn't really call this game development, at least not what one usually thinks when talking about game development, it was what led to my interest in it. With numerous people running their own servers, there was a need to find out what was possible with the current files (how could new stuff be added), and how to keep these old files up to date with the official game. That's where my involvement took place. Some of the more interesting discoveries was that the game's client (the same one everybody has) actually still had the editing tools (map editor, character/monster editor, visual effects debugging, UI editor, etc.) inside it, the only thing they removed was the code to launch it. So adding that, I was able to launch into the map editor or other editing modes. During this period I also briefly got interested in emulator development.

This all eventually led to me gaining an interest in actually making my own games sometime in 2010. The rest has kind of been history since then. I've moved into other areas that have taken my interest, but I've still kept up my interest in game development.