Monday, May 13, 2013

LLVM 3.2 and Caching

We've been relatively quiet for a long time (since last October) so I just wanted to post an update of some of the big changes that have been checked in last week.

First of all, the default branch now uses LLVM 3.2.  Arno Rehn did the port on a branch some time ago, but I had caching work happening on another branch and didn't have the bandwidth to do the merge at the time.  The introduction of LLVM 3.2 doesn't change very much, but it does keep us in sync with LLVM and the major distros.  It will also help us when we port to 3.3, which currently has a release candidate in testing.

The most exciting thing happening is that after a period of over 6 months of development, caching finally (mostly) works.  That is to say, we can run all but one of the non-bootstrapped tests with the -C option (which enables module caching) before and after persistence.

For the uninitiated, module caching has been an important planned feature for Crack, so much so that we've made it an essential feature for the 1.0 release.  With caching enabled, when we compile a module we store the products of the compile (meta-data and an LLVM bit-code file) to a persistent cache, so the next time you run something that needs that module it gets loaded from the cache, saving the executor from having to compile it and substantially reducing startup time.  This is similar to what Python does with its .pyc files, though it's harder to implement in more static language like Crack.

Over the past month, I've finally managed to solve the hardest problem of caching, which is that of cyclic ephemeral modules.  When generics are instantiated, an "ephemeral module" is created for the generic.  An ephemeral module is a module that does not directly correspond to its own source file.  So if module "x" defines generic A, when we instantiate A[int], we create the ephemeral module x.A[int].  When caching, this allows the same instantiation of a generic to be referenced by all of the modules that use it.  However, this feature also opens the possibility of dependency cycles: if A[int] uses symbols in x, and x uses A[int], then x depends on x.A[int] and x.A[int] also depends on x.  This breaks a basic assumption about how modules are loaded and has implications at many levels of the system.  Solving these problems was the last major hurdle to having a working caching system, and it looks like we're finally there.

There are a few things that are still broken in caching.  I'm currently working on fixing the "import from a non-extension shared library" feature.  I also might end up essentially rewriting the LLVM linker to correctly deal with referencing isomorphic types.  But most of the big plumbing changes are in, and I've merged the changes to the default branch.

So please try out the new caching functionality, it does noticeably speed up start-up time.  Feel free to post bugs for whatever problems you find.  Happy cracking!