Thursday, August 7, 2014

Fast

A long-standing goal of the Crack language has been to produce a system that is fast.  Specifically, we wanted to minimize the code-test cycle.  The language has thus far fallen short in this respect.  Startup times for anything less than very small programs have tended to be over five seconds.

We were never able to obtain any solid conclusions from profiling the code, but we knew there was a lot to be gained from caching.  So over two years ago we started work on caching.  Needless to say, this was a harder problem than we imagined.

Caching is simply storing the artifacts of a successful compile so that we don't have to produce them again.  In theory, the work involved in fully parsing and validation the source as well as generation and optimization of LLVM IR code is greater than the effort involved in simple deserialization of these structures.

Getting it to work right has been very difficult, largely because of a number of shortcuts and bad design choices that I made while producing the original compiler.  My first reaction to this was, of course, to try to adapt caching to the existing code.  However, when these attempts failed I did finally do the right thing and make the big changes to the design that were necessary to get caching to work.

And now, finally, it does.  I can run code, make changes, and run again with the most complicated programs that I have and everything just works.  The modules for the changed code and all of their dependencies get rebuilt, everything else gets loaded from the cache.  Sweet!  (I'm sure there are still some bugs in there somewhere...)

So trying this out against the most complicated program that we have -- the "wurld" game engine demo in 'examples' -- without any cached files, takes 12 seconds to build on my laptop.  After caching, it takes ... 9 seconds.

9 seconds?  Really?  A 25% reduction after all of that work?  That's depressing.  So I did some more benchmarking.

One of the results of the restructuring work required for caching is that all of the jitting of a cached program now happens at the end, just before the program is run and after all of the modules are loaded from the cache.  Originally we jitted every module when we finished loading it.  The new structure makes it much easier to see how much time we're spending on jitting versus everything else.

It turns out, we were spending most of that time, 7-8 seconds of it, on jitting.

We were storing the addresses of all of our functions before running everything.  We need function addresses in order to generate source file and function names when reporting stack traces.  But in the course of looking all of these addresses up, we were also generating all of the functions in the program, whether we needed them or not.

LLVM's ExecutionEngine lets you watch for notifications for a number of different internal events.  In particular, it lets you get a notification when a function gets jitted. So I replaced the registration loop with a notification callback.  Now instead of function registration driving jitting, jitting drives function registration, and only for the functions that are used by the program.  This got us down to about 5 seconds total runtime.

The ExecutionEngine also lets you enable "lazy jitting."  By default, when you request a function address, that function and all of its dependencies get jitted.  By contrast, lazy jitting defers the jitting of a function until it is actually called.  Enabling this feature brought the total run time down to under 2.5 seconds, because the hacked version of "wurld" I was using for benchmarking was loading everything and doing initialization but then immediately terminating, so much of the code is never called anyway.

I'm not sure if there's anything more we can do to improve on this, but 2.5 seconds is well in the realm of tolerable.

It's somewhat embarrassing that the huge effort of caching yielded only a small amount of the initial gains while an hour or two of investigation and simple tweaking had such a big impact.  But on the other hand, the simple tweaking couldn't have worked without some of the changes we put in for caching.  And as it turns out, post-tweaks, caching ends up saving us 60% of the startup time, which is huge.

Going forward, after we release 1.0, I still want to start to experiment with alternate backends.  libjit, in particular, looks like a promising alternative to LLVM for our JIT backend.  But as it stands, in terms of performance, I think we're in pretty good shape for a 1.0 release.

Tuesday, February 18, 2014

Crack 0.9 Released

We are pleased to announce the release of Crack 0.9.  The only major user-visible change in this release is the introduction of support for threads.  We've also done a great deal of work towards caching, but unfortunately this feature is still not ready for general use.

Other, lesser features include an IRC module (crack.net.irc) and a protobuf module (crack.protobuf).

Please note that the new tarball is not available on the googlecode download page: new uploads have been disabled by Google.  Our downloads are now available at http://crack-lang.org/downloads/

Enjoy!

Thursday, January 23, 2014

Threads

I'm happy to say that Crack now supports threading.  Thread support required two basic ingredients: a pthread extension (wrapped in a high-level Crack module) and the atomic_int type for reference counting.

Normal integer operations are not sufficient to support reference counting in a multithreaded environment.  Compilers and CPUs perform all kinds of tricks in order to improve performance, and this can result in two different threads seeing two different values for the same variable at any given point in time.  Atomic operations solve this problem.  They prevent the compiler from doing any special tricks to elide or reorder the operation, and require the CPU to lock the bus and synchronize the operation across all views of memory.

To expose atomic operations in the language, I've created an atomic_int type which supports a limited set of integer operations (mainly += and -=) and synchronized conversion to plain integer types.  The reference count of crack.lang.Object is now of type atomic_int, giving us atomic reference counts.

For actual thread creation and more general synchronization, I've implemented the crack.threads module.  This provides Crack classes encapsulating functions from a new pthread extension, so we have Thread, Mutex and Condition classes.  There is also a higher level Queue generic which provides one of the most common modes of inter-thread communication.

The new threading code is currently available only from the repository, but we're about due for a release, so it should be available soon in version 0.9 of the crack SDK.

Tuesday, October 8, 2013

Crack 0.8 Released

We're pleased to announce the release of Crack 0.8.  The new release includes
a number of bug-fixes and code cleanups as well as a number of new extension
modules, and brings us up to date with LLVM 3.3.

It's been almost a year since our last release, which is far longer than usual
and far longer than what we'd like.  The reason for the delay has been the
ongoing work on module caching.  Module caching provides a substantial
speed-up of program loading by allowing the executor to reuse the
artifacts of previous compiles instead of recompiling everything from source.
Unfortunately, it has proven to be very difficult to get this feature working
perfectly, and it still doesn't work quite right for complicated generics. 
Users are encouraged to experiment with it by adding the -C option.

The complete release notes for 0.8 follows:

    -   Lots of bug fixes and small features.
    -   Lots of refactors and code cleanups.
    -   Implemented most of the code for caching (which is still way too buggy
        for anything other than experimentation).
    -   Added flag driven tracing facility.
    -   Lots of new modules and extension modules:
        -   OpenSSL
        -   libpng
        -   alsa, midi, jack, and fluidsynth
        -   curl
        -   SHA1 digests
        -   base64 encoding
        -   Mongo DB and LMDB
        -   netCDF
        -   Mersenne Twister PRNG.
    -   enhanced the test framework to support composite tests.
    -   added the Wurld OpenGL example program.
    -   Ported to LLVM 3.3
    -   Changed annotation-in-generic semantics.

Tuesday, September 10, 2013

Annotation Semantics Changing for Generics

Annotations are a useful feature that essentially allow you to program the Crack compiler at compile time.  They provide the functionality of macros in C plus a whole lot more - you can actually use them to implement your own domain specific languages, if you like.

Annotations are stored in a "compile time namespace." This is separate from the normal namespace that functions, variables and classes live in, but it follows the same resolution rules: annotations are scoped to the lexical scope in which they are defined.

Until now, annotations used in generics were no different from annotations in any other context.  When the generic was instantiated with new parameters, it was essentially replayed into the same compile time context (including the same compile-time namespace).

Unfortunately, this approach doesn't work with caching.  When a generic is cached, the original compile context is gone.  We have to restore parts of that context (notably, the normal namespace) but the contents of the compile time namespace cannot be persisted or restored: it can contain arbitrary objects created by the annotation system, and, in fact, it routinely contains pointers to primitive functions.

The only way to restore the compile namespace of a generic is to replay the original source file it was defined in.  This would be contrary to the purpose of caching, and it would also require us to create a dummy environment so as not to actually regenerate existing code and representational datastructures.

Rather than try to go down this path and further delay the 1.0 release, I've decided to impose some limitations on the way that annotations can be used with generics.  This was a painful decision: I don't like having non-uniform semantics in the language.  Annotations should work the same for generics as for any other code in a module, but something had to give and after discussing it with the team I feel this is the best compromise.

As of this morning's check-in, generics now preserve only imported annotations.  So for example, the following code will no longer compile:

@import crack.ann define;
@define my_func() { void f() {} }
class A[T] {  @my_func }

It could be rewritten like this:

@import crack.ann define;
class A[T] {
  @define my_func() { void f() {} }
  @my_func
}

Note that the symbols defined with @import can be reused -- it's possible for us to replay imports safely.  But macros defined with @define must be defined (or redefined) within the scope of the generic itself.

       

Monday, May 13, 2013

LLVM 3.2 and Caching

We've been relatively quiet for a long time (since last October) so I just wanted to post an update of some of the big changes that have been checked in last week.

First of all, the default branch now uses LLVM 3.2.  Arno Rehn did the port on a branch some time ago, but I had caching work happening on another branch and didn't have the bandwidth to do the merge at the time.  The introduction of LLVM 3.2 doesn't change very much, but it does keep us in sync with LLVM and the major distros.  It will also help us when we port to 3.3, which currently has a release candidate in testing.

The most exciting thing happening is that after a period of over 6 months of development, caching finally (mostly) works.  That is to say, we can run all but one of the non-bootstrapped tests with the -C option (which enables module caching) before and after persistence.

For the uninitiated, module caching has been an important planned feature for Crack, so much so that we've made it an essential feature for the 1.0 release.  With caching enabled, when we compile a module we store the products of the compile (meta-data and an LLVM bit-code file) to a persistent cache, so the next time you run something that needs that module it gets loaded from the cache, saving the executor from having to compile it and substantially reducing startup time.  This is similar to what Python does with its .pyc files, though it's harder to implement in more static language like Crack.

Over the past month, I've finally managed to solve the hardest problem of caching, which is that of cyclic ephemeral modules.  When generics are instantiated, an "ephemeral module" is created for the generic.  An ephemeral module is a module that does not directly correspond to its own source file.  So if module "x" defines generic A, when we instantiate A[int], we create the ephemeral module x.A[int].  When caching, this allows the same instantiation of a generic to be referenced by all of the modules that use it.  However, this feature also opens the possibility of dependency cycles: if A[int] uses symbols in x, and x uses A[int], then x depends on x.A[int] and x.A[int] also depends on x.  This breaks a basic assumption about how modules are loaded and has implications at many levels of the system.  Solving these problems was the last major hurdle to having a working caching system, and it looks like we're finally there.

There are a few things that are still broken in caching.  I'm currently working on fixing the "import from a non-extension shared library" feature.  I also might end up essentially rewriting the LLVM linker to correctly deal with referencing isomorphic types.  But most of the big plumbing changes are in, and I've merged the changes to the default branch.

So please try out the new caching functionality, it does noticeably speed up start-up time.  Feel free to post bugs for whatever problems you find.  Happy cracking!

Thursday, October 11, 2012

Crack 0.7.1 released

I'm happy to announce the release of Crack 0.7.1.  This is a minor bugfix release containing the following:
  • We updated the manual for version 0.7.
  • Fixed crack.process to actually use the pipe flags we're passing in.
  • Fixed filehandle cleanup on directory iterators.
  • Fixed a few internal naming and ownership bugs.
In the interest of keeping the 0.7 branch as a stable development platform, I'm going to try to continue to merge bug fixes into the branch and release from it a little more frequently than we have been for the major releases.  We're going to continue to shoot for a 1.0 release around the end of the year.