Sunday, March 13, 2011

Platform Dependent Versus Universal Numeric Types

Implementing numeric types in a programming language is surprisingly difficult.  At this time, Crack supports Universal Numeric Types (UNTs - for example int32, uint64, byte) and Platform Dependant Numeric Types (PDNTs - int, uint, float).  The PDNT names are just aliases for the corresponding UNTs in the target compiler - so for example, if the C++ compiler you use to build crack has a 32 bit int, Crack's "int" will be an alias for "int32".  Additionally, the philosophy behind implicit type conversions is to allow them only if they do not result in loss of precision.

This overall approach is not without its problems:

  • It makes Crack code platform-dependent because there are expressions that will work on one platform but will result in a compile-time error on another platform.  For example, "int32 i = int(v);" works on platforms with 32 bit integers, but breaks on platforms with 64 bit integers.
  • You end up writing a lot of explicit type conversions in places where you really don't care that much (like when using a signed integer value for a function argument of type "uint").  For a scripting language valuing terse syntax, this is kind of lame.
So after some discussion on IRC, we've decided to change our approach a little bit.  The general philosophy now is that you should use a PDNT in situations where you care about performance or interoperability with C/C++ code and you should use a UNT in situations where you care about precision.  The manifestations of this decision are:

  • PDNTs will be promiscuous: any numeric type will implicitly convert to any PDNT type.  So "int i = float64(v);" will be perfectly legal on any platform.
  • There will be max-size, min-size assumptions about PDNTs.  In particular, they will all be at least 32 bits but no more than 64 bits in size.  Like everything else in the language, these assumptions are subject to change across major versions of the language.
  • UNTs will continue to apply the strict conversion rules.  However, because of the min-size/max-size assumptions, certain conversions from PDNTs will always be legal.  An example of this is "int64 i = int(v)".
There's still a platform dependency problem here because expressions like "int i = int64(v);" will vary in behavior at runtime depending on the size of an integer on the platform.  So we've essentially converted a compile-time portability issue to a runtime portability issue :-/.

To mitigate this effect, there will be a warning flag that allows you to identify the places where you could potentially lose precision with something like this.  We are also considering allowing the generation of a runtime check that would throw an exception if specific values will be truncated.

This change will probably go into the language in Crack 0.5.  For anyone interested, the formal proposal is at http://code.google.com/p/crack-language/wiki/PlatformDependentNumericTypes

Tuesday, March 8, 2011

AOT for Crack 0.4

One of the trademarks of a scripting language is of course: instant results. You edit your script and execute it immediately, skipping the traditional compile step with its associated build files, etc.

True to scripting language ideals, Crack executes scripts immediately using LLVM's excellent Just In Time (JIT) compiler to handle the heavywork of converting the compiled crack code to native instructions so that it can run both immediately and as fast as possible.


weyrick@mozek:~/crack$ cat hello.crk 
import crack.io cout;
cout `hello JIT\n`;
weyrick@mozek:~/crack$ crack hello.crk 
hello JIT


But what if you find yourself wishing for a native binary, just this once? You know, something like this instead:


weyrick@mozek:~/crack$ crackc hello.crk 
weyrick@mozek:~/crack$ ./hello 
hello JIT


What's a scripter to do? Well with Crack at least, you'll be in luck. Crack 0.4 will include an Ahead Of Time (AOT) mode which will create native binaries just like the example above. All imported crack modules will be included in the binary (something like a static link of the crack modules, although the binary itself is not statically linked). The annotation system (and macros) work for AOT binaries. We also have plans for full DWARF debugging information, which will allow source level debugging with tools like gdb.

Crack 0.4 is tentatively scheduled for release about the same time as LLVM 2.9 (beginning of April).