» Friday, 28 July A.D. 2006
high level languages
David Chisnall's recent article Debunking the Myth of High-level Languages, purports to show how programming in a low-level language can make the compiler's job harder. I suppose he also implies that programming in a high-level language can make the compiler's job easier--an assertion which seems dubious at best. Let's take a look at some of his statements to see if he defends his point well.
His first point compares C and Fortran in their suitability for dealing with vectors, specifically the ease of mapping operations on vectors to the SIMD units found on many modern microprocessors. (Aside: where is it written that all computer languages developed prior to 1970 or so are required to have their names in ALL CAPS? It's Lisp, not LISP; it's Fortran, not FORTRAN.) As he points out, C has no vector datatype per se, so you have to do vector operations with arrays, which C doesn't really have either, so you have to fake it with loops and pointers. To add two vectors, then, requires a loop:
void
add_double_vectors(double *a, double *b, double *c, int n)
{
int i;
for (i = 0; i < n; ++i) {
c[i] = a[i] + b[i];
}
}Now, to vectorize this loop, Chisnall claims the compiler has to detect that there are no dependences between loop iterations and that the operations done in the loop can be mapped to vector operations. The first part is “non-trivial,” which is correct; volumes have been written about this subject. But then he goes on to talk about Fortran, which has real arrays and therefore real vectors, wherein he claims that all the compiler has to do for Fortran is “split the vectors into chunks of the correct length.”
Let's look at some Fortran code.
subroutine foo(a, n) double a dimension a(n) a(2:n) = a(1:n-1) end subroutine foo
This routine is a simple copy loop; it would be done with memmove in C. Please note that the semantics of memmove and the Fortran written above are equivalent: conceptually, the reads are all done before the writes. Since we're using Fortran here, according to Chisnall, all we have to do is split a into chunks of the correct length. We'll have to wrap a loop around the chunked operation(s), too.
subroutine foo(a, n)
double a
dimension a(n)
! assume we don't need loop epilogues to keep things simple
do i = 1, n-1, 2
! use [] to denote actual simd operations
a[i+1:i+2] = a[i:i+1]
end do
end subroutine fooSimple enough, eh? But this version is not semantically equal to the original subroutine. On the first iteration, we read a(1) and a(2), writing them into a(2) and a(3), respectively. On the second iteration, we read a(3) and a(4)...oops! We already overwrote a(3) on the previous iteration, so the original value is no longer available. We have incorrectly vectorized the original code. (I leave it as an exercise for the reader to show how the code can be vectorized.)
This example exists to show one simple fact: you need dependence analysis in Fortran just as much as you need it in C if you're going to do a proper job of vectorization. In this respect, Fortran is no easier to vectorize than C. However, it is true that dependence analysis in Fortran is generally considered to be easier than in C, because in Fortran, it is illegal for the formal parameters of a function to be aliased to one another. That is, in the above C function add_double_vectors, a, b, and c could address overlapping chunks of memory. If you translated the example to Fortran, the language spec says the compiler can assume that the parameters address different, non-overlapping areas of memory.
Determining that the pointers don't address overlapping chunks of memory is known as alias analysis and is extremely difficult for languages like C. And the accuracy of dependence analysis in C relies on the quality of the alias analysis that you do. If you don't have good information about the dependences between references, then key optimizations that reorder loads and stores, such as software pipelining or loop unrolling (with associated instruction scheduling) must be presumed illegal. The Fortran semantics make alias analysis (and thereby dependence analysis) much easier. (New features in the C standard such as the restrict keyword are there to give things Fortran-like semantics and therefore make the compiler's job easier.)
Remember, though, that a C compiler could do an excellent job of dependence and alias analysis and be as fast as Fortran, vectorization and all. The Fortran specification simply makes the compiler writer's job easier with respect to alias analysis, not with respect to vectorization. Having true vectors and arrays in your programming language, then, doesn't make advanced techniques like vectorization any simpler. (Recent Fortran specs have started to include pointers, but if you examine their semantics, they are carefully restricted to avoid the problems that C pointers introduce.)
The next heading in the article deals with “Virtual Machines and Other Overhead.” The author properly separates the specification of Java (bytecodes) from the actual implementation of the language (native code just-in-time compiler). Not keeping this distinction in mind is one of my pet peeves when it comes to discussing languages. Granted, it can be difficult to separate the two, since it's easy to say, “Well, this language isn't slow; the compilers for it just aren't smart enough.” Which is easy to say, but not so convincing when you're trying to explain how your simple 10-line program runs five times slower in your favorite high-level programming language than C. And the person whom you're trying to convince is unlikely to wait around for you to make the compiler smart enough. Nevertheless, it's an important distinction to be aware of; if you're going to compare the performance of two languages, what you're really doing is comparing two implementations.
Seeing that the author is mindful of this difference makes it all the more frustrating that he doesn't keep it in mind during the rest of the essay. Take for instance the statement, “Due to the way C works, it's impossible for the compiler to inline a function defined in another source file.” In a narrow sense, this sentence is true: the compiler can't do anything about this. However, when you consider “compiler” as shorthand for “the entire toolchain that takes your source code and converts it into a binary,” a different picture emerges. (Chisnall implicitly assumes a similar broad view of the world when touting the advantages of high level languages.) A sufficiently smart linker can inline the function at link time. Doing this link-time optimization is not conceptually more difficult than doing the equivalent optimization in a just-in-time compiler and it's just as effective. Is it hard to write a linker that's sufficiently smart? You betcha. Does the C standard restrict you from doing so? No.
The author also botches the specification/implementation distinction when talking about array bounds checking in Java. The author claims that arrays are a very low-level concept and that Java attempts to endow them with “high-level semantics” by specifying that array bounds checking be done on array accesses. I will leave the debate about whether arrays are high-level or low-level constructs aside.
Note, though, that arrays in Java and similar languages are really nothing more than a specialization of maps or associative arrays that only permit small integers as keys. Seems pretty high-level to me. Also note that when you want to use an array, using a tree or a hash table, as Chisnall suggests, would almost certainly be slower no matter what fancy runtime tricks you did. Finally, I find it amusing that the author lauds (high level!) support for array and vectors in Fortran and bashes the concept of (low level!) arrays in Java later on.
What I will focus on is the problem of array bounds checking. Blockquoth the author:
Unlike arrays in C, arrays in Java are bound-checked. Every time you access an element in the array, you have to check that the index is in-bounds, and then perform the access. A slightly higher-level abstraction could allow much more of this bounds-checking to be done at compile time. Some research at IBM uses set theoretical concepts to do this, and provides a significant performance boost. The problem, isn't that Java is a high-level language; it's that it chose the wrong abstractions.
This paragraph is a perfect example of the specification/implementation distinction. The Java language specification does say that array accesses are checked at run time. However, if the compiler can prove at compile time that certain accesses will never be out of bounds, then you don't have to check the bounds at runtime. Ever. (I am assuming a native code compiler of some flavor here.)
This fact doesn't seem to be common knowledge, so I'll say it again for emphasis: just because the language specifies array bounds checking doesn't mean that the compiler always needs to insert the bounds checking code for each array access. The techniques for doing this have been around since at least the early 80's and the three scalar compilers textbooks with which I am familiar (Appel's Modern Compiler Implementation, Muchnick's Advanced Compiler Implementation and Cooper and Torczon's Engineering a Compiler) all reference work in this area and/or spend several pages talking about how it's done. Even if you have to check the bounds at runtime (or you subscribe to a strict reading of the Java specification), you can hoist the checking code out of loops, which drastically reduces the cost of doing the checking.
The funny part is that I know IBM's research Java compilers do the necessary legwork to eliminate array bounds checks when possible and I would not be surprised if their production Java implementations do so as well. I don't know whether the “research at IBM” refers to techniques for a different language; it may very well refer to techniques for eliminating array bounds checks in Java, although the author certainly doesn't make it sound like that's the case. Is it hard to eliminate these bounds checks in Java and similar languages? Sure, it requires a decent amount of thought and some non-trivial symbolic analysis of loop bounds and array accesses. Is it forbidden? No. Do you need a “slightly higher-level abstraction” to do so? No.
Finally, we come to the author's closing statements.
As compilers improved, one fact became clear; the more information you can give to your optimizer, the better the job it can do. When you program in a low-level language, you throw away a lot of the semantics before you get to the compilation stage, making it much harder for the compiler to do its job.
A strong start to the closing paragraph and one I overwhelmingly agree with: the more the compiler knows about your program, the better. This idea drives a lot of compiler analyses, such as dependence analysis, alias analysis, and the analysis to eliminate redundant bounds checks. However, the idea that you throw away information about your code in a low-level language seems laughable. In C, if x and y are both declared int, it's pretty obvious what x + y does. Compare the Lisp expression (+ x y) (sans declarations, of course). How do you compile this? Well, as opposed to the C code, this expression will probably be compiled to a function call (!), since the compiler doesn't know the types of the arguments. (Assuming that the compiler has not done some clever type inference or something.) You have thrown away a lot of information in the high level language, not the other way around.
For the most part, a really good high level language compiler (we'll also consider the language runtime in here as well, since Chisnall talks about the runtime doing clever optimizations as our programs run) will be at least as complicated as a low level language compiler, since a lot of the code generation and optimization issues are the same, but the high level language has more things that need to be optimized.
Don't get me wrong: I am a huge fan of high level programming languages. One of the reasons I work on SBCL is because I believe that implementations of high level programming languages can produce code that competes with code from C and its ilk. The increased expressiveness and elimination of low-level bookkeeping, not to mention safety features like array bounds checks, are huge wins in high level programming languages. But to claim that such a language makes the compiler's job easier makes me think that the author of such an assertion hasn't actually looked at implementing a high level language before.
posted by Nate @ 11:05AM