That gooey stuff in the middle
It’s common in the computer field to get two levels of answer to a question about how something works. One level is the extreme close-up, where the explainer starts talking about ones and zeroes, logic gates, instruction pointers and memory registers. The other level is the ultimately abstracted, either the pure pseudo-code of algorithms and theory or the follow-these-steps-and-don’t-ask-questions how-to guide. I find both of these answers ultimately unsatisfying, because I still want to know what’s in the middle. I can boil pseudo-code down into C if I need to (now), but even C is still abstracted to a pretty high degree. As an undergraduate I took a hardware-architecture course which explained the low-level stuff, and I believe I wrote assembly code to print out a perpetual calendar, but that’s not much of an answer, either; it’s like waving your hands around in a Home Depot and saying, “Yep, everything you need to build a house, it’s all in here.” You still don’t know how to build a house; you just know where all the tools are.
The Parallel Computing course I took this past semester was a bit like that. The textbook spends some time early in the course explaining some of the architecture issues, essentially pointing out that splitting up a program among multiple processes usually also requires the processes to communicate with each other, and that there are a lot of different approaches to this problem. Then there was this quick hand-wavy transition where it was announced that the MPI library would allow us to write programs which handle all this interprocess communication, and then poof, no more discussion, just MPI functions.
I suppose this is fine, if you’re a programmer, but two of my current projects for MPOW involve installing various permutations of the MPI libraries (it turns out that you can pick your MPI—we’re working mostly with LAM-MPI but that’s becoming OpenMPI, which is also what Xgrid plays most nicely with.)
(An aside for non-programmers: “Libraries” of code are files of generic functions which programmers can call in order to avoid reprogramming a certain operation. If you “include” a library in your program, you gain access to all those functions. For example, I could rewrite a function to calculate the square root of a number, but it’s about a thousand times easier to include the C math library and use the sqrt() function it provides.
The MPI libraries, then, are “simply” a large quantity of pre-written code which handles all the interprocess communications issues of parallel computing. There are multiple versions of the MPI libraries because MPI itself (which stands for “Message Passing Interface,” by the way,) is only a standard, and there are many differing ways to write code which meets the standard.)
More so than most other libraries, MPI has to wrangle with a lot of system-specific issues. How on earth, for example, does the same MPI library deal with both our research cluster and an Xgrid cluster? It seems like the development team is actually grappling with those questions, judging from the mailing-list archives I find on my Xgrid research searches.
I feel like there’s a lot of cool stuff going on in that gap between the close-up view and the big abstraction, and it makes me curious.