A million lines of Lisp

People rave about Lisp, and one reason why is because you can use macros to make new kinds of abstractions. Sure, in other languages you can write functions to define new procedures, but in Lisp you can write macros to define new control flow constructs (for starters). You can write functions that write functions and programs that write programs, a power which is pretty much unimaginable in most other languages. In other languages you can only stack your abstractions so high, but in Lisp, the sky's the limit. Because of macros, and others of its features, Lisp is sometimes called the programmable programming language.

I suspect that because of macros, Lisp programs grow in a way that is qualitatively different from programs in other languages. With every line of additional code in Lisp, you can do more, more quickly; in ordinary languages, the return on additional lines of code is constant if not decreasing. Various people have tried to estimate the advantage that Lisp has over C++ (in the ratio of the number of lines needed to do a particular task). Whatever the figure, I believe that Lisp's advantage should increase for larger and larger programs.

In pretty much any language, a program with a million lines of code is regarded with considerable respect. So if Lisp can do so much more with so much less, what would a million line Lisp program look like? Could we even imagine it? (Would it be sentient, or what?)

It turns that such a program exists, and is widely available.

As of right now (23 June 2008), a checkout of GNU Emacs has 1,112,341 lines of Lisp as well as 346,822 lines of C.

It is somewhat astonishing that Emacs, a program with more than 30,000 built-in functions— in a single namespace&mdash can keep from falling apart under its own weight! In Emacs, activating minor or major modes, or setting variables, can have effects on how particular buffers look, what is done when Emacs saves a file, and how different keys (or other events) are interpreted, among other things. Unlike operating systems, which are broken down into programs (which only talk to each other in limited ways), Emacs has many parts which all have the potential to interact with each other.

In some cases it is necessary to add special-case code to say what should happen for particular pairs of interacting features, but a robust system should be able to make the right thing happen most of the time even for features which are totally oblivious of each other. One way that Emacs tames this complexity is with the use of hooks.

Hooks (sometimes referred to elsewhere as listeners) are a way of telling Emacs to run a particular piece of code whenever a certain situation happens (e.g. when a file is opened). Many Emacs modules add hooks in order to do their thing. For example, VC checks every time I open a file whether it is in a version-controlled directory, so that it can report the file's version and status. Another example: Emacs activates follow-mode— if I have set a particular option— whenever I open a file. The variable find-file-hook is a list containing, among others, two functions responsible for performing the tasks described above; each function in that list is executed whenever Emacs opens a new file. I were to add a function of my own to one of these hooks, then Emacs would happily run it, too.

As an alternative, you might consider implementing such an extensible system using polymorphism and inheritance: e.g. to add behaviors, you would create a subclass of a particular class which implemented some method differently. The chief advantage of using hooks over that approach is that with hooks, changing behavior is very obviously additive: if you can call for either A, B, or C to happen in a particular situation, you can also easily call for any combination of those things, but A, B, and C can very well be implemented completely independently— a key requirement for systems of a million lines or more.