» on boggle solver performance
The other day somebody wrote a blog post concerning the performance of Boggle solvers written in C and in Common Lisp. Humorously titled “I want to believe”, the article (predictably) made its way to reddit, where it was met with all manner of comments. The dominant theme was that the C implementation wasn't much to write home about, the Common Lisp code was possible worse, and it was likely that the performance of the Common Lisp version could be improved. Several people even outlined plausible ways in which this could be done. The author of the original article wrote a “yeah, the burden of proof is on you and you have nothing” response to the comments generated.
This post is the response the author has been looking for. I will say up front that my modifications were nearly exactly what the commenters on Reddit suggested, so I'm not bringing much new to the table, just showing that they were right all along.
So, the numbers. My machine is a 1.6 GHz Core Duo with 2GB of RAM, running 32-bit Ubuntu. My Common Lisp implementation is SBCL, version 220.127.116.11, compiled with Unicode support and without thread support; the C compiler is “gcc (Ubuntu 4.3.2-1ubuntu12) 4.3.2”. Timings for the C program, on my machine:
@bishop:~/boggle-master/src/c$ time ./boggle --test --dict ../../dict/english_270k.txt < \ ../../output/sample-1000x1000.txt > w ; tail -n 40 w *** stack smashing detected ***: ./boggle terminated ======= Backtrace: ========= /lib/tls/i686/cmov/libc.so.6(__fortify_fail+0x48)[0xb7e7d6d8] /lib/tls/i686/cmov/libc.so.6(__fortify_fail+0x0)[0xb7e7d690] ./boggle[0x804b117] ======= Memory map: ======== 08048000-0804c000 r-xp 00000000 08:04 1922146 /home/froydnj/boggle-master/src/c/boggle 0804c000-0804d000 r--p 00003000 08:04 1922146 /home/froydnj/boggle-master/src/c/boggle 0804d000-0804e000 rw-p 00004000 08:04 1922146 /home/froydnj/boggle-master/src/c/boggle 08cfb000-1b137000 rw-p 08cfb000 00:00 0 [heap] b6562000-b7503000 rw-p b7d55000 00:00 0 b7c8d000-b7d83000 rw-p b7c8d000 00:00 0 b7d83000-b7edb000 r-xp 00000000 08:04 903780 /lib/tls/i686/cmov/libc-2.8.90.so b7edb000-b7edd000 r--p 00158000 08:04 903780 /lib/tls/i686/cmov/libc-2.8.90.so b7edd000-b7ede000 rw-p 0015a000 08:04 903780 /lib/tls/i686/cmov/libc-2.8.90.so b7ede000-b7ee1000 rw-p b7ede000 00:00 0 b7ee7000-b7ef4000 r-xp 00000000 08:04 128135 /lib/libgcc_s.so.1 b7ef4000-b7ef5000 r--p 0000c000 08:04 128135 /lib/libgcc_s.so.1 b7ef5000-b7ef6000 rw-p 0000d000 08:04 128135 /lib/libgcc_s.so.1 b7ef6000-b7efa000 rw-p b7ef6000 00:00 0 b7efa000-b7f14000 r-xp 00000000 08:04 189007 /lib/ld-2.8.90.so b7f14000-b7f15000 r-xp b7f14000 00:00 0 [vdso] b7f15000-b7f16000 r--p 0001a000 08:04 189007 /lib/ld-2.8.90.so b7f16000-b7f17000 rw-p 0001b000 08:04 189007 /lib/ld-2.8.90.so bfa02000-bfa17000 rw-p bffeb000 00:00 0 [stack] Aborted real 0m57.012s user 0m54.023s sys 0m2.916s
Yes, that's out of the box, no modifications to the C program. Lovely, eh? The performance of the original Lisp program:
@bishop:~/boggle-master$ ./solve-boards.sh < output/sample-1000x1000.txt > w; tail -n 40 w [...lots of output scrubbed...] Evaluation took: 248.506 seconds of real time 248.331519 seconds of total run time (240.755046 user, 7.576473 system) [ Run times consist of 10.492 seconds GC time, and 237.840 seconds non-GC time. ] 99.93% CPU 413,140,979,480 processor cycles 534,709,752 bytes consed
The performance of my optimized version:
@bishop:~/boggle-master$ ./solve-boards.sh < output/sample-1000x1000.txt > w; tail -n 40 w [...lots of output scrubbed...] Evaluation took: 68.062 seconds of real time 68.000250 seconds of total run time (64.704044 user, 3.296206 system) [ Run times consist of 1.908 seconds GC time, and 66.093 seconds non-GC time. ] 99.91% CPU 113,153,248,150 processor cycles 50,750,728 bytes consed
Not quite four times faster and around 10x less consing. Not bad. Common Lisp is within roughly 20% of the peformance of C for this particular program. I'm fairly certain that all my changes to the Common Lisp code are “non-algorithmic”, so modifying the C implementation is a moot point. I'm also a little leery of modifying a program that falls over in the first place; I have better things to do with my time than debug C programs. At least when the O'Caml or Haskell people come around with performance benchmarks, their test programs work perfectly the first time.
The patches against git version 02a0a24257260c825f13da764fa54bdbc01dd14a (the project does not have a publicly clone-able URL) are located at http://method-combination.net/lisp/files/boggle-solver-patches/, if you want to see how things work. I admit to having a large number of non-performance enhancing cleanup patches in there; rather difficult to code if you're constantly fighting some non-idiomatic coding style.
Performance notes to take away from this lesson:
If you want to write fast Common Lisp code, you absolutely need to understand how array types work in Common Lisp. If you aren't declaring things as simple-array or simple-vector, you're probably doing something wrong. And that goes doubly so if you are doing anything involving strings on an implementation that supports Unicode (where base-char and character are probably different). You also need to have some idea of what upgraded-array-element-type does and why saying something is of type (VECTOR (VECTOR STRING *) *) is probably not going to perform the type-checking/inference that you think it might.
Likewise for declaring numeric types. If you have variables that you aren't declaring as fixnum (or subtypes thereof), single-float, double-float, or complex variants of the float types, then you're probably doing something wrong. That is, declaring things as integer, or worse, number is Not Helpful for performance.
Listen to the compiler. The compiler tells you a lot of helpful stuff. The original code of the benchmark had some #+sbcl bits scattered about protecting slightly stricter type declarations than their #+ccl counterparts. A comment like “ccl doesn't barf on null pointers” usually lurked close by. First of all, “null pointer” is a category error in Common Lisp. Second of all, SBCL complaining about things (whether in the compiler or a runtime with (SAFETY 0))...was correct. (I've had the same sort of experience with Lispworks.) If the compiler complains at you, fix the problem. Only turn things off or tune them out if you're absolutely sure you understand why the compiler is whinging at you and if there's nothing you can do about it.
format ~A is useful, but expensive. One of the biggest wins was converting the printing of the words to use write-string. Do use FORMAT for your everyday printing needs, but if you need some performance, look at Common Lisp's other IO functions, too.
Sprinkle optimization declarations judiciously. The original code contained a global (declaim (optimize (speed 3) (space 0) (safety 0) (debug 0) (compilation-speed 0))). But most of the runtime (95% in the final optimized version) was spent in two functions. Putting the declarations in just those functions would have achieved the same effect. As an added bonus, the compiler will probably just tell you about ways those particular functions can be improved, rather than pointing out the hundred different ways some utility function you barely use can be improved.
Don't compile with (SAFETY 0). Just don't. You can get surprisingly far, performance-wise, with type-checking and array bounds checking turned on. If you feel like you must live dangerously, make absolutely sure that everything works and that you are not lying to the compiler about types (see above note about listening to the compiler) before flipping the switch.
Make your code runnable from the REPL. Life is too short to spend constantly waiting for your shell script to compile and/or load all your files before it gets to the business of running your program. You should be able to invoke your benchmark with a single form typed in at the REPL.
Don't needlessly allocate things inside loops. This should go without saying.
Finally, this post by Juho Snellman sums up benchmarking efforts for Common Lisp (or most any other high-level language, I'd imagine) quite well.