» on boggle solver performance

Wednesday, 6 May A.D. 2009 @ 9:05 PM

The other day somebody wrote a blog post concerning the performance of Boggle solvers written in C and in Common Lisp. Humorously titled “I want to believe”, the article (predictably) made its way to reddit, where it was met with all manner of comments. The dominant theme was that the C implementation wasn't much to write home about, the Common Lisp code was possible worse, and it was likely that the performance of the Common Lisp version could be improved. Several people even outlined plausible ways in which this could be done. The author of the original article wrote a “yeah, the burden of proof is on you and you have nothingresponse to the comments generated.

This post is the response the author has been looking for. I will say up front that my modifications were nearly exactly what the commenters on Reddit suggested, so I'm not bringing much new to the table, just showing that they were right all along.

So, the numbers. My machine is a 1.6 GHz Core Duo with 2GB of RAM, running 32-bit Ubuntu. My Common Lisp implementation is SBCL, version 1.0.28.7, compiled with Unicode support and without thread support; the C compiler is “gcc (Ubuntu 4.3.2-1ubuntu12) 4.3.2”. Timings for the C program, on my machine:

@bishop:~/boggle-master/src/c$ time ./boggle --test --dict ../../dict/english_270k.txt < \
../../output/sample-1000x1000.txt > w ; tail -n 40 w
*** stack smashing detected ***: ./boggle terminated
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6(__fortify_fail+0x48)[0xb7e7d6d8]
/lib/tls/i686/cmov/libc.so.6(__fortify_fail+0x0)[0xb7e7d690]
./boggle[0x804b117]
======= Memory map: ========
08048000-0804c000 r-xp 00000000 08:04 1922146    /home/froydnj/boggle-master/src/c/boggle
0804c000-0804d000 r--p 00003000 08:04 1922146    /home/froydnj/boggle-master/src/c/boggle
0804d000-0804e000 rw-p 00004000 08:04 1922146    /home/froydnj/boggle-master/src/c/boggle
08cfb000-1b137000 rw-p 08cfb000 00:00 0          [heap]
b6562000-b7503000 rw-p b7d55000 00:00 0
b7c8d000-b7d83000 rw-p b7c8d000 00:00 0
b7d83000-b7edb000 r-xp 00000000 08:04 903780     /lib/tls/i686/cmov/libc-2.8.90.so
b7edb000-b7edd000 r--p 00158000 08:04 903780     /lib/tls/i686/cmov/libc-2.8.90.so
b7edd000-b7ede000 rw-p 0015a000 08:04 903780     /lib/tls/i686/cmov/libc-2.8.90.so
b7ede000-b7ee1000 rw-p b7ede000 00:00 0
b7ee7000-b7ef4000 r-xp 00000000 08:04 128135     /lib/libgcc_s.so.1
b7ef4000-b7ef5000 r--p 0000c000 08:04 128135     /lib/libgcc_s.so.1
b7ef5000-b7ef6000 rw-p 0000d000 08:04 128135     /lib/libgcc_s.so.1
b7ef6000-b7efa000 rw-p b7ef6000 00:00 0
b7efa000-b7f14000 r-xp 00000000 08:04 189007     /lib/ld-2.8.90.so
b7f14000-b7f15000 r-xp b7f14000 00:00 0          [vdso]
b7f15000-b7f16000 r--p 0001a000 08:04 189007     /lib/ld-2.8.90.so
b7f16000-b7f17000 rw-p 0001b000 08:04 189007     /lib/ld-2.8.90.so
bfa02000-bfa17000 rw-p bffeb000 00:00 0          [stack]
Aborted

real    0m57.012s
user    0m54.023s
sys     0m2.916s

Yes, that's out of the box, no modifications to the C program. Lovely, eh? The performance of the original Lisp program:

@bishop:~/boggle-master$ ./solve-boards.sh < output/sample-1000x1000.txt > w; tail -n 40 w
[...lots of output scrubbed...]
Evaluation took:
  248.506 seconds of real time
  248.331519 seconds of total run time (240.755046 user, 7.576473 system)
  [ Run times consist of 10.492 seconds GC time, and 237.840 seconds non-GC time. ]
  99.93% CPU
  413,140,979,480 processor cycles
  534,709,752 bytes consed

The performance of my optimized version:

@bishop:~/boggle-master$ ./solve-boards.sh < output/sample-1000x1000.txt > w; tail -n 40 w
[...lots of output scrubbed...]
Evaluation took:
  68.062 seconds of real time
  68.000250 seconds of total run time (64.704044 user, 3.296206 system)
  [ Run times consist of 1.908 seconds GC time, and 66.093 seconds non-GC time. ]
  99.91% CPU
  113,153,248,150 processor cycles
  50,750,728 bytes consed

Not quite four times faster and around 10x less consing. Not bad. Common Lisp is within roughly 20% of the peformance of C for this particular program. I'm fairly certain that all my changes to the Common Lisp code are “non-algorithmic”, so modifying the C implementation is a moot point. I'm also a little leery of modifying a program that falls over in the first place; I have better things to do with my time than debug C programs. At least when the O'Caml or Haskell people come around with performance benchmarks, their test programs work perfectly the first time.

The patches against git version 02a0a24257260c825f13da764fa54bdbc01dd14a (the project does not have a publicly clone-able URL) are located at http://method-combination.net/lisp/files/boggle-solver-patches/, if you want to see how things work. I admit to having a large number of non-performance enhancing cleanup patches in there; rather difficult to code if you're constantly fighting some non-idiomatic coding style.

Performance notes to take away from this lesson:

Finally, this post by Juho Snellman sums up benchmarking efforts for Common Lisp (or most any other high-level language, I'd imagine) quite well.