» Saturday, 29 September A.D. 2007

wide finder in lisp

Tim Bray has done a series of posts recently about The Wide Finder Project--take a short little Ruby program and see how well it does in a “language of the future” like Erlang with easy parallel processing and whatnot. I was curious what the function would look like in Common Lisp and how fast it would run compared to the Ruby:

(defun run (&rest logs)
  (let ((counts (make-hash-table :test #'equal)))
    (dolist (filename logs)
      (with-open-file (stream filename :direction :input
                              :external-format :latin-1)
        (loop for line = (read-line stream nil stream)
           until (eq line stream)
           do (cl-ppcre:register-groups-bind (match)
                  ("GET /ongoing/When/\\d{3}x/(\\d{4}/\\d{2}/\\d{2}/[^ .]+) " line)
                (incf (gethash match counts 0))))))
    (loop for key being the hash-keys of counts
       collect key into keys
       finally (map nil #'(lambda (x)
                            (format t "~D: ~A~%" (gethash x counts) x))
                    (subseq (sort keys #'>
                                  :key #'(lambda (x) (gethash x counts))) 0 10)))))

The :external-format :latin-1 is an efficiency hack; SBCL normally reads files in whatever encoding your current locale dictates. Mine is a UTF-8 locale, so it defaults to reading in UTF-8. Leaving out this bit makes the program run about 50% slower. Adding this bit in doesn't bother me when I'm comparing against a language implementation that doesn't care about encodings--might as well level the playing field as much as possible.

On my 1.66 GHz Core Duo, the Ruby and Common Lisp (SBCL) versions take about the same amount of time. Profiling the Common Lisp indicates that ~50% or more of the time is taken up with I/O-related functions. I think there are some efficiency gains to be had in read-line and SBCL's stream implementation generally, though.

posted by Nate @ 3:56PM