» Wednesday, 29 September A.D. 2010

optimizing pixel conversion

In the previous post a previous post, I described a framework for doing pixel conversion where the separate steps were kept at arm's length: pixel reading, conversion proper, pixel writing, and the loop over the pixels in question. All of these steps were tucked away behind generic functions and closures, making it difficult for a static compiler to glom everything together and generate high-performance code. Runtime compilation might also have problems, but at least there are more ways to attack the separation.

This post is about a way to open up optimization possibilities for a static compiler and keep most of the abstraction.

The core idea behind the last proposal was that generic functions could specify how to read, write, and convert pixels. The core idea here is that we need to extend the protocol to talk about how to loop over the pixels: we need to provide a way to specialize operations at the pixel buffer level.

With that in mind, our entry point looks like:

(defun convert-pixels (dst-format src-format dst-buffer dst-start
                       src-buffer src-start n-pixels)
  (funcall (find-bulk-converter dst-format src-format)
           dst-buffer dst-start src-buffer src-start n-pixels))

find-bulk-converter can, by default, return the simple converter we already discussed.

(defun simple-converter (reader converter writer n-pixels)
  (loop repeat n-pixels
        do (multiple-value-call writer
             (multiple-value-call converter (funcall reader)))))

(defmethod find-bulk-converter (dst-format src-format)
  (let ((converter (find-pixel-converter dst-format src-format)))
    #'(lambda (dst-buffer dst-start src-buffer src-start n-pixels)
        (let ((reader (pixel-reader src-format src-buffer src-start))
              (writer (pixel-writer dst-format dst-buffer dst-start)))
          (simple-converter reader converter writer n-pixels)))))

(I've renamed find-converter from the first post into find-pixel-converter to more accurately capture what the function is supposed to do.)

Diversion: If you want to be obsessively micro-optimizing, you might rewrite the pixel-reader and pixel-writer functions slightly:

(defmethod pixel-reader ((format rgba))
  #'(lambda (buffer start)
    #'(lambda ()
        (multiple-value-prog1 (values (aref buffer start)
                                      (aref buffer (+ start 1))
                                      (aref buffer (+ start 2))
                                      (aref buffer (+ start 3)))
          (incf start 4)))))

(defmethod pixel-writer ((format rgba))
  #'(lambda (buffer start)
    #'(lambda (r g b a)
        (setf (aref buffer start) r
              (aref buffer (+ start 1)) g
              (aref buffer (+ start 2)) b
              (aref buffer (+ start 3)) a)
        (incf start 4)
        (values))))

Why would you do this? Well, because now you can move more work—specifically, generic function calls—out of the conversion function proper:

(defmethod find-bulk-converter (dst-format src-format)
  (let ((converter (find-pixel-converter dst-format src-format))
        (reader-gen (pixel-reader format))
        (writer-gen (pixel-writer format)))
    #'(lambda (dst-buffer dst-start src-buffer src-start n-pixels)
        (let ((reader (funcall reader-gen src-buffer src-start))
              (writer (funcall writer-gen dst-buffer dst-start)))
          (simple-converter reader converter writer n-pixels)))))

And because the double lambdas are pretty sweet. Of course, nobody reading this would do such things without profiling first, right? Good. Now, back to why we wanted to move to find-bulk-converter in the first place.

The advantage of this approach is that now you can write specialized conversion routines that deal with entire buffers at once. Need RGBA to CMYK conversion to go fast?

(defmethod find-bulk-converter ((dst-format cmyk) (src-format rgba))
  ;; Write whatever you like.  Maybe manually inlining the
  ;; read/convert/write routines is sufficient.  Maybe you can figure
  ;; out a clever way to share code between this routine and pixel
  ;; routines so the compiler does most of the heavy lifting for you.
  ;; Maybe write vectorized code if your implementation supports it.
  ;; Maybe call out to C if you really need to.
  ...)

Sure, you do have to duplicate all the stuff I mentioned above--pixel reading, pixel writing, looping logic, etc. But you only have to do it for the cases that really matter. And if you do something like vectorizing the code, chances are that the loop structure and such would look quite different between routines anyway. If it doesn't? Good, try to abstract things away there, too.

posted by Nate @ 5:07PM