» ironclad block cipher improvements

Sunday, 17 June A.D. 2012 @ 8:45 PM

I recently rewrote how Ironclad implements block cipher modes and I thought the evolution of that functionality was worth writing about. You may want to peruse Wikipedia's article on block cipher modes of operation for some useful background.

In the early days of Ironclad, I was impressed with how much typing macros could save you, so I used them as poor man's templates. A simple cipher definition like:

(defcipher :aes :block-length 16)

(This form and the following are intended as examples; the code that actually gets used in Ironclad is slightly more complicated, but the complexity is not relevant here. Also, while encryption is used for all the examples, the results apply equally well to decryption.)

was expanded to:

(defmethod encrypt-with-mode ((cipher aes) (mode ecb-mode) ...)
  (loop for i from in-start below in-end by 16
        for j from out-start
        do (aes-encrypt-block cipher ...)))

(defmethod encrypt-with-mode ((cipher aes) (mode cbc-mode) ...)
  ...)

...and so on for each mode Ironclad implemented, plus a core dispatching function:

(defmethod encrypt (cipher ...)
  (encrypt-with-mode cipher (mode cipher) ...))

This all worked out fairly well, and I felt superior at being able to use multimethods to dispatch on two arguments simultaneously. But like templates, this implementation suffered from code bloat: we have NxM methods just for cipher modes, where N is the number of ciphers and M is the number of modes. It made compiled files somewhat bulky and also relatively slow to compile. People complained about bloat in Ironclad, so I looked for more compact ways to do the above.

Implementation 2.0, in Ironclad 0.20, noticed that what really mattered was the block size of the cipher; if you knew that and how to encrypt/decrypt blocks for a particular cipher, you could say something like:

(defmethod encrypt-with-mode ((cipher 16-byte-block-mixin) (mode ecb-mode) ...)
  (loop with encrypt-function = (encrypt-function cipher)
        for i from in-start below in-end by 16
        for j from out-start
        do (funcall encrypt-function ...)))

Reformulating things this way allows the above code to be shared amongst all ciphers whose block size is 16 bytes (or 8 bytes, or 32, ...). (Having separate methods for each block size, rather than querying the cipher for the block size at runtime, allows good native code compilers to optimize the arithmetic for the loop indices.) The binding of ENCRYPT-FUNCTION is a (premature?) optimization to avoid extra generic function calls in the inner loop. Making this change reduced code size quite dramatically (25% of the compiled code files on x86 SBCL--I believe that's across all compiled files, not just the ciphers!).

Ideas for version 3.0 began percolating when I played with simple benchmarks like:

(defun normal (...)
  (time
    (let (...)
      (dotimes (i n)
        (ironclad:encrypt cipher block block)))))

(defun internal (...)
  (time
    (let (... (f (ironclad::encrypt-function cipher)))
      (dotimes (i n)
        (funcall f cipher block block ...)))))

That is, NORMAL benchmarks the way an ordinary user of Ironclad would use the library, whereas INTERNAL benchmarks what happens if you bypass the generic function dispatch of ENCRYPT and ENCRYPT-WITH-MODE. (This example is, of course, only relevant for ECB mode, but the results should be applicable to all cipher modes.) I noticed that INTERNAL would be 1.5-2x faster than NORMAL, depending on the cipher, for a small number of blocks (~1-8 blocks per call to ENCRYPT or equivalent); profiling showed that Ironclad was spending a good bit of its time rummaging around in generic function dispatch. Of course that overhead goes down in relative terms if a user feeds ENCRYPT with longer chunks of data, but leaving that much performance on the table didn't seem like a winning strategy.

The new design notices that ENCRYPT-WITH-MODE is doing way too much work on each call; once we set up a cipher with a particular mode, it's never changing, so why are we doing the dispatching each time? A better approach would compute a “mode encryption function” once:

(defmethod compute-mode-encryption-function ((cipher 16-byte-block-mixin)
                                             (mode ecb-mode))
  (let ((f (encrypt-function cipher)))
    (lambda (...)
      (loop for i from in-start below in-end by 16
            for j from out-start
            do (funcall f ...)))))

Now ENCRYPT can just call these functions directly, eliminating an extra level of generic function dispatch. There's also no reason for ENCRYPT to be a generic function anymore:

(defun encrypt (cipher ...)
  (funcall (encrypt-function (mode cipher)) cipher ...))

Rewriting the code this way wins makes small-block encryption about 20% faster, bringing the overhead down to about 1.4x for most ciphers. This last implementation has been committed to the repository and will be in the next release. The measured overhead, however, is still a bit more than I'd like. There are two remaining things to try that I can think of:

The ciphers themselves are classes; accessing cipher-specific data, therefore, requires generic function calls. These accesses can be optimized, but as I understand it, doing so generally requires the enclosing definition to be a method, rather than a standard function. This condition is never true for the cipher-specific block encryption functions. Making the ciphers structures, therefore, would eliminate these and also permit inlining the accesses on some implementations.

Idea number two is more interesting. Building on implementation 3.0, the cipher-specific block encryption functions could be reorganized as closure-producing functions themselves. That's quite a mouthful without some background into what's going on. So as an example, the mentioned encryption functions look something like:

(defun $cipher-encrypt-block (cipher ...)
  (let ((s-boxes (s-boxes cipher))
        (other-data (other-data cipher)))
    ...))

S-BOXES and OTHER-DATA are computed in a SCHEDULE-KEY method:

(defmethod schedule-key ((cipher $cipher) key)
  ...
  (setf (s-boxes cipher) ... (other-data cipher) ...))

The above corresponds to how a low-level language might implement ciphers. But instead of having SCHEDULE-KEY compute the data, only for it to be accessed repeatedly from the cipher later on, SCHEDULE-KEY could simply return encryption/decryption functions (closures) for the cipher directly:

(defmethod schedule-key ((cipher $cipher) key)
  ...
  (values ($cipher-encrypt-function s-boxes other-data)
          ($cipher-decrypt-function s-boxes other-data)))

(defun $cipher-encrypt-function (s-boxes other-data)
  (lambda (...)
    ;; Use S-BOXES and OTHER-DATA as in $CIPHER-ENCRYPT-BLOCK
    ...))

Accesses to cipher-specific data thus no longer require generic function dispatch; the same benefit as the first method, but a much Lispier way of getting there. This change makes the cipher objects themselves almost vestigial: useful for computing things at instantiation time and holding them for later, but not much else.

I don't have time to pursue this second idea before the next release (which should be sometime this month), but I'm looking forward to hacking it out at some point and seeing how much of a performance improvement it would be.