» clos initialization protocol

Tuesday, 22 December A.D. 2009 @ 4:40 PM

Update: Tobias Rittweiler pointed out that I don't need &ALLOW-OTHER-KEYS in my method definitions; &ALLOW-OTHER-KEYS was already declared in the DEFGENERIC and isn't required in individual methods. Furthermore, putting &ALLOW-OTHER-KEYS in your method definitions when it's already been declared in DEFGENERIC inhibits useful argument checking. So don't do that. I've fixed the examples below.

From the last blog post, we saw that REINITIALIZE-INSTANCE is a useful way to reduce consing. You say, “well, sure, that's why I have RESET-FOO functions in the library/application that I'm writing.” I'd like to suggest that if you've written such functions, you ought to be writing REINITIALIZE-INSTANCE methods (if you're resetting structures) or writing :AFTER methods on the usual CLOS initialization protocol methods instead. Doing so is more Lispy than the usual reset functions. Writing to the existing protocol also encourages you to more clearly define what's initialization, what's reinitialization, and what can be shared between them.

“OK,” you say, “but what are these CLOS methods and how do I use them?” Glad you asked.

There are three major generic functions that participate in the protocol:

As you might imagine, INITIALIZE-INSTANCE is called from MAKE-INSTANCE and REINITIALIZE-INSTANCE is called from, well, REINITIALIZE-INSTANCE. SHARED-INITIALIZE is called from both INITIALIZE-INSTANCE and REINITIALIZE-INSTANCE and handles bits common to both methods. We'll focus on SHARED-INITIALIZE because it underpins the other two and because SHARED-INITIALIZE is also called when updating class definitions and changing the class of an instance. You get a lot of things for free when writing methods for SHARED-INITIALIZE.

First things first: you generally only want to write after methods on SHARED-INITIALIZE, viz:

(defmethod shared-initialize :after ((object my-class) slot-names &rest initargs &key)
  ...)

The reason you do this is because the primary method on SHARED-INITIALIZE already does a lot of useful things, like initializing slots from their :INITFORMs and :INITARGs. (For the same reason, you should generally use :AFTER methods for INITIALIZE-INSTANCE and REINITIALIZE-INSTANCE.) You can, of course, write:

(defmethod shared-initialize ((object my-class) slot-names &rest initargs &key)
  (call-next-method)
  ...)

But that's generally not how things are done. I think writing :AFTER methods makes your intent clearer: you're doing extra work after the standard bits have been done. It's also easy to forget to CALL-NEXT-METHOD, which would lead to puzzling behavior and also might lead you to submit spurious bug reports like “CLOS initialization doesn't work with user-defined classes”. Also, method combination is often more useful and more declarative than trying to get CALL-NEXT-METHOD in the right place. That doesn't mean CALL-NEXT-METHOD doesn't have its place, it just means I think method combination should be preferred when possible.

OK, so, onto initializing things. What's this SLOT-NAMES parameter? SLOT-NAMES is a (possibly empty) list of slot names that need to be initialized from their :INITFORMs, or T to stand in for “all slots”. For our purposes here, we are going to assume that SLOT-NAMES is only ever T (for when SHARED-INITIALIZE is called from INITIALIZE-INSTANCE) or NIL (for when SHARED-INITIALIZE is called from REINITIALIZE-INSTANCE). In the T case, the primary method on SHARED-INITIALIZE handles the initialization logic. So we don't have to worry too much about SLOT-NAMES.

You can also mostly ignore INITARGS. Most of the keyword-argument pairs in it will have been handled elsewhere by standard CLOS bits. Any bits that are interesting to you will be pulled out via separate &KEY arguments; we'll get to that use-case in a second.

OK, so what useful things can we do in SHARED-INITIALIZE? We can recompute slots that depend on the values of other slots and therefore can't be usefully initialized with :INITFORM or :INITARG:

(defclass triangle ()
  ((a :initarg :a :reader a)
   (b :initarg :b :reader b)
   (c :initarg :c :reader c)
   (area :reader area)))

(defmethod shared-initialize :after ((o triangle) slot-names &rest initargs &key)
  (let ((area (compute-area-of-triangle (a o) (b o) (c o))))
    (setf (slot-value o 'area) area)))

In cases like this, you might be tempted to write some sort of wrapper (with the appropriate modification to the definition of TRIANGLE):

(defun make-triangle (a b c)
  (let ((area (compute-area-of-triangle a b c)))
    (make-instance 'triangle :a a :b b :c c :area area)))

which is certainly doable (I'll leave the subject of whether to wrap MAKE-INSTANCE or expose it directly to users in your API for a future post). But this sort of wrapper doesn't easily enable reinitialization. Consider what your hand-crafted RESET-TRIANGLE function might look like to handle all the cases the SHARED-INITIALIZE method would handle for free:

(defun reset-triangle (triangle &key a b c)
  (let ((a (or a (a triangle)))
        (b (or b (b triangle)))
        (c (or c (c triangle))))
    (setf (slot-value triangle 'a) a
          (slot-value triangle 'b) b
          (slot-value triangle 'c) c
          (slot-value triangle 'area) (compute-area-of-triangle a b c))
    triangle))

Notice that you've duplicated slot setting (the CLOS internals handle that for you) and you're computing the area of the triangle in two different, but related (both part of initialization) places. You might say that you only want users resetting all three sides of a triangle, so the API should reflect that. Maybe. But in that case, you should be the one calling REINITIALIZE-INSTANCE, and you'd wind up going the SHARED-INITIALIZE route anyway. I think the SHARED-INITIALIZE method also better captures what it means to compute the area: the computation of the area is an integral part of initialization, rather than something that you do beforehand prior to creating the object.

For a less pedagogical example, let's say you have a cool encryption algorithm:

(defclass les () ; the LISP encryption standard
  ((round-keys :reader round-keys)
   (n-rounds :reader n-rounds)))

(defmethod shared-initialize :after ((o les) slot-names &rest initargs &key key)
  (multiple-value-bind (round-keys n-rounds) (schedule-key key)
    (setf (slot-value o 'round-keys) round-keys
          (slot-value o 'n-rounds) n-rounds)))

You may have noticed that SHARED-INITIALIZE takes a KEY keyword argument above, even though there's no :INITARG :KEY in the DEFCLASS form. One cool thing about SHARED-INITIALIZE is that any &KEY arguments to it automatically become candidates for use with MAKE-INSTANCE and the other generic functions that participate in the initialization protocol. This feature enables you to pass keyword arguments to MAKE-INSTANCE that aren't directly connected with slots, but require massaging in some way to produce something suitable for slot values. So using the above, you say:

(defvar *l* (make-instance 'les :key #(#xde #xad #xbe #xef)))
...lots of code...
;; sometime later
(reinitialize-instance *l* :key #(#xca #xfe #xbe #xbe))

Again, you could use wrapper functions. But I think the same arguments cited above for clarity and for not repeating yourself apply here as well. (And if you're really serious about this encryption stuff, you also want to be able to reset the initialization vector/nonce for your encryption mode and possibly even to change the encryption mode entirely. Once you've done that, you've basically rewritten SHARED-INITIALIZE, and you have potentially duplicated logic between your MAKE-INSTANCE wrapper and your reset function.)

Another reason to have SHARED-INITIALIZE methods is because they naturally cooperate with subclassing. Let's say you're dealing with some packaging format and you're creating entries from octet vectors:

(defclass entry ()
  ...slots...)

(defun make-entry-from-buffer (buffer &key (start 0))
  (let (...parse out individual slots from BUFFER...)
    (make-instance 'entry ...initargs for slots...)))

Fine and dandy. Now let's say that your customers tell you about packaging format v2, which mostly retains the format of v1, but adds a way of specifying additional metadata as part of the entries. OK:

(defclass entry-v2 (entry)
  ...more slots...)

(defun make-entry-from-buffer (buffer &key (start 0))
  ;; A `2' at the start of the buffer indicates a version 2 entry.
  (if (= (aref buffer start) 2)
      (make-entry-v2-from-buffer buffer :start start)
      (make-entry-v1-from-buffer buffer :start start)))

Hm. We want to initialize the slots in an entry in a common place, so as to avoid code duplication:

(defun initialize-common-entries (entry buffer &key (start 0))
  (let (...parse out individual slots from BUFFER...)
    (setf ...lots of slots...)
    entry))

(defun make-entry-v2-from-buffer (buffer &key (start 0))
  (let ((entry (make-instance 'entry-v2)))
    (setf ...new slots for ENTRY-V2...)
    (initialize-common-entries entry buffer :start start)))

and so on. Possibly with slight adjustments because parsing the slots for ENTRY-V2 might depend on the values of one or more of the common slots in ENTRY. I claim that this is more elegantly handled by:

(defmethod shared-initialize :after ((o entry) slot-names &rest initargs &key buffer start)
  (let (...parse out individual slots from BUFFER...)
    (setf ...lots of slots...)
    o))

(defmethod shared-initialize :after ((o entry-v2) slot-names &rest initargs &key buffer start)
  (let (...parse out individual slots from BUFFER...)
    (setf ...slots for ENTRY-V2...)
    o))

(defun make-entry-from-buffer (buffer &key (start 0))
  ;; A `2' at the start of the buffer indicates a version 2 entry.
  (if (= (aref buffer start) 2)
      (make-instance 'entry-v2 :buffer buffer :start start)
      (make-instance 'entry :buffer :start start)))

Since :AFTER methods run in least-specific-first order, the :AFTER method on ENTRY will be called first. The :AFTER method on ENTRY-V2 can therefore freely use the value of any slots from ENTRY. Using SHARED-INITIALIZE here not only enables REINITIALIZE-INSTANCE, it also does the right thing in handling common code from a software engineering perspective.

Of course, if you have code that specifically needs to run at MAKE-INSTANCE time, or REINITIALIZE-INSTANCE time, then you can add :AFTER methods to INITIALIZE-INSTANCE or REINITIALIZE-INSTANCE as appropriate. For instance, if all of your objects need to have a unique ID, you surely don't want to assign a unique ID in SHARED-INITIALIZE.

(defclass unique-id-mixin ()
  ((unique-id :reader unique-id)))

(defmethod initialize-instance :after ((o unique-id-mixin) &rest initargs &key)
  (setf (slot-value o 'unique-id) (get-unique-id-for-instance o)))

(You might, however, want to assign a different unique ID if somebody change-class'ed your object. That's a topic for future discussion; I only sort-of understand the protocol involved in CHANGE-CLASS and I have yet to see really pragmatic reasons for adding methods to the generic functions involved.)

I know that my code doesn't always work according to the ideas I've laid out above; it's taken me a while to wrap my head around what goes where and why. But now that I understand, I'm slowly modifying my code to use these ideas. Where can your code benefit from these ideas?