Chipz

Chipz is a library for decompressing DEFLATE and BZIP2 data. DEFLATE data, defined in RFC1951, forms the core of popular compression formats such as zlib (RFC 1950) and gzip (RFC 1952). As such, Chipz also provides for decompressing data in those formats as well. BZIP2 is the format used by the popular compression tool bzip2.

Chipz is the reading complement to Salza.

Installation

Chipz can be downloaded at http://www.method-combination.net/lisp/files/chipz.tar.gz. The latest version is 0.8.

It comes with an ASDF system definition, so (ASDF:OOS 'ASDF:LOAD-OP :CHIPZ) should be all that you need to get started.

License

Chipz is released under a MIT-like license; you can do pretty much anything you want to with the code except claim that you wrote it.

Using the library

The main function of the library is decompress:

decompress output state input &key &allow-other-keys => output

Five distinct use cases are covered by this single function:

NoteChipz does not provide for decompressing data from a stream to a user-specified buffer, as the buffer management involved cannot be done automatically by the library--the application must be involved in this case.

One-shot decompression

The first and second use cases above are intended to be convenient "one-shot" decompression methods. Therefore, although the description of the following methods attached to this generic function have an decompression-state parameter, as returned by make-dstate, respectively, the usual way to use them will be to provide a format argument. This format argument should be one of:

The format argument can also be a keyword, such as :gzip, for backwards compatibility. Using symbols in the CHIPZ package is preferred, however.

Most applications will use chipz:gzip or chipz:bzip2, a few applications will use chipz:zlib, and uses of chipz:deflate will probably be few and far between.

All the method signatures described below also accept a format argument in lieu of an decompression-state argument.

The signatures of the first two methods are as follows.

decompress (output null) (state decompression-state) (input vector) &key (input-start 0) input-end buffer-size => output
decompress (output null) (state decompression-state) (input stream) &key buffer-size => output

A simple function to retrieve the contents of a gzip-compressed file, then, might be:

(defun gzip-contents (pathname)
  (with-open-file (stream pathname :direction :input
                                   :element-type '(unsigned-byte 8))
    (chipz:decompress nil 'chipz:gzip stream)))

These one-shot methods also support a :buffer-size argument as a hint of the size of decompressed data. The library uses this to pre-allocate the output buffer to the hinted size. Therefore, if you know the size of the decompressed data or have a good estimate, fewer allocations will be done, leading to slightly better performance. If :buffer-size is not provided or proves to be too small, the library will of course grow the output buffer as necessary.

Decompressing to a vector

An alternate way to deal with compressed data is to read in a buffer's worth of data, decompress the buffer, and then deal with any remaining input and the produced output, looping to read and process more data as appropriate. This scheme is the third use case described above and is handled in zlib with the inflate function. In Chipz, it is just another method of decompress.

decompress (output vector) (state decompression-state) (input vector) &key (input-start 0) input-end (output-start 0) output-end => n-bytes-consumed, n-bytes-produced

This method decompresses the data from input between input-start and input-end and place the uncompressed data in output, limited by output-start and output-end. Please note that it is possible to consume some or all of the input without producing any output and to produce some or all of the output without consuming any input.

As above, you can use a format argument instead of an decompression-state. You will usually not want to do this unless you know exactly how large the decompressed data is going to be; otherwise, you will only decompress a portion of the data and any intermediate state required to decompress the remainder of the data will be thrown away.

Decompressing to a stream

Finally, decompress can also be used to write the decompressed data directly to a stream, enabling a poor man's gunzip function:

(defun gunzip (gzip-filename output-filename)
  (with-open-file (gzstream gzip-filename :direction :input
                            :element-type '(unsigned-byte 8))
    (with-open-file (stream output-filename :direction :output
                            :element-type '(unsigned-byte 8)
                            :if-exists :supersede)
      (chipz:decompress stream 'chipz:gzip gzstream)
      output-filename)))

The relevant methods in this case are:

decompress (output stream) (state decompression-state) (input vector) &key (input-start 0) input-end => stream
decompress (output stream) (state decompression-state) (input stream) => stream

Both return the output stream.

Creating decompression-state objects

The core data structure of Chipz is a decompression-state, which stores the internal state of an ongoing decompression process. You create a decompression-state with make-dstate.

make-dstate format => dstate

Return an decompression-state object suitable for uncompressing data in data-format. data-format should be:

As with decompress, you can use keywords instead, but doing so is deprecated.

Prior to adding bzip2 support, Chipz supported only deflate-based formats. make-inflate-state was the primary interface then; it is now deprecated, but kept around for backwards compatibility.

make-inflate-state format => inflate-state

make-inflate-state supports the same data-format arguments as make-dstate does, with the obvious exception of chipz:bzip2. The inflate-state object returned is a decompression-state, so it can be passed to decompress and finish-dstate.

Once you are done with a decompression-state object, you must call finish-dstate on it. finish-dstate checks that the given state decompressed all the data in a given stream. It does not dispose of any resources associated with state; it is meant purely as an error-checking construct. Therefore, it is inappropriate to call from, say, the cleanup forms of UNWIND-PROTECT. The cleanup forms may be run when an error is thrown during decompression and of course the stream will only be partially decompressed at that point.

finish-dstate state => t

finish-inflate-state does the same thing, but only for inflate-state. Its use, like that of make-inflate-state is deprecated.

finish-inflate-state state => t

Gray streams

Chipz includes support for creating Gray streams to wrap streams containing compressed data and read the uncompressed data from those streams. SBCL, Allegro, Lispworks, CMUCL, and OpenMCL are supported at this time.

make-decompressing-stream format stream => decompressing-stream

Return a stream that provides transparent decompression of the data from stream in format. That is, read-byte and read-sequence will decompress the data read from stream and return portions of the decompressed data as requested. format is as in the one-shot decompression methods.

Conditions

chipz-error

All errors signaled by Chipz are of this type.

invalid-format-error

This error is signaled when the format argument to decompress or make-dstate is not one of the symbols specified for make-dstate. This error is also signaled in make-inflate-state if the format argument is not valid for that function.

decompression-error

All errors signaled during decompression are of this type.

invalid-checksum-error

The zlib, gzip, and bzip2 formats all contain checksums to verify the integrity of the uncompressed data; this error is signaled when the stored checksum is found to be inconsistent with the checksum computed by Chipz. It indicates that the compressed data has probably been corrupted in some fashion (or there is an error in Chipz).

premature-end-of-stream

This error is signaled when finish-dstate is called on an decompression-state that has not finished processing an entire decompressed data stream.

inflate-error

All errors signaled while decompressing deflate-based formats are of this type.

invalid-zlib-header-error

This error is signaled when an invalid zlib header is read.

invalid-gzip-header-error

This error is signaled when an invalid gzip header is read.

reserved-block-type-error

This error is signaled when a deflate block is read whose type is 3. This type is reserved for future expansion and should not be found in the wild.

invalid-stored-block-length-error

This error is signaled when the length of a deflate stored block is found to be invalid.

bzip2-error

All errors signaled while decompressing bzip2-based formats are of this type.

invalid-bzip2-data

This error is signaled when the compressed bzip2 data is found to be corrupt in some way that prevents further decompression.