OpenZL Compression System, Ahead Of Zstd And XZ In Speed And Level Of Structured Data Compression

Meta Company* introduced tools for data compression and decompression OpenZL, compared to the Zstd and XZ formats, demonstrating a higher level of compression and operating speed. OpenZL is designed to efficiently compress structured data sets, such as those used in machine learning, as well as stores containing fields with different types of repeating information. The OpenZL code is written in C/C++ and open under the BSD license.

When compressing a database with an astronomical catalog of stars SAO, OpenZL tools reduced the data size by 2.06 times, while the zstd algorithm compressed information by 1.31 times, and XZ 1.64 times. At the same time, in terms of compression speed, OpenZL was twice as fast as zstd (203 MB/s versus 115 MB/s), and XZ was 65 times faster (203 MB/s versus 3.1 MB/s). Unpacking in OpenZL turned out to be slightly slower than zstd (822 MB/s versus 890 MB/s) and 27 times faster than XZ.


OpenZL is not a general purpose algorithm and only shows good results for data with a previously known structure.
The work of OpenZL reduces to adaptive generation of a wrapper based on the passed data description. The result is compression code optimized for a specific data format. For unpacking, a universal unpacker is used, compatible with all generated packers.

Packaging and unpacking is carried out using one utility “zli” or the libopenzl library. The data structure is described in the form of profiles. It already includes a set of predefined profiles that describe typical storage formats. For example, a profile for CSV format or data stored in the form of an array of 64-bit numbers. Compression comes down to selecting a profile with the “zli list-profiles” command and starting the packaging process with the “zli compress –profile profile_name” command. To unpack, just run “zli decompress”.

For specific formats, you need to create your own profile using the “zli train” command, which identifies patterns in the data and generates a profile with the optimal compression level. Using the “–pareto-frontier” option, the created profile can be upgraded to speed up packaging or decompression, at the cost of reducing the compression level. To describe complex formats with nested structures and define the layout of data formats in structures, the language SDDL

/Reports, release notes, official announcements.