This is a pretty nice speed win. The new strategy consists in stacking new candidates as if it was a hash chain. Then, only if there is a need to actually consult the chain, they are batch-updated, before starting the match search itself. This is supposed to be beneficial when skipping positions, which happens a lot when using lazy strategy. The baseline performance for btlazy2 on my laptop is : 15#calgary.tar : 3265536 -> 955985 (3.416), 7.06 MB/s , 618.0 MB/s 15#enwik7 : 10000000 -> 3067341 (3.260), 4.65 MB/s , 521.2 MB/s 15#silesia.tar : 211984896 -> 58095131 (3.649), 6.20 MB/s , 682.4 MB/s (only level 15 remains for btlazy2, as this strategy is squeezed between lazy2 and btopt) After this patch, and keeping all parameters identical, speed is increased by a pretty good margin (+30-50%), but compression ratio suffers a bit : 15#calgary.tar : 3265536 -> 958060 (3.408), 9.12 MB/s , 621.1 MB/s 15#enwik7 : 10000000 -> 3078318 (3.249), 6.37 MB/s , 525.1 MB/s 15#silesia.tar : 211984896 -> 58444111 (3.627), 9.89 MB/s , 680.4 MB/s That's because I kept `1<<searchLog` as a maximum number of candidates to update. But for a hash chain, this represents the total number of candidates in the chain, while for the binary, it represents the maximum depth of searches. Keep in mind that a lot of candidates won't even be visited in the btree, since they are filtered out by the binary sort. As a consequence, in the new implementation, the effective depth of the binary tree is substantially shorter. To compensate, it's enough to increase `searchLog` value. Here is the result after adding just +1 to searchLog (level 15 setting in this patch): 15#calgary.tar : 3265536 -> 956311 (3.415), 8.32 MB/s , 611.4 MB/s 15#enwik7 : 10000000 -> 3067655 (3.260), 5.43 MB/s , 535.5 MB/s 15#silesia.tar : 211984896 -> 58113144 (3.648), 8.35 MB/s , 679.3 MB/s aka, almost the same compression ratio as before, but with a noticeable speed increase (+20-30%). This modification makes btlazy2 more competitive. A new round of paramgrill will be necessary to determine which levels are impacted and could adopt the new strategy.
Zstandard library files
The lib directory is split into several sub-directories, in order to make it easier to select or exclude specific features.
Building
Makefile script is provided, supporting the standard set of commands,
directories, and variables (see https://www.gnu.org/prep/standards/html_node/Command-Variables.html).
make: generates both static and dynamic librariesmake install: install libraries in default system directories
API
Zstandard's stable API is exposed within lib/zstd.h.
Advanced API
Optional advanced features are exposed via :
lib/common/zstd_errors.h: translatessize_tfunction results into anZSTD_ErrorCode, for accurate error handling.ZSTD_STATIC_LINKING_ONLY: if this macro is defined before includingzstd.h, it unlocks access to advanced experimental API, exposed in second part ofzstd.h. These APIs shall never be used with dynamic library ! They are not "stable", their definition may change in the future. Only static linking is allowed.
Modular build
- Directory
lib/commonis always required, for all variants. - Compression source code lies in
lib/compress - Decompression source code lies in
lib/decompress - It's possible to include only
compressor onlydecompress, they don't depend on each other. lib/dictBuilder: makes it possible to generate dictionaries from a set of samples. The API is exposed inlib/dictBuilder/zdict.h. This module depends on bothlib/commonandlib/compress.lib/legacy: source code to decompress older zstd formats, starting fromv0.1. This module depends onlib/commonandlib/decompress. To enable this feature, it's necessary to defineZSTD_LEGACY_SUPPORT = 1during compilation. Typically, withgcc, add argument-DZSTD_LEGACY_SUPPORT=1. Using higher number limits the number of version supported. For example,ZSTD_LEGACY_SUPPORT=2means : "support legacy formats starting from v0.2+". The API is exposed inlib/legacy/zstd_legacy.h. Each version also provides a (dedicated) set of advanced API. For example, advanced API for versionv0.4is exposed inlib/legacy/zstd_v04.h.
Multithreading support
Multithreading is disabled by default when building with make.
Enabling multithreading requires 2 conditions :
- set macro
ZSTD_MULTITHREAD - on POSIX systems : compile with pthread (
-pthreadcompilation flag forgccfor example)
Both conditions are automatically triggered by invoking make lib-mt target.
Note that, when linking a POSIX program with a multithreaded version of libzstd,
it's necessary to trigger -pthread flag during link stage.
Multithreading capabilities are exposed via :
- private API
lib/compress/zstdmt_compress.h. Symbols defined in this header are currently exposed inlibzstd, hence usable. Note however that this API is planned to be locked and remain strictly internal in the future. - advanced API
ZSTD_compress_generic(), defined inlib/zstd.h, experimental section. This API is still considered experimental, but is designed to be labelled "stable" at some point in the future. It's the recommended entry point for multi-threading operations.
Windows : using MinGW+MSYS to create DLL
DLL can be created using MinGW+MSYS with the make libzstd command.
This command creates dll\libzstd.dll and the import library dll\libzstd.lib.
The import library is only required with Visual C++.
The header file zstd.h and the dynamic library dll\libzstd.dll are required to
compile a project using gcc/MinGW.
The dynamic library has to be added to linking options.
It means that if a project that uses ZSTD consists of a single test-dll.c
file it should be linked with dll\libzstd.dll. For example:
gcc $(CFLAGS) -Iinclude/ test-dll.c -o test-dll dll\libzstd.dll
The compiled executable will require ZSTD DLL which is available at dll\libzstd.dll.
Deprecated API
Obsolete API on their way out are stored in directory lib/deprecated.
At this stage, it contains older streaming prototypes, in lib/deprecated/zbuff.h.
Presence in this directory is temporary.
These prototypes will be removed in some future version.
Consider migrating code towards supported streaming API exposed in zstd.h.
Miscellaneous
The other files are not source code. There are :
LICENSE: contains the BSD license textMakefile:makescript to build and install zstd library (static and dynamic)BUCK: support forbuckbuild system (https://buckbuild.com/)libzstd.pc.in: forpkg-config(used inmake install)README.md: this file