The following tests are included:
- Empty input scenario test.
- Workspace size and alignment tests.
- Symbol out-of-range tests.
- Cover multiple input sizes, vary permitted maximum symbol
values, and include diverse symbol distributions.
These tests verifies count table correctness, maxSymbolValuePtr
updates, and error-handling paths. It enables automated regression
of core histogram logic as well.
When an error occurs in BMK_isSuccessful_runOutcome, the code
previously skipped the call to BMK_freeTimedFnState(tfs),
leaking the allocated tfs object.
Fiexed by calling BMK_freeTimedFnState(tfs) before goto _cleanOut.
The existing scalar implementation uses a 4-way pipelined histogram
calculation which is very efficient on out-of-order CPUs. However,
this can be further accelerated using the SVE2 HISTSEG instructions -
which compute a histogram for 16 byte chunks in a vector register.
On a system with 128-bit vectors (VL128) we need 16 HISTSEG executions
to compute the histogram for the whole symbol space (0..255) of 16
bytes input. However we can only accumulate 15 of such 16 byte strips
before possible overflow. So we need to extend and save the 8-bit
histogram accumulators to 16-bit after every 240 byte chunks of input.
To store all in registers we would need 32 128-bit registers. Longer
SVE2 vectors could help here, if such machines become available.
The maximum input block size in Zstd is 128 KiB, so 16-bit accumulators
would not be enough. However an LZ pass will prepend the histogram
calculation, so it is impossible (my assumption) to overflow the 16-bit
accumulators.
The symbol distribution is also not uniform, the lower values are more
common, so we used a 3 pass algorithm to prevent stack spilling. In the
first pass we only compute histograms for 64 symbols (4-way SIMD) while
also computing the maximum symbol value. If we have symbol values
larger than 64 we start the second pass to compute the next 96 elements
of the histogram. The final pass calculates the remaining part of the
histogram (256 symbols in total) if needed. This split of histogram
generation gave the best overall results for performance.
This implementation is the best performing of a number of different
cache blocking schemes tested.
Compression uplifts on a Neoverse V2 system, using Zstd-1.5.8
(e26dde3d) as a baseline, compiled with "-O3 -march=armv8.2-a+sve2":
Clang-20 GCC-14
1#silesia.tar: +6.173% +5.987%
2#silesia.tar: +5.200% +5.011%
3#silesia.tar: +4.332% +5.031%
4#silesia.tar: +2.789% +3.064%
5#silesia.tar: +2.028% +1.838%
6#silesia.tar: +1.562% +1.340%
7#silesia.tar: +1.160% +0.959%
- Split monolithic 235-line CMakeLists.txt into focused modules
- Main file reduced to 78 lines with clear section organization
- Created 5 specialized modules:
* ZstdVersion.cmake - CMake policies and version management
* ZstdOptions.cmake - Build options and platform configuration
* ZstdDependencies.cmake - External dependency management
* ZstdBuild.cmake - Build targets and validation
* ZstdPackage.cmake - Package configuration generation
Benefits:
- Improved readability and maintainability
- Better separation of concerns
- Easier debugging and modification
- Preserved 100% backward compatibility
- All existing build options and targets unchanged
The refactored build system passes all tests and maintains
identical functionality while being much easier to understand
and maintain.
- Create new .github/workflows/cmake-tests.yml with all cmake-related jobs
- Move cmake-build-and-test-check, cmake-source-directory-with-spaces, and cmake-visual-2022 jobs
- Remove cmake tests from dev-short-tests.yml to improve organization
- Maintain same trigger conditions and test configurations
- Add dedicated concurrency group for cmake tests
This separation allows cmake tests to run independently and makes
the CI configuration more modular and easier to maintain.
The FUZZ_malloc_rand() function was incorrectly always returning NULL for
zero-size allocations. The random offset generated by
FUZZ_dataProducer_int32Range() was not being added to the pointer variable,
causing the function to always return (void *)0.
Replace direct returns in error-handling branches with a unified
cleanup block that frees allocated resources before returning,
improving code quality and robustness.
In main, resources were freed on the success path but not in the error path.
This change ensures all allocated resources are released before returning.
Building lz4 as root was causing `make clean` to fail with permission
errors.
We used to have to install lz4 from source back in Ubuntu 14.04, but
nowadays the installed lz4 is fine. Get rid of ancient helpers and
cruft!
There was no memory barrier between writing and reading `done`, which
would allow reordering to cause races. With so little data to handle
after each job completes, we might as well just join.
Previously, parallel_compression would only handle each job's results
after ALL jobs were successfully queued. This caused all src/dst
buffers to remain in memory until then!
It also polled to check whether a job completed, which is racy without
any memory barrier.
Now, we flush results as a side effect of completing a job. Completed
frames are placed in an ordered linked-list, and any eligible frames
are flushed. This may be zero or multiple frames, depending on the
order in which jobs finish.
This design also makes it simple to support streaming input, so that
is now available. Just pass `-` as the filename, and stdin/stdout will
be used for I/O.
Some of these examples are intended to be parallel, and don't make
sense to link against single-threaded libzstd.
The filename of mt and nomt libzstd are identical, so it's still
possible to link against the single-threaded one, just harder.
The pkg-config file has License variable that allows you to set the license for
the software. This sets 'BSD-3-Clause OR GPL-2.0-only' to License.
Ref: https://github.com/pkgconf/pkgconf/blob/master/man/pc.5#L116
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
After the update to MacOS 15.4, the dynamic loader dyld treats duplicated LC_RPATH as an error.
The `FLAGS` variable already contains `LDFLAGS`, thus using both `FLAGS` and `LDFLAGS`
duplicates all `LDFLAGS`, including `-Wl,rpath` parameters.
The duplicate LC_RPATH causes this kind of errors:
```
dyld[29361]: Library not loaded: @loader_path/../lib/libzstd.1.dylib
Referenced from: <7131C877-3CF0-33AC-AA05-257BA4FDD770> /Users/foobar/...
Reason: tried: '/Users/foobar/..../lib/libzstd.1.dylib' (duplicate LC_RPATH '/usr/mypath.../lib')
```
Closes https://github.com/facebook/zstd/issues/4369
Signed-off-by: Etienne Cordonnier <ecordonnier@snap.com>
otherwise will cause dev-python/zstandard build failed when compiling with
clang as reported at https://bugs.gentoo.org/950259
the root cause is pycparser, which is unfixed since reported 2.5 years
ago, :(
Signed-off-by: Z. Liu <zhixu.liu@gmail.com>
The row based match finder is slower without SIMD. We used to detect the
presence of SIMD to set the lower bound to 17, but that breaks
determinism. Instead, specifically opt into it for the kernel, because
it is one of the rare cases that doesn't have SIMD support.
checks that ZSTD_NBTHREADS triggers the expected verbose message
Also: checked that the new test script fails on current `dev` branch, and is fixed by this branch
D50949782 fixed a race condition updating `g_displayLevel` by disabling display.
Instead of disabling display, delete the global variable and always "capture" a local `displayLevel` variable.
This also fixes `DISPLAYUPDATE()` by requiring the user to pass in the last update time as the first parameter.
to reduce confusion with the term "block".
-B# remains supported for existing scripts,
but it's no longer documented, so it's effectively a hidden shortcut.
to reduce confusion with the concept of "blocks" inside a Zstandard frame.
We are now talking about "independent chunks" being produced by a `split` operation.
updated documentation accordingly.
Note: old commands "-B#` and `--blocksize=#` remain supported,
to maintain compatibility with existing scripts.
this ABI is no longer supported by Ubuntu,
and there is a wider consensus that this ABI is on the way out,
with more and more distributions dropping it,
and lingering questions about support of x32 in the kernel.
the purpose of --ultra is to make the user explicitly opt-in
to generate very large window size (> 8 MB).
The agreement to generate very large window size is already implicit
when selecting --long or --patch-from.
Consequently, `--ultra ` is automatically enabled when `--long` or `--patch-from` is set.
this is a prototype definition error:
`_mm_storeu_si128()` should accept a `void*` pointer,
since it explicitly states that it accepts unaligned addresses
yet requiring a `__m128i*` tells otherwise, and requires the compiler the enforce this alignment.
seems like a prototype interface error:
input parameter should have been `const void*`,
since the documentation is explicit that input doesn't have to be aligned,
but `const __m256i*` makes the compiler enforce it.
The `-Wl,-z,noexecstack` and `-Wa,--noexecstack` flags are already set for CMake, but not for Meson.
This brings the flags to the Meson build as well. Note that this maintains the discrepancy in behavior
between CMake and Meson when it comes to enabling ASM: on CMake, the ZSTD_HAS_NOEXECSTACK variable
is set and these flags added for GCC/Clang and MinGW. Then later, the ZSTD_HAS_NOEXECSTACK variable
is checked (along with some other conditions) to enable or disable ASM. However on Meson, this logic
is restricted to simply checking for GCC/Clang. This patch maintains this behavior; noexecstack is
dependent on GCC/Clang only.
Make the function ZSTD_compressSequencesAndLiterals() available in kernel
space. This will be used by Intel QAT driver.
Additionally, (1) expose the function ZSTD_CCtx_setParameter(), which is
required to set parameters before calling ZSTD_compressSequencesAndLiterals(),
(2) update the build process to include `compress/zstd_preSplit.o` and
(3) replace `asm/unaligned.h` with `linux/unaligned.h`.
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
technically equivalent to `size_t`,
but it's the proper type for underlying register representation.
This makes it possible to control register type, and therefore size, independently from `size_t`,
which can be useful on systems where `size_t` is 32-bit, while the architecture supports 64-bit registers.
this was previously no triggered in x86 32-bit mode,
due to a limitation in `bitstream.h`, that was fixed in #4248.
Now, `bmi2` will be automatically detected and triggered
at compilation time, if the corresponding instruction set is enabled,
even in 32-bit mode.
Also: updated library documentation, to feature STATIC_BMI2 build variable
so that a human reading the test log can determine everything was fine without consulting the shell error code.
Also: made `make check` slightly shorter by moving one longer test to `make test`
do not solve the equation, even though some members cancel each other,
this is done for clarity,
we'll let the compiler do the resolution at compile time.
When two threads are using a WorkQueue and the reader thread exits due
to an error, it must call WorkQueue::finish() to wake up the writer
thread. Otherwise, if the queue is full and the writer thread is waiting
for a free slot, it could hang forever.
This can happen in pratice when decompressing a large, corrupted file
that does not contain pzstd skippable frames.
Move towards a stronger guarantee of reproducibility by removing this small difference for machines without SSE2/Neon.
The SIMD behavior is now the default for all platforms.
The optimal parser with LDM enabled using minMatch > 3 could generate a match
length of 3 when minMatch >= 4. This is not allowed.
1. Fix the bug
2. Add validation logic to `ZSTD_buildSeqStore()` in debug mode for all block
compressors that checks we never generate too short a match. This way we don't
rely on the `generate_sequences` fuzzer to find this issue.
Credit to OSS-Fuzz
Some older Android libc implementations don't support `fseeko` or `ftello`.
This commit adds a new compile-time macro `LIBC_NO_FSEEKO` as well as a usage in CMake for old Android APIs.
Summary:
Issue reported by @ryandesign and @MarcusCalhoun-Lopez.
CMake doesn't support spaces in flags. This caused older versions of gcc to
ignore the unknown flag "-z noexecstack" on MacOS since it was interpreted as a
single flag, not two separate flags. Then, during compilation it was treated as
"-z" "noexecstack", which was correctly forwarded to the linker. But the MacOS
linker does not support `-z noexecstack` so compilation failed.
The fix is to use `-Wl,-z,noexecstack`. This is never misinterpreted by a
compiler. However, not all compilers support this syntax to forward flags to the
linker. To fix this issue, we check if all the relevant `noexecstack` flags have
been successfully set, and if they haven't we disable assembly.
See also PR#4056 and PR#4061. I decided to go a different route because this is
simpler. It might not successfully set these flags on some compilers, but in that
case it also disables assembly, so they aren't required.
Test Plan:
```
mkdir build-cmake
cmake ../build/cmake/CMakeLists.txt
make -j
```
See that the linker flag is successfully detected & that assembly is enabled.
Run the same commands on MacOS which doesn't support `-Wl,-z,noexecstack` and see
that everything compiles and that `LD_FLAG_WL_Z_NOEXECSTACK` and
`ZSTD_HAS_NOEXECSTACK` are both false.
which creates more "realistic" scenarios than former compressible noise.
The legacy data generator remains accessible,
it is triggered when requesting an explicit compressibility factor (-P#).
since it's a type name.
Note: in contrast with previous names, this one is on the Public API side.
So there is a #define, so that existing programs using ZSTD_sequenceFormat_e still work.
Do some include shuffling for `**.h` files within lib, programs, tests, and zlibWrapper.
`lib/legacy` and `lib/deprecated` are untouched.
`#include`s within `extern "C"` blocks in .cpp files are untouched.
todo: shuffling for `xxhash.h`
* Change CLI to employ multithreading by default
* Document changes to benchmarking, print number of threads for display level >= 4, and add lower bound of 1 for the default number of threads
CMake warns on the current minimum requirement (3.5). Update to 3.10.
This means support is still available for the default on Ubuntu 18.04, which
exited LTS standard in April of 2023.
[draft]
following notification from @rainbowball.
fix#4186.
Note: there is currently no QNX compilation test in CI
so this is a "blind" fix,
and this target can be silently broken again in the future.
instead of just for blind split.
This is in anticipation of adversarial input,
that would intentionally target the sampling pattern of the split detector.
Note that, even without this protection, splitting can never expand beyond ZSTD_COMPRESSBOUND(),
because this upper limit uses a 1KB block size worst case scenario,
and splitting never creates blocks thath small.
The protection is more to ensure that data is not expanded by more than 3-bytes per 128 KB full block,
which is a much stricter limit.
this helps make the streaming behavior more consistent,
since it does no longer depend on having more data presented on the input.
suggested by @terrelln
so that it can be stored using standard alignment requirement (sizeof(void*)).
Distance function still requires 64-bit signed multiplication though,
so it won't change the issue regarding the bug in ubsan for clang 32-bit on github ci.
that samples 1 in 5 positions.
This variant is fast enough for lazy2 and btlazy2,
but it's less good in combination with post-splitter at higher levels (>= btopt).
update the man page in troff format,
and the README with latest `--help` content and complementary details about benchmark mode.
also: display level 0 when doing decompression benchmark
After a regrettable update,
the benchmark module ended up reloading sources for every compression level.
While the delay itself is likely torelable,
the main issue is that the `--quiet` mode now also displays a loading summary between each compression line.
This wasn't the original intention, which is to produce a compact view of all compressions.
This is fixed in this version,
where sources are loaded only once, for all compression levels,
and loading summary is only displayed once.
The compression ratio benefits are small but consistent, i.e. always positive.
On `silesia.tar` corpus, this modification saves ~75 KB at level 3.
The measured speed cost is negligible, i.e. below noise level, between 0 and -1%.
The -Werror=missing-profile caused thread/zlib/lzma/lz4 detection failure
when build with profile-use, thus caused ZSTD_MULTITHREAD etc. is not
defined for profile-use, then there will be many profile mismatch information
in output and the final binary reports run error sometimes as below:
Error : ZSTD_CCtx_setParameter(ctx, ZSTD_c_nbWorkers, adv->nbWorkers) failed : Unsupported parameter
Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
Sanity checks on a few of the context parameters (i.e. workers and block size)
may prompt an early return on ZSTD_generateSequences.
Allocating the destination buffer past those return points avoids a potential
memory leak.
This patch should fix issue #4112.
The NDK cross compiler declares the target as __linux (which is
not technically incorrect), which triggers the enablement of _GNU_SOURCE
in the newly added code that requires the presence of qsort_r() used
in the COVER dictionary code.
Even though the NDK uses llvm/libc, it doesn't declare qsort_r()
in the stdlib.h header.
The build fix is to only activate the _GNU_SOURCE macro if the OS is
*not* Android, as then we will fallback to the C90 compliant code.
This patch should solve the reported issue number #4103.
- add option to force a specific block type
- add option to force a specific literal tyle
- add option to generate only frame headers
- add option to skip generating magic numbers
Co-authored-by: Maciej Dudek <mdudek@antmicro.com>
Co-authored-by: Pawel Czarnecki <pczarnecki@antmicro.com>
Co-authored-by: Robert Winkler <rwinkler@antmicro.com>
Co-authored-by: Roman Dobrodii <rdobrodii@antmicro.com>
Commit f4dbfce79cb2b82fb496fcd2518ecd3315051b7d ("define LIB_SRCDIR
and LIB_BINDIR") significantly reworked the build logic, but in its
introduction of LIB_BINDIR a typo was made.
It was introduced as such:
+LIB_SRCDIR ?= $(dir $(realpath $(lastword $(MAKEFILE_LIST))))
+LIB_BINDIR ?= $(LIBSRC_DIR)
But the definition of LIB_BINDIR has a typo: it should use
$(LIB_SRCDIR) not $(LIBSRC_DIR).
Due to this, $(LIB_BINDIR) is empty, therefore in programs/Makefile,
-L$(LIB_BINDIR) is expanded to just -L, and consequently when trying
to link the "zstd" binary with the libzstd library, it cannot find it:
host/lib/gcc/powerpc64-buildroot-linux-gnu/13.3.0/../../../../powerpc64-buildroot-linux-gnu/bin/ld: cannot find -lzstd: No such file or directory
This commit fixes the build by fixing this typo.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Using '--single-thread' with '--patch-from' on compression levels above 15 will lead to significantly worse compression ratios.
Corrected the man page not suggest anymore to do this.
The two main functions used for dictionary training using the COVER
algorithm require initialization of a COVER_ctx_t where a call
to qsort() is performed.
The issue is that the standard C99 qsort() function doesn't offer
a way to pass an extra parameter for the comparison function callback
(e.g. a pointer to a context) and currently zstd relies on a *global*
static variable to hold a pointer to a context needed to perform
the sort operation.
If a zstd library user invokes either ZDICT_trainFromBuffer_cover or
ZDICT_optimizeTrainFromBuffer_cover from multiple threads, the
global context may be overwritten before/during the call/execution to qsort()
in the initialization of the COVER_ctx_t, thus yielding to crashes
and other bad things (Tm) as reported on issue #4045.
Enters qsort_r(): it was designed to address precisely this situation,
to quote from the documention [1]: "the comparison function does not need to
use global variables to pass through arbitrary arguments, and is therefore
reentrant and safe to use in threads."
It is available with small variations for multiple OSes (GNU, BSD[2],
Windows[3]), and the ISO C11 [4] standard features on annex B-21 qsort_s() as
part of the <stdlib.h>. Let's hope that compilers eventually catch up
with it.
For now, we have to handle the small variations in function parameters
for each platform.
The current fix solves the problem by allowing each executing thread
pass its own COVER_ctx_t instance to qsort_r(), removing the use of
a global pointer and allowing the code to be reentrant.
Unfortunately for *BSD, we cannot leverage qsort_r() given that its API
has changed on newer versions of FreeBSD (14.0) and the other BSD variants
(e.g. NetBSD, OpenBSD) don't implement it.
For such cases we provide a fallback that will work only requiring support
for compilers implementing support for C90.
[1] https://man7.org/linux/man-pages/man3/qsort_r.3.html
[2] https://man.freebsd.org/cgi/man.cgi?query=qsort_r
[3] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/qsort-s?view=msvc-170
[4] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
- switched the patter and input of $filter into the right places
- added pattern wildcard to MSYS_NT & CYGWIN_NT as they change with windows versions
- correctly identify MSYS2, even in an env like MINGW64
As reported by Ben Hawkes in #4026, a failure to allocate a zstd context
would lead to a dereference of a NULL pointer due to a missing check
on the returned result of ZSTDv06_createDCtx().
This patch fix the issue by adding a check for valid returned pointer.
Meson always returns -pthread in dependency('threads') on non-MSVC
compilers. On Windows we use Windows threading primitives, so we don't
need this. Avoid adding -pthread to libzstd's link flags, either as a
Meson subproject or via pkg-config Libs.private, so the application
doesn't inadvertently depend on winpthreads.
Add a Meson MinGW cross-compile CI test that checks for this. It turns
out that pzstd fails to build in that environment, so have the test
skip building contrib for now.
Multi-threaded static library require -pthread to correctly link and works.
The pkg-config we provide tho only works with dynamic multi-threaded library
and won't provide the correct libs and cflags values if lib-mt is used.
To handle this, introduce an env variable MT to permit advanced user to
install and generate a correct pkg-config file for lib-mt or detect if
lib-mt target is called.
With MT env set on calling make install-pc, libzstd.pc.in is a
pkg-config file for a multi-threaded static library.
On calling make lib-mt, a libzstd.pc is generated for a multi-threaded
static library as it's what asked by the user by forcing it.
libzstd.pc is changed to PHONY to force regeneration of it on calling
lib targets or install-pc to handle case where the same directory is
used for mixed compilation.
This was notice while migrating from meson to make build system where
meson generates a correct .pc file while make doesn't.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Just after a clone I'm getting this:
~/zstd/zlibWrapper$ cc -c zstd_zlibwrapper.o gz*.c -lz -lzstd -DSTDC
gzwrite.c: In function ‘gz_write’:
gzwrite.c:226:43: error: ‘z_uInt’ undeclared (first use in this
function); did you mean ‘uInt’?
226 | state.state->strm.avail_in = (z_uInt)n;
| ^~~~~~
| uInt
gzwrite.c:226:43: note: each undeclared identifier is reported only
once for each function it appears in
gzwrite.c:226:50: error: expected ‘;’ before ‘n’
226 | state.state->strm.avail_in = (z_uInt)n;
| ^
| ;
z_uInt is never used directly, zconf.h redefines uInt to z_uInt under
the condition that Z_PREFIX is set. All examples use uInt, and the type
of avail_in is also uInt.
In this commit I modify the cast to refer to the same type as the type
of lvalue.
Arguably, the real fix here is to handle possible overflows, but that's
beyond the scope of this commit.
This was causing OSS-Fuzz errors, due to compiler differences.
* Fix the issue
* Also turn off -Werror so we don't fail fuzzer builds for warnings
* Turn on -Werror in our CI
This function was seriously flawed:
* It didn't do output bounds checks
* It produced invalid sequences when an uncompressed or RLE block was emitted
* It produced invalid sequences when the block splitter was enabled
* It produced invalid sequences when ZSTD_c_targetCBlockSize was enabled
I've attempted to fix these issues, but this function is just a bad idea,
so I've marked it as deprecated and unsafe. We should replace it with
`ZSTD_extractSequences()` which operates on a compressed frame.
Fails on errors when building fuzzers with `fuzz.py` (adds `Werror`).
Currently allows `declaration-after-statement`, `c++-compat` and
`deprecated` as they are abundant in code (some fixes to
`declaration-after-statement` are presented in this commit).
Fixes 2 issue in `simple_decompress.c`:
1. Wrong type used for storing the results of `ZSTD_findDecompressedSize` resulting in never matching to `ZSTD_CONTENTSIZE_ERROR` or `ZSTD_CONTENTSIZE_UNKNOWN`.
2. Experimental API is used (`ZSTD_findDecompressedSize`) without defining `ZSTD_STATIC_LINKING_ONLY`.
Document that the `ZSTD_BUILD_{SHARED,STATIC}` take precedence over `BUILD_SHARED_LIBS` when exactly one is ON.
Thanks to @teo-tsirpanis for pointing out the potentially confusing behavior.
Doing this check with a direct c++ snippet is prone to portability problems:
- \043 is not portable between shells: dash expands it to #,
bash does not;
- using # directly works with make 4.3 but does not with make 4.2.
Let's just use the c++ version that covers both the code and the gtest.
* Make a variable `PublicHeaders` for Zstd's public headers
* Add `PublicHeaders` to `Headers`, which was missing
* Only export `${LIBRARY_DIR}` publicly, not `common/`
* Switch the `target_include_directories()` to `INTERFACE` because zstd uses relative includes internally, so doesn't need any include directories to build
* Switch installation to use the `PublicHeaders` variable, and test that the right headers are installed
If both `ZSTD_BUILD_SHARED` and `ZSTD_BUILD_STATIC` are set, then cmake exports the libraries `libzstd_shared` and `libzstd_static` only.
It does not export `libzstd`, which is only exported when exactly one of `ZSTD_BUILD_SHARED` and `ZSTD_BUILD_STATIC` is set.
This PR exports `libzstd` in that case, based on the value of the standard CMake variable [`BUILD_SHARED_LIBS`](https://cmake.org/cmake/help/latest/variable/BUILD_SHARED_LIBS.html).
This ensures that `libzstd` can always be used to refer to the exported zstd library, since the build errors if neither `ZSTD_BUILD_SHARED` nor `ZSTD_BUILD_STATIC` are set.
I tested all the possible combinations of `ZSTD_BUILD_SHARED`, `ZSTD_BUILD_STATIC`, and `BUILD_SHARED_LIBS` and they always worked as expected:
* If only exactly one of `ZSTD_BUILD_SHARED` and `ZSTD_BUILD_STATIC` is set, that is used as `libzstd`.
* Otherwise, libzstd is set based on `BUILD_SHARED_LIBS`.
Fixes#3859.
This feature has demonstrated itself to be useful in web compression and we
want to encourage other folks to use it. But we currently make it difficult
to do so since it's locked away in the experimental API.
The API itself is really straightforward and I think it's fine to commit to
maintaining support / compatibility for this API even if in the future the
underlying implementation may continue to evolve.
Note that this commit changes its enum name and also its numeric value. Users
who respected the instructions of using the experimental API should be fine
with both of these changes since they should only have referred to it by the.
Conceivably someone could have done bad feature detection of this capability
by doing `#ifdef ZSTD_c_targetCBlockSize` which will now return false since
it's no longer a macro... but I think that's an acceptable hypothetical
breakage.
Mark that `huf_decompress_amd64.S` supports BTI and PAC, which it trivially does because it is empty for aarch64.
The issue only requested BTI markings, but it also makes sense to mark PAC, which is the only other feature.
Also run add a test for this mode to the ARM64 QEMU test. Before this PR it warns on `huf_decompress_amd64.S`, after it doesn't.
Fixes Issue #3841.
This doesn't affect most of the targets, but will help me sleep better at night knowing that future refactors won't break the legacy support.
Should have been included in https://github.com/facebook/zstd/pull/3943 but I noticed after that merged, so putting up a separate PR.
Fixes a bug in AsyncIO where we queue reads after opening a file so our queue will always be saturated (or as saturated as possible).
Previous code was looping up to `availableJobsCount` not realizing `availableJobsCount` was also decreasing in each iteration, so instead of queueing 10 jobs we'd queue 5 (and instead of 2 we'd queue 1).
This PR fixes the loop to queue as long as `availableJobsCount` is not 0.
only disable `--rm` at end of command line parsing,
so that `-c` only disables `--rm` if it's effectively selected,
and not if it's overriden by a later `-o FILE` command.
is correctly detected as corrupted by new version,
and is accepted (changed into offset==1) by older version.
updated documentation accordingly, with an hexadecimal representation.
in this new method, when an `offset==0` is detected,
it's converted into (size_t)(-1), instead of 1.
The logic is that (size_t)(-1) is effectively an extremely large positive number,
which will not pass the offset distance test at next stage (`execSequence()`).
Checked the source code, and offset is always checked (as it should),
using a formula which is not vulnerable to arithmetic overflow:
```
RETURN_ERROR_IF(sequence.offset > (size_t)(oLitEnd - virtualStart),
```
The benefit is that such a case (offset==0) is always detected as corrupted data
as opposed to relying on the checksum to detect the error.
LLU is a correct prefix according to C99 & C11 standards (but not C90).
However, older versions of Visual Studio do not work with it.
Replace by ULL, which doesn't have this issue.
Fixes https://github.com/facebook/zstd/issues/3647
and only defines a sub-block boundary when
it believes that it is compressible.
It's effectively an optimization,
avoiding a compression cycle to reach the same conclusion.
note that the size of individual compressed blocks will vary more wildly with this modification.
But it seems good enough for a first test, and fix the speed regression issue.
Further refinements can be attempted later.
using real latin sentences from Cicero.
Compression ratio lower again, closer to "real" text,
now level 6 is way better than level 4.
level 5 is still lower than level 4,
but at least it's now higher than level 3.
makes compression a bit less good,
hence a bit more comparable with real text (though still too easy to compress).
level 6 is now stronger than level 4, by a hair.
However, there is still a ratio dip at level 5.
While it's not always strictly a win,
it's a win for files that see a noticeably compression ratio increase,
while it's a very small noise for other files.
Downside is, this patch is less efficient for 32-bit arrays of integer
than the previous patch which was introducing losses for other files,
but it's still a net improvement on this scenario.
helps when litlen==1 is cheaper than litlen==0
works great on pathological arr[u32] examples
but doesn't generalize well on other files.
silesia/x-ray is amoung the most negatively affected ones.
this works great for 32-bit arrays,
notably the synthetic ones, with extreme regularity,
unfortunately, it's not universal,
and in some cases, it's a loss.
Crucially, on average, it's a loss on silesia.
The most negatively impacted file is x-ray.
It deserves an investigation before suggesting it as an evolution.
note: we probably don't want to maintain VS2008 solution anymore.
Its successor VS2010 is > 10 years old,
which is more or less the limit after which we can stop supporting old compilers.
this generator replaces the statistical generator
for the general case when no statistic is requested.
Generated data features a compression level speed / ratio curve
which is more in line with expectation.
replaced `\+` by `*`.
`\+` means `[1-N]`,
while `*` means `[0-N]`,
so it's not strictly equivalent
but `\+` happens to be badly supported on some flavors of grep,
and for the purpose of these tests, `*` is good enough.
This avoids downloading -- and periodically bumping the checksum for --
a third-party action that isn't strictly required, and thus helps keep
down dependencies and reduce update churn.
* Add ZSTD_CCtxParams_registerSequenceProducer() to public API
* add unit test
* add docs to zstd.h
* nits
* Add ZSTDLIB_STATIC_API prefix
* Add asserts
It cannot be an alias, because it would lock the package to use either static or shared libraries at its build time. We want to decide this at the time `find_package` is called.
If the relevant allocation returns NULL, ZSTD_createCDict_advanced_internal()
will return NULL. But ZSTD_createCDict_advanced2() doesn't check for
this and attempts to use the returned pointer anyway, which leads to
a segfault.
The patch adds a validation to ensure that the last field, which is
reserved, must be all-zeroes in ZSTD_decodeSeqHeaders. This prevents
potential corruption from going undetected.
Fixes an issue where corrupted input could lead to undefined behavior
due to improper validation of reserved bits.
Signed-off-by: aimuz <mr.imuz@gmail.com>
This PR introduces no functional changes. It attempts to change all
macros currently using `{ }` or some variant of that to to
`do { } while (0)`, and introduces trailing `;` where necessary.
There were no bugs found during this migration.
The bug in Visual Studios warning on this has been fixed since VS2015.
Additionally, we have several instances of `do { } while (0)` which have
been present for several releases, so we don't have to worry about
breaking peoples builds.
Fixes Issue #3830.
`HUF_DecompressFastArgs_init()` was adding 0 to NULL. Fix it by exiting
early for empty outputs. This is no change in behavior, because the
function was already exiting 0 in this case, just slightly later.
* Rename `ilimit` to `ilowest` and set it equal to `src` instead of
`src + 6 + 8`. This is safe because the fast decoding loops guarantee
to never read below `ilowest` already. This allows the fast decoder to
run for at least two more iterations, because it consumes at most 7
bytes per iteration.
* Continue the fast loop all the way until the number of safe iterations
is 0. Initially, I thought that when it got towards the end, the
computation of how many iterations of safe might become expensive. But
it ends up being slower to have to decode each of the 4 streams
individually, which makes sense.
This drastically speeds up the Huffman decoder on the `github` dataset
for the issue raised in #3762, measured with `zstd -b1e1r github/`.
| Decoder | Speed before | Speed after |
|----------|--------------|-------------|
| Fallback | 477 MB/s | 477 MB/s |
| Fast C | 384 MB/s | 492 MB/s |
| Assembly | 385 MB/s | 501 MB/s |
We can also look at the speed delta for different block sizes of silesia
using `zstd -b1e1r silesia.tar -B#`.
| Decoder | -B1K ∆ | -B2K ∆ | -B4K ∆ | -B8K ∆ | -B16K ∆ | -B32K ∆ | -B64K ∆ | -B128K ∆ |
|----------|--------|--------|--------|--------|---------|---------|---------|----------|
| Fast C | +11.2% | +8.2% | +6.1% | +4.4% | +2.7% | +1.5% | +0.6% | +0.2% |
| Assembly | +12.5% | +9.0% | +6.2% | +3.6% | +1.5% | +0.7% | +0.2% | +0.03% |
gcc in the linux kernel was not unrolling the inner loops of the Huffman
decoder, which was destroying decoding performance. The compiler was
generating crazy code with all sorts of branches. I suspect because of
Spectre mitigations, but I'm not certain. Once the loops were manually
unrolled, performance was restored.
Additionally, when gcc couldn't prove that the variable left shift in
the 4X2 decode loop wasn't greater than 63, it inserted checks to verify
it. To fix this, mask `entry.nbBits & 0x3F`, which allows gcc to eliete
this check. This is a no op, because `entry.nbBits` is guaranteed to be
less than 64.
Lastly, introduce the `HUF_DISABLE_FAST_DECODE` macro to disable the
fast C loops for Issue #3762. So if even after this change, there is a
performance regression, users can opt-out at compile time.
ZSTD_resetDStream() is deprecated and replaced by ZSTD_DCtx_reset().
This removes deprecation warnings from the kernel build.
This change is a no-op, see the docs suggesting this replacement.
fcbf2fde9a/lib/zstd.h (L2655-L2663)
More recent versions of CMake emit the following warning:
CMake Deprecation Warning at cmake/CMakeLists.txt:10 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
the flexArray in structure FSE_DecompressWksp
is just a way to derive a pointer easily,
without risk/complexity of calculating it manually.
Not sure if this change is good enough to avoid ubsan warnings though.
within ZSTDMT_.
This pattern is flagged by less forgiving variants of ubsan
notably used during compilation of the Linux Kernel.
There are 2 other places in the code where this pattern is used.
This fixes just one of them.
* Remove all pointer-overflow suppressions from our UBSAN builds/tests.
* Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress
pointer-overflow at a per-function level. This is a superior approach
because it also applies to users who build zstd with UBSAN.
* Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions.
The end goal is to only tag these functions with
`ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions
that rely on pointer overflow, and gradually transition to using
these.
* Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the
pointer may be `NULL`.
* Fix all the fuzzer issues that came up. I'm sure there will be a lot
more, but these are the ones that came up within a few minutes of
running the fuzzers, and while running GitHub CI.
To the best of my knowledge:
* `_WIN32` and `_WIN64` are defined by the compiler,
* `WIN32` and `WIN64` are defined by the user, to indicate whatever
the user chooses them to indicate. They mean 32-bit and 64-bit Windows
compilation by convention only.
See:
https://accu.org/journals/overload/24/132/wilson_2223/
Windows compilers in general, and MSVC in particular, have been defining
`_WIN32` and `_WIN64` for a long time, provably at least since Visual Studio
2015, and in practice as early as in the days of 16-bit Windows.
See:
https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=msvc-140https://learn.microsoft.com/en-us/windows/win32/winprog64/the-tools
Tests used to be inconsistent, sometimes testing `_WIN32`, sometimes
`_WIN32` and `WIN32`. This brings consistency to Windows detection.
Refine the macro guards to define the functions exactly when they are
needed.
This fixes the chromium build with zstd.
Thanks to @GregTho for reporting!
The Huffman repeat mode checker assumed that the CTable was zeroed in the region `[maxSymbolValue + 1, 256)`.
This assumption didn't hold for tables built in the dictionaries, because it didn't go through the same codepath.
Since this code was originally written, we added a header to the CTable that specifies the `tableLog`.
Add `maxSymbolValue` to that header, and check that the table's `maxSymbolValue` is at least the block's `maxSymbolValue`.
This solution is cleaner because we write this header for every CTable we build, so it can't be missed in any code path.
Credit to OSS-Fuzz
Fix the following warnings reported by the compiler when
ZDICTLIB_STATIC_API is not defined to ZDICTLIB_API:
lib/dictBuilder/cover.c:1122:21: warning: redeclaration of 'ZDICT_optimizeTrainFromBuffer_cover' with different visibility (old visibility
preserved)
lib/dictBuilder/cover.c:736:21: warning: redeclaration of 'ZDICT_trainFromBuffer_cover' with different visibility (old visibility
+preserved)
lib/dictBuilder/fastcover.c:549:1: warning: redeclaration of 'ZDICT_trainFromBuffer_fastCover' with different visibility (old visibility
preserved)
lib/dictBuilder/fastcover.c:618:1: warning: redeclaration of 'ZDICT_optimizeTrainFromBuffer_fastCover' with different visibility (old
visibility preserved)
We already have logic in our Huffman encoder to validate Huffman tables with missing symbols.
We use this for higher compression levels to re-use the previous blocks statistics, or when the dictionaries table has zero-weighted symbols.
This check was leftover as an oversight from before we added validation for Huffman tables.
I validated that the `dictionary_loader` fuzzer has coverage of every line in the `ZSTD_loadCEntropy()` function to validate that it is correctly testing this function.
MSAN is hooked into the system malloc, but when the user provides a custom
allocator, it may not provide the same cleansing behavior. So if we leave
memory poisoned and return it to the user's allocator, where it is re-used
elsewhere, our poisoning can blow up in some other context.
They are Linux-like environments under Windows and have all the tools needed to support staged installation and testing.
Beware: this only affects the make build system.
For some reasons when LTO is enabled, the compiler complains about statbuf variable not being correctly initialized, even though the variable has an assert != NULL just few lines below (FIO_getDictFileStat)
This is the fixed build failure:
x86_64-linux-gnu-gcc -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<<PKGBUILDDIR>>=/usr/src/libzstd-1.5.5+dfsg2-1 -Wall -Wextra -Wcast-qual -Wcast-align -Wshadow -Wstrict-aliasing=1 -Wswitch-enum -Wdeclaration-after-statement -Wstrict-prototypes -Wundef -Wpointer-arith -Wvla -Wformat=2 -Winit-self -Wfloat-equal -Wwrite-strings -Wredundant-decls -Wmissing-prototypes -Wc++-compat -g -Werror -Wa,--noexecstack -Wdate-time -D_FORTIFY_SOURCE=2 -DXXH_NAMESPACE=ZSTD_ -DDEBUGLEVEL=1 -DZSTD_LEGACY_SUPPORT=5 -DZSTD_MULTITHREAD -DZSTD_GZCOMPRESS -DZSTD_GZDECOMPRESS -DZSTD_LZMACOMPRESS -DZSTD_LZMADECOMPRESS -DZSTD_LZ4COMPRESS -DZSTD_LZ4DECOMPRESS -DZSTD_LEGACY_SUPPORT=5 -c -MT obj/conf_086c46a51a716b674719b8acb8484eb8/zstdcli_trace.o -MMD -MP -MF obj/conf_086c46a51a716b674719b8acb8484eb8/zstdcli_trace.d -o obj/conf_086c46a51a716b674719b8acb8484eb8/zstdcli_trace.o zstdcli_trace.c
In function ‘UTIL_isRegularFileStat’,
inlined from ‘UTIL_getFileSizeStat’ at util.c:524:10,
inlined from ‘FIO_createDResources’ at fileio.c:2230:30:
util.c:209:12: error: ‘statbuf.st_mode’ may be used uninitialized [-Werror=maybe-uninitialized]
209 | return S_ISREG(statbuf->st_mode) != 0;
| ^
fileio.c: In function ‘FIO_createDResources’:
fileio.c:2223:12: note: ‘statbuf’ declared here
2223 | stat_t statbuf;
| ^
lto1: all warnings being treated as errors
The sequence section starts with a number, which tells how sequences are present in the section.
If this number if 0, the section automatically ends.
The number 0 can be represented using the 1 byte or the 2 bytes formats.
That's because the 2-bytes formats fully overlaps the 1 byte format.
However, when 0 is represented using the 2-bytes format,
the decoder was expecting the sequence section to continue,
and was looking for FSE tables, which is incorrect.
Fixed this behavior, in both the reference decoder and the educational behavior.
In practice, this behavior never happens,
because the encoder will always select the 1-byte format to represent 0,
since this is more efficient.
Completed the fix with a new golden sample for tests,
a clarification of the specification,
and a decoder errata paragraph.
and in `decodecorpus`:
the specific case `nbSeq=127` can be represented using the 1-byte format.
Note that both the 1-byte and the 2-bytes formats are valid to represent this case,
so there was no "error", produced data remains valid,
it's just that the 1-byte format is more efficient.
fix#3667
Credit to @ip7z for finding this issue.
When forcing the source file language to `C`, Xcode enforces
the file to be compiled as `C` by appending `-x c` to the
compiler command line.
For now try to limit the damage and only enforce the language
if the ASM and C compilers differ.
Reproducer (CMake `3.26.4`, Xcode `14.3`):
```
cmake -S build/cmake -B _b -GXcode -DCMAKE_OSX_ARCHITECTURES=x86_64
cmake --build _b
```
Fix: #3622
Reduces memory when blocks are guaranteed to be smaller than allowed by
the format. This is useful for streaming compression in conjunction with
ZSTD_c_maxBlockSize.
This PR saves 2 * (formatMaxBlockSize - paramMaxBlockSize) when streaming.
Once it is rebased on top of PR #3616 it will save
3 * (formatMaxBlockSize - paramMaxBlockSize).
The split literals buffer patch increased streaming decompression memory
by 64KB (shrunk lit buffer from 128KB to 64KB, and added 128KB). This
patch removes the added 128KB buffer, because it isn't necessary.
The buffer was there because the literals compression code didn't know
the true `blockSizeMax` of the frame, and always put split literals so
they ended 128KB - 32 from the beginning of the block. Instead, we can
pass down the true `blockSizeMax` and ensure that the split literals
end up at `blockSizeMax - 32` from the beginning of the block. We
already reserve a full `blockSizeMax` bytes in streaming mode, so we
won't be overwriting the extDict window.
for compressed blocks of size exactly 128 KB
which used to be disallowed by the spec
but have become allowed in more recent version of the spec.
While this limitation is fixed in decoders v1.5.4+,
implementers should refrain from generating such block with their custom encoder
as they could be misclassified as corrupted by older decoder versions.
When `ZSTD_c_maxBlockSize` is set, we weren't computing the
decompression margin correctly, leading to `dstSize_tooSmall` errors.
Fix that computation.
This is just a bug in the fuzzer, not a bug in the library itself.
Credit to OSS-Fuzz
detected by @terrelln,
these issue could be triggered in specific scenarios
namely decompression of certain invalid magic-less frames,
or requested properties from certain invalid skippable frames.
* add check for valid dest buffer and fuzz on random dest ptr when malloc 0
* add uptrval to linux-kernel
* remove bin files
* get rid of uptrval
* restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)
decompression features automatic support of sparse files,
aka a form of "compression" where entire blocks consists only of zeroes.
This only works for some compatible file systems (like ext4),
others simply ignore it (like afs).
Triggering this feature relies of `fseek()`.
But `fseek()` is not compatible with non-seekable devices, such as pipes.
Therefore it's disabled for pipes.
However, there are other objects which are not compatible with `fseek()`, such as block devices.
Changed the logic, so that `fseek()` (and therefore sparse write) is only automatically enabled on regular files.
Note that this automatic behavior can always be overridden by explicit commands `--sparse` and `--no-sparse`.
fix#3583
This does the following:
1. Compress test data into multiple frames
2. Perform a series of small decompressions and seeks forward, checking
that compressed data wasn't reread unnecessarily.
3. Perform some seeks forward and backward to ensure correctness.
When decompressing a seekable file, if seeking forward within
a frame (by issuing multiple ZSTD_seekable_decompress calls
with a small gap between them), the frame will be unnecessarily
reread from the beginning. This patch makes it continue using
the current frame data and simply skip over the unneeded bytes.
Looking at the __builtin_expect in ZSTD_decodeSequence:
{ size_t offset;
#if defined(__clang__)
if (LIKELY(ofBits > 1)) {
#else
if (ofBits > 1) {
#endif
ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);
From profile-annotated assembly, the probability of ofBits > 1 is about 75%
(101k counts out of 135k counts). This is much smaller than the recommended
likelihood to use __builtin_expect which is 99%. As a result, clang moved the
else block further away which hurts cache locality. Removing this
__built_expect along with two others in ZSTD_decodeSequence gave better
performance when PGO is enabled. I suggest to remove these branch hints and
rely on PGO which leverages runtime profiles from actual workload to calculate
branch probability instead.
Inlining `BIT_reloadDStream` provided >3% decompression speed improvement for
clang PGO-optimized zstd binary, measured using the Silesia corpus with
compression level 1. The win comes from improved register allocation which leads
to fewer spills and reloads. Take a look at this comparison of
profile-annotated hot assembly before and after this change:
https://www.diffchecker.com/UjDGIyLz/. The diff is a bit messy, but notice three
fewer moves after inlining.
In general LLVM's register allocator works better when it can see more code. For
example, when the register allocator sees a call instruction, it partitions the
registers into caller registers and callee registers, and it is not free to do
whatever it wants with all the registers for the current function. Inlining the
callee lets the register allocation access all registers and use them more
flexsibly.
Rather than remove the flag entirely, as proposed in #3499, this commit uses
the newest C++ standard the compiler supports. This retains the selection of
using only standardized features (excluding GNU extensions) and keeps the
recency requirements of the codebase explicit.
Tested with various versions of `g++` and `clang++`.
This fixes compilation with clang-cl on Windows. There
is a bug in cmake so that check_linker_flag() doesn't give
the correct result when using link.exe/lld-link.exe.
Details in CMake's gitlab: https://gitlab.kitware.com/cmake/cmake/-/issues/22023Fixes#3522
Every 256 bytes the lazy match finders process without finding a match,
they will increase their step size by 1. So for bytes [0, 256) they search
every position, for bytes [256, 512) they search every other position,
and so on. However, they currently still insert every position into
their hash tables. This is different from fast & dfast, which only
insert the positions they search.
This PR changes that, so now after we've searched 2KB without finding
any matches, at which point we'll only be searching one in 9 positions,
we'll stop inserting every position, and only insert the positions we
search. The exact cutoff of 2KB isn't terribly important, I've just
selected a cutoff that is reasonably large, to minimize the impact on
"normal" data.
This PR only adds skipping to greedy, lazy, and lazy2, but does not
touch btlazy2.
| Dataset | Level | Compiler | CSize ∆ | Speed ∆ |
|---------|-------|--------------|---------|---------|
| Random | 5 | clang-14.0.6 | 0.0% | +704% |
| Random | 5 | gcc-12.2.0 | 0.0% | +670% |
| Random | 7 | clang-14.0.6 | 0.0% | +679% |
| Random | 7 | gcc-12.2.0 | 0.0% | +657% |
| Random | 12 | clang-14.0.6 | 0.0% | +1355% |
| Random | 12 | gcc-12.2.0 | 0.0% | +1331% |
| Silesia | 5 | clang-14.0.6 | +0.002% | +0.35% |
| Silesia | 5 | gcc-12.2.0 | +0.002% | +2.45% |
| Silesia | 7 | clang-14.0.6 | +0.001% | -1.40% |
| Silesia | 7 | gcc-12.2.0 | +0.007% | +0.13% |
| Silesia | 12 | clang-14.0.6 | +0.011% | +22.70% |
| Silesia | 12 | gcc-12.2.0 | +0.011% | -6.68% |
| Enwik8 | 5 | clang-14.0.6 | 0.0% | -1.02% |
| Enwik8 | 5 | gcc-12.2.0 | 0.0% | +0.34% |
| Enwik8 | 7 | clang-14.0.6 | 0.0% | -1.22% |
| Enwik8 | 7 | gcc-12.2.0 | 0.0% | -0.72% |
| Enwik8 | 12 | clang-14.0.6 | 0.0% | +26.19% |
| Enwik8 | 12 | gcc-12.2.0 | 0.0% | -5.70% |
The speed difference for clang at level 12 is real, but is probably
caused by some sort of alignment or codegen issues. clang is
significantly slower than gcc before this PR, but gets up to parity with
it.
I also measured the ratio difference for the HC match finder, and it
looks basically the same as the row-based match finder. The speedup on
random data looks similar. And performance is about neutral, without the
big difference at level 12 for either clang or gcc.
In Python 3.x, a single element of a bytes array is returned as
an integer number. Thus, NEWLINE is an int variable, and attempting
to add it to the line array will fail with a type mismatch error
that may be demonstrated as follows:
[roam@straylight ~]$ python3 -c 'b"hello" + b"\n"[0]'
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: can't concat int to bytes
[roam@straylight ~]$
* Mark all bufferless and block level functions as deprecated
* Update documentation to suggest not using these functions
* Add `_deprecated()` wrappers for functions that we use internally and
call those instead
* patch-from speed optimization: only load portion of dictionary into normal matchfinders
* test regression for x8 multiplier
* fix off-by-one error for bit shift bound
* restrict patchfrom speed optimization to strategy < ZSTD_btultra
* update results.csv
* update regression test
Part 2 of #3528
Adds hash salt that helps to avoid regressions where consecutive compressions use the same tag space with similar data (running zstd -b5e7 enwik8 -B128K reproduces this regression).
- Adds memory type that is guaranteed to have been initialized at least once in the workspace's lifetime.
- Changes tag space in row hash to be based on init once memory.
#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag.
Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.
Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index).
The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).
As reported by @P-E-Meunier in https://github.com/facebook/zstd/issues/2662#issuecomment-1443836186,
seekable format ingestion speed can be particularly slow
when selected `FRAME_SIZE` is very small,
especially in combination with the recent row_hash compression mode.
The specific scenario mentioned was `pijul`,
using frame sizes of 256 bytes and level 10.
This is improved in this PR,
by providing approximate parameter adaptation to the compression process.
Tested locally on a M1 laptop,
ingestion of `enwik8` using `pijul` parameters
went from 35sec. (before this PR) to 2.5sec (with this PR).
For the specific corner case of a file full of zeroes,
this is even more pronounced, going from 45sec. to 0.5sec.
These benefits are unrelated to (and come on top of) other improvement efforts currently being made by @yoniko for the row_hash compression method specifically.
The `seekable_compress` test program has been updated to allows setting compression level,
in order to produce these performance results.
Current timeout is too small for some slower machines, e.g. most modern riscv64 boards,
where tests fail with the following diagnostics:
Traceback (most recent call last):
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 734, in <module>
success = run_tests(tests, opts)
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 601, in run_tests
tests[test_case.name] = test_case.run()
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 285, in run
return self.analyze()
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 275, in analyze
self._join_test()
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 330, in _join_test
(stdout, stderr) = self._test_process.communicate(timeout=self._opts.timeout)
File "/usr/lib64/python3.10/subprocess.py", line 1154, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/usr/lib64/python3.10/subprocess.py", line 2006, in _communicate
self._check_timeout(endtime, orig_timeout, stdout, stderr)
File "/usr/lib64/python3.10/subprocess.py", line 1198, in _check_timeout
raise TimeoutExpired(
subprocess.TimeoutExpired: Command '['/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/cli-tests/compression/window-resize.sh']' timed out after 60 seconds
* Add ZSTD_setFParams() and ZSTD_setParams()
* Modify ZSTD_setCParams() to use ZSTD_setParameter() to avoid a second path setting parameters
* Add unit tests
* Update documentation to suggest using them to replace deprecated functions
Fixes#3396.
- Initializes clevel in `ZSTD_CCtxParams_init`
- Adds CI workflow for msan fuzzers runs without optimization (`-O0`)
- Fixes Makefile to correctly pass on user defined `MOREFLAGS` and `FUZZER_FLAGS` in cases they have been overwritten
The block splitter confuses sequences with literal length == 65536 that use a
repeat offset code. It interprets this as literal length == 0 when deciding the
meaning of the repeat offset, and corrupts the repeat offset history. This is
benign, merely causing suboptimal compression performance, if the confused
history is flushed before the end of the block, e.g. if there are 3 consecutive
non-repeat code sequences after the mistake. It also is only triggered if the
block splitter decided to split the block.
All that to say: This is a rare bug, and requires quite a few conditions to
trigger. However, the good news is that if you have a way to validate that the
decompressed data is correct, e.g. you've enabled zstd's checksum or have a
checksum elsewhere, the original data is very likely recoverable. So if you were
affected by this bug please reach out.
The fix is to remind the block splitter that the literal length is actually 64K.
The test case is a bit tricky to set up, but I've managed to reproduce the issue.
Thanks to @danlark1 for alerting us to the issue and providing us a reproducer!
fix#3500
CMake 3.18 or later was required by #3392. Because it uses
`CheckLinkerFlag`. But requiring CMake 3.18 or later is a bit
aggressive. Because Ubuntu 20.04 LTS still uses CMake 3.16.3:
https://packages.ubuntu.com/search?keywords=cmake
This change disables `-z noexecstack` check with old CMake. This will
not break any existing users. Because users who need `-z noexecstack`
must already use CMake 3.18 or later.
Implemented CI workflow for testing compilation with external compressors and without them. This serves as a sanity check to avoid any code dependencies on libraries that may not always be present. (Reference: #3497 for a bug fix related to this issue.)
Bytef and uInt are zlib types, not available when zlib is disabled
Fixes: 1598e6c634ac ("Async write for decompression")
Fixes: cc0657f27d81 ("AsyncIO compression part 2 - added async read and asyncio to compression code (#3022)")
* Fixes zstd-dll build (https://github.com/facebook/zstd/issues/3492):
- Adds pool.o and threading.o dependency to the zstd-dll target
- Moves custom allocation functions into header to avoid needing to add dependency on common.o
- Adds test target for zstd-dll
- Adds github workflow that buildis zstd-dll
* fix and test MSVC AVX2 build
* treat msbuild warnings as errors
* fix incorrect MSVC 2019 compiler warning
* fix MSVC error D9035: option 'Gm' has been deprecated and will be removed in a future release
We need to run it for the tests, even if programs are disabled. So if
they are disabled, create a build rule for the program, but don't
install it. Just make it available for the test itself.
It depends on the zstd program being built, and passes it as an env
variable. Just like datagen. But for datagen, we explicitly depend on
it, while for zstd, we assume it's built as part of "all".
This can be wrong in two cases:
- when running individual tests, meson can (re)build just what is needed
for that one test
- a later patch will handle building zstd but not by default
Note that the `fd` is only valid while the file is still open. So we need to
move the setting calls to before we close the file. However! We cannot do so
with the `utime()` call (even though `futimens()` exists) because the follow-
ing `close()` call to the `fd` will reset the atime of the file. So it seems
the `utime()` call has to happen after the file is closed.
Somewhat surprisingly, calling `fchmod()` is non-trivially faster than calling
`chmod()`, and so on.
This commit introduces alternate variants to some common file util functions
that take an optional fd. If present, they call the `f`-variant of the
underlying function. Otherwise, they fall back to the regular filename-taking
version of the function.
2023-02-06 13:55:34 -08:00
308 changed files with 21138 additions and 10006 deletions
| gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
# add signed entry to apt sources and configure the APT client to use Intel repository:
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
$(MAKE)testCC=clang MOREFLAGS="-g -fsanitize=memory -fno-omit-frame-pointer -Werror" HAVE_LZMA=0# datagen.c fails this test for no obvious reason
$(MAKE)testCC=clang MOREFLAGS="-g -fsanitize=memory -fno-omit-frame-pointer -Werror$(MOREFLAGS)" HAVE_LZMA=0# datagen.c fails this test for no obvious reason
@ -5,7 +5,7 @@ targeting real-time compression scenarios at zlib-level and better compression r
It's backed by a very fast entropy stage, provided by [Huff0 and FSE library](https://github.com/Cyan4973/FiniteStateEntropy).
Zstandard's format is stable and documented in [RFC8878](https://datatracker.ietf.org/doc/html/rfc8878). Multiple independent implementations are already available.
This repository represents the reference implementation, provided as an open-source dual [BSD](LICENSE) and [GPLv2](COPYING) licensed **C** library,
This repository represents the reference implementation, provided as an open-source dual [BSD](LICENSE) OR [GPLv2](COPYING) licensed **C** library,
and a command line utility producing and decoding `.zst`, `.gz`, `.xz` and `.lz4` files.
Should your project require another programming language,
a list of known ports and bindings is provided on [Zstandard homepage](https://facebook.github.io/zstd/#other-languages).
@ -13,15 +13,12 @@ a list of known ports and bindings is provided on [Zstandard homepage](https://f
**Development branch status:**
[![Build Status][travisDevBadge]][travisLink]
[![Build status][AppveyorDevBadge]][AppveyorLink]
[![Build status][CircleDevBadge]][CircleLink]
[![Build status][CirrusDevBadge]][CirrusLink]
[![Fuzzing Status][OSSFuzzBadge]][OSSFuzzLink]
[travisDevBadge]: https://api.travis-ci.com/facebook/zstd.svg?branch=dev "Continuous Integration test suite"
[travisLink]: https://travis-ci.com/facebook/zstd
[AppveyorDevBadge]: https://ci.appveyor.com/api/projects/status/xt38wbdxjk5mrbem/branch/dev?svg=true "Windows test suite"
The zstd Conan recipe is kept up to date by Conan maintainers and community contributors.
If the version is out of date, please [create an issue or pull request](https://github.com/conan-io/conan-center-index) on the ConanCenterIndex repository.
### Visual Studio (Windows)
Going into `build` directory, you will find additional possibilities:
@ -189,6 +208,10 @@ Going into `build` directory, you will find additional possibilities:
You can build the zstd binary via buck by executing: `buck build programs:zstd` from the root of the repo.
The output binary will be in `buck-out/gen/programs/`.
### Bazel
You easily can integrate zstd into your Bazel project by using the module hosted on the [Bazel Central Repository](https://registry.bazel.build/modules/zstd).
## Testing
You can run quick local smoke tests by running `make check`.
@ -204,7 +227,7 @@ Zstandard is considered safe for production environments.
## License
Zstandard is dual-licensed under [BSD](LICENSE) and [GPLv2](COPYING).
Zstandard is dual-licensed under [BSD](LICENSE) OR [GPLv2](COPYING).
Please do not open GitHub issues or pull requests - this makes the problem immediately visible to everyone, including malicious actors. Security issues in this open source project can be safely reported via the Meta Bug Bounty program:
https://www.facebook.com/whitehat
Meta's security team will triage your report and determine whether or not is it eligible for a bounty under our program.
# Receiving Vulnerability Notifications
In the case that a significant security vulnerability is reported to us or discovered by us---without being publicly known---we will, at our discretion, notify high-profile, high-exposure users of Zstandard ahead of our public disclosure of the issue and associated fix.
If you believe your project would benefit from inclusion in this list, please reach out to one of the maintainers.
<!-- Note to maintainers: this list is kept [here](https://fburl.com/wiki/cgc1l62x). -->
SET ADDITIONALPARAM=/p:LibraryPath="C:\Program Files\Microsoft SDKs\Windows\v7.1\lib\x64;c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\lib\amd64;C:\Program Files (x86)\Microsoft Visual Studio 10.0\;C:\Program Files (x86)\Microsoft Visual Studio 10.0\lib\amd64;"
SET ADDITIONALPARAM=/p:LibraryPath="C:\Program Files\Microsoft SDKs\Windows\v7.1\lib\x64;c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\lib\amd64;C:\Program Files (x86)\Microsoft Visual Studio 10.0\;C:\Program Files (x86)\Microsoft Visual Studio 10.0\lib\amd64;"
)
build_script:
- ECHO Building %COMPILER% %PLATFORM% %CONFIGURATION%
It's generally recommended to have CMake with versions higher than 3.14 for [iOS-derived platforms](https://cmake.org/cmake/help/latest/manual/cmake-toolchains.7.html#id27).
[Looking for a 'cmake clean' command to clear up CMake output](https://stackoverflow.com/questions/9680420/looking-for-a-cmake-clean-command-to-clear-up-cmake-output)
message(WARNING"BUILD_SHARED_LIBS is OFF, but ZSTD_BUILD_SHARED is ON and ZSTD_BUILD_STATIC is OFF, which takes precedence, so libzstd is a shared library")
message(WARNING"BUILD_SHARED_LIBS is ON, but ZSTD_BUILD_SHARED is OFF and ZSTD_BUILD_STATIC is ON, which takes precedence, is set so libzstd is a static library")
**Example Frame**: see zstd/tests/golden-decompression/block-128k.zst
The zstd decoder incorrectly rejected blocks of type `Compressed_Block` when their size was exactly 128 KB.
Note that `128 KB - 1` was accepted, and `128 KB + 1` is forbidden by the spec.
This type of block was never generated by the reference compressor.
These blocks used to be disallowed by the spec up until spec version 0.3.2 when the restriction was lifted by [PR#1689](https://github.com/facebook/zstd/pull/1689).
> A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block).
Compressed block with 0 literals and 0 sequences
------------------------------------------------
@ -31,6 +72,7 @@ Additionally, these blocks were disallowed by the spec up until spec version 0.3
> A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block).
# macOS linker doesn't support -soname, and use different extension
# see : https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/DynamicLibraryDesignGuidelines.html
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.