The following tests are included:
- Empty input scenario test.
- Workspace size and alignment tests.
- Symbol out-of-range tests.
- Cover multiple input sizes, vary permitted maximum symbol
values, and include diverse symbol distributions.
These tests verifies count table correctness, maxSymbolValuePtr
updates, and error-handling paths. It enables automated regression
of core histogram logic as well.
When an error occurs in BMK_isSuccessful_runOutcome, the code
previously skipped the call to BMK_freeTimedFnState(tfs),
leaking the allocated tfs object.
Fiexed by calling BMK_freeTimedFnState(tfs) before goto _cleanOut.
The existing scalar implementation uses a 4-way pipelined histogram
calculation which is very efficient on out-of-order CPUs. However,
this can be further accelerated using the SVE2 HISTSEG instructions -
which compute a histogram for 16 byte chunks in a vector register.
On a system with 128-bit vectors (VL128) we need 16 HISTSEG executions
to compute the histogram for the whole symbol space (0..255) of 16
bytes input. However we can only accumulate 15 of such 16 byte strips
before possible overflow. So we need to extend and save the 8-bit
histogram accumulators to 16-bit after every 240 byte chunks of input.
To store all in registers we would need 32 128-bit registers. Longer
SVE2 vectors could help here, if such machines become available.
The maximum input block size in Zstd is 128 KiB, so 16-bit accumulators
would not be enough. However an LZ pass will prepend the histogram
calculation, so it is impossible (my assumption) to overflow the 16-bit
accumulators.
The symbol distribution is also not uniform, the lower values are more
common, so we used a 3 pass algorithm to prevent stack spilling. In the
first pass we only compute histograms for 64 symbols (4-way SIMD) while
also computing the maximum symbol value. If we have symbol values
larger than 64 we start the second pass to compute the next 96 elements
of the histogram. The final pass calculates the remaining part of the
histogram (256 symbols in total) if needed. This split of histogram
generation gave the best overall results for performance.
This implementation is the best performing of a number of different
cache blocking schemes tested.
Compression uplifts on a Neoverse V2 system, using Zstd-1.5.8
(e26dde3d) as a baseline, compiled with "-O3 -march=armv8.2-a+sve2":
Clang-20 GCC-14
1#silesia.tar: +6.173% +5.987%
2#silesia.tar: +5.200% +5.011%
3#silesia.tar: +4.332% +5.031%
4#silesia.tar: +2.789% +3.064%
5#silesia.tar: +2.028% +1.838%
6#silesia.tar: +1.562% +1.340%
7#silesia.tar: +1.160% +0.959%
- Split monolithic 235-line CMakeLists.txt into focused modules
- Main file reduced to 78 lines with clear section organization
- Created 5 specialized modules:
* ZstdVersion.cmake - CMake policies and version management
* ZstdOptions.cmake - Build options and platform configuration
* ZstdDependencies.cmake - External dependency management
* ZstdBuild.cmake - Build targets and validation
* ZstdPackage.cmake - Package configuration generation
Benefits:
- Improved readability and maintainability
- Better separation of concerns
- Easier debugging and modification
- Preserved 100% backward compatibility
- All existing build options and targets unchanged
The refactored build system passes all tests and maintains
identical functionality while being much easier to understand
and maintain.
- Create new .github/workflows/cmake-tests.yml with all cmake-related jobs
- Move cmake-build-and-test-check, cmake-source-directory-with-spaces, and cmake-visual-2022 jobs
- Remove cmake tests from dev-short-tests.yml to improve organization
- Maintain same trigger conditions and test configurations
- Add dedicated concurrency group for cmake tests
This separation allows cmake tests to run independently and makes
the CI configuration more modular and easier to maintain.
The FUZZ_malloc_rand() function was incorrectly always returning NULL for
zero-size allocations. The random offset generated by
FUZZ_dataProducer_int32Range() was not being added to the pointer variable,
causing the function to always return (void *)0.
Replace direct returns in error-handling branches with a unified
cleanup block that frees allocated resources before returning,
improving code quality and robustness.
In main, resources were freed on the success path but not in the error path.
This change ensures all allocated resources are released before returning.
Building lz4 as root was causing `make clean` to fail with permission
errors.
We used to have to install lz4 from source back in Ubuntu 14.04, but
nowadays the installed lz4 is fine. Get rid of ancient helpers and
cruft!
There was no memory barrier between writing and reading `done`, which
would allow reordering to cause races. With so little data to handle
after each job completes, we might as well just join.
Previously, parallel_compression would only handle each job's results
after ALL jobs were successfully queued. This caused all src/dst
buffers to remain in memory until then!
It also polled to check whether a job completed, which is racy without
any memory barrier.
Now, we flush results as a side effect of completing a job. Completed
frames are placed in an ordered linked-list, and any eligible frames
are flushed. This may be zero or multiple frames, depending on the
order in which jobs finish.
This design also makes it simple to support streaming input, so that
is now available. Just pass `-` as the filename, and stdin/stdout will
be used for I/O.
Some of these examples are intended to be parallel, and don't make
sense to link against single-threaded libzstd.
The filename of mt and nomt libzstd are identical, so it's still
possible to link against the single-threaded one, just harder.
The pkg-config file has License variable that allows you to set the license for
the software. This sets 'BSD-3-Clause OR GPL-2.0-only' to License.
Ref: https://github.com/pkgconf/pkgconf/blob/master/man/pc.5#L116
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
After the update to MacOS 15.4, the dynamic loader dyld treats duplicated LC_RPATH as an error.
The `FLAGS` variable already contains `LDFLAGS`, thus using both `FLAGS` and `LDFLAGS`
duplicates all `LDFLAGS`, including `-Wl,rpath` parameters.
The duplicate LC_RPATH causes this kind of errors:
```
dyld[29361]: Library not loaded: @loader_path/../lib/libzstd.1.dylib
Referenced from: <7131C877-3CF0-33AC-AA05-257BA4FDD770> /Users/foobar/...
Reason: tried: '/Users/foobar/..../lib/libzstd.1.dylib' (duplicate LC_RPATH '/usr/mypath.../lib')
```
Closes https://github.com/facebook/zstd/issues/4369
Signed-off-by: Etienne Cordonnier <ecordonnier@snap.com>
otherwise will cause dev-python/zstandard build failed when compiling with
clang as reported at https://bugs.gentoo.org/950259
the root cause is pycparser, which is unfixed since reported 2.5 years
ago, :(
Signed-off-by: Z. Liu <zhixu.liu@gmail.com>
The row based match finder is slower without SIMD. We used to detect the
presence of SIMD to set the lower bound to 17, but that breaks
determinism. Instead, specifically opt into it for the kernel, because
it is one of the rare cases that doesn't have SIMD support.
checks that ZSTD_NBTHREADS triggers the expected verbose message
Also: checked that the new test script fails on current `dev` branch, and is fixed by this branch
D50949782 fixed a race condition updating `g_displayLevel` by disabling display.
Instead of disabling display, delete the global variable and always "capture" a local `displayLevel` variable.
This also fixes `DISPLAYUPDATE()` by requiring the user to pass in the last update time as the first parameter.
to reduce confusion with the term "block".
-B# remains supported for existing scripts,
but it's no longer documented, so it's effectively a hidden shortcut.
to reduce confusion with the concept of "blocks" inside a Zstandard frame.
We are now talking about "independent chunks" being produced by a `split` operation.
updated documentation accordingly.
Note: old commands "-B#` and `--blocksize=#` remain supported,
to maintain compatibility with existing scripts.
this ABI is no longer supported by Ubuntu,
and there is a wider consensus that this ABI is on the way out,
with more and more distributions dropping it,
and lingering questions about support of x32 in the kernel.
the purpose of --ultra is to make the user explicitly opt-in
to generate very large window size (> 8 MB).
The agreement to generate very large window size is already implicit
when selecting --long or --patch-from.
Consequently, `--ultra ` is automatically enabled when `--long` or `--patch-from` is set.
this is a prototype definition error:
`_mm_storeu_si128()` should accept a `void*` pointer,
since it explicitly states that it accepts unaligned addresses
yet requiring a `__m128i*` tells otherwise, and requires the compiler the enforce this alignment.
seems like a prototype interface error:
input parameter should have been `const void*`,
since the documentation is explicit that input doesn't have to be aligned,
but `const __m256i*` makes the compiler enforce it.
The `-Wl,-z,noexecstack` and `-Wa,--noexecstack` flags are already set for CMake, but not for Meson.
This brings the flags to the Meson build as well. Note that this maintains the discrepancy in behavior
between CMake and Meson when it comes to enabling ASM: on CMake, the ZSTD_HAS_NOEXECSTACK variable
is set and these flags added for GCC/Clang and MinGW. Then later, the ZSTD_HAS_NOEXECSTACK variable
is checked (along with some other conditions) to enable or disable ASM. However on Meson, this logic
is restricted to simply checking for GCC/Clang. This patch maintains this behavior; noexecstack is
dependent on GCC/Clang only.
Make the function ZSTD_compressSequencesAndLiterals() available in kernel
space. This will be used by Intel QAT driver.
Additionally, (1) expose the function ZSTD_CCtx_setParameter(), which is
required to set parameters before calling ZSTD_compressSequencesAndLiterals(),
(2) update the build process to include `compress/zstd_preSplit.o` and
(3) replace `asm/unaligned.h` with `linux/unaligned.h`.
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
technically equivalent to `size_t`,
but it's the proper type for underlying register representation.
This makes it possible to control register type, and therefore size, independently from `size_t`,
which can be useful on systems where `size_t` is 32-bit, while the architecture supports 64-bit registers.
this was previously no triggered in x86 32-bit mode,
due to a limitation in `bitstream.h`, that was fixed in #4248.
Now, `bmi2` will be automatically detected and triggered
at compilation time, if the corresponding instruction set is enabled,
even in 32-bit mode.
Also: updated library documentation, to feature STATIC_BMI2 build variable
so that a human reading the test log can determine everything was fine without consulting the shell error code.
Also: made `make check` slightly shorter by moving one longer test to `make test`
do not solve the equation, even though some members cancel each other,
this is done for clarity,
we'll let the compiler do the resolution at compile time.
When two threads are using a WorkQueue and the reader thread exits due
to an error, it must call WorkQueue::finish() to wake up the writer
thread. Otherwise, if the queue is full and the writer thread is waiting
for a free slot, it could hang forever.
This can happen in pratice when decompressing a large, corrupted file
that does not contain pzstd skippable frames.
Move towards a stronger guarantee of reproducibility by removing this small difference for machines without SSE2/Neon.
The SIMD behavior is now the default for all platforms.
The optimal parser with LDM enabled using minMatch > 3 could generate a match
length of 3 when minMatch >= 4. This is not allowed.
1. Fix the bug
2. Add validation logic to `ZSTD_buildSeqStore()` in debug mode for all block
compressors that checks we never generate too short a match. This way we don't
rely on the `generate_sequences` fuzzer to find this issue.
Credit to OSS-Fuzz
Some older Android libc implementations don't support `fseeko` or `ftello`.
This commit adds a new compile-time macro `LIBC_NO_FSEEKO` as well as a usage in CMake for old Android APIs.
Summary:
Issue reported by @ryandesign and @MarcusCalhoun-Lopez.
CMake doesn't support spaces in flags. This caused older versions of gcc to
ignore the unknown flag "-z noexecstack" on MacOS since it was interpreted as a
single flag, not two separate flags. Then, during compilation it was treated as
"-z" "noexecstack", which was correctly forwarded to the linker. But the MacOS
linker does not support `-z noexecstack` so compilation failed.
The fix is to use `-Wl,-z,noexecstack`. This is never misinterpreted by a
compiler. However, not all compilers support this syntax to forward flags to the
linker. To fix this issue, we check if all the relevant `noexecstack` flags have
been successfully set, and if they haven't we disable assembly.
See also PR#4056 and PR#4061. I decided to go a different route because this is
simpler. It might not successfully set these flags on some compilers, but in that
case it also disables assembly, so they aren't required.
Test Plan:
```
mkdir build-cmake
cmake ../build/cmake/CMakeLists.txt
make -j
```
See that the linker flag is successfully detected & that assembly is enabled.
Run the same commands on MacOS which doesn't support `-Wl,-z,noexecstack` and see
that everything compiles and that `LD_FLAG_WL_Z_NOEXECSTACK` and
`ZSTD_HAS_NOEXECSTACK` are both false.
which creates more "realistic" scenarios than former compressible noise.
The legacy data generator remains accessible,
it is triggered when requesting an explicit compressibility factor (-P#).
since it's a type name.
Note: in contrast with previous names, this one is on the Public API side.
So there is a #define, so that existing programs using ZSTD_sequenceFormat_e still work.
Do some include shuffling for `**.h` files within lib, programs, tests, and zlibWrapper.
`lib/legacy` and `lib/deprecated` are untouched.
`#include`s within `extern "C"` blocks in .cpp files are untouched.
todo: shuffling for `xxhash.h`
* Change CLI to employ multithreading by default
* Document changes to benchmarking, print number of threads for display level >= 4, and add lower bound of 1 for the default number of threads
CMake warns on the current minimum requirement (3.5). Update to 3.10.
This means support is still available for the default on Ubuntu 18.04, which
exited LTS standard in April of 2023.
[draft]
following notification from @rainbowball.
fix#4186.
Note: there is currently no QNX compilation test in CI
so this is a "blind" fix,
and this target can be silently broken again in the future.
instead of just for blind split.
This is in anticipation of adversarial input,
that would intentionally target the sampling pattern of the split detector.
Note that, even without this protection, splitting can never expand beyond ZSTD_COMPRESSBOUND(),
because this upper limit uses a 1KB block size worst case scenario,
and splitting never creates blocks thath small.
The protection is more to ensure that data is not expanded by more than 3-bytes per 128 KB full block,
which is a much stricter limit.
this helps make the streaming behavior more consistent,
since it does no longer depend on having more data presented on the input.
suggested by @terrelln
so that it can be stored using standard alignment requirement (sizeof(void*)).
Distance function still requires 64-bit signed multiplication though,
so it won't change the issue regarding the bug in ubsan for clang 32-bit on github ci.
that samples 1 in 5 positions.
This variant is fast enough for lazy2 and btlazy2,
but it's less good in combination with post-splitter at higher levels (>= btopt).
update the man page in troff format,
and the README with latest `--help` content and complementary details about benchmark mode.
also: display level 0 when doing decompression benchmark
After a regrettable update,
the benchmark module ended up reloading sources for every compression level.
While the delay itself is likely torelable,
the main issue is that the `--quiet` mode now also displays a loading summary between each compression line.
This wasn't the original intention, which is to produce a compact view of all compressions.
This is fixed in this version,
where sources are loaded only once, for all compression levels,
and loading summary is only displayed once.
The compression ratio benefits are small but consistent, i.e. always positive.
On `silesia.tar` corpus, this modification saves ~75 KB at level 3.
The measured speed cost is negligible, i.e. below noise level, between 0 and -1%.
The -Werror=missing-profile caused thread/zlib/lzma/lz4 detection failure
when build with profile-use, thus caused ZSTD_MULTITHREAD etc. is not
defined for profile-use, then there will be many profile mismatch information
in output and the final binary reports run error sometimes as below:
Error : ZSTD_CCtx_setParameter(ctx, ZSTD_c_nbWorkers, adv->nbWorkers) failed : Unsupported parameter
Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
Sanity checks on a few of the context parameters (i.e. workers and block size)
may prompt an early return on ZSTD_generateSequences.
Allocating the destination buffer past those return points avoids a potential
memory leak.
This patch should fix issue #4112.
The NDK cross compiler declares the target as __linux (which is
not technically incorrect), which triggers the enablement of _GNU_SOURCE
in the newly added code that requires the presence of qsort_r() used
in the COVER dictionary code.
Even though the NDK uses llvm/libc, it doesn't declare qsort_r()
in the stdlib.h header.
The build fix is to only activate the _GNU_SOURCE macro if the OS is
*not* Android, as then we will fallback to the C90 compliant code.
This patch should solve the reported issue number #4103.
- add option to force a specific block type
- add option to force a specific literal tyle
- add option to generate only frame headers
- add option to skip generating magic numbers
Co-authored-by: Maciej Dudek <mdudek@antmicro.com>
Co-authored-by: Pawel Czarnecki <pczarnecki@antmicro.com>
Co-authored-by: Robert Winkler <rwinkler@antmicro.com>
Co-authored-by: Roman Dobrodii <rdobrodii@antmicro.com>
Commit f4dbfce79cb2b82fb496fcd2518ecd3315051b7d ("define LIB_SRCDIR
and LIB_BINDIR") significantly reworked the build logic, but in its
introduction of LIB_BINDIR a typo was made.
It was introduced as such:
+LIB_SRCDIR ?= $(dir $(realpath $(lastword $(MAKEFILE_LIST))))
+LIB_BINDIR ?= $(LIBSRC_DIR)
But the definition of LIB_BINDIR has a typo: it should use
$(LIB_SRCDIR) not $(LIBSRC_DIR).
Due to this, $(LIB_BINDIR) is empty, therefore in programs/Makefile,
-L$(LIB_BINDIR) is expanded to just -L, and consequently when trying
to link the "zstd" binary with the libzstd library, it cannot find it:
host/lib/gcc/powerpc64-buildroot-linux-gnu/13.3.0/../../../../powerpc64-buildroot-linux-gnu/bin/ld: cannot find -lzstd: No such file or directory
This commit fixes the build by fixing this typo.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Using '--single-thread' with '--patch-from' on compression levels above 15 will lead to significantly worse compression ratios.
Corrected the man page not suggest anymore to do this.
The two main functions used for dictionary training using the COVER
algorithm require initialization of a COVER_ctx_t where a call
to qsort() is performed.
The issue is that the standard C99 qsort() function doesn't offer
a way to pass an extra parameter for the comparison function callback
(e.g. a pointer to a context) and currently zstd relies on a *global*
static variable to hold a pointer to a context needed to perform
the sort operation.
If a zstd library user invokes either ZDICT_trainFromBuffer_cover or
ZDICT_optimizeTrainFromBuffer_cover from multiple threads, the
global context may be overwritten before/during the call/execution to qsort()
in the initialization of the COVER_ctx_t, thus yielding to crashes
and other bad things (Tm) as reported on issue #4045.
Enters qsort_r(): it was designed to address precisely this situation,
to quote from the documention [1]: "the comparison function does not need to
use global variables to pass through arbitrary arguments, and is therefore
reentrant and safe to use in threads."
It is available with small variations for multiple OSes (GNU, BSD[2],
Windows[3]), and the ISO C11 [4] standard features on annex B-21 qsort_s() as
part of the <stdlib.h>. Let's hope that compilers eventually catch up
with it.
For now, we have to handle the small variations in function parameters
for each platform.
The current fix solves the problem by allowing each executing thread
pass its own COVER_ctx_t instance to qsort_r(), removing the use of
a global pointer and allowing the code to be reentrant.
Unfortunately for *BSD, we cannot leverage qsort_r() given that its API
has changed on newer versions of FreeBSD (14.0) and the other BSD variants
(e.g. NetBSD, OpenBSD) don't implement it.
For such cases we provide a fallback that will work only requiring support
for compilers implementing support for C90.
[1] https://man7.org/linux/man-pages/man3/qsort_r.3.html
[2] https://man.freebsd.org/cgi/man.cgi?query=qsort_r
[3] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/qsort-s?view=msvc-170
[4] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
- switched the patter and input of $filter into the right places
- added pattern wildcard to MSYS_NT & CYGWIN_NT as they change with windows versions
- correctly identify MSYS2, even in an env like MINGW64
As reported by Ben Hawkes in #4026, a failure to allocate a zstd context
would lead to a dereference of a NULL pointer due to a missing check
on the returned result of ZSTDv06_createDCtx().
This patch fix the issue by adding a check for valid returned pointer.
Meson always returns -pthread in dependency('threads') on non-MSVC
compilers. On Windows we use Windows threading primitives, so we don't
need this. Avoid adding -pthread to libzstd's link flags, either as a
Meson subproject or via pkg-config Libs.private, so the application
doesn't inadvertently depend on winpthreads.
Add a Meson MinGW cross-compile CI test that checks for this. It turns
out that pzstd fails to build in that environment, so have the test
skip building contrib for now.
Multi-threaded static library require -pthread to correctly link and works.
The pkg-config we provide tho only works with dynamic multi-threaded library
and won't provide the correct libs and cflags values if lib-mt is used.
To handle this, introduce an env variable MT to permit advanced user to
install and generate a correct pkg-config file for lib-mt or detect if
lib-mt target is called.
With MT env set on calling make install-pc, libzstd.pc.in is a
pkg-config file for a multi-threaded static library.
On calling make lib-mt, a libzstd.pc is generated for a multi-threaded
static library as it's what asked by the user by forcing it.
libzstd.pc is changed to PHONY to force regeneration of it on calling
lib targets or install-pc to handle case where the same directory is
used for mixed compilation.
This was notice while migrating from meson to make build system where
meson generates a correct .pc file while make doesn't.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Just after a clone I'm getting this:
~/zstd/zlibWrapper$ cc -c zstd_zlibwrapper.o gz*.c -lz -lzstd -DSTDC
gzwrite.c: In function ‘gz_write’:
gzwrite.c:226:43: error: ‘z_uInt’ undeclared (first use in this
function); did you mean ‘uInt’?
226 | state.state->strm.avail_in = (z_uInt)n;
| ^~~~~~
| uInt
gzwrite.c:226:43: note: each undeclared identifier is reported only
once for each function it appears in
gzwrite.c:226:50: error: expected ‘;’ before ‘n’
226 | state.state->strm.avail_in = (z_uInt)n;
| ^
| ;
z_uInt is never used directly, zconf.h redefines uInt to z_uInt under
the condition that Z_PREFIX is set. All examples use uInt, and the type
of avail_in is also uInt.
In this commit I modify the cast to refer to the same type as the type
of lvalue.
Arguably, the real fix here is to handle possible overflows, but that's
beyond the scope of this commit.
This was causing OSS-Fuzz errors, due to compiler differences.
* Fix the issue
* Also turn off -Werror so we don't fail fuzzer builds for warnings
* Turn on -Werror in our CI
This function was seriously flawed:
* It didn't do output bounds checks
* It produced invalid sequences when an uncompressed or RLE block was emitted
* It produced invalid sequences when the block splitter was enabled
* It produced invalid sequences when ZSTD_c_targetCBlockSize was enabled
I've attempted to fix these issues, but this function is just a bad idea,
so I've marked it as deprecated and unsafe. We should replace it with
`ZSTD_extractSequences()` which operates on a compressed frame.
Fails on errors when building fuzzers with `fuzz.py` (adds `Werror`).
Currently allows `declaration-after-statement`, `c++-compat` and
`deprecated` as they are abundant in code (some fixes to
`declaration-after-statement` are presented in this commit).
Fixes 2 issue in `simple_decompress.c`:
1. Wrong type used for storing the results of `ZSTD_findDecompressedSize` resulting in never matching to `ZSTD_CONTENTSIZE_ERROR` or `ZSTD_CONTENTSIZE_UNKNOWN`.
2. Experimental API is used (`ZSTD_findDecompressedSize`) without defining `ZSTD_STATIC_LINKING_ONLY`.
Document that the `ZSTD_BUILD_{SHARED,STATIC}` take precedence over `BUILD_SHARED_LIBS` when exactly one is ON.
Thanks to @teo-tsirpanis for pointing out the potentially confusing behavior.
Doing this check with a direct c++ snippet is prone to portability problems:
- \043 is not portable between shells: dash expands it to #,
bash does not;
- using # directly works with make 4.3 but does not with make 4.2.
Let's just use the c++ version that covers both the code and the gtest.
* Make a variable `PublicHeaders` for Zstd's public headers
* Add `PublicHeaders` to `Headers`, which was missing
* Only export `${LIBRARY_DIR}` publicly, not `common/`
* Switch the `target_include_directories()` to `INTERFACE` because zstd uses relative includes internally, so doesn't need any include directories to build
* Switch installation to use the `PublicHeaders` variable, and test that the right headers are installed
If both `ZSTD_BUILD_SHARED` and `ZSTD_BUILD_STATIC` are set, then cmake exports the libraries `libzstd_shared` and `libzstd_static` only.
It does not export `libzstd`, which is only exported when exactly one of `ZSTD_BUILD_SHARED` and `ZSTD_BUILD_STATIC` is set.
This PR exports `libzstd` in that case, based on the value of the standard CMake variable [`BUILD_SHARED_LIBS`](https://cmake.org/cmake/help/latest/variable/BUILD_SHARED_LIBS.html).
This ensures that `libzstd` can always be used to refer to the exported zstd library, since the build errors if neither `ZSTD_BUILD_SHARED` nor `ZSTD_BUILD_STATIC` are set.
I tested all the possible combinations of `ZSTD_BUILD_SHARED`, `ZSTD_BUILD_STATIC`, and `BUILD_SHARED_LIBS` and they always worked as expected:
* If only exactly one of `ZSTD_BUILD_SHARED` and `ZSTD_BUILD_STATIC` is set, that is used as `libzstd`.
* Otherwise, libzstd is set based on `BUILD_SHARED_LIBS`.
Fixes#3859.
This feature has demonstrated itself to be useful in web compression and we
want to encourage other folks to use it. But we currently make it difficult
to do so since it's locked away in the experimental API.
The API itself is really straightforward and I think it's fine to commit to
maintaining support / compatibility for this API even if in the future the
underlying implementation may continue to evolve.
Note that this commit changes its enum name and also its numeric value. Users
who respected the instructions of using the experimental API should be fine
with both of these changes since they should only have referred to it by the.
Conceivably someone could have done bad feature detection of this capability
by doing `#ifdef ZSTD_c_targetCBlockSize` which will now return false since
it's no longer a macro... but I think that's an acceptable hypothetical
breakage.
Mark that `huf_decompress_amd64.S` supports BTI and PAC, which it trivially does because it is empty for aarch64.
The issue only requested BTI markings, but it also makes sense to mark PAC, which is the only other feature.
Also run add a test for this mode to the ARM64 QEMU test. Before this PR it warns on `huf_decompress_amd64.S`, after it doesn't.
Fixes Issue #3841.
This doesn't affect most of the targets, but will help me sleep better at night knowing that future refactors won't break the legacy support.
Should have been included in https://github.com/facebook/zstd/pull/3943 but I noticed after that merged, so putting up a separate PR.
Fixes a bug in AsyncIO where we queue reads after opening a file so our queue will always be saturated (or as saturated as possible).
Previous code was looping up to `availableJobsCount` not realizing `availableJobsCount` was also decreasing in each iteration, so instead of queueing 10 jobs we'd queue 5 (and instead of 2 we'd queue 1).
This PR fixes the loop to queue as long as `availableJobsCount` is not 0.
only disable `--rm` at end of command line parsing,
so that `-c` only disables `--rm` if it's effectively selected,
and not if it's overriden by a later `-o FILE` command.
is correctly detected as corrupted by new version,
and is accepted (changed into offset==1) by older version.
updated documentation accordingly, with an hexadecimal representation.
in this new method, when an `offset==0` is detected,
it's converted into (size_t)(-1), instead of 1.
The logic is that (size_t)(-1) is effectively an extremely large positive number,
which will not pass the offset distance test at next stage (`execSequence()`).
Checked the source code, and offset is always checked (as it should),
using a formula which is not vulnerable to arithmetic overflow:
```
RETURN_ERROR_IF(sequence.offset > (size_t)(oLitEnd - virtualStart),
```
The benefit is that such a case (offset==0) is always detected as corrupted data
as opposed to relying on the checksum to detect the error.
LLU is a correct prefix according to C99 & C11 standards (but not C90).
However, older versions of Visual Studio do not work with it.
Replace by ULL, which doesn't have this issue.
Fixes https://github.com/facebook/zstd/issues/3647
and only defines a sub-block boundary when
it believes that it is compressible.
It's effectively an optimization,
avoiding a compression cycle to reach the same conclusion.
note that the size of individual compressed blocks will vary more wildly with this modification.
But it seems good enough for a first test, and fix the speed regression issue.
Further refinements can be attempted later.
using real latin sentences from Cicero.
Compression ratio lower again, closer to "real" text,
now level 6 is way better than level 4.
level 5 is still lower than level 4,
but at least it's now higher than level 3.
makes compression a bit less good,
hence a bit more comparable with real text (though still too easy to compress).
level 6 is now stronger than level 4, by a hair.
However, there is still a ratio dip at level 5.
While it's not always strictly a win,
it's a win for files that see a noticeably compression ratio increase,
while it's a very small noise for other files.
Downside is, this patch is less efficient for 32-bit arrays of integer
than the previous patch which was introducing losses for other files,
but it's still a net improvement on this scenario.
helps when litlen==1 is cheaper than litlen==0
works great on pathological arr[u32] examples
but doesn't generalize well on other files.
silesia/x-ray is amoung the most negatively affected ones.
this works great for 32-bit arrays,
notably the synthetic ones, with extreme regularity,
unfortunately, it's not universal,
and in some cases, it's a loss.
Crucially, on average, it's a loss on silesia.
The most negatively impacted file is x-ray.
It deserves an investigation before suggesting it as an evolution.
note: we probably don't want to maintain VS2008 solution anymore.
Its successor VS2010 is > 10 years old,
which is more or less the limit after which we can stop supporting old compilers.
this generator replaces the statistical generator
for the general case when no statistic is requested.
Generated data features a compression level speed / ratio curve
which is more in line with expectation.
replaced `\+` by `*`.
`\+` means `[1-N]`,
while `*` means `[0-N]`,
so it's not strictly equivalent
but `\+` happens to be badly supported on some flavors of grep,
and for the purpose of these tests, `*` is good enough.
This avoids downloading -- and periodically bumping the checksum for --
a third-party action that isn't strictly required, and thus helps keep
down dependencies and reduce update churn.
* Add ZSTD_CCtxParams_registerSequenceProducer() to public API
* add unit test
* add docs to zstd.h
* nits
* Add ZSTDLIB_STATIC_API prefix
* Add asserts
It cannot be an alias, because it would lock the package to use either static or shared libraries at its build time. We want to decide this at the time `find_package` is called.
If the relevant allocation returns NULL, ZSTD_createCDict_advanced_internal()
will return NULL. But ZSTD_createCDict_advanced2() doesn't check for
this and attempts to use the returned pointer anyway, which leads to
a segfault.
The patch adds a validation to ensure that the last field, which is
reserved, must be all-zeroes in ZSTD_decodeSeqHeaders. This prevents
potential corruption from going undetected.
Fixes an issue where corrupted input could lead to undefined behavior
due to improper validation of reserved bits.
Signed-off-by: aimuz <mr.imuz@gmail.com>
This PR introduces no functional changes. It attempts to change all
macros currently using `{ }` or some variant of that to to
`do { } while (0)`, and introduces trailing `;` where necessary.
There were no bugs found during this migration.
The bug in Visual Studios warning on this has been fixed since VS2015.
Additionally, we have several instances of `do { } while (0)` which have
been present for several releases, so we don't have to worry about
breaking peoples builds.
Fixes Issue #3830.
`HUF_DecompressFastArgs_init()` was adding 0 to NULL. Fix it by exiting
early for empty outputs. This is no change in behavior, because the
function was already exiting 0 in this case, just slightly later.
* Rename `ilimit` to `ilowest` and set it equal to `src` instead of
`src + 6 + 8`. This is safe because the fast decoding loops guarantee
to never read below `ilowest` already. This allows the fast decoder to
run for at least two more iterations, because it consumes at most 7
bytes per iteration.
* Continue the fast loop all the way until the number of safe iterations
is 0. Initially, I thought that when it got towards the end, the
computation of how many iterations of safe might become expensive. But
it ends up being slower to have to decode each of the 4 streams
individually, which makes sense.
This drastically speeds up the Huffman decoder on the `github` dataset
for the issue raised in #3762, measured with `zstd -b1e1r github/`.
| Decoder | Speed before | Speed after |
|----------|--------------|-------------|
| Fallback | 477 MB/s | 477 MB/s |
| Fast C | 384 MB/s | 492 MB/s |
| Assembly | 385 MB/s | 501 MB/s |
We can also look at the speed delta for different block sizes of silesia
using `zstd -b1e1r silesia.tar -B#`.
| Decoder | -B1K ∆ | -B2K ∆ | -B4K ∆ | -B8K ∆ | -B16K ∆ | -B32K ∆ | -B64K ∆ | -B128K ∆ |
|----------|--------|--------|--------|--------|---------|---------|---------|----------|
| Fast C | +11.2% | +8.2% | +6.1% | +4.4% | +2.7% | +1.5% | +0.6% | +0.2% |
| Assembly | +12.5% | +9.0% | +6.2% | +3.6% | +1.5% | +0.7% | +0.2% | +0.03% |
gcc in the linux kernel was not unrolling the inner loops of the Huffman
decoder, which was destroying decoding performance. The compiler was
generating crazy code with all sorts of branches. I suspect because of
Spectre mitigations, but I'm not certain. Once the loops were manually
unrolled, performance was restored.
Additionally, when gcc couldn't prove that the variable left shift in
the 4X2 decode loop wasn't greater than 63, it inserted checks to verify
it. To fix this, mask `entry.nbBits & 0x3F`, which allows gcc to eliete
this check. This is a no op, because `entry.nbBits` is guaranteed to be
less than 64.
Lastly, introduce the `HUF_DISABLE_FAST_DECODE` macro to disable the
fast C loops for Issue #3762. So if even after this change, there is a
performance regression, users can opt-out at compile time.
ZSTD_resetDStream() is deprecated and replaced by ZSTD_DCtx_reset().
This removes deprecation warnings from the kernel build.
This change is a no-op, see the docs suggesting this replacement.
fcbf2fde9a/lib/zstd.h (L2655-L2663)
More recent versions of CMake emit the following warning:
CMake Deprecation Warning at cmake/CMakeLists.txt:10 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
the flexArray in structure FSE_DecompressWksp
is just a way to derive a pointer easily,
without risk/complexity of calculating it manually.
Not sure if this change is good enough to avoid ubsan warnings though.
within ZSTDMT_.
This pattern is flagged by less forgiving variants of ubsan
notably used during compilation of the Linux Kernel.
There are 2 other places in the code where this pattern is used.
This fixes just one of them.
* Remove all pointer-overflow suppressions from our UBSAN builds/tests.
* Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress
pointer-overflow at a per-function level. This is a superior approach
because it also applies to users who build zstd with UBSAN.
* Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions.
The end goal is to only tag these functions with
`ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions
that rely on pointer overflow, and gradually transition to using
these.
* Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the
pointer may be `NULL`.
* Fix all the fuzzer issues that came up. I'm sure there will be a lot
more, but these are the ones that came up within a few minutes of
running the fuzzers, and while running GitHub CI.
To the best of my knowledge:
* `_WIN32` and `_WIN64` are defined by the compiler,
* `WIN32` and `WIN64` are defined by the user, to indicate whatever
the user chooses them to indicate. They mean 32-bit and 64-bit Windows
compilation by convention only.
See:
https://accu.org/journals/overload/24/132/wilson_2223/
Windows compilers in general, and MSVC in particular, have been defining
`_WIN32` and `_WIN64` for a long time, provably at least since Visual Studio
2015, and in practice as early as in the days of 16-bit Windows.
See:
https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=msvc-140https://learn.microsoft.com/en-us/windows/win32/winprog64/the-tools
Tests used to be inconsistent, sometimes testing `_WIN32`, sometimes
`_WIN32` and `WIN32`. This brings consistency to Windows detection.
Refine the macro guards to define the functions exactly when they are
needed.
This fixes the chromium build with zstd.
Thanks to @GregTho for reporting!
The Huffman repeat mode checker assumed that the CTable was zeroed in the region `[maxSymbolValue + 1, 256)`.
This assumption didn't hold for tables built in the dictionaries, because it didn't go through the same codepath.
Since this code was originally written, we added a header to the CTable that specifies the `tableLog`.
Add `maxSymbolValue` to that header, and check that the table's `maxSymbolValue` is at least the block's `maxSymbolValue`.
This solution is cleaner because we write this header for every CTable we build, so it can't be missed in any code path.
Credit to OSS-Fuzz
Fix the following warnings reported by the compiler when
ZDICTLIB_STATIC_API is not defined to ZDICTLIB_API:
lib/dictBuilder/cover.c:1122:21: warning: redeclaration of 'ZDICT_optimizeTrainFromBuffer_cover' with different visibility (old visibility
preserved)
lib/dictBuilder/cover.c:736:21: warning: redeclaration of 'ZDICT_trainFromBuffer_cover' with different visibility (old visibility
+preserved)
lib/dictBuilder/fastcover.c:549:1: warning: redeclaration of 'ZDICT_trainFromBuffer_fastCover' with different visibility (old visibility
preserved)
lib/dictBuilder/fastcover.c:618:1: warning: redeclaration of 'ZDICT_optimizeTrainFromBuffer_fastCover' with different visibility (old
visibility preserved)
We already have logic in our Huffman encoder to validate Huffman tables with missing symbols.
We use this for higher compression levels to re-use the previous blocks statistics, or when the dictionaries table has zero-weighted symbols.
This check was leftover as an oversight from before we added validation for Huffman tables.
I validated that the `dictionary_loader` fuzzer has coverage of every line in the `ZSTD_loadCEntropy()` function to validate that it is correctly testing this function.
MSAN is hooked into the system malloc, but when the user provides a custom
allocator, it may not provide the same cleansing behavior. So if we leave
memory poisoned and return it to the user's allocator, where it is re-used
elsewhere, our poisoning can blow up in some other context.
They are Linux-like environments under Windows and have all the tools needed to support staged installation and testing.
Beware: this only affects the make build system.
For some reasons when LTO is enabled, the compiler complains about statbuf variable not being correctly initialized, even though the variable has an assert != NULL just few lines below (FIO_getDictFileStat)
This is the fixed build failure:
x86_64-linux-gnu-gcc -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<<PKGBUILDDIR>>=/usr/src/libzstd-1.5.5+dfsg2-1 -Wall -Wextra -Wcast-qual -Wcast-align -Wshadow -Wstrict-aliasing=1 -Wswitch-enum -Wdeclaration-after-statement -Wstrict-prototypes -Wundef -Wpointer-arith -Wvla -Wformat=2 -Winit-self -Wfloat-equal -Wwrite-strings -Wredundant-decls -Wmissing-prototypes -Wc++-compat -g -Werror -Wa,--noexecstack -Wdate-time -D_FORTIFY_SOURCE=2 -DXXH_NAMESPACE=ZSTD_ -DDEBUGLEVEL=1 -DZSTD_LEGACY_SUPPORT=5 -DZSTD_MULTITHREAD -DZSTD_GZCOMPRESS -DZSTD_GZDECOMPRESS -DZSTD_LZMACOMPRESS -DZSTD_LZMADECOMPRESS -DZSTD_LZ4COMPRESS -DZSTD_LZ4DECOMPRESS -DZSTD_LEGACY_SUPPORT=5 -c -MT obj/conf_086c46a51a716b674719b8acb8484eb8/zstdcli_trace.o -MMD -MP -MF obj/conf_086c46a51a716b674719b8acb8484eb8/zstdcli_trace.d -o obj/conf_086c46a51a716b674719b8acb8484eb8/zstdcli_trace.o zstdcli_trace.c
In function ‘UTIL_isRegularFileStat’,
inlined from ‘UTIL_getFileSizeStat’ at util.c:524:10,
inlined from ‘FIO_createDResources’ at fileio.c:2230:30:
util.c:209:12: error: ‘statbuf.st_mode’ may be used uninitialized [-Werror=maybe-uninitialized]
209 | return S_ISREG(statbuf->st_mode) != 0;
| ^
fileio.c: In function ‘FIO_createDResources’:
fileio.c:2223:12: note: ‘statbuf’ declared here
2223 | stat_t statbuf;
| ^
lto1: all warnings being treated as errors
The sequence section starts with a number, which tells how sequences are present in the section.
If this number if 0, the section automatically ends.
The number 0 can be represented using the 1 byte or the 2 bytes formats.
That's because the 2-bytes formats fully overlaps the 1 byte format.
However, when 0 is represented using the 2-bytes format,
the decoder was expecting the sequence section to continue,
and was looking for FSE tables, which is incorrect.
Fixed this behavior, in both the reference decoder and the educational behavior.
In practice, this behavior never happens,
because the encoder will always select the 1-byte format to represent 0,
since this is more efficient.
Completed the fix with a new golden sample for tests,
a clarification of the specification,
and a decoder errata paragraph.
and in `decodecorpus`:
the specific case `nbSeq=127` can be represented using the 1-byte format.
Note that both the 1-byte and the 2-bytes formats are valid to represent this case,
so there was no "error", produced data remains valid,
it's just that the 1-byte format is more efficient.
fix#3667
Credit to @ip7z for finding this issue.
When forcing the source file language to `C`, Xcode enforces
the file to be compiled as `C` by appending `-x c` to the
compiler command line.
For now try to limit the damage and only enforce the language
if the ASM and C compilers differ.
Reproducer (CMake `3.26.4`, Xcode `14.3`):
```
cmake -S build/cmake -B _b -GXcode -DCMAKE_OSX_ARCHITECTURES=x86_64
cmake --build _b
```
Fix: #3622
Reduces memory when blocks are guaranteed to be smaller than allowed by
the format. This is useful for streaming compression in conjunction with
ZSTD_c_maxBlockSize.
This PR saves 2 * (formatMaxBlockSize - paramMaxBlockSize) when streaming.
Once it is rebased on top of PR #3616 it will save
3 * (formatMaxBlockSize - paramMaxBlockSize).
The split literals buffer patch increased streaming decompression memory
by 64KB (shrunk lit buffer from 128KB to 64KB, and added 128KB). This
patch removes the added 128KB buffer, because it isn't necessary.
The buffer was there because the literals compression code didn't know
the true `blockSizeMax` of the frame, and always put split literals so
they ended 128KB - 32 from the beginning of the block. Instead, we can
pass down the true `blockSizeMax` and ensure that the split literals
end up at `blockSizeMax - 32` from the beginning of the block. We
already reserve a full `blockSizeMax` bytes in streaming mode, so we
won't be overwriting the extDict window.
for compressed blocks of size exactly 128 KB
which used to be disallowed by the spec
but have become allowed in more recent version of the spec.
While this limitation is fixed in decoders v1.5.4+,
implementers should refrain from generating such block with their custom encoder
as they could be misclassified as corrupted by older decoder versions.
When `ZSTD_c_maxBlockSize` is set, we weren't computing the
decompression margin correctly, leading to `dstSize_tooSmall` errors.
Fix that computation.
This is just a bug in the fuzzer, not a bug in the library itself.
Credit to OSS-Fuzz
detected by @terrelln,
these issue could be triggered in specific scenarios
namely decompression of certain invalid magic-less frames,
or requested properties from certain invalid skippable frames.
* add check for valid dest buffer and fuzz on random dest ptr when malloc 0
* add uptrval to linux-kernel
* remove bin files
* get rid of uptrval
* restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)
decompression features automatic support of sparse files,
aka a form of "compression" where entire blocks consists only of zeroes.
This only works for some compatible file systems (like ext4),
others simply ignore it (like afs).
Triggering this feature relies of `fseek()`.
But `fseek()` is not compatible with non-seekable devices, such as pipes.
Therefore it's disabled for pipes.
However, there are other objects which are not compatible with `fseek()`, such as block devices.
Changed the logic, so that `fseek()` (and therefore sparse write) is only automatically enabled on regular files.
Note that this automatic behavior can always be overridden by explicit commands `--sparse` and `--no-sparse`.
fix#3583
This does the following:
1. Compress test data into multiple frames
2. Perform a series of small decompressions and seeks forward, checking
that compressed data wasn't reread unnecessarily.
3. Perform some seeks forward and backward to ensure correctness.
When decompressing a seekable file, if seeking forward within
a frame (by issuing multiple ZSTD_seekable_decompress calls
with a small gap between them), the frame will be unnecessarily
reread from the beginning. This patch makes it continue using
the current frame data and simply skip over the unneeded bytes.
Looking at the __builtin_expect in ZSTD_decodeSequence:
{ size_t offset;
#if defined(__clang__)
if (LIKELY(ofBits > 1)) {
#else
if (ofBits > 1) {
#endif
ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);
From profile-annotated assembly, the probability of ofBits > 1 is about 75%
(101k counts out of 135k counts). This is much smaller than the recommended
likelihood to use __builtin_expect which is 99%. As a result, clang moved the
else block further away which hurts cache locality. Removing this
__built_expect along with two others in ZSTD_decodeSequence gave better
performance when PGO is enabled. I suggest to remove these branch hints and
rely on PGO which leverages runtime profiles from actual workload to calculate
branch probability instead.
Inlining `BIT_reloadDStream` provided >3% decompression speed improvement for
clang PGO-optimized zstd binary, measured using the Silesia corpus with
compression level 1. The win comes from improved register allocation which leads
to fewer spills and reloads. Take a look at this comparison of
profile-annotated hot assembly before and after this change:
https://www.diffchecker.com/UjDGIyLz/. The diff is a bit messy, but notice three
fewer moves after inlining.
In general LLVM's register allocator works better when it can see more code. For
example, when the register allocator sees a call instruction, it partitions the
registers into caller registers and callee registers, and it is not free to do
whatever it wants with all the registers for the current function. Inlining the
callee lets the register allocation access all registers and use them more
flexsibly.
Rather than remove the flag entirely, as proposed in #3499, this commit uses
the newest C++ standard the compiler supports. This retains the selection of
using only standardized features (excluding GNU extensions) and keeps the
recency requirements of the codebase explicit.
Tested with various versions of `g++` and `clang++`.
This fixes compilation with clang-cl on Windows. There
is a bug in cmake so that check_linker_flag() doesn't give
the correct result when using link.exe/lld-link.exe.
Details in CMake's gitlab: https://gitlab.kitware.com/cmake/cmake/-/issues/22023Fixes#3522
Every 256 bytes the lazy match finders process without finding a match,
they will increase their step size by 1. So for bytes [0, 256) they search
every position, for bytes [256, 512) they search every other position,
and so on. However, they currently still insert every position into
their hash tables. This is different from fast & dfast, which only
insert the positions they search.
This PR changes that, so now after we've searched 2KB without finding
any matches, at which point we'll only be searching one in 9 positions,
we'll stop inserting every position, and only insert the positions we
search. The exact cutoff of 2KB isn't terribly important, I've just
selected a cutoff that is reasonably large, to minimize the impact on
"normal" data.
This PR only adds skipping to greedy, lazy, and lazy2, but does not
touch btlazy2.
| Dataset | Level | Compiler | CSize ∆ | Speed ∆ |
|---------|-------|--------------|---------|---------|
| Random | 5 | clang-14.0.6 | 0.0% | +704% |
| Random | 5 | gcc-12.2.0 | 0.0% | +670% |
| Random | 7 | clang-14.0.6 | 0.0% | +679% |
| Random | 7 | gcc-12.2.0 | 0.0% | +657% |
| Random | 12 | clang-14.0.6 | 0.0% | +1355% |
| Random | 12 | gcc-12.2.0 | 0.0% | +1331% |
| Silesia | 5 | clang-14.0.6 | +0.002% | +0.35% |
| Silesia | 5 | gcc-12.2.0 | +0.002% | +2.45% |
| Silesia | 7 | clang-14.0.6 | +0.001% | -1.40% |
| Silesia | 7 | gcc-12.2.0 | +0.007% | +0.13% |
| Silesia | 12 | clang-14.0.6 | +0.011% | +22.70% |
| Silesia | 12 | gcc-12.2.0 | +0.011% | -6.68% |
| Enwik8 | 5 | clang-14.0.6 | 0.0% | -1.02% |
| Enwik8 | 5 | gcc-12.2.0 | 0.0% | +0.34% |
| Enwik8 | 7 | clang-14.0.6 | 0.0% | -1.22% |
| Enwik8 | 7 | gcc-12.2.0 | 0.0% | -0.72% |
| Enwik8 | 12 | clang-14.0.6 | 0.0% | +26.19% |
| Enwik8 | 12 | gcc-12.2.0 | 0.0% | -5.70% |
The speed difference for clang at level 12 is real, but is probably
caused by some sort of alignment or codegen issues. clang is
significantly slower than gcc before this PR, but gets up to parity with
it.
I also measured the ratio difference for the HC match finder, and it
looks basically the same as the row-based match finder. The speedup on
random data looks similar. And performance is about neutral, without the
big difference at level 12 for either clang or gcc.
In Python 3.x, a single element of a bytes array is returned as
an integer number. Thus, NEWLINE is an int variable, and attempting
to add it to the line array will fail with a type mismatch error
that may be demonstrated as follows:
[roam@straylight ~]$ python3 -c 'b"hello" + b"\n"[0]'
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: can't concat int to bytes
[roam@straylight ~]$
* Mark all bufferless and block level functions as deprecated
* Update documentation to suggest not using these functions
* Add `_deprecated()` wrappers for functions that we use internally and
call those instead
* patch-from speed optimization: only load portion of dictionary into normal matchfinders
* test regression for x8 multiplier
* fix off-by-one error for bit shift bound
* restrict patchfrom speed optimization to strategy < ZSTD_btultra
* update results.csv
* update regression test
Part 2 of #3528
Adds hash salt that helps to avoid regressions where consecutive compressions use the same tag space with similar data (running zstd -b5e7 enwik8 -B128K reproduces this regression).
- Adds memory type that is guaranteed to have been initialized at least once in the workspace's lifetime.
- Changes tag space in row hash to be based on init once memory.
#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag.
Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.
Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index).
The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).
As reported by @P-E-Meunier in https://github.com/facebook/zstd/issues/2662#issuecomment-1443836186,
seekable format ingestion speed can be particularly slow
when selected `FRAME_SIZE` is very small,
especially in combination with the recent row_hash compression mode.
The specific scenario mentioned was `pijul`,
using frame sizes of 256 bytes and level 10.
This is improved in this PR,
by providing approximate parameter adaptation to the compression process.
Tested locally on a M1 laptop,
ingestion of `enwik8` using `pijul` parameters
went from 35sec. (before this PR) to 2.5sec (with this PR).
For the specific corner case of a file full of zeroes,
this is even more pronounced, going from 45sec. to 0.5sec.
These benefits are unrelated to (and come on top of) other improvement efforts currently being made by @yoniko for the row_hash compression method specifically.
The `seekable_compress` test program has been updated to allows setting compression level,
in order to produce these performance results.
Current timeout is too small for some slower machines, e.g. most modern riscv64 boards,
where tests fail with the following diagnostics:
Traceback (most recent call last):
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 734, in <module>
success = run_tests(tests, opts)
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 601, in run_tests
tests[test_case.name] = test_case.run()
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 285, in run
return self.analyze()
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 275, in analyze
self._join_test()
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 330, in _join_test
(stdout, stderr) = self._test_process.communicate(timeout=self._opts.timeout)
File "/usr/lib64/python3.10/subprocess.py", line 1154, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/usr/lib64/python3.10/subprocess.py", line 2006, in _communicate
self._check_timeout(endtime, orig_timeout, stdout, stderr)
File "/usr/lib64/python3.10/subprocess.py", line 1198, in _check_timeout
raise TimeoutExpired(
subprocess.TimeoutExpired: Command '['/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/cli-tests/compression/window-resize.sh']' timed out after 60 seconds
* Add ZSTD_setFParams() and ZSTD_setParams()
* Modify ZSTD_setCParams() to use ZSTD_setParameter() to avoid a second path setting parameters
* Add unit tests
* Update documentation to suggest using them to replace deprecated functions
Fixes#3396.
- Initializes clevel in `ZSTD_CCtxParams_init`
- Adds CI workflow for msan fuzzers runs without optimization (`-O0`)
- Fixes Makefile to correctly pass on user defined `MOREFLAGS` and `FUZZER_FLAGS` in cases they have been overwritten
The block splitter confuses sequences with literal length == 65536 that use a
repeat offset code. It interprets this as literal length == 0 when deciding the
meaning of the repeat offset, and corrupts the repeat offset history. This is
benign, merely causing suboptimal compression performance, if the confused
history is flushed before the end of the block, e.g. if there are 3 consecutive
non-repeat code sequences after the mistake. It also is only triggered if the
block splitter decided to split the block.
All that to say: This is a rare bug, and requires quite a few conditions to
trigger. However, the good news is that if you have a way to validate that the
decompressed data is correct, e.g. you've enabled zstd's checksum or have a
checksum elsewhere, the original data is very likely recoverable. So if you were
affected by this bug please reach out.
The fix is to remind the block splitter that the literal length is actually 64K.
The test case is a bit tricky to set up, but I've managed to reproduce the issue.
Thanks to @danlark1 for alerting us to the issue and providing us a reproducer!
fix#3500
CMake 3.18 or later was required by #3392. Because it uses
`CheckLinkerFlag`. But requiring CMake 3.18 or later is a bit
aggressive. Because Ubuntu 20.04 LTS still uses CMake 3.16.3:
https://packages.ubuntu.com/search?keywords=cmake
This change disables `-z noexecstack` check with old CMake. This will
not break any existing users. Because users who need `-z noexecstack`
must already use CMake 3.18 or later.
Implemented CI workflow for testing compilation with external compressors and without them. This serves as a sanity check to avoid any code dependencies on libraries that may not always be present. (Reference: #3497 for a bug fix related to this issue.)
Bytef and uInt are zlib types, not available when zlib is disabled
Fixes: 1598e6c634ac ("Async write for decompression")
Fixes: cc0657f27d81 ("AsyncIO compression part 2 - added async read and asyncio to compression code (#3022)")
* Fixes zstd-dll build (https://github.com/facebook/zstd/issues/3492):
- Adds pool.o and threading.o dependency to the zstd-dll target
- Moves custom allocation functions into header to avoid needing to add dependency on common.o
- Adds test target for zstd-dll
- Adds github workflow that buildis zstd-dll
* fix and test MSVC AVX2 build
* treat msbuild warnings as errors
* fix incorrect MSVC 2019 compiler warning
* fix MSVC error D9035: option 'Gm' has been deprecated and will be removed in a future release
We need to run it for the tests, even if programs are disabled. So if
they are disabled, create a build rule for the program, but don't
install it. Just make it available for the test itself.
It depends on the zstd program being built, and passes it as an env
variable. Just like datagen. But for datagen, we explicitly depend on
it, while for zstd, we assume it's built as part of "all".
This can be wrong in two cases:
- when running individual tests, meson can (re)build just what is needed
for that one test
- a later patch will handle building zstd but not by default
This frame is invalid because the `Window_Size = 0`, and the
`Block_Maximum_Size = min(128 KB, Window_Size) = 0`. But the empty
compressed block has a `Block_Content` size of 2, which is invalid.
The fix is to switch to using a `Window_Descriptor` instead of the
`Single_Segment_Flag`. This sets the `Window_Size = 1024`.
Hexdump before this PR: `28b5 2ffd 2000 1500 0000 00`
Hexdump after this PR: `28b5 2ffd 0000 1500 0000 00`
For issue #3482.
such scenario can happen, for example,
when trying a decompression-only benchmark on invalid data.
Other possibilities include an allocation error in an intermediate step.
So far, the benchmark would return immediately, but still return 0.
On command line, this would be confusing, as the program appears successful (though it does not display any successful message).
Now it returns !0, which can be interpreted as an error by command line.
Note that the `fd` is only valid while the file is still open. So we need to
move the setting calls to before we close the file. However! We cannot do so
with the `utime()` call (even though `futimens()` exists) because the follow-
ing `close()` call to the `fd` will reset the atime of the file. So it seems
the `utime()` call has to happen after the file is closed.
Somewhat surprisingly, calling `fchmod()` is non-trivially faster than calling
`chmod()`, and so on.
This commit introduces alternate variants to some common file util functions
that take an optional fd. If present, they call the `f`-variant of the
underlying function. Otherwise, they fall back to the regular filename-taking
version of the function.
In 32-bit mode, ZSTD_getOffsetInfo() can be called when nbSeq == 0, and
in this case the offset table is uninitialized. The function should just
return 0 for both values, because there are no sequences.
Credit to OSS-Fuzz
The 32-bit decoder could corrupt the regenerated data by using regular
offset mode when there were actually long offsets. This is because we
were only considering the window size in the calculation, not the
dictionary size. So a large dictionary could allow longer offsets.
Fix this in two ways:
1. Instead of looking at the window size, look at the total referencable
bytes in the history buffer. Use this in the comparison instead of
the window size. Additionally, we were comparing against the wrong
value, it was too low. Fix that by computing exactly the maximum
offset for regular sequence decoding.
2. If it is possible that we have long offsets due to (1), then check
the offset code decoding table, and if the decoding table's maximum
number of additional bits is no more than STREAM_ACCUMULATOR_MIN,
then we can't have long offsets.
This gates us to be using the long offsets decoder only when we are very
likely to actually have long offsets.
Note that this bug only affects the decoding of the data, and the
original compressed data, if re-read with a patched decoder, will
correctly regenerate the orginal data. Except that the encoder also had
the same issue previously.
This fixes both the open OSS-Fuzz issues.
Credit to OSS-Fuzz
The previous code had an issue when `bitsConsumed == 32` it would read 0
bits for the `ofBits` read, which violates the precondition of
`BIT_readBitsFast()`. This can happen when the stream is corrupted.
Fix thie issue by always reading the maximum possible number of extra
bits. I've measured neutral decoding performance, likely because this
branch is unlikely, but this should be faster anyways. And if not, it is
only 32-bit decoding, so performance isn't as critical.
Credit to OSS-Fuzz
Delete all unused FSE functions, now that we are no longer syncing
to/from upstream.
This avoids confusion about Zstd's stack usage like in Issue #3453.
It also removes dead code, which is always a plus.
The input bounds checks were buggy because they were only breaking from
the inner loop, not the outer loop. The fuzzers found this immediately.
The fix is to use `goto _out` instead of `break`.
This condition can happen on corrupted inputs.
I've benchmarked before and after on x86-64 and there were small changes
in performance, some positive, and some negative, and they end up about
balacing out.
Credit to OSS-Fuzz
Previously, cli-test would, by default, check that a stderr output is strictly identical to a saved outcome.
When there was no instructions on how to interpret stderr, it would default to requiring it to be empty.
There are many tests cases though where stderr content doesn't matter, and we are mainly interested in the return code of the cli.
For these cases, it was possible to set a .ignore document, which would instruct to ignore stderr content.
This PR update the logic, to make .ignore the default.
When willing to check that stderr content is empty, one must now add an empty .strict file.
This will allow status message to evolve without triggering many cli-tests errors.
This is especially important when some of these status include compression results, which may change as a result of compression optimizations.
It also makes it easier to add new tests which only care about the CLI's return code.
[Bugfix] CLI row hash flags set the wrong values
`--[no-]row-match-finder` do the opposite of what they are supposed to.
In effect the no option would activate row hash while the other option will disable it.
This commit fixes the issue and changes the code to use the more readable enum values.
Before calling a dictionary good, make sure that it can compress an
input. If v0.7.3 rejects v0.7.3's dictionary, fall back to the v1.0
dictionary. This is not the job of the verison test to test it, because
we cannot fix this code.
Add generic C versions of the fast decoding loops to serve architectures
that don't have an assembly implementation. Also allow selecting the C
decoding loop over the assembly decoding loop through a zstd
decompression parameter `ZSTD_d_disableHuffmanAssembly`.
I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor.
The benchmark command forces zstd to compress without any matches, using
only literals compression, and measures only Huffman decompression speed:
```
zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar
```
The new fast decoding loops outperform the previous implementation uniformly,
but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer
from the same stability problems that we've seen in the past, where the assembly
version doesn't. So even though clang gets close to assembly on x86-64, it still
has stability issues.
| Arch | Function | Compiler | Default (MB/s) | Assembly (MB/s) | Fast (MB/s) |
|---------|----------------|--------------|----------------|-----------------|-------------|
| x86-64 | decompress 4X1 | gcc-12.2.0 | 1029.6 | 1308.1 | 1208.1 |
| x86-64 | decompress 4X1 | clang-14.0.6 | 1019.3 | 1305.6 | 1276.3 |
| x86-64 | decompress 4X2 | gcc-12.2.0 | 1348.5 | 1657.0 | 1374.1 |
| x86-64 | decompress 4X2 | clang-14.0.6 | 1027.6 | 1659.9 | 1468.1 |
| aarch64 | decompress 4X1 | clang-12.0.5 | 1081.0 | N/A | 1234.9 |
| aarch64 | decompress 4X2 | clang-12.0.5 | 1270.0 | N/A | 1516.6 |
Fix the following documentation bugs:
* Note that `ZSTD_estimate*` functions are not compatible with the external matchfinder API
* Note that `ZSTD_estimateCStreamSize_usingCCtxParams()` is not compatible with `nbWorkers >= 1`
* Remove incorrect warning that the legacy streaming API is incompatible with advanced parameters and/or dictionary compression
* Note that `ZSTD_initCStream()` is incompatible with dictionary compression
* Warn that
`zstd` CLI has progressively moved to the policy of
ignoring `--rm` command when the output is `stdout`.
The primary drive is to feature a behavior more consistent with `gzip`,
when `--rm` is the default, but is also ignored when output is `stdout`.
Other policies are certainly possible, but would break from this `gzip` convention.
The new policy was inconsistenly enforced, depending on the exact list of commands.
For example, it was possible to circumvent it by using `-c --rm` in this order,
which would re-establish source removal.
- Update the CLI so that it necessarily catch these situations and ensure that `--rm` is always disabled when output is `stdout`.
- Added a warning message in this case (for verbosity 3 `-v`).
- Added an `assert()`, which controls that `--rm` is no longer active with `stdout`
- Added tests, which control the behavior, even when `--rm` is added after `-c`
- Removed some legacy code which where trying to apply a specific policy for the `stdout` + `--rm` case, which is no longer possible
In GCC, we can add a couple more flags to give us confidence that the profile
data is actually being found and used.
Also, my system for example doesn't have a binary installed under the name
`llvm-profdata`, but it does have, e.g., `llvm-profdata-13`, etc. So this
commit adds a variable that can be overridden.
Older versions of zstandard have a bug in the dictionary builder, that
can cause dictionary building to fail. The process still exits 0, but
the dictionary is not created.
For reference, the bug is that it creates a dictionary that starts with
the zstd dictionary magic, in the process of writing the dictionary header,
but the header isn't fully written yet, and zstd fails compressions in
this case, because the dictionary is malformated. We fixed this later on
by trying to load the dictionary as a zstd dictionary, but if that fails
we fallback to content only (by default).
The fix is to:
1. Make the dictionary determinsitic by sorting the input files.
Previously the bug would only sometimes occur, when the input files
were in a particular order.
2. If dictionary creation fails, fallback to the `head` dictionary.
* Cap shortCache chainLog to 24
* Cap row match finder hashLog so that rowLog <= 24
* Add unit tests to expose all cases. The row match finder unit tests
are only run in 64-bit mode, because they allocate ~1GB.
Fixes#3336
The dictionary source files were taken from the `dev` branch before this
commit, which could introduce non-determinism on PR jobs. Instead take
the sources from the PR checkout.
This PR also adds stderr logging, and verbose output for the jobs that
are failing, to help catch the failure if it occurs again.
* deprecate advanced streaming functions
* remove internal usage of the deprecated functions
* nit
* suppress warnings in tests/zstreamtest.c
* purge ZSTD_initDStream_usingDict
* nits
* c90 compat
* zstreamtest.c already disables deprecation warnings!
* fix initDStream() return value
* fix typo
* wasn't able to import private symbol properly, this commit works around that
* new strategy for zbuff
* undo zbuff deprecation warning changes
* move ZSTD_DISABLE_DEPRECATE_WARNINGS from .h to .c
The timer storage type is no longer dependent on OS.
This will make it possible to re-enable posix precise timers
since the timer storage type will no longer be sensible to #include order.
See #3168 for details of pbs of previous interface.
Suggestion by @terrelln
* Add a function and macro ZSTD_decompressionMargin() that computes the
decompression margin for in-place decompression. The function computes
a tight margin that works in all cases, and the macro computes an upper
bound that will only work if flush isn't used.
* When doing in-place decompression, make sure that our output buffer
doesn't overlap with the input buffer. This ensures that we don't
decide to use the portion of the output buffer that overlaps the input
buffer for temporary memory, like for literals.
* Add a simple unit test.
* Add in-place decompression to the simple_round_trip and
stream_round_trip fuzzers. This should help verify that our margin stays
correct.
A minor change in 5434de0 changed a `<=` into a `<`,
and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress
(previous limit was effectively 7 literals).
This is not in itself a problem, as the threshold is merely an heuristic,
but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit.
This bug would make the literal compressor believes that all literals are the same symbol,
but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions,
this outcome could be false, resulting in data corruption.
Replaced the blind heuristic by an actual test for all limit cases,
so that even if the threshold is changed again in the future,
the detection of RLE mode will remain reliable.
At Google we fuzz zstd without ZSTD_MULTITHREAD but we want inputs to be as much as reproducible. It allows us to test new fuzzing methods for our fuzz team internally and have more horsepower to find bugs
comparing level 19 to level 22 and expecting a stricter better result from level 22
is not that guaranteed,
because level 19 and 22 are very close to each other,
especially for small files,
so any noise in the final compression result
result in failing this test.
Level 22 could be compared to something much lower, like level 15,
But level 19 is required anyway, because there is a clamping test which depends on it.
Removed level 22, kept level 19
fix#3328
In situations where the alphabet size is very small,
the evaluation of literal costs from the Optimal Parser is initially incorrect.
It takes some time to converge, during which compression is less efficient.
This is especially important for small files,
because there will not be enough data to converge,
so most of the parsing is selected based on incorrect metrics.
After this patch, the scenario ##3328 gets fixed,
delivering the expected 29 bytes compressed size (smallest known compressed size).
Inspired by #3395,
offer a new capability to set all parameters defined in a ZSTD_compressionParameters structure
with a single symbol invocation
to improve user code brevity.
Reported by @shulib :
the specification for 4-streams mode
doesn't work when the amount of literals to compress is 5 bytes.
Extending it, it also doesn't work for sizes 1 or 2.
This patch updates the specification and the implementation
to require a minimum of 6 literals to trigger or accept the 4-streams mode.
The impact is expected to be a no-op :
the 4-streams mode is never triggered for such small quantity of literals anyway,
since it would be wasteful (it costs ~7.3 bytes more than single-stream mode).
An informal lower limit is set at ~256 bytes,
so the technical minimum is very far from this limit.
This is just meant for completeness of the specification.
Basic tests for (de)compressing in the following modes:
* file to file
* file to stdout
* stdin to file
* stdin to stdout
These are basic tests, and aren't testing more advanced scenarios, but
it adds the groundwork for more complex tests as needed.
Fixes#3010.
Thanks to @eli-schwartz for pointing it out!
We should maybe consider adding a helper function for applying
`ZSTD_parameters` and `ZSTD_compressionParameters` to a context.
That would aid the transition to the new API in situations like this.
`./run.py --set-exact-output` will update `stdout.expect` and
`stderr.expect` to match the expected output. This doesn't apply to
outputs which use `.glob` or `.ignore`.
The one that isn't pinned is the OSS-Fuzz builder and runner. They don't
offer tagged releases. I could pin to the current master commit, but I'm not
sure how desirable that is.
Tell CMake to explicitly compile our assembly as C code, because we
require it is compiled by a C compiler, and it is only enabled for
clang/gcc.
Fixes#3193.
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f);
do
sed -i 's/\-2021/-present/' $f;
done
g co HEAD -- .github/workflows/dev-short-tests.yml # fix bad match
```
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
* Add `Portability.h` to fix min/max issues.
* Fix conversion warnings
* Assert that windowLog <= 23, which is currently always the case.
This could be loosened, but we aren't looking to add new functionality.
Fixes on top of PR #3375 by @eli-schwartz, which added Windows CI for contrib & programs.
When spawning a Windows thread we have small worker wrapper function that translates
between the interfaces of Windows and POSIX threads.
This wrapper is given a pointer that might get stale before the worker starts running,
resulting in UB and crashes.
This commit adds synchronization so that we know the wrapper has finished reading the data
it needs before we allow the main thread to resume execution.
1. If threads are resized the threads' `ZSTD_pthread_t` might move
while the worker still holds a pointer into it (see more details in #3120).
2. The join operation was waiting for a thread and then return its `thread.arg`
as a return value, but since the `ZSTD_pthread_t thread` was passed by value it
would have a stale `arg` that wouldn't match the thread's actual return value.
This fix changes the `ZSTD_pthread_join` API and removes support for returning
a value. This means that we are diverging from the `pthread_join` API and this
is no longer just an alias.
In the future, if needed, we could return a Windows thread's return value using
`GetExitCodeThread`, but as this path wouldn't be excised in any case, it's
preferable to not add it right now.
Fix `zdict.h` static linking only section so if you include it twice it
still exposes the static linking only symbols. E.g. this pattern:
```
```
This can easily happen when a header you include includes `zdict.h`.
`ZSTD_LIB_MINIFY` broke in 8bf699aa59372d7c2bb4216bcf8037cab7dae51e.
This commit fixes the macro and the static library shrinks from ~600K to 324K
with ZSTD_LIB_MINIFY set.
Fixes#3066.
1. Follow the scheme introduced in PR #2501 for both `zdict.h` and `zstd_errors.h`.
2. If the `*_VISIBLE` macro isn't set, but the `*_VISIBILITY` macro is, use that.
Also make this change for `zstd.h`, since we probably shouldn't have changed
that macro name without backward compatibility in the first place.
3. Change all references to `*_VISIBILITY` to `*_VISIBLE`.
Fixes#3359.
It seems like with the deletion of Travis CI we didn't successfully transfer the
version compatibility test. Attempt to enable the version compatibility test.
There are a couple of oddities here. We don't attempt to build e.g.
contrib, because that doesn't seem to work at the moment. Also notice
that each command is its own step. This happens because github actions
runs in powershell, which doesn't seem to let you abort on the first
failure.
playTests.sh has an option to run really slow tests. This is enabled by
default in Meson, but what we really want is to do like the Makefile,
and run the fast ones by default, but with an option to run the slow
ones instead.
It's entirely possible some people don't have valgrind installed, but
still want to run the tests. If they don't have it installed, then they
probably don't intend to run those precise test targets anyway.
Also, this solves an error when running the tests in an automated
environment. The valgrind tests have a hard dependency on behavior such
as `./zstd` erroring out with the message "stdin is a console, aborting"
which does not work if the automated environment doesn't have a console.
As a rough heuristic, automated environments lacking a console will
*probably* also not have valgrind, so avoiding that test definition
neatly sidesteps the issue.
Also, valgrind is not easily installable on macOS, at least homebrew
says it isn't available there. This makes it needlessly hard to
enable the testsuite on macOS.
In commit 031de3c69ccbf3282ed02fb49369b476730aeca8 a feature of Meson
0.50.0 was added, but the minimum specified version of Meson is 0.48.0.
Meson therefore emitted a warning:
WARNING: Project targets '>=0.48.0' but uses feature introduced in '0.50.0': required arg in compiler.has_header.
And if anyone actually used Meson 0.48.0 to build with, it would error
out with mysterious claims that the build file itself is invalid, rather
than telling the user to install a newer version of Meson.
Solve this by bumping the minimum version to align with reality. This
e.g. drops support for Debian oldstable (buster)'s packaged version of
Meson, but still works if backports are enabled, or if the user can
`pip install` a newer version.
In commit 031de3c69ccbf3282ed02fb49369b476730aeca8 some code was added
that returned a boolean, but was treated as if it returned a dependency
object. This wasn't tested and could not work. Moreover, zstd no longer
built at all unless the entire programs directory was disabled and not
even evaluated.
Fix the return type checking.
It uses non-portable compiler options unconditionally. Elsewhere, we
check the compiler ID and only add the right ones, globally. Do the same
here.
NDEBUG can actually be handled by a core option, so while we are moving
things around, do so.
Unfortunately, this doesn't fix things entirely. The remaining issue is
not Meson's issue though -- MSVC simply does not like this source code
and somehow chokes on innocent code with the inscrutable "syntax error"
and "illegal token".
fixed#3323, reported by @nigeltao
Completed documentation around this risk
(which is largely theoretical,
I can't see that happening in any "real world" scenario,
but an erroneous @srcSize value could indeed trigger it).
Fix an off-by-one error in the compressor that emits corrupt blocks if:
* Zstd is compiled in 32-bit mode
* The windowLog == 25 exactly
* An offset of 2^25-3, 2^25-2, 2^25-1, or 2^25 is emitted
* The bitstream had 7 bits leftover before writing the offset
This bug has been present since before v1.0, but wasn't able to easily
be triggered, since until somewhat recently zstd wasn't able to find
matches that were within 128KB of the window size.
Add a test case, and fix 2 bugs in `ZSTD_compressSequences()`:
* The `ZSTD_isRLE()` check was incorrect. It wouldn't produce
corruption, but it could waste CPU and not emit RLE even if the block
was RLE
* One windowSize was `1 << windowLog`, not `1u << windowLog`
Thanks to @tansy for finding the issue, and giving us a reproducer!
Fixes Issue #3350.
Delete unaligned memory access code from the legacy codebase by removing all the
non-memcpy functions. We don't care about speed at all for this codebase, only
simplicity.
Fix an instance of `NULL + 0` in `ZSTD_decompressStream()`. Also, improve our
`stream_decompress` fuzzer to pass `NULL` in/out buffers to
`ZSTD_decompressStream()`, and fix 2 issues that were immediately surfaced.
Fixes#3351
Split the logic for parameter adaption from the logic to update the display rate.
This decouples the two updates, so changes to display updates don't affect
parameter adaption.
Also add a test case that checks that parameter adaption actually happens.
This fixes Issue #3353, where --adapt is broken when --no-progress is passed.
Instead of using packed attribute hack, just use aligned attribute. It
improves code generation on armv6 and armv7, and slightly improves code
generation on aarch64. GCC generates identical code to regular aligned
access on ARMv6 for all versions between 4.5 and trunk, except GCC 5
which is buggy and generates the same (bad) code as packed access:
https://gcc.godbolt.org/z/hq37rz7sb
* Centralize the logic about whether to print the progress bar or not in
the `*_PROGRESS()` macros.
* Centralize the logc about whether to print the summary line or not in
`FIO_shouldDisplayFileSummary()` and
`FIO_shouldDisplayMultipleFileSummary()`.
* Make `--progress` work for non-zstd (de)compressors.
* Clean up several edge cases in compression and decompression progress
printing along the way. E.g. wrong log level, or missing summary line.
One thing I don't like about stdout mode, which sets the display level
to 1, is that warnings aren't displayed. After this PR, we could change
stdout mode from lowering the display level, to defaulting to implied
`--no-progress`. But, I think that deserves a separate PR.
We've been unable to effectively test cases where stdin/stdout/stderr
are consoles, because in our test cases they generally aren't. Allow the
command line flags `--fake-std{in,out,err}-is-console` to tell the CLI
to pretend that std{in,out,err} is a console.
Newer gcc versions were getting smart and omitting the `memset()`.
Get around this issue by outlining the `memset()` into a different
function. This test is still hacky, but it works...
Run the scraper command to establish the project version immediately,
rather than wait for the build to be configured. This simplifies the
code and ensures that project introspection works correctly.
Disable ASM in the kernel for now. It requires a few changes & setup to
get working. Instead of doing it in a zstd version update, I'd prefer to
package that change as a single patch, and propose it separately from
the version update. This makes the version update easier, and reduces
some risk.
The zstd_common module was added upstream in commit
637a642f5c.
But the kernel specific code was inlined into the library. This commit
switches it to use the out of line method that we use for the other
modules.
Use a switch statement to select the search function instead of an
indirect function call. This results in a sizable performance win.
This PR is a modification of the approach taken in PR #2828.
When I measured performance for that commit, it was neutral.
However, I now see a performance regression on gcc, but still
neutral on clang. I'm measuring on the same platform, but with
newer compilers. The new approach beats both the current dev
branch and the baseline before PR #2828 was merged.
This PR is necessary for Issue #3275, to update zstd in the kernel.
Without this PR there is a large regression in greedy - btlazy2
compression speed. With this PR it is about neutral.
gcc version: 12.2.0
clang version: 14.0.6
dataset: silesia.tar
| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta |
|----------|-------|------------------|-----------------|--------|
| gcc | 5 | 102.6 | 113.7 | +10.8% |
| gcc | 7 | 66.6 | 74.8 | +12.3% |
| gcc | 9 | 51.5 | 58.9 | +14.3% |
| gcc | 13 | 14.3 | 14.3 | +0.0% |
| clang | 5 | 108.1 | 114.8 | +6.2% |
| clang | 7 | 68.5 | 72.3 | +5.5% |
| clang | 9 | 53.2 | 56.2 | +5.6% |
| clang | 13 | 14.3 | 14.7 | +2.8% |
The binary size stays just about the same for clang and gcc, measured
using the `size` command:
| Compiler | Branch | Text | Data | BSS | Total |
|----------|--------|---------|------|-----|---------|
| gcc | dev | 1127950 | 3312 | 280 | 1131542 |
| gcc | PR | 1123422 | 2512 | 280 | 1126214 |
| clang | dev | 1046254 | 3256 | 216 | 1049726 |
| clang | PR | 1048198 | 2296 | 216 | 1050710 |
Add a `--spdx` option to the freestanding script to prefix
files with a line like (for `.c` files):
// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
or (for `.h` and `.S` files):
/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
Given the style of the line to be used depends on the extension,
a simple `sed` insert command would not work.
It also skips the file if an existing SPDX line is there,
as well as raising an error if an unexpected SPDX line appears
anywhere else in the file, as well as for unexpected
file extensions.
I double-checked that all currently generated files appear
to be license as expected with:
grep -LRF 'This source code is licensed under both the BSD-style license (found in the' linux/lib/zstd
grep -LRF 'LICENSE file in the root directory of this source tree) and the GPLv2 (found' linux/lib/zstd
but somebody knowledgable on the licensing of the project should
double-check this is the intended case.
Fixes: https://github.com/facebook/zstd/issues/3293
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Currently this function actually reads the dict ID from the dictionary's
header, via `ZSTD_getDictID_fromDict()`. But during decompression the decomp-
ressor actually compares the dict ID in the frame header with the dict ID in
the DDict. Now of course the dict ID in the dictionary contents and the dict
ID in the DDict struct *should* be the same. But in cases of memory corrupt-
ion, where they can drift out of sync, it's misleading for this function to
read it again from the dict buffer rather then return the dict ID that will
actually be used.
Also doing it this way avoids rechecking the magic and so on and so it is a
tiny bit more efficient.
Since your Makefile uses obj/$(HASH_DIR) for object files, this code does not work correctly for GCC. Because profiles are saved in one directory, and are expected in another when reading.
`$(RM) zstd *.o` - this line doesn't delete object files.
Clang stores profiles in the current directory, so the problem doesn't appear when compiling with Clang.
Also this code will work if BUILD_DIR is set.
Clang doesn't allow [[deprecated(...)]] attribute after __attribute__.
Move [[deprecated(...)]] before __attribute__ to fix C++14/C++17 uses
with Clang.
Fix#3250
* add checks to mal-formed numeric values for memory and memlimit parameters
Signed-off-by: Ly Cao <lycao@fb.com>
* changed errorMsg to a literal string instead of static string in main
* moved bogus numeric error to NEXT_UINT32 + add macro NEXT_TSIZE
Signed-off-by: Ly Cao <lycao@fb.com>
Signed-off-by: Ly Cao <lycao@fb.com>
Co-authored-by: Ly Cao <lycao@fb.com>
As mentioned in
https://github.com/facebook/zstd/pull/3252#issuecomment-1251733791 ,
this patch adds a CI job that builds and installs libzstd on the job
runner, and then compiles a sample binary linking against the installed
library; the needed build flags are passed by invoking pkg-config.
Comparing 4B instead of comparing 1B in ZSTD_noDict
mode, thus it can avoid cases like match in match[ml]
but mismatch in match[ml-3]..match[ml-1]. So the call
count of ZSTD_count can be reduced.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I3449ea423d5c8e8344f75341f19a2d1643c703f6
When creating a new `Makefile` target to build,
it's also necessary to update the `clean` target,
which purpose is to remove built targets when they are present.
This process is simple, but it's also easy to forget :
since there is a large distance between the position in the `Makefile` where the new built target is added,
and the place where the list of files to `clean` is defined.
Moreover, the list of files becomes pretty long over time,
hence it's difficult to visually ensure that all built targets are present there,
or that no old target (no longer produced) is no longer in the list
This PR tries to improve this process by adding a CLEAN variable.
Now, when a new built target is added to the `Makefile`,
it should preceded by :
```
CLEAN += newTarget
newTarget:
<TAB> ...recipe...
```
This new requirement is somewhat similar to `.PHONY: newTarget` for non-built targets.
This new method offers the advantage of locality :
there is no separate place in the file to maintain a list of files to clean.
This makes maintenance of `make clean` easier.
With this patch the pkg-config generation when using the CMake build
system is improved in the following ways:
- Libs.private is now filled when needed
- The JoinPaths module is now used to join paths, leading to simpler
code
- The .pc file is always generated, regardless of the platform, as it
can also be consumed on Windows
Here's how the .pc file is affected by these changes, in comparison to
the one generated with the official Makefiles:
$ diff -s lib/libzstd.pc build/cmake/build-old/lib/libzstd.pc
15c15
< Libs.private: -pthread
---
> Libs.private:
$ diff -s lib/libzstd.pc build/cmake/build-new/lib/libzstd.pc
Files lib/libzstd.pc and build/cmake/build-new/lib/libzstd.pc are
identical
Adds documentation to help and man pages for legacy pass-through behavior
when force is set and destination is stdout. Documents --pass-through in
man pages
Fixes#3211.
Adds the `--[no-]pass-through` flag which enables/disables pass-through mode.
* `zstdcat`, `zcat`, and `gzcat` default to `--pass-through`.
Pass-through mode can be disabled by passing `--no-pass-through`.
* All other binaries default to not setting pass-through mode.
However, we preserve the legacy behavior of enabling pass-through
mode when writing to stdout with `-f` set, unless pass-through
mode is explicitly disabled with `--no-pass-through`.
Adds a new test for this behavior that should codify the behavior we want.
fileio_types.h cannot be parsed by itself
because it relies on basic types defined in `lib/common/mem.h`.
As for #3231, it likely wasn't detected because `mem.h` was probably included before within target files.
But this is not proper.
A "easy" solution would be to add the missing include,
but each dependency should be considered "bad" by default,
and only allowed if it brings some tangible value.
In this case, since these types are only used to declare internal structure variables
which are effectively only flags,
I believe it's really not valuable to add a dependency on `mem.h` for this purpose
while the standard `int` type can do the same job.
I was expecting some compiler warnings following this change,
but it turns out we don't use `-Wconversion` by default on `zstd` source code,
so there is none.
Nevertheless, I enabled `-Wconversion` locally and proceeded to fix a few conversion warnings in the process.
Adding `-Wconversion` to the list of flags used for `zstd` is something I would be favorable over the long term,
but it cannot be done overnight,
because the nb of places where this warning is triggered is daunting.
Better progressively reduce the nb of triggered `-Wconversion` warnings before enabling this flag by default.
Fixes#3212.
Long literal and match lengths had an off-by-one error in ZSTD_getSequenceLength.
Fix the off-by-one error, and add a golden compression test that catches the bug.
Also run all the golden tests in the cli-tests framework.
The discussion for this task is here: facebook/zstd#3128.
This task can probably be scoped to the first part: marking these functions deprecated.
We'll later look at removal when we roll out v1.6.0.
When user pass in argument for both decompression and multi-thread, print a warning message
to indicate that multi-threaded decompression is not supported.
* Add warning when multi-thread decompression is requested
* add test case for multi-threaded decoding warning
Expectation is for -d -T0 we will not throw any warning,
and see warning for any other -d -T(>1) inputs
In zlib 1.2.12 the OF macro was changed to _Z_OF breaking any
project that used zlibWrapper. To fix this the OF has been
changed to _Z_OF everywhere and _Z_OF is defined as OF in the
case it is not yet defined for zlib 1.2.11 and older.
Fixes: https://github.com/facebook/zstd/issues/3216
* Fixing compiler warnings
* Replace the old -s flag with the -Wl,-s flag
* Fixing compiler warnings
* Fixing the linker strip flag and tests/code not working as expected on AIX
With statistic data of test data files of silesia
the chance of position beyond highThreshold is very
low (~1.3%@L8 in most cases, all <2.5%), and is in
"lowprob area". Add the branch hint so compiler can
get better pipiline codegen.
With this change it is observed ~1% of mozilla and
xml, and slight (0.3%~0.8%) but consistent uplift on
other files on Arm N1.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade
match is used for following sequence copy. It is
only updated when extDict is needed, which is a
low probability case. So it can be prefetched to
reduce cache miss.
The benchmarks on various Arm platforms showed
uplift from 1% ~ 14% with gcc-11/clang-14.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: If201af4799d2455d74c79f8387404439d7f684ae
Summary:
Freeing an uninitialized pointer is undefined behavior. This caused a segfault
when compiling the benchmark with Clang -O3 and benching decompression.
V2: always create compressInstructions but check if cctxParams is NULL before
setting CCtx params to avoid segfault.
Test Plan:
make and run
Summary:
Added an option -p# where -p0 (default) sets the aggregation method to fastest
speed while -p1 sets the aggregation method to median. Also added a new column
in the csv file to report this option's value.
Test Plan:
``
$ ./largeNbDicts -1 --nbDicts=1 -D ~/benchmarks/html/html_8_16K.32K.dict
~/benchmarks/html/html_8_16K/*
loading 7450 files...
created src buffer of size 83.4 MB
split input into 7450 blocks
loading dictionary /home/zhuhan/benchmarks/html/html_8_16K.32K.dict
compressing at level 1 without dictionary : Ratio=3.03 (28827863 bytes)
compressed using a 32768 bytes dictionary : Ratio=4.28 (20410262 bytes)
generating 1 dictionaries, using 0.1 MB of memory
Compression Speed : 306.0 MB/s
Fastest Speed : 310.6 MB/s
$ ./largeNbDicts -1 --nbDicts=1 -p1 -D ~/benchmarks/html/html_8_16K.32K.dict
~/benchmarks/html/html_8_16K/*
loading 7450 files...
created src buffer of size 83.4 MB
split input into 7450 blocks
loading dictionary /home/zhuhan/benchmarks/html/html_8_16K.32K.dict
compressing at level 1 without dictionary : Ratio=3.03 (28827863 bytes)
compressed using a 32768 bytes dictionary : Ratio=4.28 (20410262 bytes)
generating 1 dictionaries, using 0.1 MB of memory
Compression Speed : 306.9 MB/s
Median Speed : 298.4 MB/s
```
Summary:
Add column headers and data for whether it's a compression or a decompression
run, compression level, nbDicts and dictAttachPref in additional to
compr/decompr speed.
Test Plan:
Example output:
```
./largeNbDicts
Compression/Decompression,Level,nbDicts,dictAttachPref,Speed
Compression,1,1,0,300.9
Compression,1,1,1,296.4
Compression,1,1,2,307.8
Compression,1,10,0,292.3
Compression,1,100,0,293.3
Compression,3,110,0,106.0
Decompression,-1,110,-1,155.6
Decompression,-1,110,-1,709.4
Decompression,-1,120,-1,709.1
Decompression,-1,120,-1,734.6
```
Benchmarking decompression results in a segfault in `createCompressInstructions`
because `cctxParams` is NULL. Skip running that function if we are not benching
compression.
* Intial commit to address 3090. Added support to decompress empty block
* Update zstd_decompress_block.c
Addressed review comments for the case of 'set_basic'
* Update lib/decompress/zstd_decompress_block.c
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
* Update lib/decompress/zstd_decompress_block.c
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
The discussion for this task is here: facebook/zstd#3128.
This task can probably be scoped to the first part: marking these functions deprecated.
We'll later look at removal when we roll out v1.6.0.
Streaming decompression used to wait for a minimum of 5 bytes before attempting decoding.
This meant that, in the case that only a few bytes (<5) were provided,
and assuming these bytes are incorrect,
there would be no error reported.
The streaming API would simply request more data, waiting for at least 5 bytes.
This PR makes it possible to detect incorrect Frame IDs as soon as the first byte is provided.
Fix#3169
* first attempt at fast DMS short cache
* significant wins for some scenarios
* fix all clang regressions
* nits
* fix 1.5% gcc11 regression on hot 110Kdict scenario
* fix CI
* nit
* Add tags to doublefast hash table
* use tags in doublefast DMS
* Fix CI
* Clean up some hardcoded logic / constants
* Switch forCCtx to an enum
* nit
* add short cache to ip+1 long search
* Move tag size into hashLog
* Minor nits
* Truncate dictionaries greater than 16MB in short cache mode
* Helper function for tag comparison
* Cap short cache hashLog at 24 to prevent overflow
* size_t dictTagsMatch -> int dictTagsMatch
* nit
* Clean up and comment dictionary truncation
* Move ZSTD_tableFillPurpose_e next to ZSTD_dictTableLoadMethod_e
* Comment and expand helper functions
* Asserts and documentation
* nit
This assert slows the loop down by 10x. We can get similar
coverage by asserting at the beginning & end of the loop.
We need this fix because Debian compiles zstd with asserts
enabled. Separately, we should ask them why, and if they would
consider disabling asserts in their builds. Since we don't
optimize for assert enabled builds.
Fixes Issue #3150.
ZSTD_seqSymbol is a structure with total of 64 bits
wide. So it can be loaded in one operation and
extract its fields by simply shifting or extracting
on aarch64.
GCC doesn't recognize this and generates more
unnecessary ldr/ldrb/ldrh operations that cause
performance drop.
With this change it is observed 2~4% uplift of
silesia and 2.5~6% of cantrbry @L8 on Arm N1.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I7748909204cf78a17eb9d4f2333692d53239daa8
On aarch64 ZSTD_wildcopy uses a simple loop to do
16B based memory copy. There is existing optimized
two stage copy that can achieve better performance.
By applying this to aarch64 it is also observed ~1%
uplift in silesia corpus.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: Ic1253308e7a8a7df2d08963ba544e086c81ce8be
We found that movemask is not used properly or consumes too much CPU.
This effort helps to optimize the movemask emulation on ARM.
For level 8-9 we saw 3-5% improvements. For level 10 we say 1.5%
improvement.
The key idea is not to use pure movemasks but to have groups of bits.
For rowEntries == 16, 32 we are going to have groups of size 4 and 2
respectively. It means that each bit will be duplicated within the group
Then we do AND to have only one bit set in the group so that iteration
with lowering bit `a &= (a - 1)` works as well.
Also, aarch64 does not have rotate instructions for 16 bit, only for 32
and 64, that's why we see more improvements for level 8-9.
vshrn_n_u16 instruction is used to achieve that: vshrn_n_u16 shifts by
4 every u16 and narrows to 8 lower bits. See the picture below. It's
also used in
[Folly](c570259008/folly/container/detail/F14Table.h (L446)).
It also uses 2 cycles according to Neoverse-N{1,2} guidelines.
64 bit movemask is already well optimized. We have ongoing experiments
but were not able to validate other implementations work reliably faster.
This commit avoids checking whether a hashtable write is safe in two of the
three match-found paths in `ZSTD_compressBlock_fast_noDict_generic`. This pro-
duces a ~0.5% speed-up in compression.
A comment in the code describes why we can skip this check in the other two
paths (the repcode check and the first match check in the unrolled loop).
A downside is that in the new position where we make this check, we have not
yet computed `mLength`. We therefore have to avoid writing *possibly* dangerous
positions, rather than the old check which only avoids writing *actually*
dangerous positions. This leads to a miniscule loss in ratio (remember that
this scenario can only been triggered in very negative levels or under incomp-
ressibility acceleration).
Partial, Meson-only implementation of #2976 for non-MSVC builds.
Due to the prevalence of private symbol reuse, linking to a shared
library is simply utterly unreliable, but we still want to defer to the
shared library for installable applications. By linking to both, we can
share symbols where possible, and statically link where needed.
This means we no longer need to manually track every file that needs to
be extracted and reused.
The flip side is that MSVC completely does not support this, so for MSVC
builds we just link to a full static copy even where
-Ddefault_library=shared.
As a side benefit, by using library inclusion rather than including
extra explicit object files, the zstd program shrinks in size slightly
(~4kb).
These need to be explicitly included as we use their private symbols,
but we don't need to recompile them when we can reuse the existing
objects.
Minus 7 compile steps.
The poolTests program already linked to libzstd, and later to
libtestcommon with included libzstd objects. So this was redundant.
Minus 4 compile steps.
Adopt the more standard Usage: formatting style
List short and long options alongside where available
Print lists as a table
Use command style description
Newer less versions appear to have changed how stderr
and stdout are showing error messages. hardcode the
expected behavior to make the tests pass with any less version.
Also set locale to C so that the strings are matching.
- As referenced by Nick Terrelln ~ the ZSTD maintainer in the linux kernel, making zstd_reset_cstream() functionally identical to ZSTD_resetCStream() would be the perfect way to fix the warning without touching any core functions or breaking other parts of the code.
Suggested-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Set removeSrcFile back to false when -c or --stdout is used to improve
compatibility with gzip(1) behavior.
gzip(1) is removing the original file on compression unless --stdout or
/-c is used. zstd is defaulting to keep the file unless --rm is used or
when it is called via a gzip symlink, in which it is removing by
default. Specifying -c/--stdout turns this behavior off.
- The previous patch throws the following warning:
../linux/lib/zstd/zstd_compress_module.c: In function ‘zstd_reset_cstream’:
../linux/lib/zstd/zstd_compress_module.c:136:34: error: enum conversion when passing argument 2 of ‘ZSTD_CCtx_reset’ is invalid in C++ [-Werror=c++-compat]
136 | return ZSTD_CCtx_reset(cstream, pledged_src_size);
| ^~~~~~~~~~~~~~~~
In file included from ../linux/include/linux/zstd.h:26,
from ../linux/lib/zstd/zstd_compress_module.c:15:
../linux/include/linux/zstd_lib.h:501:20: note: expected ‘ZSTD_ResetDirective’ {aka ‘enum <anonymous>’} but argument is of type ‘long long unsigned int’
501 | ZSTDLIB_API size_t ZSTD_CCtx_reset(ZSTD_CCtx* cctx, ZSTD_ResetDirective reset);
| ^~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Since we have a choice to either use ZSTD_CCtx_reset or ZSTD_CCtx_setPledgedSrcSize instead of ZSTD_resetCStream, let's switch to ZSTD_CCtx_setPledgedSrcSize to not have any unnecessary warns alongside the kernel build and CI test build.
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
- This fixes the below warning:
../lib/zstd/zstd_compress_module.c: In function 'zstd_reset_cstream':
../lib/zstd/zstd_compress_module.c:136:9: warning: 'ZSTD_resetCStream' is deprecated [-Wdeprecated-declarations]
136 | return ZSTD_resetCStream(cstream, pledged_src_size);
| ^~~~~~
In file included from ../include/linux/zstd.h:26,
from ../lib/zstd/zstd_compress_module.c:15:
../include/linux/zstd_lib.h:2277:8: note: declared here
2277 | size_t ZSTD_resetCStream(ZSTD_CStream* zcs, unsigned long long pledgedSrcSize);
| ^~~~~~~~~~~~~~~~~
ZSTD_resetCstream is deprecated and zstd_CCtx_reset is suggested to use hence let's switch to it.
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
* fix the assertion in readLinesFromFile
When the file is not terminated by endline, readLineFromFile will append
a '\0' for the last line. In this case pos + lineLength == dstCapacity.
* test: don't print very long text garbage
We already have BIT_getLowerBits, so use it. Benefits are 2fold:
1) Somewhat cleaner code
2) We are now using bzhi instructions, when available. Performance
delta is too small for microbenchmarks, but avoiding load still helps
larger applications, by reducing data cache pressure.
The zstd symlinks, notably `zstdcat`, weren't working as expected
because only the `tests/cli-tests/bin/zstd` wrapper was symlinked. We
still invoked `zstd` with the name `zstd`. The fix is to create a
directory of zstd symlinks in `tests/cli-tests/bin/symlinks` for each
name that zstd recognizes. And when `tets/cli-tests/bin/zstd` is
invoked, it selects the correct symlink to call.
See the test `zstd-cli/zstdcat.sh` for an example of how it would work.
* playtests.sh: fix for a bug in macos' /bin/sh that persists temporary env vars when introduced before function calls
* cli-tests/run.py: Do not use existing ZSTD* envvars
The use of --long alters the window size internally in the underlying
library (lib/compress/zstd_compress.c:ZSTD_getCParamsFromCCtxParams),
which changes the memory required for decompression. This means that the
reported requirement from the zstd binary when -vv is specified is
incorrect.
A full fix for this would be to add an API call to be able to retrieve
the required decompression memory from the library, but as a
lighterweight fix we can just take account of the fact we've enabled
long mode and update our verbose output appropriately.
Fixes#2968
credit to oss-fuzz
This issue could happen when using the new Sequence Compression API in Explicit Delimiter Mode
with a too small dstCapacity.
In which case, there was one place where the buffer size wasn't checked.
Currently the build errors out with:
```
ERROR: This script does not work on Python 3.6 The minimum supported Python version is 3.7. Please use https://bootstrap.pypa.io/pip/3.6/get-pip.py instead.
```
While in theory this advice could be followed to get a better pip on
xenial, Meson has now deprecated python 3.6 support too, and the next
(unreleased) version requires python 3.7
There are a couple solutions to this:
- hold the version of pip, allow pip to only install 3.6-compatible
versions of meson (effectively freezing meson going forward)
- install python 3.7 on xenial
- update to a 2-year-old image instead of a 4-year-old one
Option 3 is the simplest.
While trying to raise an exception on failures, it instead raised an
exception for misusing the exception. CalledProcessError is only
supposed to be used when given a return code and a command, and it
prints:
Command '{cmd}' returned non-zero exit status {ret}
Passing an error message string instead, just errored out with:
TypeError: __init__() missing 1 required positional argument
Instead use the subprocess module's base error which does accept string
messages. Everything that used to error out, still errors out, but now
they do so with a slightly prettier console message.
libm is not guaranteed to exist. POSIX requires the math functions to
exist, but doesn't require to have it be a standalone library.
On platforms where libm exists as a standalone library, it will always
be found by meson -- it is shipped with libc.
If it is not found, then we can safely assume the linker will make the
math functions available by default.
See https://mesonbuild.com/howtox.html#add-math-library-lm-portably
Fixes building with bin_tests=true on Windows.
* It couldn't detect that the `fastCoverParams` can't be non-null, since it was just an assertion.
* It thought we were accesing `wksp->dtable` beyond the bounds because we were using it to set the `workSpace` value. Instead, compute the workspace size used in a different way.
discovered by oss-fuzz
It's a bug in the test itself :
ZSTD_compressBound() as an upper bound of the compress size
only works for data compressed "normally".
But in situations where many flushes are forcefully introduced,
this creates many more blocks,
each of which has a potential to increase the size by 3 bytes.
In extreme cases (lots of small incompressible blocks), the expansion can go beyond ZSTD_compressBound().
This situation is similar when using the CompressSequences() API
with Explicit Block Delimiters.
In which case, each explicit block acts like a deliberate flush.
When employed by a fuzzer, it's possible to generate scenarios like the one described above,
with tons of incompressible blocks of small sizes,
thus going beyond ZSTD_compressBound().
fix : when using Explicit Block Delimiters, use a larger bound, to account for this scenario.
It's a bug in the test itself, in exceptional circumstances (no more space for additional sequence).
There should be enough room for all cases to work fine from now on,
and if not, we have an additional `assert()` to catch that situation.
Add cli-tests to `make test`. This adds a `python3` dependency to `make
test`, but not `make check`. We could make this dependency optional by
skipping the tests if `python3` is not present.
`datagen` was printing a `\n` even when it had no other output. Raise
the output level for the final `\n` to the minimum output level used.
This minor bug was caught by the new testing framework.
The option `--auto-threads` should still be accepted and parsed, even if
`ZSTD_MULTITHREAD` is not defined. It doesn't mean anything, but we
should still accept the option. Since we want scripts to be able to work
generically.
This bug was caught by tests I added to the new testing framework.
Passing 0 inputs to `DiB_shuffle()` caused an assertion failure where
it should just return.
A test is added in a later commit, with the initial introduction of the
new testing framework.
Fixes#3007.
credit to oss-fuzz
In rare circumstances, the block-splitter might cut a block at the exact beginning of a repcode.
In which case, since litlength=0, if the repcode expected 1+ literals in front, its signification changes.
This scenario is controlled in ZSTD_seqStore_resolveOffCodes(),
and the repcode is transformed into a raw offset when its new meaning is incorrect.
In more complex scenarios, the previous block might be emitted as uncompressed after all,
thus modifying the expected repcode history.
In the case discovered by oss-fuzz, the first block is emitted as uncompressed,
so the repcode history remains at default values: 1,4,8.
But since the starting repcode is repcode3, and the literal length is == 0,
its meaning is : = repcode1 - 1.
Since repcode1==1, it results in an offset value of 0, which is invalid.
So that's what the `assert()` was verifying : the result of the repcode translation should be a valid offset.
But actually, it doesn't matter, because this result will then be compared to reality,
and since it's an invalid offset, it will necessarily be discarded if incorrect,
then the repcode will be replaced by a raw offset.
So the `assert()` is not useful.
Furthermore, it's incorrect, because it assumes this situation cannot happen, but it does, as described in above scenario.
Knowing the version of zlib/lz4/lzma we're linking against is very
useful for debugging issues with those libraries, so print it out in the
verbosity 4 version output.
Also print this information at the top of `playTests.sh`.
which properly represents the maximum bit size of compressed literals (11) as defined in the specification.
To be preferred from HUF_TABLELOG_DEFAULT which represents the same value but by accident.
Name selected to keep the same convention as existing width definitions,
MLFSELog, LLFSELog and OffFSELog.
In rare cases, the default huffman depth selector is a bit too harsh,
requiring brutal adaptations to the tree,
resulting is some loss of compression ratio.
This new heuristic avoids the worse cases, favoring compression ratio.
As an example, compression of a specific distribution of 771 literals
is now improved to 441 bytes, from 601 bytes before.
-r on empty directory resulted in zstd waiting input from stdin. now zstd exits without error and prints a warning message explaining why no processing happened (no files or directories to process).
* Refactored fileio.c:
- Extracted asyncio code to fileio_asyncio.c/.h
- Moved type definitions to fileio_types.h
- Moved common macro definitions needed by both fileio.c and fileio_asyncio.c to fileio_common.h
* Bugfix - rename fileio_asycio to fileio_asyncio
* Added copyrights & license to new files
* CR fixes
* Async IO decompression:
- Added --[no-]asyncio flag for CLI decompression.
- Replaced dstBuffer in decompression with a pool of write jobs.
- Added an ability to execute write jobs in a separate thread.
- Added an ability to wait (join) on all jobs in a thread pool (queued and running).
We previously triggered release artifact generation on release creation. We
sometimes observed that the action failed to run. I hypothesized that we were
hitting rate limiting or something. I just stumbled across [this documentat-
ion](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#release), which says:
> Note: Workflows are not triggered for the `created`, `edited`, or `deleted`
> activity types for draft releases. When you create your release through the
> GitHub browser UI, your release may automatically be saved as a draft.
This must have been what was happening. This commit therefore changes the
trigger to the `published` activity. This should be more reliable.
This does have the unfortunate side effect that artifacts won't be generated
or attached until *after* the release has been published, which is what I was
trying to avoid by using the `created` activity. Oh well.
Append -z cet-report=error to LDFLAGS if -fcf-protection is enabled by
default in compiler to catch the missing Intel CET marker:
compiling multi-threaded dynamic library 1.5.1
/usr/local/bin/ld: obj/conf_f408b4c825de923ffc88f7f21b6884b1/dynamic/huf_decompress_amd64.o: error: missing IBT and SHSTK properties
collect2: error: ld returned 1 exit status
...
LINK obj/conf_dbc0b41e36c44111bb0bb918e093d7c1/zstd
/usr/local/bin/ld: obj/conf_dbc0b41e36c44111bb0bb918e093d7c1/huf_decompress_amd64.o: error: missing IBT and SHSTK properties
collect2: error: ld returned 1 exit status
When testing amalgamating other projects it was found: invalid Unicode errors were tripping Python's text IO, and the header search order appears differs from the shell version.
Intel Control-flow Enforcement Technology (CET):
https://en.wikipedia.org/wiki/Control-flow_integrity#Intel_Control-flow_Enforcement_Technology
requires that on Linux, all linker input files are marked as CET enabled
in .note.gnu.property section. For high-level language source codes,
.note.gnu.property section is added by compiler with the -fcf-protection
option. For assembly sources, include <cet.h> to add .note.gnu.property
section.
When decompressing with `-q` and an output file, the progress bar was mistakenly printed. This is a minimal fix, with a larger refactor to be stacked on top of it.
Fixes#2967.
oss-fuzz uncovered a scenario where we're evaluating the cost of litLength = 131072,
which can't be represented in the zstd format, so we accessed 1 beyond LL_bits.
Fix the issue by making it cost 1 bit more than litLength = 131071.
There are still follow ups:
1. This happened because literals_cost[0] = 0, so the optimal parser chose 36 literals
over a match. Should we bound literals_cost[literal] > 0, unless the block truly only
has one literal value?
2. When no matches are found, the cost model isn't updated. In this case no matches were
found for an entire block. So the literals cost model wasn't updated at all. That made
the optimal parser think literals_cost[0] = 0, where it is actually quite high, since
the block was entirely random noise.
Credit to OSS-Fuzz.
xxHash symbols are present in `libzstd.so`, but they are local and therefore
unavailable outside the lib. There are two possible solutions to the problem.
We could make those symbols global, or we could remove the dependency.
This commit chooses the latter approach. I suppose this comes at the cost of
code size / build time. I'm open to comments on whether this is a good thing
to do, especially since this will apply even when we are statically linking
everything.
This commit makes several changes:
1. It adds modules for the dictionary builder and errors headers.
2. It captures all of the macros that are used to configure these headers.
When the headers are imported as modules and one of these macros is defined
the compiler issues a warning that it needs to be defined on the CLI.
3. It promotes the modulemap file into the root of the lib directory.
Experimentation shows that clang's `-fimplicit-module-maps` will find the
modulemap when placed here, but not when it's put in a subdirectory.
After merging #2951 I realized that we will want to explicitly disable
assembly when we aren't including the assembly source file. Otherwise,
if some non clang/gcc compiler does support assembly, it would fail to
build.
When re-using a compression state, across multiple successive compressions,
the state should minimize the amount of allocation and initialization required.
This mostly matters in situations where initialization is an overwhelming task
compared to compression itself.
This can happen when the amount to compress is small,
while the compression state was given the impression that it would be much larger,
aka, streaming mode without providing a srcSize hint.
This lean-initialization optimization was broken in 980f3bbf8354edec0ad32b4430800f330185de6a .
This commit fixes it, making this scenario once again on par with v1.4.9.
Note that this does not completely fix#2966,
since another heavy initialization, specific to row mode,
is also happening (and was not present in v1.4.9).
This will be fixed in a separate commit.
Apparently, even when the assembly file is empty (because
`ZSTD_ENABLE_ASM_X86_64_BMI2` is false), it still is marked as possibly
needing an executable stack and so the whole library is marked as such. This
commit applies a simple patch for this problem by moving the noexecstack
indication outside the macro guard.
This commit builds on #2857.
This commit addresses #2963.
directly at ZSTD_storeSeq() interface.
In the process, remove ZSTD_REP_MOVE.
This makes it possible, in future commits,
to update and effectively simplify the naming scheme
to properly label the updated processing pipeline :
offset | repcode => offBase => offCode + offBits
the new contracts seems to make more sense :
updateRep() updates an array of repeat offsets _in place_,
while newRep() generates a new structure with the updated repeat-offset array.
Most callers are actually expecting the in-place variant,
and a limited sub-section, in `zstd_opt.c` mainly, prefer `newRep()`.
to act on values stored / expressed in the sumtype numeric representation required by `storedSeq()`.
This makes it possible to abstract away this representation by using the macros to extract these values.
First user : ZSTD_updateRep() .
this meant to abstract the sumtype representation required
to transfert `offcode` to `ZSTD_storeSeq()`.
Unfortunately, the sumtype numeric representation is currently a leaky abstraction
that has permeated many other parts of the code,
especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`.
While this PR makes a good job a transfering a large nb of call sites
to using the new macros, there are still a few sites where this transformation is more complex,
or where the numeric representation itself it used "as is".
One of the problematics area is the decision to use the numeric format of the sumtype
within the match finders of `zstd_lazy`.
This commit doesn't change the behavior, it only introduces and employes the macros,
but eventually the resulting code remains identical.
At target, if the numeric representation of the sumtype can be completely abstracted
and no other part of the code depends on it,
it will be possible to move it towards something slightly more efficient.
`CFLAGS=-O0 make`
will now use `-O0` instead of enforcing `-O3`
which used to be the behavior before introduction of `libzstd.mk`.
This should result in faster tests,
since a few tests depend on this capability for faster roundtrips.
since this is effectively what is stored in this field (== matchLength - MINMATCH).
This makes it clearer what needs to be done when reading from / writing to this field.
Regression from commit a5f2c45528032ed20c33e0f8cd2c163a800a0017. It is
not possible to unconditionally add the asm sources, since not all
compilers understand the .s file extension.
Specifically for meson, only compilers inheriting from the GNU mixin
will allow a .s file at configure time.
zstd doesn't support asm for MSVC for the same basic reason; if/when
Windows asm support is added, it would involve preprocessing with nasm,
most likely.
the variable has only very limited usage,
being only used once at the beginning of the block for prefetching only,
hence the error had no impact on compression ratio.
This saves some 1.7Kb in rodata section (x86_64, zstd tool),
while assembler code stays the same except
the type of a few load/extend instructions.
Should not have negative performance implications.
mostly for maintenance convenience.
Performance wise, there is very little change,
slightly faster for slog 3 & 4,
neutral or very slightly negative for slot 5 & 6.
I couldn't find a good way to spread `ip0` and `ip1` apart when we accelerate
due to incompressible inputs. (The methods I tried slowed things down quite a
bit.)
Since we aren't splaying ip0 and ip1 apart (which would be like `0_1_2_3_`, as
opposed to the `01__23__` we were actually doing), it's a big ambitious to
increment `step` by 2. Instead, let's increment it by 1, which has the benefit
sliiightly improving compression. Speed remains pretty much unchanged.
The position updates are rewritten from `ip[N] = ip[N-1] + step` to be
`ip[N] = ip[N-2] + step`. This lets us only deal with the asymmetric spacing
of gaps at setup and then we only have to keep a single `step` variable.
This seems to work quite well on GCC and Clang!
This replicates the behavior of @terrelln's `ZSTD_fast` implementation. That
is, it always looks at adjacent pairs of positions, and only applies the
acceleration every other position. This produces a more fine-grained
acceleration.
no idea why visual + clang-cl + appveyor don't like them,
I've not been able to reproduce the issue locally,
but these static assert are very unlikely to deliver a useful signal,
I can't imagine a situation where they will be wrong,
and if they are, then a ton of other things will be broken way before reaching that point.
I hadn't seen #2890, so I wrote my own version. I like this approach a little
better, since it does an explicit check for a regular file, rather than
passing a magic value.
Addresses #2874.
Move portability macros to `lib/common/portability_macros.h`. This file
only contains platform/feature detection (e.g. 0/1 macros). This file is
shared between C and ASM code, so it cannot include any C code.
Rename `HUF_` ASM macros to be `ZSTD_` prefixed, and move to the new
header.
Restrict `ZSTD_ASM_SUPPORTED` to `__GNUC__`, because we need the GAS
assembler.
Finally, only include the ASM code if we are actually going to use it.
This disables it on all Windows platforms, which should resolve the
problem brought up in Issue #2789.
Use the same trick as we did for zstd_lazy in PR #2828:
* Create one search function specialization for each (dictMode, mls).
* Select the search function pointer at the top of the match finder.
Additionally, we no longer inline `ZSTD_compressBlock_opt_generic` into
every function, since `dictMode` is no longer used as a template. Create
two specializations, for opt levels 0 and 2, and call one of the two
specializations.
Lastly, remove the hack that disabled inlining for zstd_opt for the
Linux Kernel, as we've gotten most of the benefit already.
Compilation time sees a ~4x reduction:
| Compiler | Flags | Dev Time (s) | PR Time (s) | Delta |
|----------|----------------------------------|--------------|-------------|-------|
| gcc | -O3 | 10.1 | 2.3 | -77% |
| gcc | -O3 -fsanitize=address,undefined | 61.1 | 10.2 | -83% |
| clang | -O3 | 9.0 | 2.1 | -76% |
| clang | -O3 -fsanitize=address,undefined | 33.5 | 5.1 | -84% |
Build size is reduced by 150KB - 200KB:
| Compiler | Dev libzstd.a Size (B) | PR libzstd.a Size (B) | Delta |
|----------|------------------------|-----------------------|-------|
| gcc | 1327476 | 1177108 | -11% |
| clang | 1378324 | 1167780 | -15% |
There is a <2% speed loss in all cases:
| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta |
|----------|-------|------------------|-----------------|--------|
| gcc | 16 | 4.78 | 4.72 | -1.25% |
| gcc | 17 | 3.49 | 3.46 | -0.85% |
| gcc | 18 | 2.92 | 2.86 | -2.04% |
| gcc | 19 | 2.61 | 2.61 | 0.00% |
| clang | 16 | 4.69 | 4.80 | 2.34% |
| clang | 17 | 3.53 | 3.49 | -1.13% |
| clang | 18 | 2.86 | 2.85 | -0.34% |
| clang | 19 | 2.61 | 2.61 | 0.00% |
Fixes Issue #2862.
`lib/deprecated` is no longer built by zstd's bundled build files. However,
users may try to build these files when they import the source tree into
their own build systems. And if they have `-Wdeprecated-declarations` on,
this can produce warnings.
This PR migrates these files away from using deprecated declarations.
This addresses #2767.
because mem.h is dropped in the Linux kernel.
Changed macro definition order (gcc/clang/msvc before c11)
due to a limitation in the kernel source builder.
Changed the backup to sizeof(),
reverting to previous behavior when no support of alignof() is detected.
short-tests-0 were silently failing. I think because of the && make clean construction. Switch to ; instead.
Also fix all the test failures that were exposed.
`make all` is failing on CircleCI because it is missing Docker. Move that test
to GitHub actions, and switch the pedantic CircleCI test to `make allmost`.
51 MB seems excessive for CI storate and considering the nature of the test.
(note : maybe we should consider using `/tmp` for files generated during tests,
as tmpfs is typically using RAM, thus preserving storage.)
gcc-5 didn't like the l-value overload for defaulted operator=. There is
no reason it needs to be l-value overloaded, so just remove it.
I'm not sure why the build broke for @mckaygerhard in Issue #2811, since
this code hasn't changed since it was added. But, there is no harm in
fixing it.
Fixes issue #2811.
Tests that libzstd.so doesn't have the exec-stack bit set using
readelf. If the stack is marked executable systemd will refuse
to link against zstd. We now test that it isn't set on every PR.
Adds a test for PR #2857
Fixes Issue #2865
Allow the `dictContentSize` to be any size. The finalized dictionary
content size must be at least as large as the maximum repcode (8). So we
add zero bytes to the dictionary to ensure that we meet that
requirement.
I've removed this restriction because its been causing us headaches when
people complain that dictionary training failed. It fails because there
isn't enough useful content to put in the dictionary. Either because
every sample is exactly the same and less than ZDICT_CONTENTSIZE_MIN bytes,
or there isn't enough content. Instead, we should succeed in creating
the dictionary, and it is up to the user to decide if it is worthwhile.
It is possible that the tables alone provide enough value.
NOTE: This allows us to produce dictionaries with finalized
`dictContentSize < ZDICT_CONTENTSIZE_MIN`. But, they are still valid
zstd dictionaries. We could remove the `ZDICT_CONTENTSIZE_MIN` macro,
but I've decided to leave that for now, so we don't break users.
* When dynamic dispatching to bmi2 add lzcnt and bmi to the
TARGET_ATTRIBUTE.
* Centralize the bmi2 TARGET_ATTRIBUTE definition to
BMI2_TARGET_ATTRIBUTE so we can change it in the future.
* Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't
be any cases where bmi2 is supported but bmi1 isn't. But, since we are
using the instruction we should check bmi1 as well.
PR #2850 attempted to fix a determinism bug that was uncovered by OSS-Fuzz. It
succeeded in addressing that source of non-determinism, but introduced a new
one: it was possible, when index reduction occurred, to map indices in the
window to the reserved value, which would cause them to be zeroed, potentially
altering parsing of the input.
This PR addresses this issue. It makes sure that the bottom of the window is
always `>= ZSTD_WINDOW_START_INDEX`.
I'm not sure if this makes #2850 redundant. I think it's probably still
valuable to have that protection as well.
Credit to OSS-Fuzz for discovering this issue.
It is no longer necessary to get good performance, there is only a small
speed difference between -O2 and -O3, so just stick to the default of
-O2. I've measured neutral compression speed and a ~3% decompression
speed loss in userspace with clang & gcc. I've also measured neutral
compression speed and a ~1% decompression speed loss in the kernel
benchmarks.
This also fixes the stack space usage on parisc. The compiler was buggy
for -O3 and used ~3KB of stack space for several functions. With -O2 the
problem is completely resolved, and stack space is back to a few hundred
bytes.
Additionally, we get a large code size win on gcc:
| Compiler | Before (Bytes) | After (Bytes) | Delta (Bytes) |
|----------|----------------|---------------|---------------|
| gcc-11 | 952754 | 738954 | -213800 |
| clang-12 | 976290 | 938826 | -37464 |
The optimal parser is unlikely to be used in the linux kernel in
practice. There is no reason these functions should be force inlined,
since we aren't gaining anything, and are losing build size.
| Compiler | Before (Bytes) | After (Bytes) | Delta (Bytes) |
|----------|----------------|---------------|---------------|
| gcc-11 | 1142090 | 952754 | -189336 |
| clang-12 | 1228402 | 976290 | -252112 |
This is a temporary solution pending the resolution of PR #2862 in the
`dev` branch.
Take the same approach as in PR #2828 [0] to remove functions that force
inline many function bodies and `switch`. Instead, create one function per
"template" combination, and then switch between these functions. This
allows the compiler to break the large function into many small
functions, which generally helps codegen.
Also, in the `extDict` modes when there is no ext-dict, call the top
level function instead of the force inlined one, to save on code size.
I'm specifically doing this because gcc on the parisc architecture doesn't
handle the large function body well, and ends up using a lot of excess
stack space. Outlining these functions fixes it.
Putting stack marking into every assembly files is required to indicate
that the stack does not need to be executable.
Executable flag on stack conflicts with some security measures, Systemd
MemoryDenyWriteExecute=yes for example.
Previously, if an index was equal to `reducerValue + 1`, it would get remapped
during index reduction to 1 i.e. `ZSTD_DUBT_UNSORTED_MARK`. This can affect the
parsing of the input slightly, by causing tree nodes to be nullified when they
otherwise wouldn't be. This hardly matters from a correctness or efficiency
perspective, but it does impact determinism.
So this commit changes index reduction to avoid mapping indices to collide with
`ZSTD_DUBT_UNSORTED_MARK`.
that's clearer than finding the tables somewhere in the middle of `compress.c`.
Also, down the line, it may potentially allows zstd to feature adjusted tables depending on target cpu.
Speed up compilation times by moving each specialized search function
into its own function. This is faster because compilers can handle many
smaller functions much faster than one gigantic function. The previous
approach generated one giant function with `switch` statements and
inlining to select the implementation.
| Compiler | Flags | Dev Time (s) | PR Time (s) | Delta |
|----------|-------------------------------------|--------------|-------------|-------|
| gcc | -O3 | 16.5 | 5.6 | -66% |
| gcc | -O3 -g -fsanitize=address,undefined | 158.9 | 38.2 | -75% |
| clang | -O3 | 36.5 | 5.5 | -85% |
| clang | -O3 -g -fsanitize=address,undefined | 27.8 | 17.5 | -37% |
This also reduces the binary size because the search functions are no
longer inlined into the main body.
| Compiler | Dev libzstd.a Size (B) | PR libzstd.a Size (B) | Delta |
|----------|------------------------|-----------------------|-------|
| gcc | 1563868 | 1308844 | -16% |
| clang | 1924372 | 1376020 | -28% |
Finally, the performance is not impacted significantly by this change,
in fact we generally see a small speed boost.
| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta |
|----------|-------|------------------|-----------------|-------|
| gcc | 5 | 110.6 | 110.0 | -0.5% |
| gcc | 7 | 70.4 | 72.2 | +2.5% |
| gcc | 9 | 53.2 | 53.5 | +0.5% |
| gcc | 13 | 12.7 | 12.9 | +1.5% |
| clang | 5 | 113.9 | 110.4 | -3.0% |
| clang | 7 | 67.7 | 70.6 | +4.2% |
| clang | 9 | 51.9 | 52.2 | +0.5% |
| clang | 13 | 12.4 | 13.3 | +7.2% |
The compression strategy is unmodified in this PR, so the compressed size
should be exactly the same. I may have a follow up PR to slightly improve
the compression ratio, if it doesn't cost too much speed.
Fix underflow of `nbCompares` by switching to an `int` and comparing
`nbCompares > 0`. This is a minimal fix, because I don't want to change
the logic. These loops seem to be doing `nbCompares + 1` comparisons.
The bug was reported by Dan Carpenter and found by Smatch static
checker.
https://lore.kernel.org/all/20211008063704.GA5370@kili/
There is no minimum value check, so the parameter could be negative.
Switch to the standard pattern of using `BOUNDCHECK()`.
The bug was reported by Dan Carpenter and found by Smatch static
checker.
https://lore.kernel.org/all/20211008063704.GA5370@kili/
Since we're now hashing the position ahead even if we find a long match and
don't search that next position, we can write it back into the hashtable even
in long matches. This seems to cost us no speed, and improves compression
ratio slightly!
Aside from maybe a latency win in the loop, this means that when we find a
short match, we've already done the hash we need to check the next long match.
* Limit training samples size to 2GB
* simplified DISPLAYLEVEL() macro to use global vqriable instead of local.
* refactored training samples loading
* fixed compiler warning
* addressed comments from the pull request
* addressed @terrelln comments
* missed some fixes
* fixed type mismatch
* Fixed bug passing estimated number of samples rather insted of the loaded number of samples.
Changed unit conversion not to use bit-shifts.
* fixed a declaration after code
* fixed type conversion compile errors
* fixed more type castting
* fixed more type mismatching
* changed sizes type to size_t
* move type casting
* more type cast fixes
PR #2784 introduced a bug in the decompressor that caused some valid
inputs to fail to decompress. The bitstream isn't reloaded after the 4X*
loop if the number of elements remaining is small enough, causing us to
read more bits than are available in the bitcontainer.
This was caught by the MSAN fuzzer in OSS-Fuzz because the assembly
implementation isn't used in the MSAN build.
Credit to OSS-Fuzz.
Multiple ZSTD_createDCtx* functions call other (public)
ZSTD_createDCtx* functions, this makes it harder for humans
and compilers to throw out code that is not used.
This farms out the logic into a static function, if a program
only uses a single ZSTD_createDCtx variant, all others can be easily
dropped and the remaining implementation can be specialized.
Commit d7ef97a013b5
("[build] Fix oss-fuzz build with the dataflow sanitizer") broke
build inside Linux-kernel after 'import', as it no longer can
conditionally remove ZSTD_MEMORY_SANITIZER definition from
the #if DEF_A || DEF_B block. This emits -Wundef warning which
can be treated as error.
Split this preprocessor condition into two separate conditions
to fix this.
Fixes: d7ef97a013b5 ("[build] Fix oss-fuzz build with the dataflow sanitizer")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Test the kernel build with the standard warnings enabled so that we
don't miss issues like fixed in PR #2802.
Stacked on top of PR #2802 so CI passes, so it includes the fix. But, I
won't merge until it is merged, so @solbjorn gets the credit for the
fix.
Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported
compilers this uses an attribute, otherwise it becomes a comment.
This is necessary to be compatible with clang's `-Wfall-through`, and
gcc's `-Wfall-through=2` which don't support comments. Without this the
linux build emits a bunch of warnings.
Also add a test to CI to ensure that we don't regress.
turns out, it's possible to constify MatchState* parameter
in some parts of the binary tree algorithm,
making it a pure read-only parameter,
as opposed to a mutable state.
This is supposed to be helpful for both maintenance and the compiler.
The mentioned path is being created/used by the 'import' rule for
generating source files for Linux kernel.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Commit a5f2c4552803 ("Huffman ASM") added a new ASM source file,
but it wasn't added to the kernel Makefile despite that it received
support for Huffman ASM according to the internal definitions. This
leads to undefined references, as huf_decompress.o now calls those
ASM functions.
Add it to the list of sources when building inside the kernel tree.
Kbuild can handle .S files just fine, so none additional rules
needed.
Fixes: a5f2c4552803 ("Huffman ASM")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Linux 5.15 introduces a new Kconfig option, CONFIG_WERROR, which
forces -Werror for the entire kernel.
Current in-kernel ZSTD implementation uses functions deprecated
in 1.5.0, and thus fails on -Wdeprecated-declarations.
Turn this particular error into warning to be able to build the
kernel with CONFIG_WERROR. I'm not disabling them completely to
make sure they'll be visible and [hopefully] fixed sooner or later.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Instead of calling `ZSTD_compress_advanced()` and
`ZSTD_initCStream_advanced()`, which each take a `ZSTD_parameters` by
value, use the new advanced API.
Stack usage went from 2024 -> 1944.
* Extract out common portion of `lib/Makefile` into `lib/libzstd.mk`.
Most relevantly, the way we find library files.
* Use `lib/libzstd.mk` in the other Makefiles instead of repeating the
same code.
* Add a test `tests/test-variants.sh` that checks that the builds of
`make -C programs allVariants` are correct, and run it in Actions.
* Adds support for ASM files in the CMake build.
The Meson build is not updated because it lists every file in zstd,
and supports ASM off the bat, so the Huffman ASM commit will just add
the ASM file to the list.
The Visual Studios build is not updated because I'm not adding ASM
support to Visual Studios yet.
Test failures showed up on the daily cron job. They didn't show up
in CI because the condition is somewhat rare, and didn't trigger
during the CI tests.
This PR fixes up the logic in `findSynchronizationPoint()` to correctly
handle the edge case. It also un-comments an assert that helps catch the
issue, and verify that rsyncable mode is calculating the correct hash.
After the fix, the test that failed passes:
```
./zstreamtest --newapi -t1 --no-big-tests -s9680
```
In degenerate cases `--rsyncable` could create very small blocks (1
byte). This causes the compressed output to be larger than
`ZSTD_compressBound()`. Fix the issue by ensuring that rsyncable mode
never outputs blocks smaller than 128 KB.
The minimum job size is 512 KB, so we shouldn't lose many
synchronization points from skipping any that cause blocks smaller than
128 KB. And even if we do, that is fine, because we'll find the next
one.
This fixes the `raw_dictionary_round_trip` oss-fuzz assert.
Credit to OSS-Fuzz
better for large files, and sources with relatively "stable" entropy,
like silesia.tar.
slightly worse for files with rapidly changing entropy,
like Calgary.tar/.
Updated small files tests in fuzzer
used to be necessary to counter-balance the fixed-weight frequency update
which has been recently changed for an adaptive rate (targeting stable starting frequency stats).
As a library, the default shouldn't be to write anything on console.
`cover` and `fastcover` have a `g_displayLevel` variable to control this behavior.
It's now set to 0 (no display) by default.
Setting notification to a higher level should be an explicit operation by a console application.
This new setup is slighly better on `silesia.tar` :
Ratio : 3.649 -> 3.655
Speed : 11.9 MB/s -> 12.2 MB/s
At the cost of more memory : 24 MB -> 32 MB
The new memory budget is a reasonable interpolation between neighboring levels 12 and 14:
level 12 : 24 MB
level 13 : 32 MB (increased from 24 MB)
level 14 : 48 MB
Window size remains unaffected (4 MB)
It's a bit strange, because this is hitting the dictionary special case where
the dictionary is contiguous with the input and still runs in the single-
segment path.
We should probably change that to hit the `extDict` path instead?
This removes the old `ZSTD_compressBlock_fast_generic()` and renames the new
`ZSTD_compressBlock_fast_generic_pipelined()` to replace it. This is
functionally a no-op.
Unrolling the loop to handle 2 positions in each iteration allows us to reduce
the frequency of some operations that don't need to happen at every position.
One such operation is the step calculation, which is a very rough heuristic
anyways. It's fine if we do this a position later. The other operation is the
repcode check. But since the repcode check already tries expanding back one
position, we're really not missing much of importance by only trying it every
other position.
This commit also slightly reorders some operations.
Amusingly, it seems to be a non-trivial performance hit to add in final
searches or even hash table insertions during cleanup. So let's not. It seems
to not make any meaningful difference in compression ratio.
This was ~30mn, by far the longest run on travisCI.
That's because it re-analyzes multiple times the same files (library files notably).
It also performs actions that make no sense for the static analyzer purpose,
such as building the single-file library.
Reduced time spent in this test by reducing its scope :
just build the CLI, and obviously the library along it.
These are the only ones that really deserve to be analyzed.
Unfortunately, it still results in a number of false positives when using newer versions of scanbuild
(each version of scanbuild generates a different list of false positives).
These will have to be fixed before transfering to Github Actions.
by allowing parallel build of units,
and reducing optimization levels.
Parallel build is only effective on "recent" versions of `zstd`,
as previously, the list of units was passed as a list of source files,
which is something neither `make` nor `gcc` can parallelize.
So its impact is mildly effective (-20%).
Reducing optimization level to `-O1` makes compilation much faster.
It also makes runtime slower,
but in this test, compilation time dominates run time.
The savings are very significant (-50%).
On my test system, it reduces the length of this test from 13mn to 5mn.
This matches the Makefile build. Due to one private xxhash symbol in use
by the program, it recompiles a private copy of xxhash.
Due to the test binaries making extensive (?) use of private symbols, it
doesn't even attempt to link to shared libzstd, and instead, all of the
original object files are added to libtestcommon itself for private
linkage. This, too, matches the Makefile build.
Ref. #2261
meson prefers that project-level options for Wall/Wextra/pedantic be
used, rather than hardcoding raw flags in add_project_arguments. If you
do the latter anyway, it raises a meson warning.
Set the default options for the project to use all this.
Also move the -Werror comment to the project default options with
appropriate format, but leave it commented out since it does not work.
* Add a Huffman round trip fuzzer
* Fix two minor bugs in Huffman that aren't exposed in zstd
- Incorrect weight comparison (weights are allowed to be equal to
table log).
- HUF_compress1X_usingCTable_internal() can return compressed
size >= source size, so the assert that `cSize <= 65535` isn't
correct, and it needs to be checked instead.
This PR fixes an incorrect comparison in figuring out `minChain` in
`ZSTD_dedicatedDictSearch_lazy_loadDictionary()`. This incorrect comparison
had been masked by the fact that `idx` was always 1, until @terrelln changed
that in #2726.
Credit-to: OSS-Fuzz
The DUBT can be non-deterministic if an index is equal to
`ZSTD_DUBT_UNSORTED_MARK`. Ensure that never happens by starting the
indices at 2.
This bug was found by the OSS-Fuzz determinism fuzzer. With this change
the fuzzer test passes. And I've confirmed that this is the root cause,
not just hiding the problem.
Aside: This took me a long time to figure out, because I thought I had
tried this first thing. But, apparantly I messed it up, because when I
was going through it again with @felixhandte, I was pointing out that it
wasn't the case, but it turns out it was.
Credit to: OSS-Fuzz
Add the libzstd.pc target to the lib target in lib/Makefile, which makes
it inherit LDFLAGS_DYNLIB from the lib-mt target. This allows us to add
a Libs.private field to libzstd.pc which gets conditionally populated
with '-pthread'.
The 1.5.0 release notes mention that the static library isn't
multi-threaded by default, due to concern for people building static
binaries with libzstd:
Now the dynamic library supports multi-threaded compression by
default. Note that this property is not extended to the static
library because doing so would have impacted the build script of
existing client applications (requiring them to add -pthread to their
recipe), thus potentially breaking their build.
To get closer to being able to enable multi-threading for all library
builds by default, this commit makes it so that any libzstd consumer
using pkg-config gets the correct flags.
We also fix the indentation of the rule for libzstd.pc and move it
outside the if/endif block for install rules (which uses a list of OSs
where the rules were validated), so the rule is available for all users
of the 'lib*' targets.
* The block splitter missed a bounds check, so when the buffer is too small it
passes an erroneously large size to `ZSTD_entropyCompressSeqStore()`, which
can then write the compressed data past the end of the buffer. This is a new
regression in v1.5.0 when the block splitter is enabled. It is either enabled
explicitly, or implicitly when using the optimal parser and `ZSTD_compress2()`
or `ZSTD_compressStream*()`.
* `HUF_writeCTable_wksp()` omits a bounds check when calling
`HUF_compressWeights()`. If it is called with `dstCapacity == 0` it will pass
an erroneously large size to `HUF_compressWeights()`, which can then write
past the end of the buffer. This bug has been present for ages. However, I
believe that zstd cannot trigger the bug, because it never calls
`HUF_compress*()` with `dstCapacity == 0` because of [this check][1].
Credit to: Oss-Fuzz
[1]: 89127e5ee2/lib/compress/zstd_compress_literals.c (L100)
* Flatten ZSTD_row_getMatchMask
* Remove the SIMD abstraction layer.
* Add big endian support.
* Align `hashTags` within `tagRow` to a 16-byte boundary.
* Switch SSE2 to use aligned reads.
* Optimize scalar path using SWAR.
* Optimize neon path for `n == 32`
* Work around minor clang issue for NEON (https://bugs.llvm.org/show_bug.cgi?id=49577)
* replace memcpy with MEM_readST
* silence alignment warnings
* fix neon casts
* Update zstd_lazy.c
* unify simd preprocessor detection (#3)
* remove duplicate asserts
* tweak rotates
* improve endian detection
* add cast
there is a fun little catch-22 with gcc: result from pmovmskb has to be cast to uint32_t to avoid a zero-extension
but must be uint16_t to get gcc to generate a rotate instruction..
* more casts
* fix casts
better work-around for the (bogus) warning: unary minus on unsigned
This patch is supposed to improve compatibility with less featured tar variants
"when the tar program used does not support historical options (without hyphen) nor the '-z' option."
Patch proposed by Antonio Diaz Diaz
Even with -fvisibility=hidden added to CFLAGS, any symbol which is
given a default visibility attribute ends up exported in the dynamic
library. This happens through zstd_internal.h which defines
..._STATIC_LINKING_ONLY before including various header files, and is
included for example in lib/common/pool.c.
To avoid this, this patch distinguishes static and non-static APIs, by
using ZSTDLIB_API only for the latter, and introducing
ZSTDLIB_STATIC_API for the former. For now, both are exported, but
non-static APIs can be hidden by overriding the definition
ZSTDLIB_STATIC_API. lib/Makefile is modified to allow this using
make CPPFLAGS_DYNLIB=-DZSTDLIB_STATIC_API=ZSTDLIB_HIDDEN
In addition, API declarations are dropped from zstd_compress.c (they
aren't needed there).
Signed-off-by: Stephen Kitt <steve@sk2.org>
Compress the input twice in the `simple_round_trip` and
`dictionary_round_trip` fuzzers with exactly the same parameters, but
reusing the context. Then ensure that the compressed output is
identical.
Call `ZSTD_enforceMaxDist()` before each block with the beginning of the
block. This ensures that `lowLimit` is updated to `dictLimit` whenever
the ext-dict is out of range, so we can use prefix mode for speed.
This can cause non-determinism because prefix mode and ext-dict mode
match finders can return different results. It can also hurt speed
because ext-dict match finders are slower.
The scenario is:
1. Compress large data with a dictionary.
2. The dictionary goes out of bounds, so we invalidate it.
3. However, we still have `lowLimit < dictLimit`, since it is
never updated.
4. We will call the ext-dict match finder instead of the prefix one.
The repcode checks disallowed repcodes that are equal to `windowLow`.
This is slightly inefficient, but isn't a problem on its own. Together
with the next commit, it cause non-determinism.
`ZSTD_insertBt1()` has a speed optimization that skips the prefix of
very long matches.
40def70387/lib/compress/zstd_opt.c (L476)
This optimization is based off the length longest match found. However,
when indices are reset, we only ensure that we can reference the whole
window starting from `ip`. If the previous block ended with a long match
then `nextToUpdate` could be much less than `ip`. It might be far enough
back that `nextToUpdate < maxDist`, so it doesn't have a full window of
data to reference. This can cause non-determinism bugs, because we may
find a match that is beyond `ip - maxDist`, and may sometimes be
un-referencable, and that match triggers the speed optimization.
The fix is to base the `windowLow` off of the `target` of
`ZSTD_updateTree_internal()`, because anything below that value will be
obsolete by the time `ZSTD_updateTree_internal()` completes.
With small enough input files, the inferred value of fileWindowLog could
be smaller than ZSTD_WINDOWLOG_MIN.
This can be reproduced like so:
$ echo abc > small
$ echo abcdef > small2
$ zstd --patch-from small small2 -o patch
previously, this would fail with the error "zstd: error 11 : Parameter is out of bound"
and restored limit to 256 when in 64-bit mode
(it was reduced to 200 to give more room for 32-bit).
This should fix test instability issues
using lot of threads in 32-bit environments.
When running armv6 userspace on armv8 hardware with a 64 bit Linux kernel,
the mode 2 caused SIGBUS (unaligned memory access).
Running all our arm builds in the build farm
only on armv8 simplifies administration a lot.
Depending on compiler and environment, this change might slow down
memory accesses (did not benchmark it). The original analysis is 6 years old.
Fixes#2632
the new alignment setting is better for gcc-9 and gcc-10
by about ~+5%.
Unfortunately, it's worse for essentially all other compilers.
Make the new alignment setting conditional to gcc-9+.
changed strategy,
now unconditionally prefetch the first 2 cache lines,
instead of cache lines corresponding to the first and last bytes of the match.
This better corresponds to cpu expectation,
which should auto-prefetch following cachelines on detecting the sequential nature of the read.
This is globally positive, by +5%,
though exact gains depend on compiler (from -2% to +15%).
The only negative counter-example is gcc-9.
Linearly back off the frequency of overflow correction based on the
number of times the `ZSTD_window_t` has been overflow corrected. This
will still allow the fuzzer to quickly find overflow correction bugs,
while also keeping good speed for larger inputs.
Additionally, the `nbOverflowCorrections` variable can be useful for
debugging coredumps, since we can inspect the `ZSTD_CCtx` to see if
overflow correction has happened yet.
I've verified this fixes the timeouts in OSS-Fuzz (176 seconds -> 6
seconds). I've also verified that fuzzers and `fuzzer` and `zstreamtest`
still catch the row-hash overflow correction bug.
The FAQ covers the questions asked in Issue #2566. It first covers why
you would want to use a dictionary, then what a dictionary is, and
finally it tells you how to train a dictionary, and clarifies some of
the parameters.
There is definitely more that could be said about some of the advanced
trainers, but this should be a good start.
This flag forces zstd to always load the prefix in ext-dict mode, even
if it happens to be contiguous, to force determinism. It also applies to
dictionaries that are re-processed.
A determinism test case is also added, which fails without
`ZSTD_c_deterministicRefPrefix` and passes with it set.
Question: Should this be the default behavior? It isn't in this PR.
benchmark results are not progressively displayed on Windows terminal.
For long benchmark sessions, nothing is displayed,
until the end, where everything is flushed.
Force display to be flushed after each update.
Updates happen roughtly every second, or even less,
so it's not a substantial workload.
* Take `params` by const reference in `ZSTD_resetCCtx_internal()`.
* Add `simpleApiParams` to the CCtx and use them in the simple API
functions, instead of creating those parameters on the stack.
I think this is a good direction to move in, because we shouldn't need
to worry about adding parameters to `ZSTD_CCtx_params`, since it should
always be on the heap (unless they become absoultely gigantic).
Some `ZSTD_CCtx_params` are still on the stack in the CDict functions,
but I've left them for now, because it was a little more complex, and we
don't use those functions in stack-constrained currently.
`open()`'s mode bits are only applied to files that are created by the call.
If the output file already exists, but is not readable, the `fopen()` would
fail, preventing us from removing it, which would mean that the file would
not end up with the correct permission bits.
It's not clear to me why the `fopen()` is there at all. `UTIL_isRegularFile()`
should be sufficient, AFAICT.
pipeline increased from 4 to 8 slots.
This change substantially improves decompression speed when there are long distance offsets.
example with enwik9 compressed at level 22 :
gcc-9 : 947 -> 1039 MB/s
clang-10: 884 -> 946 MB/s
I also checked the "cold dictionary" scenario,
and found a smaller benefit, around ~2%
(measurements are more noisy for this scenario).
* seekable_format: fix from-file reading (not in-memory)
It tries to check the buffer boundary, but there is no buffer for
from-file reading.
* seekable_decompression: break when ZSTD_seekable_decompress() returns zero
* seekable_decompression_mem: break when ZSTD_seekable_decompress() returns zero
* seekable_format: cap the offset+len up to the last dOffset
This will allow to read the whole file w/o gotting corruption error if
the offset is more then the data left in file, i.e.:
$ ./seekable_compression seekable_compression.c 8192 | head
$ zstd -cdq seekable_compression.c.zst | wc -c
4737
Before this patch:
$ ./seekable_decompression seekable_compression.c.zst 0 10000000 | wc -c
ZSTD_seekable_decompress() error : Corrupted block detected
0
After:
$ ./seekable_decompression seekable_compression.c.zst 0 10000000 | wc -c
4737
Dictionaries larger than `ZSTD_CHUNKSIZE_MAX` used to have to be loaded
in multiple segments. Instead, when we detect large dictionaries, ensure
that we reset the context's indicies. Then, for dictionaries larger than
`ZSTD_CURRENT_MAX - 1`, only load the suffix of the dictionary. Finally,
enable DDS for large dictionaries, since we no longer load in multiple
segments.
This simplifes the dictionary loading code, and reduces opportunities
for non-determinism to slip in.
previous lower limit was 1 MB.
Note : by default, the lowest job size is 2 MB, achieved at level 1.
Even lower job sizes can be achieved by manipulating this value directly,
or manually modifying window sizes to lower amounts.
Updated unit test to ensure that this new limit works fine
(test would fail with previous 1 MB limit).
LDM does especially poorly on repetitive data when that data's hash happens
to have `(hash & stopMask) == 0`. Either because the `stopMask == 0` or
random chance. Optimize this case by skipping over repetitive patterns.
The detection is very simplistic, but should catch most of the offending
cases.
```
head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
21.187881087 seconds time elapsed
head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
1.149707921 seconds time elapsed
```
Any stable API entry point introduced after v1.0
should be documented with its minimum version number.
Since PR fixes this requirement
updating mostly new entry points since v1.4.0
and newly introduced ones for future v1.5.0.
Switch from `-T0` to the default `-T1` which significantly reduces
memory usage for level 19 when there are many cores. This fixes
32-bit issues of running out of address space.
Fixes#2603.
* Fix overflow correction when `windowLog < cycleLog`. Previously, we
got the correction wrong in this case, and our chain tables and binary
trees would be corrupted. Now, we work as long as `maxDist` is a power
of two, by adding `MAX(maxDist, cycleSize)` to our indices.
* When `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` is defined to non-zero
run overflow correction as frequently as allowed without impacting
compression ratio.
* Enable `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` in `fuzzer` and
`zstreamtest` as well as all the OSS-Fuzz fuzzers. This has a 5-10%
speed penalty at most, which seems reasonable.
`zstd_errors.h` and `zdict.h` are public headers, so they deserve to be
in the root `lib/` directory with `zstd.h`, not mixed in with our private
headers.
Replace kernel-style comments with regular comments.
E.g.
```
/** Before */
/* After */
/**
* Before
*/
/*
* After
*/
/***********************************
* Before
***********************************/
/* *********************************
* After
***********************************/
```
Instead of providing a default no-op implementation, check the symbols
for `NULL` before accessing them. Providing a default implementation
doesn't reliably work with dynamic linking. Depending on link order the
default implementations may not be overridden. By skipping the default
implementation, all link order issues are resolved. If the symbols
aren't provided the weak function will be `NULL`.
If build_dir is set the zstd build complains about md5sum not being found.
Fix this by checking if build_dir is set before checking and using the hash tool
just like in lib/Makefile .
* Perform 64-byte alignment of wksp tables and aligneds internally
* Clean up cwskp_finalize() function to only do two allocs
* Refactor aligned/buffer reservation code, remove ASAN req for alignment reservations
* Change from allocating 128 bytes always to allocating only buffer space as needed for tables/aligned
* Back out aligned/table reservation order restriction
* Add stricter bounds for new/resized wksps, fix comment in zstd_cwksp.h
* Do not emit last partitions of blocks as RLE/uncompressed
* Fix repcode updates within block splitter
* Add a entropytables confirm function, redo ZSTD_confirmRepcodesAndEntropyTables() for better function signature
* Add a repcode updater to block splitter, no longer need to force emit compressed blocks
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
Following #2545,
I noticed that one field in `seq_t` is optional,
and only used in combination with prefetching.
(This may have contributed to static analyzer failure to detect correct initialization).
I then wondered if it would be possible to rewrite the code
so that this optional part is handled directly by the prefetching code
rather than delegated as an option into `ZSTD_decodeSequence()`.
This resulted into this refactoring exercise
where the prefetching responsibility is better isolated into its own function
and `ZSTD_decodeSequence()` is streamlined to contain strictly Sequence decoding operations.
Incidently, due to better code locality,
it reduces the need to send information around,
leading to simplified interface, and smaller state structures.
* Move `counting` to a struct in `FSE_decompress_wksp_body()`
* Fix error code in `FSE_decompress_wksp_body()`
* Rename a variable in `HUF_ReadDTableX2_Workspace`
Expose the zstd headers in `include/linux` to avoid struct duplication.
This makes the member names not follow Kernel style guidelines, and
exposes the zstd symbols. But, the LMKL reviewers are okay with that.
This commit introduces a GitHub action that is triggered on release creation,
which creates the release tarball, compresses it, hashes it, signs it, and
attaches all of those files to the release.
Fixes the update from PR #2508. I had accidentally forgotten to rebuild
the library, and the regression test suite isn't hooked up to the new
fancy build system yet.
I've double checked that the results are deterministic.
9f327c02fd17a5aad2b24ae06b85d8226add1f93 changed the compression method
for LDM, so the results are slightly different.
I've re-tested LDM on some larger inputs and everything seems fine.
These ratio changes just seem to be noise. There is generally a 0.01%
swing in ratio, sometimes better sometimes worse, but never large.
These are replaced by the corresponding context resets. When
converting resetCStream, CCtx_setPledgedSrcSize isn't called if the
source size is "unknown".
This helps reduce the reliance on "static only" symbols, as well as
reducing the use of deprecated functions.
Signed-off-by: Stephen Kitt <steve@sk2.org>
* Fix compiler version regex, which was broken for multi-digit
versions.
* Fix compiler detection for gcc.
* Disable `pointer-overflow` instead of `integer-overflow` for gcc
versions newer than 8.0.0.
This commit addresses #2491.
Note that a downside of this solution is that it is global: `umask()` affects
all file creation calls in the process. I believe this is safe since
`fileio.c` functions should only ever be used in the zstd binary, and these
are (almost) the only files ever created by zstd, and AIUI they're only
created in a single thread. So we can get away with messing with global state.
Note that this doesn't change the permissions of files created by `dibio.c`.
I'm not sure what those should be...
The simple compression functions are intended to ignore the advanced
parameters, but they were accidentally using them. All the
`ZSTD_parameters` were set correctly, but any extra parameters were
used as-is. E.g. `ZSTD_c_format`.
This PR makes all the simple single-pass functions listed below ignore
the advanced parameters, as intended.
* `ZSTD_compressCCtx()`
* `ZSTD_compress_usingDict()`
* `ZSTD_compress_usingCDict()`
* `ZSTD_compress_advanced()`
* `ZSTD_compress_usingCDict_advanced()`
It also adds a test case that ensures that each of these functions
ignore the advanced parameters.
Forward the correct compressionLevel to the appliedParams in all cases.
It was already correct for the advanced API, so only the old single-pass
functions needed to be fixed.
This compression level is unused by the library, but is set so that the
tracing framework can consume it.
The most common information that you want to track between begin() and
end() is the timestamp of the begin function, so you can measure the
duration of the (de)compression call. Allow the tracing library to put
this information inside the `ZSTD_TraceCtx`, so it doesn't need to keep
a global map in this case. If a single uint64_t is not enough, the
tracing library can return a unique identifier (like the context
pointer) instead, and use it as a key in a map.
This keeps the simple case simple.
This program takes a file with concatenated zstd frames and splits the
file up by frame. E.g. if `dir.zst` has 4 frames:
```
> ./recover_directory dir.zst recovery/file
Recovering 4 files...
Recovered recovery/file0
Recovered recovery/file1
Recovered recovery/file2
Recovered recovery/file3
Complete
```
Escaping in add_custom_target() seems to depend on the shell used in the cmake
generator and using Ninja on Windows, which uses cmd.exe, results in stray backslashes
in the .pc file.
Instead of going through escaping hell just use configure_file() with the existing
libzstd.pc.in file already used by the simple Makefile based build system.
This fixes the .pc file syntax when building zstd with CMake+Ninja+gcc on Windows.
ZSTD_cycleLog() is a very short function,
creating a rather large dependency onto libzstd's internal just for it is overkill.
Prefer duplicating this 2-lines function.
This PR makes the zstd CLI one step closer to being linkable to the dynamic library (see #2450)
More steps are still needed to reach this goal.
Treat ZSTD_getCParams() and ZSTD_adjustCParams() in the same way
we treat streaming compression. Choose parameters based on the
dictionary size + source size, and assume the source size is small
if unkown. But, don't shrink the window log down in
ZSTD_adjustCParams_internal().
Fixes#2442.
1. When creating a dictionary keep the same behavior as before.
Assume the source size is 513 bytes when adjusting parameters.
2. When calling ZSTD_getCParams() or ZSTD_adjustCParams() keep
the same behavior as before.
3. When attaching a dictionary keep the same behavior of ignoring
the dictionary size. When streaming this will select the
largest parameters and not adjust them down. But, the CDict
will use the correctly sized parameters, which seems like the
right tradeoff.
4. When not attaching a dictionary (either forced not to, or
using a prefix dictionary) we select parameters based on the
dictionary size + source size, and assume the source size is
small, which is the same behavior as before. But, now we don't
adjust the window log (and hash and chain log) down when the
source size is unknown.
When the source size is unknown all cdicts should attach, except
when the user disables attaching, or `forceWindow` is used. This
means that when streaming with a CDict we end up in the good case
where we get small CDict parameters, and large source parameters.
TODO: Add a streaming + dictionary regression test case.
* Add a test that runs without a pledgedSrcSize and with a dictionary.
* Add github.tar data with uses the github dictionary while compressing
github.tar, instead of each file individually.
This ensures the symbols aren't redefined, which would result in a compiler
error.
I was getting redefined symbols for _LARGEFILE64_SOURCE when building for
32-bit x86 Linux on an older CentOS release in a CI environment. With this
change, I'm able to compile the single file library in this environment.
Closes#2443.
for some reasons, this test fails at _installing_ 32-bit dependencies
using the exact same command that actually works in other tests !!?
It's unclear why it fails repeateadly for this test only.
Try another way to install dependencies to fix that.
The pkgconfig file generation didn't correctly escape the paths. It both
quoted and escaped spaces with `\`, which doesn't work. The fix is to
remove the quoting.
Comment and refactor `HUF_buildCTable()` and the helper functions
it calls as I read and understand the code. Hopefully this refactor
makes the code a bit more clear.
Add the kernel wrapper API. This keeps the same API and semantics as the
existing kernel API with name changes to be more kernel style and avoid
symbol collisions with zstd.
this build works fine on all my systems,
but since to fail on CI environment.
Unclear why there is a difference.
This build test is not relevant anyway.
https://github.com/facebook/zstd/pull/2339 removes the single-pass zstdmt API.
This changes the compressed size, because we no longer take the # of threads into
account when deciding the job size.
We don't use it when we have a stable input buffer, so don't allocate
it. I had to slightly modify `ZSTD_copyCCtx()` by storing the
`ZSTD_buffered_policy_e` in the `ZSTD_CCtx`, since `inBuffSize > 0` is
no longer the correct signal for the buffered mode.
When we have a stable output buffer take the single-pass shortcut.
It is okay to return `dstSizeTooSmall` if the output buffer isn't
big enough, because we know it will never grow.
Sets these parameters in ZSTD_compress2() then resets them to their
orignal values after the compression call.
An alternative design could be to add a flush mode `ZSTD_e_singlePass`
which implies `ZSTD_c_stable{In,Out}Buffer` but only for a single
compression call, by directly setting the applied parameters. I've opted
for the smaller change, but this is open for discussion.
Previously only `nbWorkers` was set. Set all parameters, because that is
what is expected. This is needed for the `ZSTD_c_stable{In,Out}Buffer`
parameters.
Makefile now automatically detects modifications of compilation flags,
and produce object files in directories dedicated to this compilation flags.
This makes it possible, for example, to compile libzstd with different DEBUGLEVEL.
Object files sharing the same configration will be generated into their dedicated directories.
Also : new compilation variables
- DEBUGLEVEL : select the debug level (assert & traces) inserted during compilation (default == 0 == release)
- HASH : select a hash function to differentiate configuration (default == md5sum)
- BUILD_DIR : skip the hash stage, store object files into manually specified directory
%.o object files generated for dynamic library
must be different from those generated for static library.
Due to this difference, %.o were so far only generated for the static library.
The dynamic library was rebuilt from %.c source.
This meant that, for every minor change, the entire dynamic library had to be rebuilt.
This is fixed in this PR :
only the modified %.c source get rebuilt.
That was a subtle one :
VPATH is affecting search for both %.c source and %.o object files.
This meant that, when an object file already exists in lib/,
it's used in programs/,
even though programs/ is supposed to generate its own %.o object files.
With the new vpath directive, this is no longer the case :
the search is only activated for %.c source files.
Now, local programs/%.o are always generated
even if equivalent ones are already created in lib/.
It more clearly guarantees that lib/ and programs/ can use different compilation directives
without mixing resulting %.o object files.
Building the zstd CLI costs time.
Some part of it is incompressible, leading to substantial iteration delay when testing code modifications.
That's mainly because all source files from the library must be rebuilt from source every time.
The main reason we don't build the CLI from library object files
is that we can't just build the object directly in the lib/ directory
(which they would by default)
since they use different compilation flags.
Specifically, the CLI enables multithreading, while the library doesn't (by default).
This is solved in this commit, by generating the object files locally.
Now, the CLI and the library can employ different sets of flags, without tripping over each other.
All library object files are generated directly into programs/ dir.
This works because no 2 source files have the same name.
Now, modifying a file doesn't require to recompile the entire lib, just the modified files.
The recipe is also compatible with `-j` parallel build, leading to large build time reductions on multi-core systems.
previous recipe would build object files directly within programs/
which could be in competition with other local builds happening in programs/ at the same time.
fixed by generating the relevant object file locally.
There are compilation environments in aarch64 where NEON isn't
available. While these environments could define ZSTD_NO_INTRINSICS,
it's more fail-safe to use the more specific symbol to know if NEON
extensions are available.
__ARM_NEON is the proper symbol, defined in ARM C Language Extensions
Release 2.1 (https://developer.arm.com/documentation/ihi0053/d/). Some
sources suggest __ARM_NEON__, but that's the obsolete spelling from
prior versions of the standard.
Signed-off-by: Warner Losh <imp@bsdimp.com>
The problem occurs in this scenario:
1. We find a synchronization point.
2. We attmept to create the job.
3. We fail because the job table is full: `mtctx->nextJobID > mtctx->doneJobID + mtctx->jobIDMask`.
4. We call `ZSTDMT_compressStream_generic` again.
5. We forget that we're at a sync point already, and we continue looking
for the next sync point.
This fix is to detect if we're currently paused at a sync point, and if
we are then don't load any more input.
Caught by zstreamtest. I modified it to make the bug occur more often
(~1/100K -> ~1/200) and verified that it is fixed after. I then ran a
few hundred thousand unmodified zstreamtest iterations to verify.
When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty
compression job can be created. Additionally, `mtctx->frameEnded` can be
set to 1, which could potentially cause problems like unterminated blocks.
The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.
* Run compression twice and check the compressed data is byte-identical.
The compression loop had to be rewritten to ensure deteriminism. It is
guaranteed by always making maximal forward progress.
* When nbWorkers > 0, change the number of workers 1/8 of the time.
* Run in single-pass mode 1/4 of the time.
I've run a few hundred thousand iterations of zstreamtest and have seen
no deteriminism issues so far. Before the zstdmt fix that skips the
single-pass shortcut non-determinism showed up in a few hundred
iterations.
This commit leaves only the functions used by zstd_compress.c. All other
functions have been removed from the API. The ZSTDMT unit tests in
fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And
the --mt zstreamtest tests have been ripped out.
Simplifies the code and removes blocking from zstdmt.
At this point we could completely delete
`ZSTDMT_compress_advanced_internal()`. However I'm leaving it in because
I think we want to do that in the zstd-1.5.0 release, in case anyone is
still using the ZSTDMT API, even though it is not installed by default.
Fixes#2327.
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.
Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.
Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.
Adds a simple test to ensure that we aren't downsizing too far.
The DDS structure can't be copied into the working tables like the DMS.
So it doesn't need to account for the source size when sizing its
parameters, just the dictionary size.
Conditions to trigger:
* CDict is loaded as raw content.
* CDict starts with the zstd dictionary magic number.
* The CDict is reprocessed (not attached or copied).
* The new API is used (streaming or `ZSTD_compress2()`).
Bug: The dictionary is loaded as a zstd dictionary, not a raw content
dictionary, because the dict content type is set to `ZSTD_dct_auto`.
Fix: Pass in the dictionary content type from cdict creation to the call
to `ZSTD_compress_insertDictionary()`.
Test: Added a test case that exposes the bug, and fixed the raw
content tests to not modify the `dictBuffer`, which makes all future
tests with the `dictBuffer` raw content, which doesn't seem intentional.
Rename ADDRESS_SANITIZER -> ZSTD_ADDRESS_SANITIZER and same for
MEMORY_SANITIZER. Also set it to 0/1 instead of checking for defined.
This allows the user to override ASAN/MSAN detection for platforms that
don't support it.
This clarifies operator precedence, and quiets cppcheck in
the Kernel Test Robot. I think this is a slight bonus to
readability, so I am accepting the suggestion.
Even if the discrepancies are at the moment benign, it's probably better to
standardize on using the one true initializer, rather than trying (and failing)
to correctly duplicate its behavior.
Rewrite ZSTD_createCDict_advanced() as a wrapper around
ZSTD_createCDict_advanced2(). Evaluate whether to use DDSS mode *after* fully
resolving cparams. If not, fall back.
Add decompress_sources.h, which includes all the decompression .c files.
This is used for kernel decompression.
Also, add a test which checks that including decompress_sources.h works.
This is kind of hacky. And maybe this test doesn't need to be permanently as
exhaustive as it is now. But while we're actively developing the DDSS, we
should ensure it's compatible across many different modes.
Rather than restrict our temp chain table to 2 ** chainLog entries, this
commit uses all available space to reach further back to gather longer
chains to pack into the DDSS chain table.
Rather than interleave all of the chain table entries, tying each entry's
position to the corresponding position in the input, this commit changes the
layout so that all the entries in a single chain are laid out next to each
other. The last entry in the hash table's bucket for this hash is now a packed
pointer of position + length of this chain.
This cannot be merged as written, since it allocates temporary memory inside
ZSTD_dedicatedDictSearch_lazy_loadDictionary().
Previously, if DDSS was enabled on a CCtx and a dictionary was inserted into
the CCtx, the CCtx MatchState would be filled as a DDSS struct, causing
segfaults etc. This changes the check to use whether the MatchState is marked
as using the DDSS (which is only ever set for CDict MatchStates), rather than
looking at the CCtxParams.
Entries in the hashTable chain cache aren't subject to the same aliasing that
the circular chain table is subject to. As such, we don't need to stop when we
cross the chain limit. We can delve deeper. :)
This caused us to double-search the first position and fail to search the
last position in the chain, slowing down search and making it less effective.
This makes it clear that not only is the feature allowed here, we're actually
using it, as opposed to the CCtxParam field, in which it's enabled, but we may
or may not be using it.
The unused function definitions are hidden behind a
`#ifndef ZSTD_NO_UNUSED_FUNCTIONS` check.
Initially hiding all functions which are unused and take up more than
2KB of stack space, because these will show up as warnings in the
Linux Kernel build system.
Previously, this construct would add `-O3` onto the end of the compiler flags
variable, **after** `MOREFLAGS`, which meant that it was impossible to over-
ride. This commit fixes this order and should otherwise be a no-op.
Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
overflow_before_widen: Potentially overflowing expression:
cdict->dictContentSize * 6U
with type unsigned int (32 bits, unsigned) is evaluated using 32-bit
arithmetic, and then used in a context that expects an expression of
type U64 (64 bits, unsigned).
* Fix bug introduced in PR #2271
* Fix long-standing bug that is impossible to trigger inside of zstd
* Add a fuzzer that makes sure the normalized count always round trips
correctly
Rather than special-casing a check for `/dev/null`, this uses `stat()` to
avoid `chmod()`-ing any non-regular file. I believe this is the desirable
behavior. `UTIL_chmod()` is never called on directories at the moment, only
output files.
Instead of calling `stat()`, these functions accept the result of a previous
`stat()` call on the file in question, which will allow us to make multiple
decisions around a file without redundant `stat()` calls.
I want to introduce versions of many of these functions that take pre-
populated `stat_t` objects and use those rather than doing their own redundant
`stat()` internally. These functions will have `...Stat()` suffixes. So this
commit renames these existing functions into the active voice, to avoid
confusion.
new version easier to vectorize
leads to smaller code and faster execution
notably at the last recombination stage
(basically, fixed cost per block).
Assembly inspected with godbolt
On my laptop, with `clang` and `-mavx2` :
2K block : 1280 MB/s -> 1550 MB/s
8K block : 1750 MB/s -> 1860 MB/s
both `--long-commmand=arg` and `--long-command arg`
are supported by macro `NEXT_FIELD()`
so that the long-command only needs to be listed once.
This extends the syntax to support new syntaxes like
`-o=FILE` or `-D=dict`,
though there is no need to advertise this capability for the time being.
Also : added `NEXT_UINT32()`,
which is wrapper around `NEXT_FIELD()`
to read integer parameters.
Use the wrapper to new field, such as `--memlimit`
which can now support both syntaxes too.
for --patch-from origin
and --filelist list
Also : removed some constrained syntax tests,
as the new argument parsing syntax is more permissive.
For example :
zstd file -of dest
used to be disallowed.
It's now allowed, and understood as:
zstd file -o dest -f
this patch reduces complexity associated with
commands requiring a separated arguments
such as :
-o filename
-D dictionary
--output-dir-flat dir
--output-dir-mirror dir
It used to be a multi-stage process with explicit context,
it's now simplified as a single macro.
Thanks to this simplification,
separated arguments logic has also been extended to
--patch-from XXX
--filelist XXX
which extends existing capability using =XXX
--patch-from=XXX
--filelist=XXX
Separated argument is useful for filenames and directories,
as it benefits from shell expansion
such as ~/dir/file
where the ~ is automatically translated by the shell.
In contrast --long-command=FILE does not interpret FILE,
so ~/ is transmitted as is to the main() function,
generally resulting in incorrect file name.
Since version 1.4.5 and commit
5af8cb7aea8d890b4801e50e5274371510f2cf33, if st_mtime is not defined,
programs/util.c uses utime without including utime.h which will raise
the following build failure on some of the buildroot autobuilders:
util.c: In function 'UTIL_setFileStat':
util.c:161:24: error: storage size of 'timebuf' isn't known
struct utimbuf timebuf;
^~~~~~~
Fixes:
- http://autobuild.buildroot.org/results/be902c5d110f37bce622a2215191f155b7d3e7e0
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Fuzzing build modes (FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) doesn't
necessarily imply that assert() is enabled, according to the manual.
When the current do-nothing is expanded under -Wunused-variable (-Wall),
it results in unused variables in some of the FUZZING_BUILD_MODE...
blocks.
This patch extends the do-nothing to avoid the unused variable.
The bound check condition should always be met because we selected `set_basic` as
our encoding type. But that code is very far away, so assert it is true so if it is
ever false we can catch it, and add a bounds check.
Fixes#2213.
Allow compression to use dictionaries with missing symbols in their
entropy tables. We set the FSE repeat mode to check when there are
missing symbols, and set the FSE repeat mode to valid when all symbols
are present.
Note that when not all symbols are present, the heuristics which favor
dictionary tables for lower compression levels won't activate.
Tested by manually creating a dictionary with missing symbols of every
type, and validing that the compressor rejects it before this change,
and accepts it after this change. Also, I ran the `dictionary_loader`
fuzzer for >1 hour of CPU time without running into cases where
compression succeeds, but decompression fails.
Fixes#2174.
This adds support for just showing the version string
without the full welcome message if the log level is
less than the default. That means that you pass -q once
and you will just see "1.4.5".
This makes it easier to parse in scripts.
default rule is `lib-release`
`lib-release` wasn't working : it was just skipped.
Removing `lib-release` from the list of .PHONY targets fixes it.
Same for `lib-mt`.
this API is deprecated, for a loong time now,
all related symbols will be removed in a future version (likely v1.5.0)
and the header file `zbuff.h` doesn't compile from `include/` anyway,
because it needs to be positioned one directory below `zstd.h`.
Also removed `cover.h` from `cmake` installer,
as it should have never been part of this list to begin with.
# Dev PR jobs that still have to be migrated from travis
#
# icc (need self-hosted)
# arm/qemu-arm (need self-hosted)
# versionTag
# valgrindTest (keeps failing for some reason. need investigation)
# staticAnalyze (need trusty so need self-hosted)
# pcc-fuzz: (need trusty so need self-hosted)
# arm-build-test (need self-hosted)
This commit pulls out the internals of `ZSTD_estimateCCtxSize_usingCCtxParams`
into a helper. It then migrates two other callsites to use that helper,
a small optimization for `ZSTD_estimateCStreamSize_usingCCtxParams`, which
folds the buffer sizing into the helper, and then `ZSTD_resetCCtx_internal`,
which is more invasive.
This attempts to guarantee that the estimates returned to users are always
correct.
Memory constrained use cases that manage multiple archives benefit from
retaining multiple archive seek tables without retaining a ZSTD_seekable
instance for each.
* New opaque type for seek table: ZSTD_seekTable.
* ZSTD_seekable_copySeekTable() supports copying seek table out of a
ZSTD_seekable.
* ZSTD_seekTable_[eachSeekTableOp]() defines seek table API that mirrors
existing seek table operations.
* Existing ZSTD_seekable_[eachSeekTableOp]() retained; they delegate to
ZSTD_seekTable the variant.
These changes allow the above-mentioned use cases to initialize a
ZSTD_seekable, extract its ZSTD_seekTable, then throw the ZSTD_seekable
away to save memory. Standard ZSTD operations can then be used to
decompress frames based on seek table offsets.
The copy and delegate patterns are intended to minimize impact on
existing code and clients. Using copy instead of move for the infrequent
operation extracting a seek table ensures that the extraction does not
render the ZSTD_seekable useless. Delegating to *new* seek
table-oriented APIs ensures that this is not a breaking change for
existing clients while supporting all meaningful operations that depend
only on seek table data.
2020-05-07 09:31:43 -04:00
631 changed files with 60979 additions and 51600 deletions
| gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
# add signed entry to apt sources and configure the APT client to use Intel repository:
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
New versions are being developed in the "dev" branch,
or in their own feature branch.
When they are deemed ready for a release, they are merged into "master".
When they are deemed ready for a release, they are merged into "release".
As a consequences, all contributions must stage first through "dev"
As a consequence, all contributions must stage first through "dev"
or their own feature branch.
## Pull Requests
@ -47,7 +47,7 @@ Our contribution process works in three main stages:
* Topic and development:
* Make a new branch on your fork about the topic you're developing for
```
# branch names should be consise but sufficiently informative
# branch names should be concise but sufficiently informative
git checkout -b <branch-name>
git push origin <branch-name>
```
@ -60,7 +60,7 @@ Our contribution process works in three main stages:
* Note: run local tests to ensure that your changes didn't break existing functionality
* Quick check
```
make shortest
make check
```
* Longer check
```
@ -68,8 +68,8 @@ Our contribution process works in three main stages:
```
2. Code Review and CI tests
* Ensure CI tests pass:
* Before sharing anything to the community, make sure that all CI tests pass on your local fork.
See our section on setting up your CI environment for more information on how to do this.
* Before sharing anything to the community, create a pull request in your own fork against the dev branch
and make sure that all GitHub Actions CI tests pass. See the Continuous Integration section below for more information.
* Ensure that static analysis passes on your development machine. See the Static Analysis section
below to see how to do this.
* Create a pull request:
@ -104,7 +104,7 @@ Our contribution process works in three main stages:
issue at hand, then please indicate this by requesting that an issue be closed by commenting.
* Just because your changes have been merged does not mean the topic or larger issue is complete. Remember
that the change must make it to an official zstd release for it to be meaningful. We recommend
that contributers track the activity on their pull request and corresponding issue(s) page(s) until
that contributors track the activity on their pull request and corresponding issue(s) page(s) until
their change makes it to the next release of zstd. Users will often discover bugs in your code or
suggest ways to refine and improve your initial changes even after the pull request is merged.
@ -126,6 +126,56 @@ just `contrib/largeNbDicts` and nothing else, you can run:
scan-build make -C contrib/largeNbDicts largeNbDicts
```
### Pitfalls of static analysis
`scan-build` is part of our regular CI suite. Other static analyzers are not.
It can be useful to look at additional static analyzers once in a while (and we do), but it's not a good idea to multiply the nb of analyzers run continuously at each commit and PR. The reasons are :
- Static analyzers are full of false positive. The signal to noise ratio is actually pretty low.
- A good CI policy is "zero-warning tolerance". That means that all issues must be solved, including false positives. This quickly becomes a tedious workload.
- Multiple static analyzers will feature multiple kind of false positives, sometimes applying to the same code but in different ways leading to :
+ tortuous code, trying to please multiple constraints, hurting readability and therefore maintenance. Sometimes, such complexity introduce other more subtle bugs, that are just out of scope of the analyzers.
+ sometimes, these constraints are mutually exclusive : if one try to solve one, the other static analyzer will complain, they can't be both happy at the same time.
- As if that was not enough, the list of false positives change with each version. It's hard enough to follow one static analyzer, but multiple ones with their own update agenda, this quickly becomes a massive velocity reducer.
This is different from running a static analyzer once in a while, looking at the output, and __cherry picking__ a few warnings that seem helpful, either because they detected a genuine risk of bug, or because it helps expressing the code in a way which is more readable or more difficult to misuse. These kinds of reports can be useful, and are accepted.
## Continuous Integration
CI tests run every time a pull request (PR) is created or updated. The exact tests
that get run will depend on the destination branch you specify. Some tests take
longer to run than others. Currently, our CI is set up to run a short
series of tests when creating a PR to the dev branch and a longer series of tests
when creating a PR to the release branch. You can look in the configuration files
of the respective CI platform for more information on what gets run when.
Most people will just want to create a PR with the destination set to their local dev
branch of zstd. You can then find the status of the tests on the PR's page. You can also
re-run tests and cancel running tests from the PR page or from the respective CI's dashboard.
Almost all of zstd's CI runs on GitHub Actions (configured at `.github/workflows`), which will automatically run on PRs to your
own fork. A small number of tests run on other services (e.g. Travis CI, Circle CI, Appveyor).
These require work to set up on your local fork, and (at least for Travis CI) cost money.
Therefore, if the PR on your local fork passes GitHub Actions, feel free to submit a PR
against the main repo.
### Third-party CI
A small number of tests cannot run on GitHub Actions, or have yet to be migrated.
For these, we use a variety of third-party services (listed below). It is not necessary to set
these up on your fork in order to contribute to zstd; however, we do link to instructions for those
| Travis CI | Used for testing on non-x86 architectures such as PowerPC | https://docs.travis-ci.com/user/tutorial/#to-get-started-with-travis-ci-using-github <br> https://github.com/marketplace/travis-ci | `.travis.yml` |
| AppVeyor | Used for some Windows testing (e.g. cygwin, mingw) | https://www.appveyor.com/blog/2018/10/02/github-apps-integration/ <br> https://github.com/marketplace/appveyor | `appveyor.yml` |
| Cirrus CI | Used for testing on FreeBSD | https://github.com/marketplace/cirrus-ci/ | `.cirrus.yml` |
| Circle CI | Historically was used to provide faster signal,<br/> but we may be able to migrate these to Github Actions | https://circleci.com/docs/2.0/getting-started/#setting-up-circleci <br> https://youtu.be/Js3hMUsSZ2c <br> https://circleci.com/docs/2.0/enable-checks/ | `.circleci/config.yml` |
Note: the instructions linked above mostly cover how to set up a repository with CI from scratch.
The general idea should be the same for setting up CI on your fork of zstd, but you may have to
follow slightly different steps. In particular, please ignore any instructions related to setting up
config files (since zstd already has configs for each of these services).
## Performance
Performance is extremely important for zstd and we only merge pull requests whose performance
landscape and corresponding trade-offs have been adequately analyzed, reproduced, and presented.
@ -147,7 +197,7 @@ something subtle merged is extensive benchmarking. You will be doing us a great
take the time to run extensive, long-duration, and potentially cross-(os, platform, process, etc)
benchmarks on your end before submitting a PR. Of course, you will not be able to benchmark
your changes on every single processor and os out there (and neither will we) but do that best
you can:) We've adding some things to think about when benchmarking below in the Benchmarking
you can:) We've added some things to think about when benchmarking below in the Benchmarking
Performance section which might be helpful for you.
3. Optimizing performance for a certain OS, processor vendor, compiler, or network system is a perfectly
legitimate thing to do as long as it does not harm the overall performance health of Zstd.
@ -166,7 +216,7 @@ will typically not be stable enough to obtain reliable benchmark results. If you
hands on a desktop, this is usually a better scenario.
Of course, benchmarking can be done on non-hyper-stable machines as well. You will just have to
do a little more work to ensure that you are in fact measuring the changes you've made not and
do a little more work to ensure that you are in fact measuring the changes you've made and not
noise. Here are some things you can do to make your benchmarks more stable:
1. The most simple thing you can do to drastically improve the stability of your benchmark is
@ -223,7 +273,7 @@ for that options you have just provided. If you want to look at the internals of
benchmarking script works, you can check out programs/benchzstd.c
For example: say you have made a change that you believe improves the speed of zstd level 1. The
very first thing you should use to asses whether you actually achieved any sort of improvement
very first thing you should use to assess whether you actually achieved any sort of improvement
is `zstd -b`. You might try to do something like this. Note: you can use the `-i` option to
specify a running time for your benchmark in seconds (default is 3 seconds).
Usually, the longer the running time, the more stable your results will be.
@ -249,24 +299,24 @@ this method of evaluation will not be sufficient.
### Profiling
There are a number of great profilers out there. We're going to briefly mention how you can
profile your code using `instruments` on mac, `perf` on linux and `visual studio profiler`
on windows.
on Windows.
Say you have an idea for a change that you think will provide some good performance gains
for level 1 compression on Zstd. Typically this means, you have identified a section of
code that you think can be made to run faster.
The first thing you will want to do is make sure that the piece of code is actually taking up
a notable amount of time to run. It is usually not worth optimzing something which accounts for less than
a notable amount of time to run. It is usually not worth optimizing something which accounts for less than
0.0001% of the total running time. Luckily, there are tools to help with this.
Profilers will let you see how much time your code spends inside a particular function.
If your target code snippit is only part of a function, it might be worth trying to
isolate that snippit by moving it to its own function (this is usually not necessary but
If your target code snippet is only part of a function, it might be worth trying to
isolate that snippet by moving it to its own function (this is usually not necessary but
might be).
Most profilers (including the profilers dicusssed below) will generate a call graph of
functions for you. Your goal will be to find your function of interest in this call grapch
and then inspect the time spent inside of it. You might also want to to look at the
annotated assembly which most profilers will provide you with.
Most profilers (including the profilers discussed below) will generate a call graph of
functions for you. Your goal will be to find your function of interest in this call graph
and then inspect the time spent inside of it. You might also want to look at the annotated
assembly which most profilers will provide you with.
#### Instruments
We will once again consider the scenario where you think you've identified a piece of code
@ -280,23 +330,23 @@ Instruments.
* You will want a benchmark that runs for at least a few seconds (5 seconds will
usually be long enough). This way the profiler will have something to work with
and you will have ample time to attach your profiler to this process:)
* I will just use benchzstd as my bencharmking script for this example:
* I will just use benchzstd as my benchmarmking script for this example:
```
$ zstd -b1 -i5 <my-data> # this will run for 5 seconds
```
5. Once you run your benchmarking script, switch back over to instruments and attach your
process to the time profiler. You can do this by:
* Clicking on the `All Processes` drop down in the top left of the toolbar.
* Selecting your process from the dropdown. In my case, it is just going to be labled
* Selecting your process from the dropdown. In my case, it is just going to be labeled
`zstd`
* Hitting the bright red record circle button on the top left of the toolbar
6. You profiler will now start collecting metrics from your bencharking script. Once
6. You profiler will now start collecting metrics from your benchmarking script. Once
you think you have collected enough samples (usually this is the case after 3 seconds of
recording), stop your profiler.
7. Make sure that in toolbar of the bottom window, `profile` is selected.
8. You should be able to see your call graph.
* If you don't see the call graph or an incomplete call graph, make sure you have compiled
zstd and your benchmarking scripg using debug flags. On mac and linux, this just means
zstd and your benchmarking script using debug flags. On mac and linux, this just means
you will have to supply the `-g` flag alone with your build script. You might also
have to provide the `-fno-omit-frame-pointer` flag
9. Dig down the graph to find your function call and then inspect it by double clicking
@ -315,7 +365,7 @@ Some general notes on perf:
counter statistics. Perf uses a high resolution timer and this is likely one
of the first things your team will run when assessing your PR.
* Perf has a long list of hardware counters that can be viewed with `perf --list`.
When measuring optimizations, something worth trying is to make sure the handware
When measuring optimizations, something worth trying is to make sure the hardware
counters you expect to be impacted by your change are in fact being so. For example,
if you expect the L1 cache misses to decrease with your change, you can look at the
$(MAKE) -C $(PRGDIR) all CC="$(CXX) -Wno-deprecated"CFLAGS="$(CXXFLAGS)"# adding -Wno-deprecated to avoid clang++ warning on dealing with C files directly
gcc5test:clean
gcc-5 -v
$(MAKE) all CC=gcc-5 MOREFLAGS="-Werror"
$(MAKE) all CC=gcc-5 MOREFLAGS="-Werror$(MOREFLAGS)"
gcc6test:clean
gcc-6 -v
$(MAKE) all CC=gcc-6 MOREFLAGS="-Werror"
$(MAKE) all CC=gcc-6 MOREFLAGS="-Werror$(MOREFLAGS)"
armtest:clean
$(MAKE) -C $(TESTDIR) datagen # use native, faster
$(MAKE)testCC=clang MOREFLAGS="-g -fsanitize=memory -fno-omit-frame-pointer -Werror" HAVE_LZMA=0# datagen.c fails this test for no obvious reason
$(MAKE)testCC=clang MOREFLAGS="-g -fsanitize=memory -fno-omit-frame-pointer -Werror$(MOREFLAGS)" HAVE_LZMA=0# datagen.c fails this test for no obvious reason
@ -4,23 +4,21 @@ __Zstandard__, or `zstd` as short version, is a fast lossless compression algori
targeting real-time compression scenarios at zlib-level and better compression ratios.
It's backed by a very fast entropy stage, provided by [Huff0 and FSE library](https://github.com/Cyan4973/FiniteStateEntropy).
The project is provided as an open-source dual [BSD](LICENSE) and [GPLv2](COPYING) licensed **C** library,
Zstandard's format is stable and documented in [RFC8878](https://datatracker.ietf.org/doc/html/rfc8878). Multiple independent implementations are already available.
This repository represents the reference implementation, provided as an open-source dual [BSD](LICENSE) OR [GPLv2](COPYING) licensed **C** library,
and a command line utility producing and decoding `.zst`, `.gz`, `.xz` and `.lz4` files.
Should your project require another programming language,
a list of known ports and bindings is provided on [Zstandard homepage](http://www.zstd.net/#other-languages).
a list of known ports and bindings is provided on [Zstandard homepage](https://facebook.github.io/zstd/#other-languages).
**Development branch status:**
[![Build Status][travisDevBadge]][travisLink]
[![Build status][AppveyorDevBadge]][AppveyorLink]
[![Build status][CircleDevBadge]][CircleLink]
[![Build status][CirrusDevBadge]][CirrusLink]
[![Fuzzing Status][OSSFuzzBadge]][OSSFuzzLink]
[travisDevBadge]: https://travis-ci.org/facebook/zstd.svg?branch=dev "Continuous Integration test suite"
[travisLink]: https://travis-ci.org/facebook/zstd
[AppveyorDevBadge]: https://ci.appveyor.com/api/projects/status/xt38wbdxjk5mrbem/branch/dev?svg=true "Windows test suite"
The zstd Conan recipe is kept up to date by Conan maintainers and community contributors.
If the version is out of date, please [create an issue or pull request](https://github.com/conan-io/conan-center-index) on the ConanCenterIndex repository.
### Visual Studio (Windows)
Going into `build` directory, you will find additional possibilities:
@ -176,18 +208,30 @@ Going into `build` directory, you will find additional possibilities:
You can build the zstd binary via buck by executing: `buck build programs:zstd` from the root of the repo.
The output binary will be in `buck-out/gen/programs/`.
### Bazel
You easily can integrate zstd into your Bazel project by using the module hosted on the [Bazel Central Repository](https://registry.bazel.build/modules/zstd).
## Testing
You can run quick local smoke tests by running `make check`.
If you can't use `make`, execute the `playTest.sh` script from the `src/tests` directory.
Two env variables `$ZSTD_BIN` and `$DATAGEN_BIN` are needed for the test script to locate the `zstd` and `datagen` binary.
For information on CI testing, please refer to `TESTING.md`.
## Status
Zstandard is currently deployed within Facebook. It is used continuously to compress large amounts of data in multiple formats and use cases.
Zstandard is currently deployed within Facebook and many other large cloud infrastructures.
It is run continuously to compress large amounts of data in multiple formats and use cases.
Zstandard is considered safe for production environments.
## License
Zstandard is dual-licensed under [BSD](LICENSE) and [GPLv2](COPYING).
Zstandard is dual-licensed under [BSD](LICENSE) OR [GPLv2](COPYING).
## Contributing
The "dev" branch is the one where all contributions are merged before reaching "master".
If you plan to propose a patch, please commit into the "dev" branch, or its own feature branch.
Direct commit to "master" are not permitted.
The `dev` branch is the one where all contributions are merged before reaching `release`.
If you plan to propose a patch, please commit into the `dev` branch, or its own feature branch.
Direct commit to `release` are not permitted.
For more information, please read [CONTRIBUTING](CONTRIBUTING.md).
Please do not open GitHub issues or pull requests - this makes the problem immediately visible to everyone, including malicious actors. Security issues in this open source project can be safely reported via the Meta Bug Bounty program:
https://www.facebook.com/whitehat
Meta's security team will triage your report and determine whether or not is it eligible for a bounty under our program.
# Receiving Vulnerability Notifications
In the case that a significant security vulnerability is reported to us or discovered by us---without being publicly known---we will, at our discretion, notify high-profile, high-exposure users of Zstandard ahead of our public disclosure of the issue and associated fix.
If you believe your project would benefit from inclusion in this list, please reach out to one of the maintainers.
<!-- Note to maintainers: this list is kept [here](https://fburl.com/wiki/cgc1l62x). -->
SET ADDITIONALPARAM=/p:LibraryPath="C:\Program Files\Microsoft SDKs\Windows\v7.1\lib\x64;c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\lib\amd64;C:\Program Files (x86)\Microsoft Visual Studio 10.0\;C:\Program Files (x86)\Microsoft Visual Studio 10.0\lib\amd64;"
SET ADDITIONALPARAM=/p:LibraryPath="C:\Program Files\Microsoft SDKs\Windows\v7.1\lib\x64;c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\lib\amd64;C:\Program Files (x86)\Microsoft Visual Studio 10.0\;C:\Program Files (x86)\Microsoft Visual Studio 10.0\lib\amd64;"
)
build_script:
- ECHO Building %COMPILER% %PLATFORM% %CONFIGURATION%
It's generally recommended to have CMake with versions higher than 3.14 for [iOS-derived platforms](https://cmake.org/cmake/help/latest/manual/cmake-toolchains.7.html#id27).
[Looking for a 'cmake clean' command to clear up CMake output](https://stackoverflow.com/questions/9680420/looking-for-a-cmake-clean-command-to-clear-up-cmake-output)
message(WARNING"BUILD_SHARED_LIBS is OFF, but ZSTD_BUILD_SHARED is ON and ZSTD_BUILD_STATIC is OFF, which takes precedence, so libzstd is a shared library")
message(WARNING"BUILD_SHARED_LIBS is ON, but ZSTD_BUILD_SHARED is OFF and ZSTD_BUILD_STATIC is ON, which takes precedence, is set so libzstd is a static library")
It's possible to create a compressor-only library but since the decompressor is so small in comparison this doesn't bring much of a gain (but for the curious, simply remove the files in the _decompress_ section at the end of `zstd-in.c`).
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.