827 Commits

Author SHA1 Message Date
ZijianLi
87cc127705 - Modify the GCC version used for CI testing of the RISCV architecture
- Fix a bug in the ZSTD_row_getRVVMask function
- Improve some performance for ZSTD_copy16()
2025-09-26 22:34:57 +08:00
Ryan Lefkowitz
c59812e558 🔧 Fix memory leak in pthread init functions on failure
When pthread_mutex_init() or pthread_cond_init() fails in the debug
implementation (DEBUGLEVEL >= 1), the previously allocated memory was
not freed, causing a memory leak.

This fix ensures that allocated memory is properly freed when pthread
initialization functions fail, preventing resource leaks in error
conditions.

The issue affects:
- ZSTD_pthread_mutex_init() at lib/common/threading.c:146
- ZSTD_pthread_cond_init() at lib/common/threading.c:167

This is particularly important for long-running applications or
scenarios with resource constraints where pthread initialization
might fail due to system limits.
2025-09-15 18:20:01 -04:00
ZijianLi
d04e7944dd add compiler version check. 2025-07-07 23:07:39 +08:00
Arpad Panyik
1e9d2006ae AArch64: Use better block copy8
The vector copy is only necessary for 16-byte blocks on AArch64.

Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8
compiled with "-O3 -march=armv8.2-a+sve2":

                 Clang-19  Clang-20    GCC-14    GCC-15
 1#silesia.tar:   +0.316%   +0.865%   +0.025%   +0.096%
 2#silesia.tar:   +0.689%   +1.374%   +0.027%   +0.065%
 3#silesia.tar:   +0.811%   +1.654%   +0.034%   +0.033%
 4#silesia.tar:   +0.912%   +1.755%   +0.027%   +0.042%
 5#silesia.tar:   +0.995%   +1.826%   +0.062%   +0.094%
 6#silesia.tar:   +0.976%   +1.777%   +0.065%   +0.104%
 7#silesia.tar:   +0.910%   +1.738%   +0.077%   +0.110%
2025-06-20 17:05:41 +00:00
Arpad Panyik
7e4937bc75 AArch64: Add SVE2 implementation of histogram computation
The existing scalar implementation uses a 4-way pipelined histogram
calculation which is very efficient on out-of-order CPUs. However,
this can be further accelerated using the SVE2 HISTSEG instructions -
which compute a histogram for 16 byte chunks in a vector register.

On a system with 128-bit vectors (VL128) we need 16 HISTSEG executions
to compute the histogram for the whole symbol space (0..255) of 16
bytes input. However we can only accumulate 15 of such 16 byte strips
before possible overflow. So we need to extend and save the 8-bit
histogram accumulators to 16-bit after every 240 byte chunks of input.
To store all in registers we would need 32 128-bit registers. Longer
SVE2 vectors could help here, if such machines become available.

The maximum input block size in Zstd is 128 KiB, so 16-bit accumulators
would not be enough. However an LZ pass will prepend the histogram
calculation, so it is impossible (my assumption) to overflow the 16-bit
accumulators.

The symbol distribution is also not uniform, the lower values are more
common, so we used a 3 pass algorithm to prevent stack spilling. In the
first pass we only compute histograms for 64 symbols (4-way SIMD) while
also computing the maximum symbol value. If we have symbol values
larger than 64 we start the second pass to compute the next 96 elements
of the histogram. The final pass calculates the remaining part of the
histogram (256 symbols in total) if needed. This split of histogram
generation gave the best overall results for performance.

This implementation is the best performing of a number of different
cache blocking schemes tested.

Compression uplifts on a Neoverse V2 system, using Zstd-1.5.8
(e26dde3d) as a baseline, compiled with "-O3 -march=armv8.2-a+sve2":

                 Clang-20    GCC-14
 1#silesia.tar:   +6.173%   +5.987%
 2#silesia.tar:   +5.200%   +5.011%
 3#silesia.tar:   +4.332%   +5.031%
 4#silesia.tar:   +2.789%   +3.064%
 5#silesia.tar:   +2.028%   +1.838%
 6#silesia.tar:   +1.562%   +1.340%
 7#silesia.tar:   +1.160%   +0.959%
2025-06-11 12:14:22 +00:00
李子建
d95123f2e6 Improve speed of ZSTD_compressSequencesAndLiterals() using RVV 2025-06-02 17:21:02 +08:00
Nick Terrell
0de4991942 Add a method for checking if ZSTD was compiled with flags that impact determinism 2025-03-07 10:31:19 -05:00
Yann Collet
db2d205ada fixed -Wconversion for lib/decompress/zstd_decompress_block.c 2025-02-26 10:01:05 -08:00
Yann Collet
30281d889f fix conversion warning 2025-02-26 07:41:34 -08:00
Yann Collet
54e9d46db4 added __clang__ to compiler-specific alignment attribute
when clang is used within msvc, `__GNUC__` isn't defined,
so testing `__clang__` explicitly is required.
2025-02-05 13:48:24 -08:00
Yann Collet
bcf404c0ab changed C11 keyword to _Alignas
so that it doesn't depend on #include
2025-02-05 13:25:14 -08:00
Yann Collet
26a2b5d5df
Merge pull request #4265 from pps83/static-bmi2-check
Check `STATIC_BMI2` instead of `STATIC_BMI2 == 1`
2025-01-31 14:39:20 -08:00
Pavel P
0cda0100ea fix formatting 2025-01-24 03:03:22 +02:00
Pavel P
f7e8fc339b Check STATIC_BMI2 instead of STATIC_BMI2 == 1 2025-01-24 03:03:21 +02:00
Pavel P
0a183620a3 Reorder __BMI2__ check
+ if `__BMI2__` defined, then set STATIC_BMI2 for all compilers
 + use `defined(_MSC_VER) && defined(__AVX2__)` as fallback for ms compiler
2025-01-24 03:02:47 +02:00
Pavel P
d486ccc9e9 Update comment for STATIC_BMI2 macro 2025-01-24 03:02:47 +02:00
Pavel P
1b15e888fc Move STATIC_BMI2 block as-is to portability_macros.h 2025-01-24 03:02:46 +02:00
Yann Collet
a7b59bcb7f
Merge pull request #4257 from pps83/dev-x64test
Use _M_X64 only without mixing with _M_AMD64
2025-01-23 12:50:27 -08:00
Yann Collet
55c0c5bdca
Merge pull request #4258 from pps83/dev-ZSTD_ALIGNED
Implement ZSTD_ALIGNED for ms compiler
2025-01-22 15:09:35 -08:00
Pavel P
a0872a8372 Implement ZSTD_ALIGNED for ms compiler 2025-01-21 02:33:25 +02:00
Pavel P
6c1d1cc600 Use _M_X64 only without mixing with _M_AMD64 2025-01-21 02:27:39 +02:00
Yann Collet
48b186f76b
Merge pull request #4253 from facebook/BitContainerType
minor: use BitContainerType when appropriate
2025-01-19 18:35:36 -08:00
Yann Collet
82346b92bb minor: generalize BitContainerType
technically equivalent to `size_t`,
but it's the proper type for underlying register representation.

This makes it possible to control register type, and therefore size, independently from `size_t`,
which can be useful on systems where `size_t` is 32-bit, while the architecture supports 64-bit registers.
2025-01-19 18:05:57 -08:00
Yann Collet
4bbf4a285d enable DYNAMIC_BMI2 by default on x86 (32-bit mode)
so far was only enabled for x64 (64-bit mode)
2025-01-19 08:11:59 -08:00
Yann Collet
a556559841 no longer limit automated BMI2 detection to x64
this was previously no triggered in x86 32-bit mode,
due to a limitation in `bitstream.h`, that was fixed in #4248.

Now, `bmi2` will be automatically detected and triggered
at compilation time, if the corresponding instruction set is enabled,
even in 32-bit mode.

Also: updated library documentation, to feature STATIC_BMI2 build variable
2025-01-19 00:08:57 -08:00
Yann Collet
27d7940631 minor: cosmetic, indentation 2025-01-18 22:49:16 -08:00
Yann Collet
9efb09749b added a CI test for x86 32-bit + avx2 combination
which is expected to be quite rare, but nonetheless possible.

This test is initially expected to fail, before integration of #4248 fix
2025-01-18 22:49:16 -08:00
Yann Collet
a469e7c083
Merge pull request #4248 from pps83/dev-bzhi32
Use _bzhi_u32 for 32-bit builds when building with STATIC_BMI2
2025-01-18 22:48:24 -08:00
Pavel P
fcd684b9b4 update sizeof check 2025-01-19 02:37:35 +02:00
Pavel P
d60c4d75e9 remove unrelated changes 2025-01-19 02:36:00 +02:00
Pavel P
462484d5dc change to BitContainerType 2025-01-19 02:34:41 +02:00
Pavel P
26e5fb3614 handle 32bit size_t when building for x64 2025-01-18 23:37:50 +02:00
Pavel P
936927a427 handle 32bit size_t when building for x64 2025-01-18 23:30:55 +02:00
Pavel P
ee17f4c6d2 Use _bzhi_u32 for 32-bit builds when building with STATIC_BMI2
`_bzhi_u64` is available only for 64-bit builds, while `BIT_getLowerBits` expects `nbBits` to be less than `BIT_MASK_SIZE` (`BIT_MASK_SIZE` is 32)
2025-01-18 21:33:04 +02:00
Pavel P
46e17b805b [asm] Enable x86_64 asm for windows builds 2025-01-18 05:33:08 +02:00
Yann Collet
8bff69af86 Alignment instruction ZSTD_ALIGNED() in common/compiler.h 2025-01-15 17:11:27 -08:00
Yann Collet
6f8e6f3c97 create new compilation macro ZSTD_ARCH_X86_AVX2 2025-01-15 17:11:27 -08:00
MessyHack
42d704ad5e should check defined(_M_X64) not defined(_M_X86) when building with MSVC.
_M_X86 is only defined under MSVC 32Bit
_M_X64 is only defined under MSVC 64Bit
2025-01-10 22:47:48 -08:00
Victor Zhang
a610550e2c
Merge pull request #4218 from facebook/externC
Move #includes out of `extern "C"` blocks
2025-01-07 10:06:08 -08:00
Yann Collet
a2ff6ea784 improve ZSTD_getFrameHeader on skippable frames
now reports:
- the header size
- the magic variant (within @dictID field)
2024-12-29 12:26:04 -08:00
Yann Collet
b339efff2b add dedicated error code for special case
ZSTD_compressSequencesAndLiterals() cannot produce an uncompressed block
2024-12-20 10:37:00 -08:00
Yann Collet
0a5c0807af minor conversion warning fix 2024-12-20 10:36:59 -08:00
Yann Collet
477a01067f codemod: symbolEncodingType_e -> SymbolEncodingType_e 2024-12-20 10:36:56 -08:00
Yann Collet
b4a40a845f move Sequences definition to zstd_compress_internal.h
they should not be in common/zstd_internal.h,
since these definitions are not shared beyond lib/compress/.
2024-12-20 10:36:55 -08:00
Victor Zhang
8f49db5a02 Revert "Remove unnecessary extern C declarations from xxhash.h"
This reverts commit 10b9d81909f8631e3ac64bd45e3bdd04982e39d6.
2024-12-19 17:54:41 -08:00
Victor Zhang
10b9d81909 Remove unnecessary extern C declarations from xxhash.h 2024-12-19 16:54:32 -08:00
Victor Zhang
d0d5ce4c00 Remove extern C blocks from lib/* internal APIs (except xxhash.h) 2024-12-19 16:00:11 -08:00
Victor Zhang
d51e6072a8 Test: remove extern C from some lib/common files 2024-12-19 14:59:02 -08:00
Victor Zhang
a7bb6d6c49 Oopsie with xxhash.h [1/?] 2024-12-18 12:41:53 -08:00
Victor Zhang
07ffcc6b65 Separate xxhash includes from extern C blocks 2024-12-18 12:35:10 -08:00