sharpetronics/zstd - zstd - Gitea: Git with a cup of tea

mirror of https://github.com/facebook/zstd.git synced 2025-10-04 00:02:33 -04:00

Author	SHA1	Message	Date
ZijianLi	87cc127705	- Modify the GCC version used for CI testing of the RISCV architecture - Fix a bug in the ZSTD_row_getRVVMask function - Improve some performance for ZSTD_copy16()	2025-09-26 22:34:57 +08:00
Ryan Lefkowitz	c59812e558	🔧 Fix memory leak in pthread init functions on failure When pthread_mutex_init() or pthread_cond_init() fails in the debug implementation (DEBUGLEVEL >= 1), the previously allocated memory was not freed, causing a memory leak. This fix ensures that allocated memory is properly freed when pthread initialization functions fail, preventing resource leaks in error conditions. The issue affects: - ZSTD_pthread_mutex_init() at lib/common/threading.c:146 - ZSTD_pthread_cond_init() at lib/common/threading.c:167 This is particularly important for long-running applications or scenarios with resource constraints where pthread initialization might fail due to system limits.	2025-09-15 18:20:01 -04:00
ZijianLi	d04e7944dd	add compiler version check.	2025-07-07 23:07:39 +08:00
Arpad Panyik	1e9d2006ae	AArch64: Use better block copy8 The vector copy is only necessary for 16-byte blocks on AArch64. Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with "-O3 -march=armv8.2-a+sve2": Clang-19 Clang-20 GCC-14 GCC-15 1#silesia.tar: +0.316% +0.865% +0.025% +0.096% 2#silesia.tar: +0.689% +1.374% +0.027% +0.065% 3#silesia.tar: +0.811% +1.654% +0.034% +0.033% 4#silesia.tar: +0.912% +1.755% +0.027% +0.042% 5#silesia.tar: +0.995% +1.826% +0.062% +0.094% 6#silesia.tar: +0.976% +1.777% +0.065% +0.104% 7#silesia.tar: +0.910% +1.738% +0.077% +0.110%	2025-06-20 17:05:41 +00:00
Arpad Panyik	7e4937bc75	AArch64: Add SVE2 implementation of histogram computation The existing scalar implementation uses a 4-way pipelined histogram calculation which is very efficient on out-of-order CPUs. However, this can be further accelerated using the SVE2 HISTSEG instructions - which compute a histogram for 16 byte chunks in a vector register. On a system with 128-bit vectors (VL128) we need 16 HISTSEG executions to compute the histogram for the whole symbol space (0..255) of 16 bytes input. However we can only accumulate 15 of such 16 byte strips before possible overflow. So we need to extend and save the 8-bit histogram accumulators to 16-bit after every 240 byte chunks of input. To store all in registers we would need 32 128-bit registers. Longer SVE2 vectors could help here, if such machines become available. The maximum input block size in Zstd is 128 KiB, so 16-bit accumulators would not be enough. However an LZ pass will prepend the histogram calculation, so it is impossible (my assumption) to overflow the 16-bit accumulators. The symbol distribution is also not uniform, the lower values are more common, so we used a 3 pass algorithm to prevent stack spilling. In the first pass we only compute histograms for 64 symbols (4-way SIMD) while also computing the maximum symbol value. If we have symbol values larger than 64 we start the second pass to compute the next 96 elements of the histogram. The final pass calculates the remaining part of the histogram (256 symbols in total) if needed. This split of histogram generation gave the best overall results for performance. This implementation is the best performing of a number of different cache blocking schemes tested. Compression uplifts on a Neoverse V2 system, using Zstd-1.5.8 (e26dde3d) as a baseline, compiled with "-O3 -march=armv8.2-a+sve2": Clang-20 GCC-14 1#silesia.tar: +6.173% +5.987% 2#silesia.tar: +5.200% +5.011% 3#silesia.tar: +4.332% +5.031% 4#silesia.tar: +2.789% +3.064% 5#silesia.tar: +2.028% +1.838% 6#silesia.tar: +1.562% +1.340% 7#silesia.tar: +1.160% +0.959%	2025-06-11 12:14:22 +00:00
李子建	d95123f2e6	Improve speed of ZSTD_compressSequencesAndLiterals() using RVV	2025-06-02 17:21:02 +08:00
Nick Terrell	0de4991942	Add a method for checking if ZSTD was compiled with flags that impact determinism	2025-03-07 10:31:19 -05:00
Yann Collet	db2d205ada	fixed -Wconversion for lib/decompress/zstd_decompress_block.c	2025-02-26 10:01:05 -08:00
Yann Collet	30281d889f	fix conversion warning	2025-02-26 07:41:34 -08:00
Yann Collet	54e9d46db4	added __clang__ to compiler-specific alignment attribute when clang is used within msvc, `__GNUC__` isn't defined, so testing `__clang__` explicitly is required.	2025-02-05 13:48:24 -08:00
Yann Collet	bcf404c0ab	changed C11 keyword to _Alignas so that it doesn't depend on #include	2025-02-05 13:25:14 -08:00
Yann Collet	26a2b5d5df	Merge pull request #4265 from pps83/static-bmi2-check Check `STATIC_BMI2` instead of `STATIC_BMI2 == 1`	2025-01-31 14:39:20 -08:00
Pavel P	0cda0100ea	fix formatting	2025-01-24 03:03:22 +02:00
Pavel P	f7e8fc339b	Check `STATIC_BMI2` instead of `STATIC_BMI2 == 1`	2025-01-24 03:03:21 +02:00
Pavel P	0a183620a3	Reorder __BMI2__ check + if `__BMI2__` defined, then set STATIC_BMI2 for all compilers + use `defined(_MSC_VER) && defined(__AVX2__)` as fallback for ms compiler	2025-01-24 03:02:47 +02:00
Pavel P	d486ccc9e9	Update comment for STATIC_BMI2 macro	2025-01-24 03:02:47 +02:00
Pavel P	1b15e888fc	Move STATIC_BMI2 block as-is to portability_macros.h	2025-01-24 03:02:46 +02:00
Yann Collet	a7b59bcb7f	Merge pull request #4257 from pps83/dev-x64test Use _M_X64 only without mixing with _M_AMD64	2025-01-23 12:50:27 -08:00
Yann Collet	55c0c5bdca	Merge pull request #4258 from pps83/dev-ZSTD_ALIGNED Implement ZSTD_ALIGNED for ms compiler	2025-01-22 15:09:35 -08:00
Pavel P	a0872a8372	Implement ZSTD_ALIGNED for ms compiler	2025-01-21 02:33:25 +02:00
Pavel P	6c1d1cc600	Use _M_X64 only without mixing with _M_AMD64	2025-01-21 02:27:39 +02:00
Yann Collet	48b186f76b	Merge pull request #4253 from facebook/BitContainerType minor: use BitContainerType when appropriate	2025-01-19 18:35:36 -08:00
Yann Collet	82346b92bb	minor: generalize BitContainerType technically equivalent to `size_t`, but it's the proper type for underlying register representation. This makes it possible to control register type, and therefore size, independently from `size_t`, which can be useful on systems where `size_t` is 32-bit, while the architecture supports 64-bit registers.	2025-01-19 18:05:57 -08:00
Yann Collet	4bbf4a285d	enable DYNAMIC_BMI2 by default on x86 (32-bit mode) so far was only enabled for x64 (64-bit mode)	2025-01-19 08:11:59 -08:00
Yann Collet	a556559841	no longer limit automated BMI2 detection to x64 this was previously no triggered in x86 32-bit mode, due to a limitation in `bitstream.h`, that was fixed in #4248. Now, `bmi2` will be automatically detected and triggered at compilation time, if the corresponding instruction set is enabled, even in 32-bit mode. Also: updated library documentation, to feature STATIC_BMI2 build variable	2025-01-19 00:08:57 -08:00
Yann Collet	27d7940631	minor: cosmetic, indentation	2025-01-18 22:49:16 -08:00
Yann Collet	9efb09749b	added a CI test for x86 32-bit + avx2 combination which is expected to be quite rare, but nonetheless possible. This test is initially expected to fail, before integration of #4248 fix	2025-01-18 22:49:16 -08:00
Yann Collet	a469e7c083	Merge pull request #4248 from pps83/dev-bzhi32 Use _bzhi_u32 for 32-bit builds when building with STATIC_BMI2	2025-01-18 22:48:24 -08:00
Pavel P	fcd684b9b4	update sizeof check	2025-01-19 02:37:35 +02:00
Pavel P	d60c4d75e9	remove unrelated changes	2025-01-19 02:36:00 +02:00
Pavel P	462484d5dc	change to BitContainerType	2025-01-19 02:34:41 +02:00
Pavel P	26e5fb3614	handle 32bit size_t when building for x64	2025-01-18 23:37:50 +02:00
Pavel P	936927a427	handle 32bit size_t when building for x64	2025-01-18 23:30:55 +02:00
Pavel P	ee17f4c6d2	Use _bzhi_u32 for 32-bit builds when building with STATIC_BMI2 `_bzhi_u64` is available only for 64-bit builds, while `BIT_getLowerBits` expects `nbBits` to be less than `BIT_MASK_SIZE` (`BIT_MASK_SIZE` is 32)	2025-01-18 21:33:04 +02:00
Pavel P	46e17b805b	[asm] Enable x86_64 asm for windows builds	2025-01-18 05:33:08 +02:00
Yann Collet	8bff69af86	Alignment instruction ZSTD_ALIGNED() in common/compiler.h	2025-01-15 17:11:27 -08:00
Yann Collet	6f8e6f3c97	create new compilation macro ZSTD_ARCH_X86_AVX2	2025-01-15 17:11:27 -08:00
MessyHack	42d704ad5e	should check defined(_M_X64) not defined(_M_X86) when building with MSVC. _M_X86 is only defined under MSVC 32Bit _M_X64 is only defined under MSVC 64Bit	2025-01-10 22:47:48 -08:00
Victor Zhang	a610550e2c	Merge pull request #4218 from facebook/externC Move #includes out of `extern "C"` blocks	2025-01-07 10:06:08 -08:00
Yann Collet	a2ff6ea784	improve ZSTD_getFrameHeader on skippable frames now reports: - the header size - the magic variant (within @dictID field)	2024-12-29 12:26:04 -08:00
Yann Collet	b339efff2b	add dedicated error code for special case ZSTD_compressSequencesAndLiterals() cannot produce an uncompressed block	2024-12-20 10:37:00 -08:00
Yann Collet	0a5c0807af	minor conversion warning fix	2024-12-20 10:36:59 -08:00
Yann Collet	477a01067f	codemod: symbolEncodingType_e -> SymbolEncodingType_e	2024-12-20 10:36:56 -08:00
Yann Collet	b4a40a845f	move Sequences definition to zstd_compress_internal.h they should not be in common/zstd_internal.h, since these definitions are not shared beyond lib/compress/.	2024-12-20 10:36:55 -08:00
Victor Zhang	8f49db5a02	Revert "Remove unnecessary extern C declarations from xxhash.h" This reverts commit 10b9d81909f8631e3ac64bd45e3bdd04982e39d6.	2024-12-19 17:54:41 -08:00
Victor Zhang	10b9d81909	Remove unnecessary extern C declarations from xxhash.h	2024-12-19 16:54:32 -08:00
Victor Zhang	d0d5ce4c00	Remove extern C blocks from lib/* internal APIs (except xxhash.h)	2024-12-19 16:00:11 -08:00
Victor Zhang	d51e6072a8	Test: remove extern C from some lib/common files	2024-12-19 14:59:02 -08:00
Victor Zhang	a7bb6d6c49	Oopsie with xxhash.h [1/?]	2024-12-18 12:41:53 -08:00
Victor Zhang	07ffcc6b65	Separate xxhash includes from extern C blocks	2024-12-18 12:35:10 -08:00

1 2 3 4 5 ...

827 Commits