sharpetronics/zstd - zstd - Gitea: Git with a cup of tea

mirror of https://github.com/facebook/zstd.git synced 2025-10-04 00:02:33 -04:00

Author	SHA1	Message	Date
Yann Collet	c15fa3cd40	update documentation of ZSTD_getFrameContentSize() hopefully answering #4495	2025-09-23 23:17:11 -07:00
Yann Collet	4c1f86c777	fix minor warning in legacy decoders for mingw + clang CI test	2025-09-23 13:01:38 -07:00
Yann Collet	be072c708e	Added documentation details for Makefile installation and pkg-config.	2025-09-20 16:33:41 +00:00
Yann Collet	085cc9319a	Merge pull request #4486 from rlefko/fix-pthread-init-memleak Fix memory leak in pthread init functions on failure	2025-09-19 21:42:21 -08:00
Ryan Lefkowitz	c59812e558	🔧 Fix memory leak in pthread init functions on failure When pthread_mutex_init() or pthread_cond_init() fails in the debug implementation (DEBUGLEVEL >= 1), the previously allocated memory was not freed, causing a memory leak. This fix ensures that allocated memory is properly freed when pthread initialization functions fail, preventing resource leaks in error conditions. The issue affects: - ZSTD_pthread_mutex_init() at lib/common/threading.c:146 - ZSTD_pthread_cond_init() at lib/common/threading.c:167 This is particularly important for long-running applications or scenarios with resource constraints where pthread initialization might fail due to system limits.	2025-09-15 18:20:01 -04:00
w1m024	fb7a86f20f	Refactor ZSTD_row_getMatchMask for RVV optimization Performance (vs. SWAR) - 16-byte data: 5.87x speedup - 32-byte data: 9.63x speedup - 64-byte data: 17.98x speedup Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>	2025-09-11 20:45:54 +00:00
w1m024	c9d2cbd5ba	add RVV optimization for ZSTD_row_getMatchMask Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>	2025-09-09 06:20:55 +00:00
Yann Collet	b5c294ea01	Merge pull request #4440 from arpadpanyik-arm/convert_seq_sve2 AArch64: Add SVE2 path for convertSequences_noRepcodes	2025-08-21 17:20:33 -07:00
Arpad Panyik	2849f3a5d1	AArch64: Add SVE2 path for convertSequences_noRepcodes Add an 8-way vector length agnostic (VLA) SVE2 code path for convertSequences_noRepcodes. It works with any SVE vector length. Relative performance to GCC-13 using: `./fullbench -b18 -l5 enwik5` Neon SVE2 Neoverse-V2 before after uplift GCC-13: 100.000% 103.209% 1.032x GCC-14: 100.309% 134.872% 1.344x GCC-15: 100.355% 134.827% 1.343x Clang-18: 123.614% 128.565% 1.040x Clang-19: 123.587% 132.984% 1.076x Clang-20: 123.629% 133.023% 1.075x Neon SVE2 Cortex-A720 before after uplift GCC-13: 100.000% 116.032% 1.160x GCC-14: 99.700% 116.648% 1.169x GCC-15: 100.354% 117.047% 1.166x Clang-18: 100.447% 116.762% 1.162x Clang-19: 100.454% 116.627% 1.160x Clang-20: 100.452% 116.649% 1.161x	2025-08-21 17:37:41 +00:00
Yann Collet	290e692ef8	Merge pull request #4463 from brad0/gnu_source_qsort Check for build environment instead of just _GNU_SOURCE	2025-08-21 09:30:29 -07:00
Thirumalai Nagalingam	42243c3d46	CI: Update build_package.bat for CMake builds	2025-08-20 17:12:05 +05:30
Brad Smith	0d1f8de9ad	Check for build environment instead of just _GNU_SOURCE Fixes the build on OpenBSD and NetBSD. It is too easy for _GNU_SOURCE to be defined even on non-Linux systems. Found via py-zstandard with the embedded copy of zstandard and Python defines _GNU_SOURCE. Also simplify the Linux checking, there is no need to check the rest of the symbol names.	2025-08-19 20:06:24 -04:00
Yann Collet	40c285e0ba	Merge pull request #4419 from AZero13/patch-1 Check for job before releasing resources	2025-08-19 17:02:48 -07:00
Yann Collet	e128976193	Merge pull request #4448 from Cyan4973/install_oses regroup list of OSes for install inside common variable	2025-07-28 11:01:58 -08:00
Yann Collet	8bca04ba9f	regroup list of OSes for install inside common variable within lib/install_oses.mk. fixes #4445	2025-07-28 11:33:22 -07:00
Yann Collet	34f3a0ab11	Merge pull request #4413 from arpadpanyik-arm/huf_decode2x AArch64: Enhance struct access in Huffman decode 2X	2025-07-23 15:03:37 -08:00
Yann Collet	6f1cb87ade	Merge pull request #4443 from facebook/opt_simplify_4442 simplify sequence resolution in zstd_opt	2025-07-23 15:01:36 -08:00
Yann Collet	0055ce7a02	simplify sequence resolution in zstd_opt initially hinted by @pitaj in #4442	2025-07-18 21:21:47 -07:00
Yann Collet	f9e26bb42b	Merge pull request #4394 from AZero13/zstd Remove redundant setting of allJobsCompleted to 1	2025-07-18 18:55:47 -08:00
Yann Collet	8c651868ff	Merge pull request #4418 from arpadpanyik-arm/decode_seq_opt AArch64: Improve ZSTD_decodeSequence performance	2025-07-18 18:54:49 -08:00
Yann Collet	a1e11db08a	Merge pull request #4435 from zijianli1234/dev add riscv ci	2025-07-18 18:54:24 -08:00
Arpad Panyik	07cd78d366	AArch64: Add Neon path for convertSequences_noRepcodes Add a 4-way Neon implementation for the convertSequences_noRepcodes function. Remove 'static' keywords from all of its implementations to be able to add unit tests. Relative performance to Clang-18 using: `./fullbench -b18 -l5 enwik5` Neoverse-V2 before after Clang-18: 100.000% 311.703% Clang-19: 100.191% 311.714% Clang-20: 100.181% 311.723% GCC-13: 107.520% 252.309% GCC-14: 107.652% 253.158% GCC-15: 107.674% 253.168% Cortex-A720 before after Clang-18: 100.000% 204.512% Clang-19: 102.825% 204.600% Clang-20: 102.807% 204.558% GCC-13: 110.668% 203.594% GCC-14: 110.684% 203.978% GCC-15: 102.864% 204.299% Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>	2025-07-10 18:20:57 +00:00
Arpad Panyik	8e4400463a	Improve ZSTD_get1BlockSummary Add a faster scalar implementation of ZSTD_get1BlockSummary which removes the data dependency of the accumulators in the hot loop to leverage the superscalar potential of recent out-of-order CPUs. The new algorithm leverages SWAR (SIMD Within A Register) methodology to exploit the capabilities of 64-bit architectures. It achieves this by packing two 32-bit data elements into a single 64-bit register, enabling parallel operations on these subcomponents while ensuring that the 32-bit boundaries prevent overflow, thereby optimizing computational efficiency. Corresponding unit tests are included. Relative performance to GCC-13 using: `./fullbench -b19 -l5 enwik5` Neoverse-V2 before after GCC-13: 100.000% 290.527% GCC-14: 100.000% 291.714% GCC-15: 99.914% 291.495% Clang-18: 148.072% 264.524% Clang-19: 148.075% 264.512% Clang-20: 148.062% 264.490% Cortex-A720 before after GCC-13: 100.000% 235.261% GCC-14: 101.064% 234.903% GCC-15: 112.977% 218.547% Clang-18: 127.135% 180.359% Clang-19: 127.149% 180.297% Clang-20: 127.154% 180.260% Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>	2025-07-10 18:20:49 +00:00
ZijianLi	d04e7944dd	add compiler version check.	2025-07-07 23:07:39 +08:00
ZijianLi	2c3f23b018	fix dereferencing type-punned pointer error	2025-06-29 15:36:25 +08:00
Rose	4efbd56749	Check for job before releasing ZSTDMT_freeCCtx calls ZSTDMT_releaseAllJobResources, but ZSTDMT_releaseAllJobResources may be called when ZSTDMT_freeCCtx is called when initialization fails, resulting in a NULL pointer dereference.	2025-06-24 14:05:08 -04:00
Rose	50f169411b	Remove redundant setting of allJobsCompleted to 1 This will do it automatically.	2025-06-24 14:04:21 -04:00
Arpad Panyik	a28e8182b1	AArch64: Improve ZSTD_decodeSequence performance LLVM's alias-analysis sometimes fails to see that a static-array member of a struct cannot alias other members. This patch: - Reduces array accesses via struct indirection to aid load/store alias analysis under Clang. - Converts dynamic array indexing into conditional-move arithmetic, eliminating branches and extra loads/stores on out-of-order CPUs. - Reloads the bitstream only when match-length bits are consumed (assuming each reload only needs to happen once per match-length read), improving branch-prediction rates. - Removes the UNLIKELY() hint, which recent compilers already handle well without cost. Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with "-O3 -march=armv8.2-a+sve2": Clang-19 Clang-20 Clang-* GCC-14 GCC-15 1#silesia.tar: +11.556% +16.203% +0.240% +2.216% +7.891% 2#silesia.tar: +15.493% +21.140% -0.041% +2.850% +9.926% 3#silesia.tar: +16.887% +22.570% -0.183% +3.056% +10.660% 4#silesia.tar: +17.785% +23.315% -0.262% +3.343% +11.187% 5#silesia.tar: +18.125% +24.175% -0.466% +3.350% +11.228% 6#silesia.tar: +17.607% +23.339% -0.591% +3.175% +10.851% 7#silesia.tar: +17.463% +22.837% -0.486% +3.292% +10.868% * Requires Clang-21 support from LLVM commit hash `a53003fe23cb6c871e72d70ff2d3a075a7490da2` (Clang-21 hasn’t been released as of this writing) Co-authored by: David Sherwood, David.Sherwood@arm.com Ola Liljedahl, Ola.Liljedahl@arm.com	2025-06-24 12:22:23 +00:00
Arpad Panyik	bd38fc2c5f	AArch64: Enhance struct access in Huffman decode 2X In the multi-stream multi-symbol Huffman decoder GCC generates suboptimal code - emitting more loads for HUF_DEltX2 struct member accesses. Forcing it to use 32-bit loads and bit arithmetic to extract the necessary parts (UBFX) improves the overall decode speed. Also avoid integer type conversions in the symbol decodes, which leads to better instruction selection in table lookup accesses. On AArch64 the decoder no longer runs into register-pressure limits, so we can simplify the hot path and improve throughput Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with "-O3 -march=armv8.2-a+sve2": Clang-20 Clang-* GCC-13 GCC-14 GCC-15 1#silesia.tar: +0.820% +1.365% +2.480% +1.348% +0.987% 2#silesia.tar: +0.426% +0.784% +1.218% +0.665% +0.554% 3#silesia.tar: +0.112% +0.389% +0.508% +0.188% +0.261% * Requires Clang-21 support from LLVM commit hash `a53003fe23cb6c871e72d70ff2d3a075a7490da2` (Clang-21 hasn’t been released as of this writing)	2025-06-23 14:16:25 +00:00
Arpad Panyik	1e9d2006ae	AArch64: Use better block copy8 The vector copy is only necessary for 16-byte blocks on AArch64. Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with "-O3 -march=armv8.2-a+sve2": Clang-19 Clang-20 GCC-14 GCC-15 1#silesia.tar: +0.316% +0.865% +0.025% +0.096% 2#silesia.tar: +0.689% +1.374% +0.027% +0.065% 3#silesia.tar: +0.811% +1.654% +0.034% +0.033% 4#silesia.tar: +0.912% +1.755% +0.027% +0.042% 5#silesia.tar: +0.995% +1.826% +0.062% +0.094% 6#silesia.tar: +0.976% +1.777% +0.065% +0.104% 7#silesia.tar: +0.910% +1.738% +0.077% +0.110%	2025-06-20 17:05:41 +00:00
Yann Collet	7eefc22169	Merge pull request #4367 from ClickHouse/cfi Add unwind information in huf_decompress_amd64.S	2025-06-19 23:41:38 -07:00
Arpad Panyik	7e4937bc75	AArch64: Add SVE2 implementation of histogram computation The existing scalar implementation uses a 4-way pipelined histogram calculation which is very efficient on out-of-order CPUs. However, this can be further accelerated using the SVE2 HISTSEG instructions - which compute a histogram for 16 byte chunks in a vector register. On a system with 128-bit vectors (VL128) we need 16 HISTSEG executions to compute the histogram for the whole symbol space (0..255) of 16 bytes input. However we can only accumulate 15 of such 16 byte strips before possible overflow. So we need to extend and save the 8-bit histogram accumulators to 16-bit after every 240 byte chunks of input. To store all in registers we would need 32 128-bit registers. Longer SVE2 vectors could help here, if such machines become available. The maximum input block size in Zstd is 128 KiB, so 16-bit accumulators would not be enough. However an LZ pass will prepend the histogram calculation, so it is impossible (my assumption) to overflow the 16-bit accumulators. The symbol distribution is also not uniform, the lower values are more common, so we used a 3 pass algorithm to prevent stack spilling. In the first pass we only compute histograms for 64 symbols (4-way SIMD) while also computing the maximum symbol value. If we have symbol values larger than 64 we start the second pass to compute the next 96 elements of the histogram. The final pass calculates the remaining part of the histogram (256 symbols in total) if needed. This split of histogram generation gave the best overall results for performance. This implementation is the best performing of a number of different cache blocking schemes tested. Compression uplifts on a Neoverse V2 system, using Zstd-1.5.8 (e26dde3d) as a baseline, compiled with "-O3 -march=armv8.2-a+sve2": Clang-20 GCC-14 1#silesia.tar: +6.173% +5.987% 2#silesia.tar: +5.200% +5.011% 3#silesia.tar: +4.332% +5.031% 4#silesia.tar: +2.789% +3.064% 5#silesia.tar: +2.028% +1.838% 6#silesia.tar: +1.562% +1.340% 7#silesia.tar: +1.160% +0.959%	2025-06-11 12:14:22 +00:00
Michael Kolupaev	a480191f9e	Fix Darwin build of huf_decompress_amd64.S	2025-06-08 05:07:09 +00:00
Michael Kolupaev	80cac404c7	Add unwind information in huf_decompress_amd64.S	2025-06-08 05:07:09 +00:00
李子建	d95123f2e6	Improve speed of ZSTD_compressSequencesAndLiterals() using RVV	2025-06-02 17:21:02 +08:00
Nobuhiro Iwamatsu	2d224dc745	Add License variable to pkg-config file The pkg-config file has License variable that allows you to set the license for the software. This sets 'BSD-3-Clause OR GPL-2.0-only' to License. Ref: https://github.com/pkgconf/pkgconf/blob/master/man/pc.5#L116 Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>	2025-05-06 12:16:28 -07:00
Etienne Cordonnier	8929d3b09f	Fix duplicate LC_RPATH error on MacOS After the update to MacOS 15.4, the dynamic loader dyld treats duplicated LC_RPATH as an error. The `FLAGS` variable already contains `LDFLAGS`, thus using both `FLAGS` and `LDFLAGS` duplicates all `LDFLAGS`, including `-Wl,rpath` parameters. The duplicate LC_RPATH causes this kind of errors: ``` dyld[29361]: Library not loaded: @loader_path/../lib/libzstd.1.dylib Referenced from: <7131C877-3CF0-33AC-AA05-257BA4FDD770> /Users/foobar/... Reason: tried: '/Users/foobar/..../lib/libzstd.1.dylib' (duplicate LC_RPATH '/usr/mypath.../lib') ``` Closes https://github.com/facebook/zstd/issues/4369 Signed-off-by: Etienne Cordonnier <ecordonnier@snap.com>	2025-04-18 15:59:06 +02:00
Yann Collet	2fec3989c1	add an assert to help static analyzers understand there is no overflow risk there.	2025-03-22 18:23:31 -07:00
Z. Liu	cd8ca9d92e	lib/zstd.h: move pragma before static otherwise will cause dev-python/zstandard build failed when compiling with clang as reported at https://bugs.gentoo.org/950259 the root cause is pycparser, which is unfixed since reported 2.5 years ago, :( Signed-off-by: Z. Liu <zhixu.liu@gmail.com>	2025-03-20 03:40:42 +00:00
Yann Collet	4d53e27144	removed OpenBSD specificity	2025-03-12 09:55:14 -07:00
Yann Collet	ddcb41a282	updated documentation	2025-03-11 14:10:35 -07:00
Yann Collet	a9b8fef2e8	add support for C11 Annex K qsort_s() standard defined re-entrant variant of qsort(). Unfortunately, Annex K is optional.	2025-03-11 14:10:35 -07:00
Yann Collet	dcf675886b	re-design qsort() selection in cover centralizes auto detection tests, then distribute the outcome in all the places where it's active.	2025-03-11 14:10:35 -07:00
Yann Collet	51b6e79f65	fix #4312 and upgraded the test so that it would fail, both at compile time and at run time, without the fix	2025-03-11 14:10:35 -07:00
Nick Terrell	68dfd14a8c	[linux] Opt out of row based match finder for the kernel The row based match finder is slower without SIMD. We used to detect the presence of SIMD to set the lower bound to 17, but that breaks determinism. Instead, specifically opt into it for the kernel, because it is one of the rare cases that doesn't have SIMD support.	2025-03-11 16:18:59 -04:00
Yann Collet	2ff87aefac	fix FreeBSD use an alias instead of a function also: added more traces and updated version nb to v1.5.8	2025-03-10 19:04:41 -07:00
Nick Terrell	0de4991942	Add a method for checking if ZSTD was compiled with flags that impact determinism	2025-03-07 10:31:19 -05:00
Nick Terrell	190a620974	[zstd] Remove global variables in dictBuilder D50949782 fixed a race condition updating `g_displayLevel` by disabling display. Instead of disabling display, delete the global variable and always "capture" a local `displayLevel` variable. This also fixes `DISPLAYUPDATE()` by requiring the user to pass in the last update time as the first parameter.	2025-03-05 10:35:01 -05:00
Nick Terrell	d5b84f5a27	[zstd] Backport D49756856	2025-03-05 10:35:01 -05:00
Yann Collet	4e1723a7e4	fixed the script so that it fails when a copy fails and also: fix the list of files, as `zdict.h` was incorrectly set.	2025-02-27 16:18:44 -08:00

1 2 3 4 5 ...

4869 Commits