sharpetronics/zstd - zstd - Gitea: Git with a cup of tea

mirror of https://github.com/facebook/zstd.git synced 2025-10-04 00:02:33 -04:00

Author	SHA1	Message	Date
Arpad Panyik	bd38fc2c5f	AArch64: Enhance struct access in Huffman decode 2X In the multi-stream multi-symbol Huffman decoder GCC generates suboptimal code - emitting more loads for HUF_DEltX2 struct member accesses. Forcing it to use 32-bit loads and bit arithmetic to extract the necessary parts (UBFX) improves the overall decode speed. Also avoid integer type conversions in the symbol decodes, which leads to better instruction selection in table lookup accesses. On AArch64 the decoder no longer runs into register-pressure limits, so we can simplify the hot path and improve throughput Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with "-O3 -march=armv8.2-a+sve2": Clang-20 Clang-* GCC-13 GCC-14 GCC-15 1#silesia.tar: +0.820% +1.365% +2.480% +1.348% +0.987% 2#silesia.tar: +0.426% +0.784% +1.218% +0.665% +0.554% 3#silesia.tar: +0.112% +0.389% +0.508% +0.188% +0.261% * Requires Clang-21 support from LLVM commit hash `a53003fe23cb6c871e72d70ff2d3a075a7490da2` (Clang-21 hasn’t been released as of this writing)	2025-06-23 14:16:25 +00:00
Yann Collet	7eefc22169	Merge pull request #4367 from ClickHouse/cfi Add unwind information in huf_decompress_amd64.S	2025-06-19 23:41:38 -07:00
Arpad Panyik	7e4937bc75	AArch64: Add SVE2 implementation of histogram computation The existing scalar implementation uses a 4-way pipelined histogram calculation which is very efficient on out-of-order CPUs. However, this can be further accelerated using the SVE2 HISTSEG instructions - which compute a histogram for 16 byte chunks in a vector register. On a system with 128-bit vectors (VL128) we need 16 HISTSEG executions to compute the histogram for the whole symbol space (0..255) of 16 bytes input. However we can only accumulate 15 of such 16 byte strips before possible overflow. So we need to extend and save the 8-bit histogram accumulators to 16-bit after every 240 byte chunks of input. To store all in registers we would need 32 128-bit registers. Longer SVE2 vectors could help here, if such machines become available. The maximum input block size in Zstd is 128 KiB, so 16-bit accumulators would not be enough. However an LZ pass will prepend the histogram calculation, so it is impossible (my assumption) to overflow the 16-bit accumulators. The symbol distribution is also not uniform, the lower values are more common, so we used a 3 pass algorithm to prevent stack spilling. In the first pass we only compute histograms for 64 symbols (4-way SIMD) while also computing the maximum symbol value. If we have symbol values larger than 64 we start the second pass to compute the next 96 elements of the histogram. The final pass calculates the remaining part of the histogram (256 symbols in total) if needed. This split of histogram generation gave the best overall results for performance. This implementation is the best performing of a number of different cache blocking schemes tested. Compression uplifts on a Neoverse V2 system, using Zstd-1.5.8 (e26dde3d) as a baseline, compiled with "-O3 -march=armv8.2-a+sve2": Clang-20 GCC-14 1#silesia.tar: +6.173% +5.987% 2#silesia.tar: +5.200% +5.011% 3#silesia.tar: +4.332% +5.031% 4#silesia.tar: +2.789% +3.064% 5#silesia.tar: +2.028% +1.838% 6#silesia.tar: +1.562% +1.340% 7#silesia.tar: +1.160% +0.959%	2025-06-11 12:14:22 +00:00
Michael Kolupaev	a480191f9e	Fix Darwin build of huf_decompress_amd64.S	2025-06-08 05:07:09 +00:00
Michael Kolupaev	80cac404c7	Add unwind information in huf_decompress_amd64.S	2025-06-08 05:07:09 +00:00
李子建	d95123f2e6	Improve speed of ZSTD_compressSequencesAndLiterals() using RVV	2025-06-02 17:21:02 +08:00
Nobuhiro Iwamatsu	2d224dc745	Add License variable to pkg-config file The pkg-config file has License variable that allows you to set the license for the software. This sets 'BSD-3-Clause OR GPL-2.0-only' to License. Ref: https://github.com/pkgconf/pkgconf/blob/master/man/pc.5#L116 Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>	2025-05-06 12:16:28 -07:00
Etienne Cordonnier	8929d3b09f	Fix duplicate LC_RPATH error on MacOS After the update to MacOS 15.4, the dynamic loader dyld treats duplicated LC_RPATH as an error. The `FLAGS` variable already contains `LDFLAGS`, thus using both `FLAGS` and `LDFLAGS` duplicates all `LDFLAGS`, including `-Wl,rpath` parameters. The duplicate LC_RPATH causes this kind of errors: ``` dyld[29361]: Library not loaded: @loader_path/../lib/libzstd.1.dylib Referenced from: <7131C877-3CF0-33AC-AA05-257BA4FDD770> /Users/foobar/... Reason: tried: '/Users/foobar/..../lib/libzstd.1.dylib' (duplicate LC_RPATH '/usr/mypath.../lib') ``` Closes https://github.com/facebook/zstd/issues/4369 Signed-off-by: Etienne Cordonnier <ecordonnier@snap.com>	2025-04-18 15:59:06 +02:00
Yann Collet	2fec3989c1	add an assert to help static analyzers understand there is no overflow risk there.	2025-03-22 18:23:31 -07:00
Z. Liu	cd8ca9d92e	lib/zstd.h: move pragma before static otherwise will cause dev-python/zstandard build failed when compiling with clang as reported at https://bugs.gentoo.org/950259 the root cause is pycparser, which is unfixed since reported 2.5 years ago, :( Signed-off-by: Z. Liu <zhixu.liu@gmail.com>	2025-03-20 03:40:42 +00:00
Yann Collet	4d53e27144	removed OpenBSD specificity	2025-03-12 09:55:14 -07:00
Yann Collet	ddcb41a282	updated documentation	2025-03-11 14:10:35 -07:00
Yann Collet	a9b8fef2e8	add support for C11 Annex K qsort_s() standard defined re-entrant variant of qsort(). Unfortunately, Annex K is optional.	2025-03-11 14:10:35 -07:00
Yann Collet	dcf675886b	re-design qsort() selection in cover centralizes auto detection tests, then distribute the outcome in all the places where it's active.	2025-03-11 14:10:35 -07:00
Yann Collet	51b6e79f65	fix #4312 and upgraded the test so that it would fail, both at compile time and at run time, without the fix	2025-03-11 14:10:35 -07:00
Nick Terrell	68dfd14a8c	[linux] Opt out of row based match finder for the kernel The row based match finder is slower without SIMD. We used to detect the presence of SIMD to set the lower bound to 17, but that breaks determinism. Instead, specifically opt into it for the kernel, because it is one of the rare cases that doesn't have SIMD support.	2025-03-11 16:18:59 -04:00
Yann Collet	2ff87aefac	fix FreeBSD use an alias instead of a function also: added more traces and updated version nb to v1.5.8	2025-03-10 19:04:41 -07:00
Nick Terrell	0de4991942	Add a method for checking if ZSTD was compiled with flags that impact determinism	2025-03-07 10:31:19 -05:00
Nick Terrell	190a620974	[zstd] Remove global variables in dictBuilder D50949782 fixed a race condition updating `g_displayLevel` by disabling display. Instead of disabling display, delete the global variable and always "capture" a local `displayLevel` variable. This also fixes `DISPLAYUPDATE()` by requiring the user to pass in the last update time as the first parameter.	2025-03-05 10:35:01 -05:00
Nick Terrell	d5b84f5a27	[zstd] Backport D49756856	2025-03-05 10:35:01 -05:00
Yann Collet	4e1723a7e4	fixed the script so that it fails when a copy fails and also: fix the list of files, as `zdict.h` was incorrectly set.	2025-02-27 16:18:44 -08:00
Yann Collet	7340657c6f	update build_package.bat by using a subrouting	2025-02-27 16:18:44 -08:00
Yann Collet	e94e09dd7b	ensure that a copy error results in the task failing clearly error code != 0, red status checked by intentionally inserting an error in another run	2025-02-27 16:18:44 -08:00
Yann Collet	a1a5154b69	Merge pull request #4312 from Cyan4973/musl_ci introduce ZSTD_USE_C90_QSORT	2025-02-27 14:27:21 -08:00
Yann Collet	22b2fd2517	Merge pull request #4317 from hirohira9119/fix-function-signature Fix function signature mismatch for ZSTD_convertBlockSequences	2025-02-27 13:03:03 -08:00
Yann Collet	d6fbaaac99	Merge pull request #4320 from sebres/patch-1 build_package.bat: fix path to zstd_errors.h, avoid silently ignore of the errors if build failed	2025-02-26 15:15:03 -08:00
Yann Collet	dca9791862	fixed minor C++ compat warnings	2025-02-26 14:30:29 -08:00
Sergey G. Brester	f0d3173203	build_package.bat: don't swallow the error(s) by copy, exit with error if failed somewhere	2025-02-26 20:02:48 +01:00
Sergey G. Brester	97bc43cc68	build_package.bat: fix path to zstd_errors.h (it is in lib not in lib/common) closes gh-4318	2025-02-26 19:27:44 +01:00
Yann Collet	db2d205ada	fixed -Wconversion for lib/decompress/zstd_decompress_block.c	2025-02-26 10:01:05 -08:00
Yann Collet	2413f17322	fixed -Wconversion for cover.c	2025-02-26 08:33:01 -08:00
Yann Collet	8ffa27d93b	fixed -Wconversion for divsufsort.c	2025-02-26 08:12:11 -08:00
Yann Collet	e635221f1b	fixed -Wconversion for zdict	2025-02-26 08:07:51 -08:00
Yann Collet	30281d889f	fix conversion warning	2025-02-26 07:41:34 -08:00
hirohira	2840631dc1	Fix function signature mismatch for ZSTD_convertBlockSequences	2025-02-26 08:23:48 +09:00
Yann Collet	fd5498a179	document ZSTD_USE_C90_QSORT	2025-02-21 12:48:26 -08:00
Yann Collet	ebfa660b82	introduce ZSTD_USE_C90_QSORT	2025-02-21 11:36:30 -08:00
Yann Collet	d2c562b803	update hrlog comment	2025-02-10 10:48:56 -08:00
Yann Collet	67fad95f79	derive hashratelog from hashlog when only hashlog is set	2025-02-10 10:46:37 -08:00
Yann Collet	09d7e34ed8	adjust mml	2025-02-10 10:46:37 -08:00
Yann Collet	d5e4698267	fix boundary condition	2025-02-10 10:46:37 -08:00
Yann Collet	72406b71c3	update hrlog rule to favor compression ratio a bit more at low levels	2025-02-10 10:46:37 -08:00
Yann Collet	f26cc54f37	dynamic bucket sizes	2025-02-10 10:46:37 -08:00
Yann Collet	4609a40b89	dynamically adjust hratelog and ldmml based on strategy	2025-02-10 10:46:37 -08:00
Yann Collet	23e5f80390	Revert "pass dictionary loading method as parameter" This reverts commit 821fc567f93a415e9fbe856271ccd452ee7acf07.	2025-02-05 18:47:26 -08:00
Yann Collet	c7cd7dc04b	better MT fluidity --patch-from no longer blocked on first job dictionary loading	2025-02-05 18:42:00 -08:00
Yann Collet	f11bd19c7f	ensure cdict is properly reset to NULL	2025-02-05 18:42:00 -08:00
Yann Collet	7406d2b6eb	skips the need to create a temporary cdict for --patch-from thus saving a bit of memory and a little bit of cpu time	2025-02-05 18:42:00 -08:00
Yann Collet	220abe6da8	reduced memory usage by avoiding to duplicate in memory a dictionary that was passed by reference.	2025-02-05 18:42:00 -08:00
Yann Collet	85a44b233a	always free .cdictLocal	2025-02-05 18:41:59 -08:00

1 2 3 4 5 ...

4840 Commits