sharpetronics/zstd - zstd - Gitea: Git with a cup of tea

mirror of https://github.com/facebook/zstd.git synced 2025-10-04 00:02:33 -04:00

Author	SHA1	Message	Date
Yann Collet	34f3a0ab11	Merge pull request #4413 from arpadpanyik-arm/huf_decode2x AArch64: Enhance struct access in Huffman decode 2X	2025-07-23 15:03:37 -08:00
Arpad Panyik	a28e8182b1	AArch64: Improve ZSTD_decodeSequence performance LLVM's alias-analysis sometimes fails to see that a static-array member of a struct cannot alias other members. This patch: - Reduces array accesses via struct indirection to aid load/store alias analysis under Clang. - Converts dynamic array indexing into conditional-move arithmetic, eliminating branches and extra loads/stores on out-of-order CPUs. - Reloads the bitstream only when match-length bits are consumed (assuming each reload only needs to happen once per match-length read), improving branch-prediction rates. - Removes the UNLIKELY() hint, which recent compilers already handle well without cost. Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with "-O3 -march=armv8.2-a+sve2": Clang-19 Clang-20 Clang-* GCC-14 GCC-15 1#silesia.tar: +11.556% +16.203% +0.240% +2.216% +7.891% 2#silesia.tar: +15.493% +21.140% -0.041% +2.850% +9.926% 3#silesia.tar: +16.887% +22.570% -0.183% +3.056% +10.660% 4#silesia.tar: +17.785% +23.315% -0.262% +3.343% +11.187% 5#silesia.tar: +18.125% +24.175% -0.466% +3.350% +11.228% 6#silesia.tar: +17.607% +23.339% -0.591% +3.175% +10.851% 7#silesia.tar: +17.463% +22.837% -0.486% +3.292% +10.868% * Requires Clang-21 support from LLVM commit hash `a53003fe23cb6c871e72d70ff2d3a075a7490da2` (Clang-21 hasn’t been released as of this writing) Co-authored by: David Sherwood, David.Sherwood@arm.com Ola Liljedahl, Ola.Liljedahl@arm.com	2025-06-24 12:22:23 +00:00
Arpad Panyik	bd38fc2c5f	AArch64: Enhance struct access in Huffman decode 2X In the multi-stream multi-symbol Huffman decoder GCC generates suboptimal code - emitting more loads for HUF_DEltX2 struct member accesses. Forcing it to use 32-bit loads and bit arithmetic to extract the necessary parts (UBFX) improves the overall decode speed. Also avoid integer type conversions in the symbol decodes, which leads to better instruction selection in table lookup accesses. On AArch64 the decoder no longer runs into register-pressure limits, so we can simplify the hot path and improve throughput Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with "-O3 -march=armv8.2-a+sve2": Clang-20 Clang-* GCC-13 GCC-14 GCC-15 1#silesia.tar: +0.820% +1.365% +2.480% +1.348% +0.987% 2#silesia.tar: +0.426% +0.784% +1.218% +0.665% +0.554% 3#silesia.tar: +0.112% +0.389% +0.508% +0.188% +0.261% * Requires Clang-21 support from LLVM commit hash `a53003fe23cb6c871e72d70ff2d3a075a7490da2` (Clang-21 hasn’t been released as of this writing)	2025-06-23 14:16:25 +00:00
Michael Kolupaev	a480191f9e	Fix Darwin build of huf_decompress_amd64.S	2025-06-08 05:07:09 +00:00
Michael Kolupaev	80cac404c7	Add unwind information in huf_decompress_amd64.S	2025-06-08 05:07:09 +00:00
Nick Terrell	d5b84f5a27	[zstd] Backport D49756856	2025-03-05 10:35:01 -05:00
Yann Collet	dca9791862	fixed minor C++ compat warnings	2025-02-26 14:30:29 -08:00
Yann Collet	db2d205ada	fixed -Wconversion for lib/decompress/zstd_decompress_block.c	2025-02-26 10:01:05 -08:00
Pavel P	59afb28c97	Remove unused ZSTD_decompressSequences_t typedef	2025-01-24 02:13:20 +02:00
Pavel P	1204626138	Check `DYNAMIC_BMI2` instead of `DYNAMIC_BMI2 != 0` `#if DYNAMIC_BMI2` is consistent with the rest of the code. + use spaces instead of tabs	2025-01-23 23:59:38 +02:00
Yann Collet	167b00495d	Merge pull request #4246 from pps83/dev-asmx64-win [asm] Enable x86_64 asm for windows builds	2025-01-18 20:03:16 -08:00
Yann Collet	e8de8085f4	minor: assert that state is not null replaces #4016	2025-01-18 13:08:04 -08:00
Pavel P	46e17b805b	[asm] Enable x86_64 asm for windows builds	2025-01-18 05:33:08 +02:00
Yann Collet	04a2a0219c	update type names naming convention: Type names should start with a Capital letter (after the prefix)	2024-12-29 14:25:33 -08:00
Yann Collet	a2ff6ea784	improve ZSTD_getFrameHeader on skippable frames now reports: - the header size - the magic variant (within @dictID field)	2024-12-29 12:26:04 -08:00
Yann Collet	477a01067f	codemod: symbolEncodingType_e -> SymbolEncodingType_e	2024-12-20 10:36:56 -08:00
Yann Collet	31d48e9ffa	fixing minor formatting issue in 32-bit mode with logs enabled	2024-10-23 11:50:56 -07:00
Dimitri Papadopoulos	44e83e9180	Fix typos not found by codespell	2024-06-20 20:16:25 +02:00
Elliot Gorokhovsky	741b87bbe1	Fuzzing and bugfixes for magicless-format decoding (#3976 ) * fuzzing and bugfixes for magicless format * reset dctx before each decompression * do not memcmp empty buffers * nit: decompressor errata	2024-03-20 19:22:34 -04:00
Elliot Gorokhovsky	7d970bd83c	Implement one-shot fallback for magicless format (#3971 )	2024-03-18 10:55:53 -04:00
Elliot Gorokhovsky	559762da12	Remove duplicate and incorrect docs in zstd_decompress.c (#3967 )	2024-03-14 15:55:01 -04:00
Nick Terrell	ff0afbad58	[asm][aarch64] Mark that BTI and PAC are supported Mark that `huf_decompress_amd64.S` supports BTI and PAC, which it trivially does because it is empty for aarch64. The issue only requested BTI markings, but it also makes sense to mark PAC, which is the only other feature. Also run add a test for this mode to the ARM64 QEMU test. Before this PR it warns on `huf_decompress_amd64.S`, after it doesn't. Fixes Issue #3841.	2024-03-13 16:15:51 -04:00
Elliot Gorokhovsky	f65b9e27ce	Exercise ZSTD_findDecompressedSize() in the simple decompression fuzzer (#3959 ) * Improve decompression fuzzer * Fix legacy frame header fuzzer crash, add unit test	2024-03-12 17:07:06 -04:00
Yann Collet	a9fb8d4c41	new method to deal with offset==0 in this new method, when an `offset==0` is detected, it's converted into (size_t)(-1), instead of 1. The logic is that (size_t)(-1) is effectively an extremely large positive number, which will not pass the offset distance test at next stage (`execSequence()`). Checked the source code, and offset is always checked (as it should), using a formula which is not vulnerable to arithmetic overflow: ``` RETURN_ERROR_IF(sequence.offset > (size_t)(oLitEnd - virtualStart), ``` The benefit is that such a case (offset==0) is always detected as corrupted data as opposed to relying on the checksum to detect the error.	2024-03-08 15:26:06 -08:00
Yann Collet	8689633fdf	Merge pull request #3840 from aimuz/fix-reserved lib/decompress: check for reserved bit corruption in zstd	2024-03-05 13:40:12 -08:00
Yann Collet	f77f634d41	update API documentation	2024-02-24 01:28:17 -08:00
Yann Collet	4b51526412	fix partial block uncompressed	2024-02-24 01:24:58 -08:00
Yann Collet	4683667785	refactor optimal parser store stretches as intermediate solution instead of sequences. makes it possible to link a solution to a predecessor.	2024-01-31 02:51:46 -08:00
aimuz	468bb17378	lib/decompress: check for reserved bit corruption in zstd The patch adds a validation to ensure that the last field, which is reserved, must be all-zeroes in ZSTD_decodeSeqHeaders. This prevents potential corruption from going undetected. Fixes an issue where corrupted input could lead to undefined behavior due to improper validation of reserved bits. Signed-off-by: aimuz <mr.imuz@gmail.com>	2023-11-28 21:04:37 +08:00
Nick Terrell	8193250615	Modernize macros to use `do { } while (0)` This PR introduces no functional changes. It attempts to change all macros currently using `{ }` or some variant of that to to `do { } while (0)`, and introduces trailing `;` where necessary. There were no bugs found during this migration. The bug in Visual Studios warning on this has been fixed since VS2015. Additionally, we have several instances of `do { } while (0)` which have been present for several releases, so we don't have to worry about breaking peoples builds. Fixes Issue #3830.	2023-11-21 20:05:17 -05:00
Nick Terrell	dd4de1dd7a	[huf] Fix null pointer addition `HUF_DecompressFastArgs_init()` was adding 0 to NULL. Fix it by exiting early for empty outputs. This is no change in behavior, because the function was already exiting 0 in this case, just slightly later.	2023-11-20 17:13:01 -05:00
Nick Terrell	5ab78c0418	[huf] Improve fast C & ASM performance on small data * Rename `ilimit` to `ilowest` and set it equal to `src` instead of `src + 6 + 8`. This is safe because the fast decoding loops guarantee to never read below `ilowest` already. This allows the fast decoder to run for at least two more iterations, because it consumes at most 7 bytes per iteration. * Continue the fast loop all the way until the number of safe iterations is 0. Initially, I thought that when it got towards the end, the computation of how many iterations of safe might become expensive. But it ends up being slower to have to decode each of the 4 streams individually, which makes sense. This drastically speeds up the Huffman decoder on the `github` dataset for the issue raised in #3762, measured with `zstd -b1e1r github/`. \| Decoder \| Speed before \| Speed after \| \|----------\|--------------\|-------------\| \| Fallback \| 477 MB/s \| 477 MB/s \| \| Fast C \| 384 MB/s \| 492 MB/s \| \| Assembly \| 385 MB/s \| 501 MB/s \| We can also look at the speed delta for different block sizes of silesia using `zstd -b1e1r silesia.tar -B#`. \| Decoder \| -B1K ∆ \| -B2K ∆ \| -B4K ∆ \| -B8K ∆ \| -B16K ∆ \| -B32K ∆ \| -B64K ∆ \| -B128K ∆ \| \|----------\|--------\|--------\|--------\|--------\|---------\|---------\|---------\|----------\| \| Fast C \| +11.2% \| +8.2% \| +6.1% \| +4.4% \| +2.7% \| +1.5% \| +0.6% \| +0.2% \| \| Assembly \| +12.5% \| +9.0% \| +6.2% \| +3.6% \| +1.5% \| +0.7% \| +0.2% \| +0.03% \|	2023-11-20 17:13:01 -05:00
Nick Terrell	c7269add7e	[huf] Improve fast huffman decoding speed in linux kernel gcc in the linux kernel was not unrolling the inner loops of the Huffman decoder, which was destroying decoding performance. The compiler was generating crazy code with all sorts of branches. I suspect because of Spectre mitigations, but I'm not certain. Once the loops were manually unrolled, performance was restored. Additionally, when gcc couldn't prove that the variable left shift in the 4X2 decode loop wasn't greater than 63, it inserted checks to verify it. To fix this, mask `entry.nbBits & 0x3F`, which allows gcc to eliete this check. This is a no op, because `entry.nbBits` is guaranteed to be less than 64. Lastly, introduce the `HUF_DISABLE_FAST_DECODE` macro to disable the fast C loops for Issue #3762. So if even after this change, there is a performance regression, users can opt-out at compile time.	2023-11-20 14:56:46 -05:00
Yann Collet	c1e588fcb4	Merge pull request #3771 from DimitriPapadopoulos/codespell Fix new typos found by codespell	2023-10-07 19:29:41 -07:00
Nick Terrell	43118da8a7	Stop suppressing pointer-overflow UBSAN errors * Remove all pointer-overflow suppressions from our UBSAN builds/tests. * Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress pointer-overflow at a per-function level. This is a superior approach because it also applies to users who build zstd with UBSAN. * Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions. The end goal is to only tag these functions with `ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions that rely on pointer overflow, and gradually transition to using these. * Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the pointer may be `NULL`. * Fix all the fuzzer issues that came up. I'm sure there will be a lot more, but these are the ones that came up within a few minutes of running the fuzzers, and while running GitHub CI.	2023-09-28 17:35:05 -04:00
Nick Terrell	3daed7017a	Revert "Work around nullptr-with-nonzero-offset warning" This reverts commit c27fa399042f466080e79bb4fd8a4871bc0bcf28.	2023-09-28 17:35:05 -04:00
Dimitri Papadopoulos	fe34776c20	Fix new typos found by codespell	2023-09-23 18:56:01 +02:00
Nick Terrell	cdceb0fce5	Improve macro guards for ZSTD_assertValidSequence Refine the macro guards to define the functions exactly when they are needed. This fixes the chromium build with zstd. Thanks to @GregTho for reporting!	2023-09-22 16:36:14 -04:00
Nick Terrell	c27fa39904	Work around nullptr-with-nonzero-offset warning See comment.	2023-08-25 13:20:59 -04:00
Yann Collet	c123e69ad0	fixed static analyzer false positive regarding @sequence initialization make a mock initialization to please the tool	2023-06-16 16:24:48 -07:00
Yann Collet	c60dcedcc9	adapted long decoder to new decodeSequences removed older decodeSequences	2023-06-16 15:52:00 -07:00
Yann Collet	33fca19dd4	changed ZSTD_decompressSequences_bodySplitLitBuffer() decoding loop to behave more like the regular decoding loop.	2023-06-16 15:32:07 -07:00
Yann Collet	84e898a76c	removed _old variant from splitLit	2023-06-16 14:42:28 -07:00
Yann Collet	02134fad12	changed (partially) the decodeSequences flow logic this allows detecting overflow events without a checksum.	2023-06-16 11:57:12 -07:00
Yann Collet	b46236278a	detect extraneous bytes in the Sequences section when nbSeq == 0. Reported by @ip7z	2023-06-13 11:43:45 -07:00
Yann Collet	3732a08f5b	fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes The sequence section starts with a number, which tells how sequences are present in the section. If this number if 0, the section automatically ends. The number 0 can be represented using the 1 byte or the 2 bytes formats. That's because the 2-bytes formats fully overlaps the 1 byte format. However, when 0 is represented using the 2-bytes format, the decoder was expecting the sequence section to continue, and was looking for FSE tables, which is incorrect. Fixed this behavior, in both the reference decoder and the educational behavior. In practice, this behavior never happens, because the encoder will always select the 1-byte format to represent 0, since this is more efficient. Completed the fix with a new golden sample for tests, a clarification of the specification, and a decoder errata paragraph.	2023-06-05 16:03:00 -07:00
Nick Terrell	61efb2a047	Add ZSTD_d_maxBlockSize parameter Reduces memory when blocks are guaranteed to be smaller than allowed by the format. This is useful for streaming compression in conjunction with ZSTD_c_maxBlockSize. This PR saves 2 * (formatMaxBlockSize - paramMaxBlockSize) when streaming. Once it is rebased on top of PR #3616 it will save 3 * (formatMaxBlockSize - paramMaxBlockSize).	2023-04-17 22:06:44 -07:00
Nick Terrell	0abf2baef9	Reduce streaming decompression memory by 128KB The split literals buffer patch increased streaming decompression memory by 64KB (shrunk lit buffer from 128KB to 64KB, and added 128KB). This patch removes the added 128KB buffer, because it isn't necessary. The buffer was there because the literals compression code didn't know the true `blockSizeMax` of the frame, and always put split literals so they ended 128KB - 32 from the beginning of the block. Instead, we can pass down the true `blockSizeMax` and ensure that the split literals end up at `blockSizeMax - 32` from the beginning of the block. We already reserve a full `blockSizeMax` bytes in streaming mode, so we won't be overwriting the extDict window.	2023-04-17 16:31:02 -07:00
Yann Collet	e4120c5513	fixing potential over-reads detected by @terrelln, these issue could be triggered in specific scenarios namely decompression of certain invalid magic-less frames, or requested properties from certain invalid skippable frames.	2023-04-03 16:52:32 -07:00
daniellerozenblit	fcaf06ddb4	Check that `dest` is valid for decompression (#3555 ) * add check for valid dest buffer and fuzz on random dest ptr when malloc 0 * add uptrval to linux-kernel * remove bin files * get rid of uptrval * restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)	2023-03-31 23:00:55 -07:00

1 2 3 4 5 ...

729 Commits