sharpetronics/zstd - zstd - Gitea: Git with a cup of tea

mirror of https://github.com/facebook/zstd.git synced 2025-10-15 00:02:02 -04:00

Author	SHA1	Message	Date
Nick Terrell	8193250615	Modernize macros to use `do { } while (0)` This PR introduces no functional changes. It attempts to change all macros currently using `{ }` or some variant of that to to `do { } while (0)`, and introduces trailing `;` where necessary. There were no bugs found during this migration. The bug in Visual Studios warning on this has been fixed since VS2015. Additionally, we have several instances of `do { } while (0)` which have been present for several releases, so we don't have to worry about breaking peoples builds. Fixes Issue #3830.	2023-11-21 20:05:17 -05:00
Nick Terrell	dd4de1dd7a	[huf] Fix null pointer addition `HUF_DecompressFastArgs_init()` was adding 0 to NULL. Fix it by exiting early for empty outputs. This is no change in behavior, because the function was already exiting 0 in this case, just slightly later.	2023-11-20 17:13:01 -05:00
Nick Terrell	5ab78c0418	[huf] Improve fast C & ASM performance on small data * Rename `ilimit` to `ilowest` and set it equal to `src` instead of `src + 6 + 8`. This is safe because the fast decoding loops guarantee to never read below `ilowest` already. This allows the fast decoder to run for at least two more iterations, because it consumes at most 7 bytes per iteration. * Continue the fast loop all the way until the number of safe iterations is 0. Initially, I thought that when it got towards the end, the computation of how many iterations of safe might become expensive. But it ends up being slower to have to decode each of the 4 streams individually, which makes sense. This drastically speeds up the Huffman decoder on the `github` dataset for the issue raised in #3762, measured with `zstd -b1e1r github/`. \| Decoder \| Speed before \| Speed after \| \|----------\|--------------\|-------------\| \| Fallback \| 477 MB/s \| 477 MB/s \| \| Fast C \| 384 MB/s \| 492 MB/s \| \| Assembly \| 385 MB/s \| 501 MB/s \| We can also look at the speed delta for different block sizes of silesia using `zstd -b1e1r silesia.tar -B#`. \| Decoder \| -B1K ∆ \| -B2K ∆ \| -B4K ∆ \| -B8K ∆ \| -B16K ∆ \| -B32K ∆ \| -B64K ∆ \| -B128K ∆ \| \|----------\|--------\|--------\|--------\|--------\|---------\|---------\|---------\|----------\| \| Fast C \| +11.2% \| +8.2% \| +6.1% \| +4.4% \| +2.7% \| +1.5% \| +0.6% \| +0.2% \| \| Assembly \| +12.5% \| +9.0% \| +6.2% \| +3.6% \| +1.5% \| +0.7% \| +0.2% \| +0.03% \|	2023-11-20 17:13:01 -05:00
Nick Terrell	c7269add7e	[huf] Improve fast huffman decoding speed in linux kernel gcc in the linux kernel was not unrolling the inner loops of the Huffman decoder, which was destroying decoding performance. The compiler was generating crazy code with all sorts of branches. I suspect because of Spectre mitigations, but I'm not certain. Once the loops were manually unrolled, performance was restored. Additionally, when gcc couldn't prove that the variable left shift in the 4X2 decode loop wasn't greater than 63, it inserted checks to verify it. To fix this, mask `entry.nbBits & 0x3F`, which allows gcc to eliete this check. This is a no op, because `entry.nbBits` is guaranteed to be less than 64. Lastly, introduce the `HUF_DISABLE_FAST_DECODE` macro to disable the fast C loops for Issue #3762. So if even after this change, there is a performance regression, users can opt-out at compile time.	2023-11-20 14:56:46 -05:00
Nick Terrell	43118da8a7	Stop suppressing pointer-overflow UBSAN errors * Remove all pointer-overflow suppressions from our UBSAN builds/tests. * Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress pointer-overflow at a per-function level. This is a superior approach because it also applies to users who build zstd with UBSAN. * Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions. The end goal is to only tag these functions with `ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions that rely on pointer overflow, and gradually transition to using these. * Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the pointer may be `NULL`. * Fix all the fuzzer issues that came up. I'm sure there will be a lot more, but these are the ones that came up within a few minutes of running the fuzzers, and while running GitHub CI.	2023-09-28 17:35:05 -04:00
Elliot Gorokhovsky	a7de1d9f49	Fix all MSVC warnings (#3495 ) * fix and test MSVC AVX2 build * treat msbuild warnings as errors * fix incorrect MSVC 2019 compiler warning * fix MSVC error D9035: option 'Gm' has been deprecated and will be removed in a future release	2023-02-11 10:56:59 -05:00
Nick Terrell	bda947e17a	[huf] Fix bug in fast C decoders The input bounds checks were buggy because they were only breaking from the inner loop, not the outer loop. The fuzzers found this immediately. The fix is to use `goto _out` instead of `break`. This condition can happen on corrupted inputs. I've benchmarked before and after on x86-64 and there were small changes in performance, some positive, and some negative, and they end up about balacing out. Credit to OSS-Fuzz	2023-01-26 14:39:13 -08:00
Nick Terrell	8957fef554	[huf] Add generic C versions of the fast decoding loops Add generic C versions of the fast decoding loops to serve architectures that don't have an assembly implementation. Also allow selecting the C decoding loop over the assembly decoding loop through a zstd decompression parameter `ZSTD_d_disableHuffmanAssembly`. I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor. The benchmark command forces zstd to compress without any matches, using only literals compression, and measures only Huffman decompression speed: ``` zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar ``` The new fast decoding loops outperform the previous implementation uniformly, but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer from the same stability problems that we've seen in the past, where the assembly version doesn't. So even though clang gets close to assembly on x86-64, it still has stability issues. \| Arch \| Function \| Compiler \| Default (MB/s) \| Assembly (MB/s) \| Fast (MB/s) \| \|---------\|----------------\|--------------\|----------------\|-----------------\|-------------\| \| x86-64 \| decompress 4X1 \| gcc-12.2.0 \| 1029.6 \| 1308.1 \| 1208.1 \| \| x86-64 \| decompress 4X1 \| clang-14.0.6 \| 1019.3 \| 1305.6 \| 1276.3 \| \| x86-64 \| decompress 4X2 \| gcc-12.2.0 \| 1348.5 \| 1657.0 \| 1374.1 \| \| x86-64 \| decompress 4X2 \| clang-14.0.6 \| 1027.6 \| 1659.9 \| 1468.1 \| \| aarch64 \| decompress 4X1 \| clang-12.0.5 \| 1081.0 \| N/A \| 1234.9 \| \| aarch64 \| decompress 4X2 \| clang-12.0.5 \| 1270.0 \| N/A \| 1516.6 \|	2023-01-25 13:47:51 -08:00
Nick Terrell	dc2b3e8876	Fix -Wstringop-overflow warning Backported from kernel patch [0]. I wasn't able to reproduce the warning locally, but could repro it in the kernel. [0] https://lore.kernel.org/lkml/20220330193352.GA119296@embeddedor/	2023-01-23 10:12:25 -08:00
Nick Terrell	329169189c	Replace Huffman boolean args with flags bit set	2023-01-20 14:12:53 -08:00
Nick Terrell	0cc1b0cb22	Delete unused Huffman functions Remove all Huffman functions that aren't used by zstd.	2023-01-20 14:12:53 -08:00
Yann Collet	6a9c525903	spec update : require minimum nb of literals for 4-streams mode Reported by @shulib : the specification for 4-streams mode doesn't work when the amount of literals to compress is 5 bytes. Extending it, it also doesn't work for sizes 1 or 2. This patch updates the specification and the implementation to require a minimum of 6 literals to trigger or accept the 4-streams mode. The impact is expected to be a no-op : the 4-streams mode is never triggered for such small quantity of literals anyway, since it would be wasteful (it costs ~7.3 bytes more than single-stream mode). An informal lower limit is set at ~256 bytes, so the technical minimum is very far from this limit. This is just meant for completeness of the specification.	2022-12-22 16:14:34 -08:00
W. Felix Handte	5d693cc38c	Coalesce Almost All Copyright Notices to Standard Phrasing ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache $ -prune -o -type f); do sed -i '/Copyright .* $Yann Collet$\\|$Meta Platforms$/ s/Copyright ./Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0.c lib/legacy/zstd_v0*.h nano ./programs/windres/zstd.rc nano ./build/VS2010/zstd/zstd.rc nano ./build/VS2010/libzstd-dll/libzstd-dll.rc ```	2022-12-20 12:52:34 -05:00
W. Felix Handte	8927f985ff	Update Copyright Headers 'Facebook' -> 'Meta Platforms' ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora $ -prune -o -type f); do sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f; done ```	2022-12-20 12:37:57 -05:00
Elliot Gorokhovsky	529cd7b821	Fix nits	2022-02-14 14:24:50 -05:00
Elliot Gorokhovsky	db2f4a6532	Move bitwise builtins into bits.h	2022-02-14 11:16:03 -05:00
H.J. Lu	568c69a4eb	x86-64: Hide internal assembly functions Hide x86-64 internal assembly functions. Before $ nm -D lib/libzstd.so.1 \| grep usingDTable_internal_bmi2_asm_loop 00000000000c23c0 T _HUF_decompress4X1_usingDTable_internal_bmi2_asm_loop 00000000000c23c0 T HUF_decompress4X1_usingDTable_internal_bmi2_asm_loop 00000000000c283d T _HUF_decompress4X2_usingDTable_internal_bmi2_asm_loop 00000000000c283d T HUF_decompress4X2_usingDTable_internal_bmi2_asm_loop $ After $ nm -D lib/libzstd.so.1 \| grep usingDTable_internal_bmi2_asm_loop $ This fixes issue #2990.	2022-01-11 10:12:24 -08:00
Yann Collet	e1ab2200ff	fixed x32 compatibility	2021-12-10 21:02:17 -08:00
Nick Terrell	c284569457	[asm] Share portability macros and restrict ASM further Move portability macros to `lib/common/portability_macros.h`. This file only contains platform/feature detection (e.g. 0/1 macros). This file is shared between C and ASM code, so it cannot include any C code. Rename `HUF_` ASM macros to be `ZSTD_` prefixed, and move to the new header. Restrict `ZSTD_ASM_SUPPORTED` to `__GNUC__`, because we need the GAS assembler. Finally, only include the ASM code if we are actually going to use it. This disables it on all Windows platforms, which should resolve the problem brought up in Issue #2789.	2021-12-02 16:58:04 -08:00
Nick Terrell	5414dd7978	[bmi2] Add lzcnt and bmi target attributes * When dynamic dispatching to bmi2 add lzcnt and bmi to the TARGET_ATTRIBUTE. * Centralize the bmi2 TARGET_ATTRIBUTE definition to BMI2_TARGET_ATTRIBUTE so we can change it in the future. * Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't be any cases where bmi2 is supported but bmi1 isn't. But, since we are using the instruction we should check bmi1 as well.	2021-11-30 17:54:56 -08:00
Dimitris Apostolou	ebbd675998	Fix typos	2021-11-13 10:04:04 +02:00
Nick Terrell	d46995efeb	Backport zstd patch from LKML Credit to Nathan Chancellor for the bug fix and Nick Desaulniers for the bug report. Link: https://github.com/ClangBuiltLinux/linux/issues/1486 Link: https://lore.kernel.org/all/20211021202353.2356400-1-nathan@kernel.org/	2021-11-05 14:09:49 -07:00
Nick Terrell	3a4d421c0f	Merge pull request #2802 from solbjorn/fix-kernel-wundef [contrib][linux] Fix -Wundef inside Linux kernel tree	2021-09-29 09:48:17 -07:00
Nick Terrell	a07ddb47f7	[huf] Fix OSS-Fuzz assert PR #2784 introduced a bug in the decompressor that caused some valid inputs to fail to decompress. The bitstream isn't reloaded after the 4X* loop if the number of elements remaining is small enough, causing us to read more bits than are available in the bitcontainer. This was caught by the MSAN fuzzer in OSS-Fuzz because the assembly implementation isn't used in the MSAN build. Credit to OSS-Fuzz.	2021-09-27 13:56:07 -07:00
Alexander Lobakin	71526e6f29	[contrib][linux] Fix -Wundef inside Linux kernel tree Commit d7ef97a013b5 ("[build] Fix oss-fuzz build with the dataflow sanitizer") broke build inside Linux-kernel after 'import', as it no longer can conditionally remove ZSTD_MEMORY_SANITIZER definition from the #if DEF_A \|\| DEF_B block. This emits -Wundef warning which can be treated as error. Split this preprocessor condition into two separate conditions to fix this. Fixes: d7ef97a013b5 ("[build] Fix oss-fuzz build with the dataflow sanitizer") Signed-off-by: Alexander Lobakin <alobakin@pm.me>	2021-09-25 13:35:25 +02:00
Nick Terrell	d7ef97a013	[build] Fix oss-fuzz build with the dataflow sanitizer The dataflow sanitizer requires all code to be instrumented. We can't instrument the ASM function, so we have to disable it.	2021-09-23 11:48:39 -07:00
Nick Terrell	9450876a9d	[huf] Fix compilation when DYNAMIC_BMI2=0 && BMI2 is supported * Fix compilation issues pointed out in PR #2790. * Add test cases to GitHub actions that test all combinations of `DYNAMIC_BMI2` BMI2 support.	2021-09-21 16:49:13 -07:00
Nick Terrell	a5f2c45528	Huffman ASM	2021-09-20 14:46:43 -07:00
Nick Terrell	d7542aacd9	[fuzzer] Add huf_decompress fuzzer Add a fuzzer for Huffman decompression. Fix several bugs in Huffman decompression, mostly related to `op == NULL` and pointer underflow.	2021-09-17 15:00:49 -07:00
Nick Terrell	a494308ae9	[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files * Switch to yearless copyright per FB policy * Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources * Add zstd copyright/license header to the `contrib/linux-kernel` sources * Update the `tests/test-license.py` to check for yearless copyright * Improvements to `tests/test-license.py` * Check `contrib/linux-kernel` in `tests/test-license.py`	2021-03-30 10:30:43 -07:00
Nick Terrell	756bd59322	[huf][fse] Clean up workspaces * Move `counting` to a struct in `FSE_decompress_wksp_body()` * Fix error code in `FSE_decompress_wksp_body()` * Rename a variable in `HUF_ReadDTableX2_Workspace`	2021-03-17 16:50:37 -07:00
Nick Terrell	0f18059a4e	[huf] Reduce stack usage of HUF_readDTableX2 by ~460 bytes * Use `HUF_readStats_wksp()` * Use workspace in `HUF_fillDTableX2()` Clean up workspace usage to use a workspace struct	2021-03-05 12:39:46 -08:00
Nick Terrell	66e811d782	[license] Update year to 2021	2021-01-04 17:53:52 -05:00
Nick Terrell	79ded1b4a9	[lib] Add ZSTD_NO_UNUSED_FUNCTIONS macro to hide unused functions The unused function definitions are hidden behind a `#ifndef ZSTD_NO_UNUSED_FUNCTIONS` check. Initially hiding all functions which are unused and take up more than 2KB of stack space, because these will show up as warnings in the Linux Kernel build system.	2020-09-09 14:35:39 -07:00
Nick Terrell	f91ed5c766	[lib] s/current/curr because it collides with Linux Kernel macro	2020-09-09 14:35:39 -07:00
Nick Terrell	c465f24457	ZSTD_ prefix mem{cpy,move,set},malloc,calloc,free	2020-08-26 12:26:03 -07:00
Nick Terrell	80f577baa2	Move standard includes to zstd_deps.h	2020-08-26 12:25:08 -07:00
Nick Terrell	8f8bd2d1ac	[regression] Update results.csv	2020-08-20 12:41:35 -07:00
Nick Terrell	612e947c5e	wire up bmi2 support	2020-08-17 16:35:28 -07:00
Nick Terrell	ba1fd17a9f	speed up literal header decoding	2020-08-17 12:17:53 -07:00
W. Felix Handte	6028827fee	Rewrite Include Paths to be Relative Addresses #1998.	2020-05-04 15:20:26 -04:00
Carl Woffenden	a93fadfcd9	Further replication removed `CHECK_F` is now in `error_private.h`. Minor tidy.	2020-04-07 11:25:16 +02:00
Carl Woffenden	7202184ee0	Fixes decompressor when using -Wshorten-64-to-32 (#2062 ) Spotted on iOS when building with `-Wshorten-64-to-32` (since `__builtin_expect` returns a `long`).	2020-04-03 02:55:29 -07:00
Nick Terrell	ac58c8d720	Fix copyright and license lines * All copyright lines now have -2020 instead of -present * All copyright lines include "Facebook, Inc" * All licenses are now standardized The copyright in `threading.{h,c}` is not changed because it comes from zstdmt. The copyright and license of `divsufsort.{h,c}` is not changed.	2020-03-26 17:02:06 -07:00
Nick Terrell	cb2abc3dbe	Fix performance regression on aarch64 with clang	2020-01-23 17:31:14 -08:00
Nick Terrell	718f00ff6f	Optimize decompression speed for gcc and clang (#1892 ) * Optimize `ZSTD_decodeSequence()` * Optimize Huffman decoding * Optimize `ZSTD_decompressSequences()` * Delete `ZSTD_decodeSequenceLong()`	2019-11-25 18:26:19 -08:00
Nick Terrell	e0d6daabac	Fix Appveyor failure	2019-11-19 11:12:26 -08:00
Clement Courbet	b3c9fc27b4	Optimized loop bounds to allow the compiler to unroll the loop. This has no measurable impact on large files but improves small file decompression by ~1-2% for 10kB, benchmarked with: head -c 10000 silesia.tar > /tmp/test make CC=/usr/local/bin/clang-9 BUILD_STATIC=1 && ./lzbench -ezstd -t1,5 /tmp/test	2019-11-15 08:27:05 +01:00
Carl Woffenden	901ea61f83	Tweaks to create a single-file decoder The CHECK_F macros differ slightly (but eventually do the same thing). Older GCC needs to fallback on the old-style pragma optimisation flags.	2019-08-21 17:49:17 +02:00
W. Felix Handte	0d606ee3db	Fix Incorrect assert()	2018-12-18 13:36:39 -08:00

1 2 3

105 Commits