118 Commits

Author SHA1 Message Date
Dimitri Papadopoulos
44e83e9180
Fix typos not found by codespell 2024-06-20 20:16:25 +02:00
Yann Collet
a9fb8d4c41 new method to deal with offset==0
in this new method, when an `offset==0` is detected,
it's converted into (size_t)(-1), instead of 1.

The logic is that (size_t)(-1) is effectively an extremely large positive number,
which will not pass the offset distance test at next stage (`execSequence()`).
Checked the source code, and offset is always checked (as it should),
using a formula which is not vulnerable to arithmetic overflow:
```
RETURN_ERROR_IF(sequence.offset > (size_t)(oLitEnd - virtualStart),
```

The benefit is that such a case (offset==0) is always detected as corrupted data
as opposed to relying on the checksum to detect the error.
2024-03-08 15:26:06 -08:00
Yann Collet
8689633fdf
Merge pull request #3840 from aimuz/fix-reserved
lib/decompress: check for reserved bit corruption in zstd
2024-03-05 13:40:12 -08:00
Yann Collet
4b51526412 fix partial block uncompressed 2024-02-24 01:24:58 -08:00
Yann Collet
4683667785 refactor optimal parser
store stretches as intermediate solution instead of sequences.
makes it possible to link a solution to a predecessor.
2024-01-31 02:51:46 -08:00
aimuz
468bb17378
lib/decompress: check for reserved bit corruption in zstd
The patch adds a validation to ensure that the last field, which is
reserved, must be all-zeroes in ZSTD_decodeSeqHeaders. This prevents
potential corruption from going undetected.

Fixes an issue where corrupted input could lead to undefined behavior
due to improper validation of reserved bits.

Signed-off-by: aimuz <mr.imuz@gmail.com>
2023-11-28 21:04:37 +08:00
Yann Collet
c1e588fcb4
Merge pull request #3771 from DimitriPapadopoulos/codespell
Fix new typos found by codespell
2023-10-07 19:29:41 -07:00
Nick Terrell
43118da8a7 Stop suppressing pointer-overflow UBSAN errors
* Remove all pointer-overflow suppressions from our UBSAN builds/tests.
* Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress
  pointer-overflow at a per-function level. This is a superior approach
  because it also applies to users who build zstd with UBSAN.
* Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions.
  The end goal is to only tag these functions with
  `ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions
  that rely on pointer overflow, and gradually transition to using
  these.
* Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the
  pointer may be `NULL`.
* Fix all the fuzzer issues that came up. I'm sure there will be a lot
  more, but these are the ones that came up within a few minutes of
  running the fuzzers, and while running GitHub CI.
2023-09-28 17:35:05 -04:00
Dimitri Papadopoulos
fe34776c20
Fix new typos found by codespell 2023-09-23 18:56:01 +02:00
Nick Terrell
cdceb0fce5 Improve macro guards for ZSTD_assertValidSequence
Refine the macro guards to define the functions exactly when they are
needed.

This fixes the chromium build with zstd.

Thanks to @GregTho for reporting!
2023-09-22 16:36:14 -04:00
Yann Collet
c123e69ad0 fixed static analyzer false positive regarding @sequence initialization
make a mock initialization to please the tool
2023-06-16 16:24:48 -07:00
Yann Collet
c60dcedcc9 adapted long decoder to new decodeSequences
removed older decodeSequences
2023-06-16 15:52:00 -07:00
Yann Collet
33fca19dd4 changed ZSTD_decompressSequences_bodySplitLitBuffer() decoding loop
to behave more like the regular decoding loop.
2023-06-16 15:32:07 -07:00
Yann Collet
84e898a76c removed _old variant from splitLit 2023-06-16 14:42:28 -07:00
Yann Collet
02134fad12 changed (partially) the decodeSequences flow logic
this allows detecting overflow events without a checksum.
2023-06-16 11:57:12 -07:00
Yann Collet
b46236278a detect extraneous bytes in the Sequences section
when nbSeq == 0.

Reported by @ip7z
2023-06-13 11:43:45 -07:00
Yann Collet
3732a08f5b fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes
The sequence section starts with a number, which tells how sequences are present in the section.
If this number if 0, the section automatically ends.

The number 0 can be represented using the 1 byte or the 2 bytes formats.
That's because the 2-bytes formats fully overlaps the 1 byte format.

However, when 0 is represented using the 2-bytes format,
the decoder was expecting the sequence section to continue,
and was looking for FSE tables, which is incorrect.

Fixed this behavior, in both the reference decoder and the educational behavior.

In practice, this behavior never happens,
because the encoder will always select the 1-byte format to represent 0,
since this is more efficient.

Completed the fix with a new golden sample for tests,
a clarification of the specification,
and a decoder errata paragraph.
2023-06-05 16:03:00 -07:00
Nick Terrell
0abf2baef9 Reduce streaming decompression memory by 128KB
The split literals buffer patch increased streaming decompression memory
by 64KB (shrunk lit buffer from 128KB to 64KB, and added 128KB). This
patch removes the added 128KB buffer, because it isn't necessary.

The buffer was there because the literals compression code didn't know
the true `blockSizeMax` of the frame, and always put split literals so
they ended 128KB - 32 from the beginning of the block. Instead, we can
pass down the true `blockSizeMax` and ensure that the split literals
end up at `blockSizeMax - 32` from the beginning of the block. We
already reserve a full `blockSizeMax` bytes in streaming mode, so we
won't be overwriting the extDict window.
2023-04-17 16:31:02 -07:00
daniellerozenblit
fcaf06ddb4
Check that dest is valid for decompression (#3555)
* add check for valid dest buffer and fuzz on random dest ptr when malloc 0

* add uptrval to linux-kernel

* remove bin files

* get rid of uptrval

* restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)
2023-03-31 23:00:55 -07:00
Han Zhu
b558190ac7 Remove clang-only branch hints from ZSTD_decodeSequence
Looking at the __builtin_expect in ZSTD_decodeSequence:

{   size_t offset;
    #if defined(__clang__)
 if (LIKELY(ofBits > 1)) {
    #else
 if (ofBits > 1) {
    #endif
 ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);

From profile-annotated assembly, the probability of ofBits > 1 is about 75%
(101k counts out of 135k counts). This is much smaller than the recommended
likelihood to use __builtin_expect which is 99%. As a result, clang moved the
else block further away which hurts cache locality. Removing this
__built_expect along with two others in ZSTD_decodeSequence gave better
performance when PGO is enabled. I suggest to remove these branch hints and
rely on PGO which leverages runtime profiles from actual workload to calculate
branch probability instead.
2023-03-28 15:36:22 -07:00
Nick Terrell
fbd97f305a Deprecated bufferless and block level APIs
* Mark all bufferless and block level functions as deprecated
* Update documentation to suggest not using these functions
* Add `_deprecated()` wrappers for functions that we use internally and
  call those instead
2023-03-16 10:04:15 -07:00
Dimitri Papadopoulos
547794ef40
Fix typos found by codespell 2023-02-18 10:31:48 +01:00
Nick Terrell
71a0259247 Fix ZSTD_getOffsetInfo() when nbSeq == 0
In 32-bit mode, ZSTD_getOffsetInfo() can be called when nbSeq == 0, and
in this case the offset table is uninitialized. The function should just
return 0 for both values, because there are no sequences.

Credit to OSS-Fuzz
2023-02-02 14:26:41 -08:00
Nick Terrell
cc3e3acd34 Fix 32-bit decoding with large dictionary
The 32-bit decoder could corrupt the regenerated data by using regular
offset mode when there were actually long offsets. This is because we
were only considering the window size in the calculation, not the
dictionary size. So a large dictionary could allow longer offsets.

Fix this in two ways:
1. Instead of looking at the window size, look at the total referencable
   bytes in the history buffer. Use this in the comparison instead of
   the window size. Additionally, we were comparing against the wrong
   value, it was too low. Fix that by computing exactly the maximum
   offset for regular sequence decoding.
2. If it is possible that we have long offsets due to (1), then check
   the offset code decoding table, and if the decoding table's maximum
   number of additional bits is no more than STREAM_ACCUMULATOR_MIN,
   then we can't have long offsets.

This gates us to be using the long offsets decoder only when we are very
likely to actually have long offsets.

Note that this bug only affects the decoding of the data, and the
original compressed data, if re-read with a patched decoder, will
correctly regenerate the orginal data. Except that the encoder also had
the same issue previously.

This fixes both the open OSS-Fuzz issues.

Credit to OSS-Fuzz
2023-02-01 17:22:44 -08:00
Nick Terrell
2f74507bbd Simplify 32-bit long offsets decoding logic
The previous code had an issue when `bitsConsumed == 32` it would read 0
bits for the `ofBits` read, which violates the precondition of
`BIT_readBitsFast()`. This can happen when the stream is corrupted.

Fix thie issue by always reading the maximum possible number of extra
bits. I've measured neutral decoding performance, likely because this
branch is unlikely, but this should be faster anyways. And if not, it is
only 32-bit decoding, so performance isn't as critical.

Credit to OSS-Fuzz
2023-01-30 12:21:42 -08:00
Nick Terrell
b3b43f2893 Fix invalid assert in 32-bit decoding
The assert is only correct for valid sequences, so disable it for
everything execpt round trip fuzzers.
2023-01-27 14:40:38 -08:00
Nick Terrell
8957fef554 [huf] Add generic C versions of the fast decoding loops
Add generic C versions of the fast decoding loops to serve architectures
that don't have an assembly implementation. Also allow selecting the C
decoding loop over the assembly decoding loop through a zstd
decompression parameter `ZSTD_d_disableHuffmanAssembly`.

I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor.
The benchmark command forces zstd to compress without any matches, using
only literals compression, and measures only Huffman decompression speed:

```
zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar
```

The new fast decoding loops outperform the previous implementation uniformly,
but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer
from the same stability problems that we've seen in the past, where the assembly
version doesn't. So even though clang gets close to assembly on x86-64, it still
has stability issues.

| Arch    | Function       | Compiler     | Default (MB/s) | Assembly (MB/s) | Fast (MB/s) |
|---------|----------------|--------------|----------------|-----------------|-------------|
| x86-64  | decompress 4X1 | gcc-12.2.0   |         1029.6 |          1308.1 |      1208.1 |
| x86-64  | decompress 4X1 | clang-14.0.6 |         1019.3 |          1305.6 |      1276.3 |
| x86-64  | decompress 4X2 | gcc-12.2.0   |         1348.5 |          1657.0 |      1374.1 |
| x86-64  | decompress 4X2 | clang-14.0.6 |         1027.6 |          1659.9 |      1468.1 |
| aarch64 | decompress 4X1 | clang-12.0.5 |         1081.0 |             N/A |      1234.9 |
| aarch64 | decompress 4X2 | clang-12.0.5 |         1270.0 |             N/A |      1516.6 |
2023-01-25 13:47:51 -08:00
Nick Terrell
329169189c Replace Huffman boolean args with flags bit set 2023-01-20 14:12:53 -08:00
Nick Terrell
0cc1b0cb22 Delete unused Huffman functions
Remove all Huffman functions that aren't used by zstd.
2023-01-20 14:12:53 -08:00
Yann Collet
089b2797e3
Merge pull request #3398 from facebook/fix3316
spec update : require minimum nb of literals for 4-streams mode
2022-12-22 16:57:05 -08:00
Yann Collet
6a9c525903 spec update : require minimum nb of literals for 4-streams mode
Reported by @shulib :
the specification for 4-streams mode
doesn't work when the amount of literals to compress is 5 bytes.
Extending it, it also doesn't work for sizes 1 or 2.

This patch updates the specification and the implementation
to require a minimum of 6 literals to trigger or accept the 4-streams mode.

The impact is expected to be a no-op :
the 4-streams mode is never triggered for such small quantity of literals anyway,
since it would be wasteful (it costs ~7.3 bytes more than single-stream mode).
An informal lower limit is set at ~256 bytes,
so the technical minimum is very far from this limit.

This is just meant for completeness of the specification.
2022-12-22 16:14:34 -08:00
Yann Collet
ea2895cef4 Support decompression of compressed blocks of size ZSTD_BLOCKSIZE_MAX exactly 2022-12-22 12:40:27 -08:00
W. Felix Handte
5d693cc38c Coalesce Almost All Copyright Notices to Standard Phrasing
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do sed -i '/Copyright .* \(Yann Collet\)\|\(Meta Platforms\)/ s/Copyright .*/Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done

git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0*.c lib/legacy/zstd_v0*.h
nano ./programs/windres/zstd.rc
nano ./build/VS2010/zstd/zstd.rc
nano ./build/VS2010/libzstd-dll/libzstd-dll.rc
```
2022-12-20 12:52:34 -05:00
W. Felix Handte
8927f985ff Update Copyright Headers 'Facebook' -> 'Meta Platforms'
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
  sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
2022-12-20 12:37:57 -05:00
Nick Terrell
a70ca2bd7d
Fix off-by-one error in superblock mode (#3221)
Fixes #3212.

Long literal and match lengths had an off-by-one error in ZSTD_getSequenceLength.
Fix the off-by-one error, and add a golden compression test that catches the bug.
Also run all the golden tests in the cli-tests framework.
2022-08-03 11:28:39 -07:00
Jun He
ec5fdcde19
lib: add hint to generate more pipeline friendly code (#3138)
With statistic data of test data files of silesia
the chance of position beyond highThreshold is very
low (~1.3%@L8 in most cases, all <2.5%), and is in
"lowprob area". Add the branch hint so compiler can
get better pipiline codegen.
With this change it is observed ~1% of mozilla and
xml, and slight (0.3%~0.8%) but consistent uplift on
other files on Arm N1.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade
2022-07-29 10:28:04 -07:00
Jun He
558cf20d0d
decomp: add prefetch for matched seq on aarch64 (#3164)
match is used for following sequence copy. It is
only updated when extDict is needed, which is a
low probability case. So it can be prefetched to
reduce cache miss.
The benchmarks on various Arm platforms showed
uplift from 1% ~ 14% with gcc-11/clang-14.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: If201af4799d2455d74c79f8387404439d7f684ae
2022-07-29 10:27:20 -07:00
udayanbapat
43f21a600e
Intial commit to address 3090. Added support to decompress empty block. (#3118)
* Intial commit to address 3090. Added support to decompress empty block

* Update zstd_decompress_block.c

Addressed review comments for the case of 'set_basic'

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2022-07-14 11:54:34 -07:00
Jun He
2491c65937 dec: adjust seqSymbol load on aarch64
ZSTD_seqSymbol is a structure with total of 64 bits
wide. So it can be loaded in one operation and
extract its fields by simply shifting or extracting
on aarch64.
GCC doesn't recognize this and generates more
unnecessary ldr/ldrb/ldrh operations that cause
performance drop.
With this change it is observed 2~4% uplift of
silesia and 2.5~6% of cantrbry @L8 on Arm N1.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I7748909204cf78a17eb9d4f2333692d53239daa8
2022-05-30 22:01:38 +08:00
Dominique Pelle
b772f53952 Typo and grammar fixes 2022-03-12 08:58:04 +01:00
Elliot Gorokhovsky
db2f4a6532 Move bitwise builtins into bits.h 2022-02-14 11:16:03 -05:00
Yann Collet
7616e39f3b adding traces to better track processing of literals 2022-01-26 14:47:21 -08:00
Norbert Lange
2fbb1d10c1 Reduce bit tables to 8bit
This saves some 1.7Kb in rodata section (x86_64, zstd tool),
while assembler code stays the same except
the type of a few load/extend instructions.

Should not have negative performance implications.
2021-12-14 23:47:57 +01:00
Nick Terrell
5414dd7978 [bmi2] Add lzcnt and bmi target attributes
* When dynamic dispatching to bmi2 add lzcnt and bmi to the
  TARGET_ATTRIBUTE.
* Centralize the bmi2 TARGET_ATTRIBUTE definition to
  BMI2_TARGET_ATTRIBUTE so we can change it in the future.
* Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't
  be any cases where bmi2 is supported but bmi1 isn't. But, since we are
  using the instruction we should check bmi1 as well.
2021-11-30 17:54:56 -08:00
binhdvo
04734ee84a
Fix oss fuzz test error (#2837) 2021-10-29 10:29:50 -04:00
binhdvo
6a7ede3dfc
Reduce size of dctx by reutilizing dst buffer (#2751)
* Reduce size of dctx by reutilizing dst buffer

Co-authored-by: Binh Vo <binhvo@fb.com>
2021-10-25 10:38:01 -04:00
Norbert Lange
0d45540695 decompress: conditionally remove bmi2 from context
Use an helper function, which will just return 0 in case
the feature is disabled.
Allows constant propagation and removal of dead code.
2021-09-26 14:41:37 +02:00
Nick Terrell
189e87bcbe [lib] Make lib compatible with -Wfall-through excepting legacy
Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported
compilers this uses an attribute, otherwise it becomes a comment.

This is necessary to be compatible with clang's `-Wfall-through`, and
gcc's `-Wfall-through=2` which don't support comments. Without this the
linux build emits a bunch of warnings.

Also add a test to CI to ensure that we don't regress.
2021-09-23 10:51:18 -07:00
Danila Kutenin
2c2c9e7dfd Add possible improvements for gcc-11 2021-06-29 09:06:47 +01:00
Danila Kutenin
08a3ddbd28 Add comment for gcc-11 2021-06-08 20:54:21 +01:00