* Cap shortCache chainLog to 24
* Cap row match finder hashLog so that rowLog <= 24
* Add unit tests to expose all cases. The row match finder unit tests
are only run in 64-bit mode, because they allocate ~1GB.
Fixes#3336
fix#3328
In situations where the alphabet size is very small,
the evaluation of literal costs from the Optimal Parser is initially incorrect.
It takes some time to converge, during which compression is less efficient.
This is especially important for small files,
because there will not be enough data to converge,
so most of the parsing is selected based on incorrect metrics.
After this patch, the scenario ##3328 gets fixed,
delivering the expected 29 bytes compressed size (smallest known compressed size).
Inspired by #3395,
offer a new capability to set all parameters defined in a ZSTD_compressionParameters structure
with a single symbol invocation
to improve user code brevity.
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
fixed#3323, reported by @nigeltao
Completed documentation around this risk
(which is largely theoretical,
I can't see that happening in any "real world" scenario,
but an erroneous @srcSize value could indeed trigger it).
Fix an off-by-one error in the compressor that emits corrupt blocks if:
* Zstd is compiled in 32-bit mode
* The windowLog == 25 exactly
* An offset of 2^25-3, 2^25-2, 2^25-1, or 2^25 is emitted
* The bitstream had 7 bits leftover before writing the offset
This bug has been present since before v1.0, but wasn't able to easily
be triggered, since until somewhat recently zstd wasn't able to find
matches that were within 128KB of the window size.
Add a test case, and fix 2 bugs in `ZSTD_compressSequences()`:
* The `ZSTD_isRLE()` check was incorrect. It wouldn't produce
corruption, but it could waste CPU and not emit RLE even if the block
was RLE
* One windowSize was `1 << windowLog`, not `1u << windowLog`
Thanks to @tansy for finding the issue, and giving us a reproducer!
Fixes Issue #3350.
* Intial commit to address 3090. Added support to decompress empty block
* Update zstd_decompress_block.c
Addressed review comments for the case of 'set_basic'
* Update lib/decompress/zstd_decompress_block.c
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
* Update lib/decompress/zstd_decompress_block.c
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
* first attempt at fast DMS short cache
* significant wins for some scenarios
* fix all clang regressions
* nits
* fix 1.5% gcc11 regression on hot 110Kdict scenario
* fix CI
* nit
* Add tags to doublefast hash table
* use tags in doublefast DMS
* Fix CI
* Clean up some hardcoded logic / constants
* Switch forCCtx to an enum
* nit
* add short cache to ip+1 long search
* Move tag size into hashLog
* Minor nits
* Truncate dictionaries greater than 16MB in short cache mode
* Helper function for tag comparison
* Cap short cache hashLog at 24 to prevent overflow
* size_t dictTagsMatch -> int dictTagsMatch
* nit
* Clean up and comment dictionary truncation
* Move ZSTD_tableFillPurpose_e next to ZSTD_dictTableLoadMethod_e
* Comment and expand helper functions
* Asserts and documentation
* nit
credit to oss-fuzz
This issue could happen when using the new Sequence Compression API in Explicit Delimiter Mode
with a too small dstCapacity.
In which case, there was one place where the buffer size wasn't checked.
discovered by oss-fuzz
It's a bug in the test itself :
ZSTD_compressBound() as an upper bound of the compress size
only works for data compressed "normally".
But in situations where many flushes are forcefully introduced,
this creates many more blocks,
each of which has a potential to increase the size by 3 bytes.
In extreme cases (lots of small incompressible blocks), the expansion can go beyond ZSTD_compressBound().
This situation is similar when using the CompressSequences() API
with Explicit Block Delimiters.
In which case, each explicit block acts like a deliberate flush.
When employed by a fuzzer, it's possible to generate scenarios like the one described above,
with tons of incompressible blocks of small sizes,
thus going beyond ZSTD_compressBound().
fix : when using Explicit Block Delimiters, use a larger bound, to account for this scenario.
credit to oss-fuzz
In rare circumstances, the block-splitter might cut a block at the exact beginning of a repcode.
In which case, since litlength=0, if the repcode expected 1+ literals in front, its signification changes.
This scenario is controlled in ZSTD_seqStore_resolveOffCodes(),
and the repcode is transformed into a raw offset when its new meaning is incorrect.
In more complex scenarios, the previous block might be emitted as uncompressed after all,
thus modifying the expected repcode history.
In the case discovered by oss-fuzz, the first block is emitted as uncompressed,
so the repcode history remains at default values: 1,4,8.
But since the starting repcode is repcode3, and the literal length is == 0,
its meaning is : = repcode1 - 1.
Since repcode1==1, it results in an offset value of 0, which is invalid.
So that's what the `assert()` was verifying : the result of the repcode translation should be a valid offset.
But actually, it doesn't matter, because this result will then be compared to reality,
and since it's an invalid offset, it will necessarily be discarded if incorrect,
then the repcode will be replaced by a raw offset.
So the `assert()` is not useful.
Furthermore, it's incorrect, because it assumes this situation cannot happen, but it does, as described in above scenario.
which properly represents the maximum bit size of compressed literals (11) as defined in the specification.
To be preferred from HUF_TABLELOG_DEFAULT which represents the same value but by accident.
Name selected to keep the same convention as existing width definitions,
MLFSELog, LLFSELog and OffFSELog.