1360 Commits

Author SHA1 Message Date
Nick Terrell
07a2a33135 Add ZSTD_set{C,F,}Params() helper functions
* Add ZSTD_setFParams() and ZSTD_setParams()
* Modify ZSTD_setCParams() to use ZSTD_setParameter() to avoid a second path setting parameters
* Add unit tests
* Update documentation to suggest using them to replace deprecated functions

Fixes #3396.
2023-03-08 09:57:35 -08:00
Yonatan Komornik
988ce61a0c
Adds initialization of clevel to static cdict (#3525) (#3527)
- Initializes clevel in `ZSTD_CCtxParams_init`
- Adds CI workflow for msan fuzzers runs without optimization (`-O0`)
- Fixes Makefile to correctly pass on user defined `MOREFLAGS` and `FUZZER_FLAGS` in cases they have been overwritten
2023-03-06 18:05:12 -08:00
Nick Terrell
395a2c5462 [bug-fix] Fix rare corruption bug affecting the block splitter
The block splitter confuses sequences with literal length == 65536 that use a
repeat offset code. It interprets this as literal length == 0 when deciding the
meaning of the repeat offset, and corrupts the repeat offset history. This is
benign, merely causing suboptimal compression performance, if the confused
history is flushed before the end of the block, e.g. if there are 3 consecutive
non-repeat code sequences after the mistake. It also is only triggered if the
block splitter decided to split the block.

All that to say: This is a rare bug, and requires quite a few conditions to
trigger. However, the good news is that if you have a way to validate that the
decompressed data is correct, e.g. you've enabled zstd's checksum or have a
checksum elsewhere, the original data is very likely recoverable. So if you were
affected by this bug please reach out.

The fix is to remind the block splitter that the literal length is actually 64K.
The test case is a bit tricky to set up, but I've managed to reproduce the issue.

Thanks to @danlark1 for alerting us to the issue and providing us a reproducer!
2023-02-23 10:54:31 -08:00
Yonatan Komornik
c78f434aa4
Fix zstd-dll build missing dependencies (#3496)
* Fixes zstd-dll build (https://github.com/facebook/zstd/issues/3492):
- Adds pool.o and threading.o dependency to the zstd-dll target
- Moves custom allocation functions into header to avoid needing to add dependency on common.o
- Adds test target for zstd-dll
- Adds github workflow that buildis zstd-dll
2023-02-12 12:32:31 -08:00
Elliot Gorokhovsky
ff42ed1582
Rename "External Matchfinder" to "Block-Level Sequence Producer" (#3484)
* change "external matchfinder" to "external sequence producer"

* migrate contrib/ to new naming convention

* fix contrib build

* fix error message

* update debug strings

* fix def of invalid sequences in zstd.h

* nit

* update CHANGELOG

* fix .gitignore
2023-02-09 17:01:17 -05:00
Elliot Gorokhovsky
3fe5f1fbb9 assert externalRepSearch != ZSTD_ps_auto 2023-02-01 18:24:46 -08:00
Elliot Gorokhovsky
7f8189ca57 add ZSTD_c_fastExternalSequenceParsing cctxParam 2023-02-01 09:09:53 -08:00
Elliot Gorokhovsky
64052ef57d
Guard against invalid sequences from external matchfinders (#3465) 2023-01-31 13:55:48 -05:00
daniellerozenblit
2bde9fbf85
Update lib/compress/zstd_compress.c
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2023-01-27 16:58:53 -05:00
Danielle Rozenblit
9e4c66b9e9 record long offsets in ZSTD_symbolEncodingTypeStats_t + add test case 2023-01-27 12:04:29 -08:00
Danielle Rozenblit
814f4bfb99 fix long offset resolution 2023-01-27 08:21:47 -08:00
Danielle Rozenblit
7d600c628a fix bound check for ZSTD_copySequencesToSeqStoreNoBlockDelim() 2023-01-24 06:40:40 -08:00
daniellerozenblit
9116000be6
Merge pull request #3439 from daniellerozenblit/sequence-validation-bug-fix
Fix sequence validation and seqStore bounds check
2023-01-23 13:50:37 -05:00
Danielle Rozenblit
815d1d4eda update external sequence error to fit error naming scheme 2023-01-23 09:58:34 -08:00
Danielle Rozenblit
1b65727e74 fix nits and add new error code for invalid external sequences 2023-01-23 07:59:02 -08:00
Nick Terrell
329169189c Replace Huffman boolean args with flags bit set 2023-01-20 14:12:53 -08:00
Nick Terrell
0cc1b0cb22 Delete unused Huffman functions
Remove all Huffman functions that aren't used by zstd.
2023-01-20 14:12:53 -08:00
Yann Collet
6742f20a7f
Merge pull request #3435 from facebook/c89build
added c89 build test to CI
2023-01-20 14:07:12 -08:00
Nick Terrell
666944fbe6 Cap hashLog & chainLog to ensure that we only use 32 bits of hash
* Cap shortCache chainLog to 24
* Cap row match finder hashLog so that rowLog <= 24
* Add unit tests to expose all cases. The row match finder unit tests
  are only run in 64-bit mode, because they allocate ~1GB.

Fixes #3336
2023-01-20 14:05:26 -08:00
Danielle Rozenblit
aa385ece13 fix sequence validation and bounds check in ZSTD_copySequencesToSeqStore() 2023-01-20 10:32:35 -08:00
Yann Collet
ea684c335a added c89 build test to CI 2023-01-19 14:59:30 -08:00
Elliot Gorokhovsky
bce0382c82
Bugfixes for the External Matchfinder API (#3433)
* external matchfinder bugfixes + tests

* small doc fix
2023-01-19 10:41:24 -05:00
Danielle Rozenblit
8353a4b095 fix maxBlockSize resolution + add test cases 2023-01-17 12:24:18 -08:00
Danielle Rozenblit
06b096db47 additional tests and documentation updates + allow maxBlockSize to be set to 0 (goes to default) 2023-01-12 13:41:50 -08:00
Danielle Rozenblit
53eb5a758c add simple test for maxBlockSize expected functionality 2023-01-12 08:55:39 -08:00
Danielle Rozenblit
1fffcfe01d update minimum threshold for max block size 2023-01-11 11:09:57 -08:00
Danielle Rozenblit
fe08137d9a resolve max block value in cctx and use when calculating the max block size 2023-01-09 07:53:53 -08:00
daniellerozenblit
d913417f72
Merge branch 'dev' into fuzz-max-block-size 2023-01-04 16:34:07 -05:00
Danielle Rozenblit
908e812733 initial commit 2023-01-04 13:01:54 -08:00
Yann Collet
5434de01e2 improve compression ratio of small alphabets
fix #3328

In situations where the alphabet size is very small,
the evaluation of literal costs from the Optimal Parser is initially incorrect.
It takes some time to converge, during which compression is less efficient.
This is especially important for small files,
because there will not be enough data to converge,
so most of the parsing is selected based on incorrect metrics.

After this patch, the scenario ##3328 gets fixed,
delivering the expected 29 bytes compressed size (smallest known compressed size).
2023-01-03 12:22:37 -08:00
Yann Collet
481a2e1010
Merge pull request #3403 from facebook/setCParams
ZSTD_CCtx_setCParams
2022-12-28 14:07:13 -08:00
Elliot Gorokhovsky
2a402626dd
External matchfinder API (#3333)
* First building commit with sample matchfinder

* Set up ZSTD_externalMatchCtx struct

* move seqBuffer to ZSTD_Sequence*

* support non-contiguous dictionary

* clean up parens

* add clearExternalMatchfinder, handle allocation errors

* Add useExternalMatchfinder cParam

* validate useExternalMatchfinder cParam

* Disable LDM + external matchfinder

* Check for static CCtx

* Validate mState and mStateDestructor

* Improve LDM check to cover both branches

* Error API with optional fallback

* handle RLE properly for external matchfinder

* nit

* Move to a CDict-like model for resource ownership

* Add hidden useExternalMatchfinder bool to CCtx_params_s

* Eliminate malloc, move to cwksp allocation

* Handle CCtx reset properly

* Ensure seqStore has enough space for external sequences

* fix capitalization

* Add DEBUGLOG statements

* Add compressionLevel param to matchfinder API

* fix c99 issues and add a param combination error code

* nits

* Test external matchfinder API

* C90 compat for simpleExternalMatchFinder

* Fix some @nocommits and an ASAN bug

* nit

* nit

* nits

* forward declare copySequencesToSeqStore functions in zstd_compress_internal.h

* nit

* nit

* nits

* Update copyright headers

* Fix CMake zstreamtest build

* Fix copyright headers (again)

* typo

* Add externalMatchfinder demo program to make contrib

* Reduce memory consumption for small blockSize

* ZSTD_postProcessExternalMatchFinderResult nits

* test sum(matchlen) + sum(litlen) == srcSize in debug builds

* refExternalMatchFinder -> registerExternalMatchFinder

* C90 nit

* zstreamtest nits

* contrib nits

* contrib nits

* allow block splitter + external matchfinder, refactor

* add windowSize param

* add contrib/externalMatchfinder/README.md

* docs

* go back to old RLE heuristic because of the first block issue

* fix initializer element is not a constant expression

* ref contrib from zstd.h

* extremely pedantic compiler warning fix, meson fix, typo fix

* Additional docs on API limitations

* minor nits

* Refactor maxNbSeq calculation into a helper function

* Fix copyright
2022-12-28 16:45:14 -05:00
Yann Collet
b17743e41b Signal parameter change during MT compression 2022-12-28 13:14:58 -08:00
Yann Collet
89342d1e07 New xp library symbol : ZSTD_CCtx_setCParams()
Inspired by #3395,
offer a new capability to set all parameters defined in a ZSTD_compressionParameters structure
with a single symbol invocation
to improve user code brevity.
2022-12-27 23:49:22 -08:00
Yann Collet
ea2895cef4 Support decompression of compressed blocks of size ZSTD_BLOCKSIZE_MAX exactly 2022-12-22 12:40:27 -08:00
Nick Terrell
40a7188130 Fix make clangbuild & add CI
Fix the errors for:
* `-Wdocumentation`
* `-Wconversion` except `-Wsign-conversion`
2022-12-21 17:31:04 -08:00
W. Felix Handte
5d693cc38c Coalesce Almost All Copyright Notices to Standard Phrasing
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do sed -i '/Copyright .* \(Yann Collet\)\|\(Meta Platforms\)/ s/Copyright .*/Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done

git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0*.c lib/legacy/zstd_v0*.h
nano ./programs/windres/zstd.rc
nano ./build/VS2010/zstd/zstd.rc
nano ./build/VS2010/libzstd-dll/libzstd-dll.rc
```
2022-12-20 12:52:34 -05:00
W. Felix Handte
8927f985ff Update Copyright Headers 'Facebook' -> 'Meta Platforms'
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
  sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
2022-12-20 12:37:57 -05:00
Yann Collet
832c1a6a1c minor reformatting
and minor reliability and maintenance changes
2022-12-18 11:26:57 -08:00
Yann Collet
51355e1f70
Merge pull request #3362 from facebook/compressBound
check potential overflow of compressBound()
2022-12-16 14:22:22 -08:00
Yann Collet
45ed0df18a check potential overflow of compressBound()
fixed #3323, reported by @nigeltao

Completed documentation around this risk
(which is largely theoretical,
I can't see that happening in any "real world" scenario,
but an erroneous @srcSize value could indeed trigger it).
2022-12-15 15:23:15 -08:00
Nick Terrell
a91e7ec175 Fix corruption that rarely occurs in 32-bit mode with wlog=25
Fix an off-by-one error in the compressor that emits corrupt blocks if:

* Zstd is compiled in 32-bit mode
* The windowLog == 25 exactly
* An offset of 2^25-3, 2^25-2, 2^25-1, or 2^25 is emitted
* The bitstream had 7 bits leftover before writing the offset

This bug has been present since before v1.0, but wasn't able to easily
be triggered, since until somewhat recently zstd wasn't able to find
matches that were within 128KB of the window size.

Add a test case, and fix 2 bugs in `ZSTD_compressSequences()`:
* The `ZSTD_isRLE()` check was incorrect. It wouldn't produce
  corruption, but it could waste CPU and not emit RLE even if the block
  was RLE
* One windowSize was `1 << windowLog`, not `1u << windowLog`

Thanks to @tansy for finding the issue, and giving us a reproducer!

Fixes Issue #3350.
2022-12-15 14:41:50 -08:00
Danielle Rozenblit
69ec75f0d5 fix window resizing edge case 2022-12-13 08:35:20 -08:00
Yann Collet
ecd7601c36 minor: proper pledgedSrcSize trace 2022-11-22 06:00:45 -08:00
Elliot Gorokhovsky
c8d870fe52 Improve LDM cparam validation logic 2022-11-21 15:39:18 -05:00
Danielle Rozenblit
fa7d9c1139 Set threshold to use optimal table log 2022-10-11 14:33:25 -07:00
Danielle Rozenblit
8888a2ddcc CI failure fixes 2022-10-11 13:12:19 -07:00
udayanbapat
43f21a600e
Intial commit to address 3090. Added support to decompress empty block. (#3118)
* Intial commit to address 3090. Added support to decompress empty block

* Update zstd_decompress_block.c

Addressed review comments for the case of 'set_basic'

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2022-07-14 11:54:34 -07:00
Elliot Gorokhovsky
cb9e341129 Nits 2022-06-23 16:59:21 -04:00
Elliot Gorokhovsky
2a128110d0 Add prefetchCDictTables CCtxParam 2022-06-22 16:13:07 -04:00