1410 Commits

Author SHA1 Message Date
Yann Collet
7d3e5e3ba1 split all full 128 KB blocks
this helps make the streaming behavior more consistent,
since it does no longer depend on having more data presented on the input.

suggested by @terrelln
2024-10-23 14:18:48 -07:00
Yann Collet
b68ddce818 rewrite fingerprint storage to no longer need 64-bit members
so that it can be stored using standard alignment requirement (sizeof(void*)).

Distance function still requires 64-bit signed multiplication though,
so it won't change the issue regarding the bug in ubsan for clang 32-bit on github ci.
2024-10-23 11:50:57 -07:00
Yann Collet
0be334d208 fixes static state allocation check
detected by @felixhandte
2024-10-23 11:50:57 -07:00
Yann Collet
ea85dc7af6 conservatively estimate over-splitting in presence of incompressible loss
ensure data can never be expanded by more than 3 bytes per full block.
2024-10-23 11:50:57 -07:00
Yann Collet
5ae34e4c96 ensure lastBlock is correctly determined
reported by @terrelln
2024-10-23 11:50:57 -07:00
Yann Collet
a167571db5 added a faster block splitter variant
that samples 1 in 5 positions.

This variant is fast enough for lazy2 and btlazy2,
but it's less good in combination with post-splitter at higher levels (>= btopt).
2024-10-23 11:50:57 -07:00
Yann Collet
4ce91cbf2b fixed workspace alignment on non 64-bit systems 2024-10-23 11:50:57 -07:00
Yann Collet
cae8d13294 splitter workspace is now provided by ZSTD_CCtx* 2024-10-23 11:50:56 -07:00
Yann Collet
73a6653653 ZSTD_splitBlock_4k() uses externally provided workspace
ideally, this workspace would be provided from the ZSTD_CCtx* state
2024-10-23 11:50:56 -07:00
Yann Collet
20c3d176cd fix assert 2024-10-23 11:50:56 -07:00
Yann Collet
0d4b520657 only split full blocks
short term simplification
2024-10-23 11:50:56 -07:00
Yann Collet
f83ed087f6 fixed RLE detection test 2024-10-23 11:50:56 -07:00
Yann Collet
83a3402a92 fix overlap write scenario in presence of incompressible data 2024-10-23 11:50:56 -07:00
Yann Collet
a5bce4ae84 XP: add a pre-splitter
instead of ingesting only full blocks, make an analysis of data, and infer where to split.
2024-10-23 11:50:56 -07:00
Adenilson Cavalcanti
a40bad8ec0 [zstd][leak] Avoid memory leak on early return of ZSTD_generateSequence
Sanity checks on a few of the context parameters (i.e. workers and block size)
may prompt an early return on ZSTD_generateSequences.

Allocating the destination buffer past those return points avoids a potential
memory leak.

This patch should fix issue #4112.
2024-08-06 18:01:20 -07:00
Dimitri Papadopoulos
44e83e9180
Fix typos not found by codespell 2024-06-20 20:16:25 +02:00
Yann Collet
c6e5257240
Merge pull request #3977 from facebook/doc_advanced
Doc update
2024-03-21 12:33:15 -07:00
Nick Terrell
731f4b70fc Fix & fuzz ZSTD_generateSequences
This function was seriously flawed:
* It didn't do output bounds checks
* It produced invalid sequences when an uncompressed or RLE block was emitted
* It produced invalid sequences when the block splitter was enabled
* It produced invalid sequences when ZSTD_c_targetCBlockSize was enabled

I've attempted to fix these issues, but this function is just a bad idea,
so I've marked it as deprecated and unsafe. We should replace it with
`ZSTD_extractSequences()` which operates on a compressed frame.
2024-03-21 07:18:05 -07:00
Yann Collet
6f1215b874 fix ZSTD_TARGETCBLOCKSIZE_MIN test
when requested CBlockSize is too low,
bound it to the minimum
instead of returning an error.
2024-03-18 14:10:08 -07:00
Christoph Grüninger
b921f1aad6 Reduce scope of variables
This improves readability, keeps variables local, and
prevents the unintended use (e.g. typo) later on.
Found by Cppcheck (variableScope)
2024-02-11 22:00:03 +01:00
Yann Collet
5474edbe60 fixed wrong assert
by introducing ZSTD_OPT_SIZE
2024-02-03 19:31:53 -08:00
Yann Collet
4683667785 refactor optimal parser
store stretches as intermediate solution instead of sequences.
makes it possible to link a solution to a predecessor.
2024-01-31 02:51:46 -08:00
Yann Collet
a07cae3976
Merge pull request #3847 from michoecho/fix_nullptr_deref_in_createCDict
Fix a nullptr dereference in ZSTD_createCDict_advanced2()
2023-12-30 13:23:39 -08:00
Elliot Gorokhovsky
c6cabf9441
Make offload API compatible with static CCtx (#3854)
* Add ZSTD_CCtxParams_registerSequenceProducer() to public API

* add unit test

* add docs to zstd.h

* nits

* Add ZSTDLIB_STATIC_API prefix

* Add asserts
2023-12-28 14:48:46 -05:00
Michał Chojnowski
9a3b17c4d6 Fix a nullptr dereference in ZSTD_createCDict_advanced2()
If the relevant allocation returns NULL, ZSTD_createCDict_advanced_internal()
will return NULL. But ZSTD_createCDict_advanced2() doesn't check for
this and attempts to use the returned pointer anyway, which leads to
a segfault.
2023-12-16 13:02:18 +01:00
Elliot Gorokhovsky
d151a4880b Move offload API params into ZSTD_CCtx_params 2023-11-27 08:11:01 -08:00
Elliot Gorokhovsky
809c7eb6bf Refactor ZSTD_sequenceProducer_F typedef to ZSTD_sequenceProducer_F* 2023-11-27 06:56:37 -08:00
Nick Terrell
8193250615 Modernize macros to use do { } while (0)
This PR introduces no functional changes. It attempts to change all
macros currently using `{ }` or some variant of that to to
`do { } while (0)`, and introduces trailing `;` where necessary.
There were no bugs found during this migration.

The bug in Visual Studios warning on this has been fixed since VS2015.
Additionally, we have several instances of `do { } while (0)` which have
been present for several releases, so we don't have to worry about
breaking peoples builds.

Fixes Issue #3830.
2023-11-21 20:05:17 -05:00
Yann Collet
e8ff7d18eb removed FlexArray pattern from CCtxPool
within ZSTDMT_.
This pattern is flagged by less forgiving variants of ubsan
notably used during compilation of the Linux Kernel.

There are 2 other places in the code where this pattern is used.
This fixes just one of them.
2023-10-07 21:30:08 -07:00
Dimitri Papadopoulos
fe34776c20
Fix new typos found by codespell 2023-09-23 18:56:01 +02:00
Nick Terrell
bd02c9be6e No longer reject dictionaries with literals maxSymbolValue < 255
We already have logic in our Huffman encoder to validate Huffman tables with missing symbols.
We use this for higher compression levels to re-use the previous blocks statistics, or when the dictionaries table has zero-weighted symbols.
This check was leftover as an oversight from before we added validation for Huffman tables.

I validated that the `dictionary_loader` fuzzer has coverage of every line in the `ZSTD_loadCEntropy()` function to validate that it is correctly testing this function.
2023-08-22 13:22:35 -04:00
Jacob Greenfield
55ff3e4e17 Save one byte on the frame epilogue 2023-07-20 18:59:44 -04:00
Elliot Gorokhovsky
c6a888c073 suppress false error message in LDM mode 2023-06-21 19:19:02 -07:00
Nick Terrell
d01a2c6929 Fix UBSAN issue (zero addition to NULL)
Fix UBSAN issue that came up internally.
2023-05-26 13:43:47 -07:00
W. Felix Handte
d09f195ceb Remove blockCompressor NULL Checks 2023-05-04 12:18:58 -04:00
W. Felix Handte
b7add1dd67 Abort if Unsupported Parameters Used 2023-05-04 12:18:58 -04:00
W. Felix Handte
39b7946b95 Define Macros for Possibly-Present Functions; Use Them Rather than Ifdef Guards 2023-05-04 12:18:58 -04:00
W. Felix Handte
b12e8cb3e7 Merge Ultra and Ultra2 Exclusion
Ultra2 does not exist for dict compression, and so uses ultra. So ultra must
be present if ultra2 is.
2023-05-04 12:18:58 -04:00
W. Felix Handte
5a75956001 Adjust Strategy in CParams to Avoid Using Excluded Block Compressors 2023-05-04 12:18:58 -04:00
W. Felix Handte
50cdf84f58 Macro-Exclude Block Compressors from Declaration/Definition 2023-05-04 12:18:58 -04:00
W. Felix Handte
81b86a2024 NULL Out Block Compressor Table Entries When Excluded
Don't check about excluding `ZSTD_fast`. It's always included so that we know
we can resolve downwards and hit a strategy that's present.
2023-05-04 12:18:58 -04:00
W. Felix Handte
cbf3e26316 Allow ZSTD_selectBlockCompressor() to Return NULL
Return an error rather than segfaulting.
2023-05-04 12:18:58 -04:00
Yann Collet
2e29728797 fix #3583
As reported by @georgmu,
the previous fix is undone by the later initialization.
Switch order, so that initialization is adjusted by special case.
2023-04-03 09:45:11 -07:00
daniellerozenblit
3e0550ee52
fix window update (#3556) 2023-03-21 13:28:26 -04:00
Nick Terrell
a3c3a38b9b [lazy] Skip over incompressible data
Every 256 bytes the lazy match finders process without finding a match,
they will increase their step size by 1. So for bytes [0, 256) they search
every position, for bytes [256, 512) they search every other position,
and so on. However, they currently still insert every position into
their hash tables. This is different from fast & dfast, which only
insert the positions they search.

This PR changes that, so now after we've searched 2KB without finding
any matches, at which point we'll only be searching one in 9 positions,
we'll stop inserting every position, and only insert the positions we
search. The exact cutoff of 2KB isn't terribly important, I've just
selected a cutoff that is reasonably large, to minimize the impact on
"normal" data.

This PR only adds skipping to greedy, lazy, and lazy2, but does not
touch btlazy2.

| Dataset | Level | Compiler     | CSize ∆ | Speed ∆ |
|---------|-------|--------------|---------|---------|
| Random  |     5 | clang-14.0.6 |    0.0% |   +704% |
| Random  |     5 | gcc-12.2.0   |    0.0% |   +670% |
| Random  |     7 | clang-14.0.6 |    0.0% |   +679% |
| Random  |     7 | gcc-12.2.0   |    0.0% |   +657% |
| Random  |    12 | clang-14.0.6 |    0.0% |  +1355% |
| Random  |    12 | gcc-12.2.0   |    0.0% |  +1331% |
| Silesia |     5 | clang-14.0.6 | +0.002% |  +0.35% |
| Silesia |     5 | gcc-12.2.0   | +0.002% |  +2.45% |
| Silesia |     7 | clang-14.0.6 | +0.001% |  -1.40% |
| Silesia |     7 | gcc-12.2.0   | +0.007% |  +0.13% |
| Silesia |    12 | clang-14.0.6 | +0.011% | +22.70% |
| Silesia |    12 | gcc-12.2.0   | +0.011% |  -6.68% |
| Enwik8  |     5 | clang-14.0.6 |    0.0% |  -1.02% |
| Enwik8  |     5 | gcc-12.2.0   |    0.0% |  +0.34% |
| Enwik8  |     7 | clang-14.0.6 |    0.0% |  -1.22% |
| Enwik8  |     7 | gcc-12.2.0   |    0.0% |  -0.72% |
| Enwik8  |    12 | clang-14.0.6 |    0.0% | +26.19% |
| Enwik8  |    12 | gcc-12.2.0   |    0.0% |  -5.70% |

The speed difference for clang at level 12 is real, but is probably
caused by some sort of alignment or codegen issues. clang is
significantly slower than gcc before this PR, but gets up to parity with
it.

I also measured the ratio difference for the HC match finder, and it
looks basically the same as the row-based match finder. The speedup on
random data looks similar. And performance is about neutral, without the
big difference at level 12 for either clang or gcc.
2023-03-20 11:18:29 -07:00
Nick Terrell
fbd97f305a Deprecated bufferless and block level APIs
* Mark all bufferless and block level functions as deprecated
* Update documentation to suggest not using these functions
* Add `_deprecated()` wrappers for functions that we use internally and
  call those instead
2023-03-16 10:04:15 -07:00
daniellerozenblit
53bad103ce
patch-from speed optimization (#3545)
* patch-from speed optimization: only load portion of dictionary into normal matchfinders

* test regression for x8 multiplier

* fix off-by-one error for bit shift bound

* restrict patchfrom speed optimization to strategy < ZSTD_btultra

* update results.csv

* update regression test
2023-03-14 20:36:56 -04:00
Yonatan Komornik
91f4c23e63
Add salt into row hash (#3528 part 2) (#3533)
Part 2 of #3528

Adds hash salt that helps to avoid regressions where consecutive compressions use the same tag space with similar data (running zstd -b5e7 enwik8 -B128K reproduces this regression).
2023-03-13 15:34:13 -07:00
Yonatan Komornik
9420bce8a4
Add init once memory (#3528) (#3529)
- Adds memory type that is guaranteed to have been initialized at least once in the workspace's lifetime.
- Changes tag space in row hash to be based on init once memory.
2023-03-13 13:20:49 -07:00
Yonatan Komornik
33e39094e7
Reduce RowHash's tag space size by x2 (#3543)
Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index).
The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).
2023-03-10 14:15:04 -08:00