2278 Commits

Author SHA1 Message Date
Christoph Grüninger
b921f1aad6 Reduce scope of variables
This improves readability, keeps variables local, and
prevents the unintended use (e.g. typo) later on.
Found by Cppcheck (variableScope)
2024-02-11 22:00:03 +01:00
Yann Collet
b0e8580dc7 fix fuzz issue 5131069967892480 2024-02-08 16:38:20 -08:00
Yann Collet
22574d848d fix issue 5921623844651008
ossfuzz managed to create a scenario which triggers an `assert`.
This fixes it, by giving +1 more space for the backward search pass.
2024-02-06 13:01:14 -08:00
Yann Collet
b88c593d8f added or updated code comments
as suggested by @terrelln,
to make the code of the optimal parser a bit more understandable.
2024-02-05 18:32:25 -08:00
Yann Collet
6c35fb2e8c fix msan warnings 2024-02-05 01:21:06 -08:00
Yann Collet
641749fc09 fix uasan dictionary_stream_round_trip fuzz test 2024-02-05 00:36:10 -08:00
Yann Collet
fe2e2ad36d use ZSTD_memcpy()
which can be redirected in Linux kernel mode
2024-02-03 19:57:38 -08:00
Yann Collet
0ae21d8c31 removed trace control 2024-02-03 19:32:59 -08:00
Yann Collet
5474edbe60 fixed wrong assert
by introducing ZSTD_OPT_SIZE
2024-02-03 19:31:53 -08:00
Yann Collet
e5af24c5fa fixed wrong assert 2024-02-03 17:48:29 -08:00
Yann Collet
8168a451e5 minor optimization, mostly for clarity 2024-02-03 17:26:47 -08:00
Yann Collet
d31018e223 finally, a version that generalizes well
While it's not always strictly a win,
it's a win for files that see a noticeably compression ratio increase,
while it's a very small noise for other files.

Downside is, this patch is less efficient for 32-bit arrays of integer
than the previous patch which was introducing losses for other files,
but it's still a net improvement on this scenario.
2024-02-03 14:26:18 -08:00
Yann Collet
0166b2ba80 modification: differentiate literal update at pos+1
helps when litlen==1 is cheaper than litlen==0

works great on pathological arr[u32] examples
but doesn't generalize well on other files.

silesia/x-ray is amoung the most negatively affected ones.
2024-01-31 11:20:43 -08:00
Yann Collet
4683667785 refactor optimal parser
store stretches as intermediate solution instead of sequences.
makes it possible to link a solution to a predecessor.
2024-01-31 02:51:46 -08:00
Yann Collet
de10f56be2 improve high compression ratio for file like #3793
this works great for 32-bit arrays,
notably the synthetic ones, with extreme regularity,
unfortunately, it's not universal,
and in some cases, it's a loss.
Crucially, on average, it's a loss on silesia.
The most negatively impacted file is x-ray.
It deserves an investigation before suggesting it as an evolution.
2024-01-29 23:25:24 -08:00
Yann Collet
a07cae3976
Merge pull request #3847 from michoecho/fix_nullptr_deref_in_createCDict
Fix a nullptr dereference in ZSTD_createCDict_advanced2()
2023-12-30 13:23:39 -08:00
Elliot Gorokhovsky
c6cabf9441
Make offload API compatible with static CCtx (#3854)
* Add ZSTD_CCtxParams_registerSequenceProducer() to public API

* add unit test

* add docs to zstd.h

* nits

* Add ZSTDLIB_STATIC_API prefix

* Add asserts
2023-12-28 14:48:46 -05:00
Michał Chojnowski
9a3b17c4d6 Fix a nullptr dereference in ZSTD_createCDict_advanced2()
If the relevant allocation returns NULL, ZSTD_createCDict_advanced_internal()
will return NULL. But ZSTD_createCDict_advanced2() doesn't check for
this and attempts to use the returned pointer anyway, which leads to
a segfault.
2023-12-16 13:02:18 +01:00
Elliot Gorokhovsky
d151a4880b Move offload API params into ZSTD_CCtx_params 2023-11-27 08:11:01 -08:00
Elliot Gorokhovsky
809c7eb6bf Refactor ZSTD_sequenceProducer_F typedef to ZSTD_sequenceProducer_F* 2023-11-27 06:56:37 -08:00
Nick Terrell
8193250615 Modernize macros to use do { } while (0)
This PR introduces no functional changes. It attempts to change all
macros currently using `{ }` or some variant of that to to
`do { } while (0)`, and introduces trailing `;` where necessary.
There were no bugs found during this migration.

The bug in Visual Studios warning on this has been fixed since VS2015.
Additionally, we have several instances of `do { } while (0)` which have
been present for several releases, so we don't have to worry about
breaking peoples builds.

Fixes Issue #3830.
2023-11-21 20:05:17 -05:00
Yann Collet
d988e00a7f baby-step towards solving flexArray issue #3785
the flexArray in structure FSE_DecompressWksp
is just a way to derive a pointer easily,
without risk/complexity of calculating it manually.

Not sure if this change is good enough to avoid ubsan warnings though.
2023-10-18 16:21:39 -07:00
Yann Collet
6bb1688c1a extended the fix to ZSTDMT's Buffer Pool 2023-10-08 00:25:17 -07:00
Yann Collet
ea4027c003 removed unused macro constant 2023-10-07 23:32:22 -07:00
Yann Collet
c87ad5bdb5 fixes suggested by @ebiggers 2023-10-07 23:29:42 -07:00
Yann Collet
e8ff7d18eb removed FlexArray pattern from CCtxPool
within ZSTDMT_.
This pattern is flagged by less forgiving variants of ubsan
notably used during compilation of the Linux Kernel.

There are 2 other places in the code where this pattern is used.
This fixes just one of them.
2023-10-07 21:30:08 -07:00
Yann Collet
c1e588fcb4
Merge pull request #3771 from DimitriPapadopoulos/codespell
Fix new typos found by codespell
2023-10-07 19:29:41 -07:00
Nick Terrell
43118da8a7 Stop suppressing pointer-overflow UBSAN errors
* Remove all pointer-overflow suppressions from our UBSAN builds/tests.
* Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress
  pointer-overflow at a per-function level. This is a superior approach
  because it also applies to users who build zstd with UBSAN.
* Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions.
  The end goal is to only tag these functions with
  `ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions
  that rely on pointer overflow, and gradually transition to using
  these.
* Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the
  pointer may be `NULL`.
* Fix all the fuzzer issues that came up. I'm sure there will be a lot
  more, but these are the ones that came up within a few minutes of
  running the fuzzers, and while running GitHub CI.
2023-09-28 17:35:05 -04:00
Dimitri Papadopoulos
fe34776c20
Fix new typos found by codespell 2023-09-23 18:56:01 +02:00
Nick Terrell
396ef5b434 Fix & refactor Huffman repeat tables for dictionaries
The Huffman repeat mode checker assumed that the CTable was zeroed in the region `[maxSymbolValue + 1, 256)`.
This assumption didn't hold for tables built in the dictionaries, because it didn't go through the same codepath.

Since this code was originally written, we added a header to the CTable that specifies the `tableLog`.
Add `maxSymbolValue` to that header, and check that the table's `maxSymbolValue` is at least the block's `maxSymbolValue`.

This solution is cleaner because we write this header for every CTable we build, so it can't be missed in any code path.

Credit to OSS-Fuzz
2023-08-25 13:21:58 -04:00
Nick Terrell
bd02c9be6e No longer reject dictionaries with literals maxSymbolValue < 255
We already have logic in our Huffman encoder to validate Huffman tables with missing symbols.
We use this for higher compression levels to re-use the previous blocks statistics, or when the dictionaries table has zero-weighted symbols.
This check was leftover as an oversight from before we added validation for Huffman tables.

I validated that the `dictionary_loader` fuzzer has coverage of every line in the `ZSTD_loadCEntropy()` function to validate that it is correctly testing this function.
2023-08-22 13:22:35 -04:00
W. Felix Handte
9987d2f594 Unpoison Workspace Memory Before Freeing to Custom Free
MSAN is hooked into the system malloc, but when the user provides a custom
allocator, it may not provide the same cleansing behavior. So if we leave
memory poisoned and return it to the user's allocator, where it is re-used
elsewhere, our poisoning can blow up in some other context.
2023-08-16 12:09:12 -04:00
W. Felix Handte
5f5bdc1e5d Easy: Move Helper Functions Up 2023-08-16 12:08:52 -04:00
Jacob Greenfield
55ff3e4e17 Save one byte on the frame epilogue 2023-07-20 18:59:44 -04:00
Elliot Gorokhovsky
c6a888c073 suppress false error message in LDM mode 2023-06-21 19:19:02 -07:00
Yann Collet
1f83b7cfc4 fix a minor inefficiency in compress_superblock
and in `decodecorpus`:
the specific case `nbSeq=127` can be represented using the 1-byte format.
Note that both the 1-byte and the 2-bytes formats are valid to represent this case,
so there was no "error", produced data remains valid,
it's just that the 1-byte format is more efficient.

fix #3667

Credit to @ip7z for finding this issue.
2023-06-05 09:51:52 -07:00
Nick Terrell
d01a2c6929 Fix UBSAN issue (zero addition to NULL)
Fix UBSAN issue that came up internally.
2023-05-26 13:43:47 -07:00
W. Felix Handte
1b65803fe7 Reorder Definitions in zstd_opt.c to Group Under Macro Guards (Slightly) 2023-05-22 12:41:48 -04:00
W. Felix Handte
59c7b2a492 Reorder Definitions in zstd_lazy.c to Group Under Macro Guards 2023-05-22 12:37:03 -04:00
W. Felix Handte
eb9227935e Also Reorganize Zstd Opt Declarations 2023-05-04 12:18:58 -04:00
W. Felix Handte
d09f195ceb Remove blockCompressor NULL Checks 2023-05-04 12:18:58 -04:00
W. Felix Handte
b7add1dd67 Abort if Unsupported Parameters Used 2023-05-04 12:18:58 -04:00
W. Felix Handte
f242f5be8f Re-Order Lazy Declarations; Minimize ifndefs 2023-05-04 12:18:58 -04:00
W. Felix Handte
39b7946b95 Define Macros for Possibly-Present Functions; Use Them Rather than Ifdef Guards 2023-05-04 12:18:58 -04:00
W. Felix Handte
b12e8cb3e7 Merge Ultra and Ultra2 Exclusion
Ultra2 does not exist for dict compression, and so uses ultra. So ultra must
be present if ultra2 is.
2023-05-04 12:18:58 -04:00
W. Felix Handte
6761e1c949 Tweak Ultra/Opt Guards 2023-05-04 12:18:58 -04:00
W. Felix Handte
5a75956001 Adjust Strategy in CParams to Avoid Using Excluded Block Compressors 2023-05-04 12:18:58 -04:00
W. Felix Handte
50cdf84f58 Macro-Exclude Block Compressors from Declaration/Definition 2023-05-04 12:18:58 -04:00
W. Felix Handte
81b86a2024 NULL Out Block Compressor Table Entries When Excluded
Don't check about excluding `ZSTD_fast`. It's always included so that we know
we can resolve downwards and hit a strategy that's present.
2023-05-04 12:18:58 -04:00
W. Felix Handte
cbf3e26316 Allow ZSTD_selectBlockCompressor() to Return NULL
Return an error rather than segfaulting.
2023-05-04 12:18:58 -04:00