2296 Commits

Author SHA1 Message Date
Yann Collet
c8ab027227 reduce the amount of includes in "cover.h" 2024-03-13 11:29:28 -07:00
Yonatan Komornik
b20703f273
Updates ZSTD_RowFindBestMatch comment (#3947)
Updates the comment on the head of `ZSTD_RowFindBestMatch` to make sure it's aligned with recent changes to the hash table.
2024-03-12 15:10:07 -07:00
Yann Collet
aed172a8fe minor: fix incorrect debug level 2024-03-08 14:29:44 -08:00
Yann Collet
8d31e8ec42 sizeBlockSequences() also tracks uncompressed size
and only defines a sub-block boundary when
it believes that it is compressible.

It's effectively an optimization,
avoiding a compression cycle to reach the same conclusion.
2024-02-26 14:31:12 -08:00
Yann Collet
d23b95d21d minor refactor for clarity
since we can ensure that nbSubBlocks>0
2024-02-26 14:06:34 -08:00
Yann Collet
86db60752d optimization: bail out faster in presence of incompressible data 2024-02-26 13:27:59 -08:00
Yann Collet
ef82b214ad nit: comment indentation
as reported by @terrelln
2024-02-26 13:23:59 -08:00
Yann Collet
aa8592c532 minor: reformulate nbSubBlocks assignment 2024-02-26 13:21:14 -08:00
Yann Collet
e0412c2062 fix extraneous semicolon ';'
as reported by @terrelln
2024-02-26 12:26:54 -08:00
Yann Collet
1fafd0c4ae fix minor visual static analyzer warning
it's a false positive,
but change the code nonetheless to make it more obvious to the static analyzer.
2024-02-25 19:45:32 -08:00
Yann Collet
038a8a906b targetCBlockSize: modified splitting strategy to generate blocks of more regular size
notably avoiding to feature a larger first block
2024-02-25 17:39:29 -08:00
Yann Collet
f8372191f5 reduced minimum compressed block size
with the intention to match the transport layer size,
such as Ethernet and 4G mobile networks.
2024-02-24 01:59:16 -08:00
Yann Collet
4b51526412 fix partial block uncompressed 2024-02-24 01:24:58 -08:00
Yann Collet
6719794379 fixed some regressionTests
but not all
2024-02-23 18:48:29 -08:00
Yann Collet
0591e7eea1 minor: fix overly cautious conversion warning 2024-02-23 16:05:09 -08:00
Yann Collet
3b40100058 fix long sequences (> 64 KB) 2024-02-23 15:35:12 -08:00
Yann Collet
6b11fc436c fix issue with incompressible sections 2024-02-23 14:53:56 -08:00
Yann Collet
cc4530924b speed optimized version of targetCBlockSize
note that the size of individual compressed blocks will vary more wildly with this modification.
But it seems good enough for a first test, and fix the speed regression issue.
Further refinements can be attempted later.
2024-02-23 14:03:26 -08:00
Christoph Grüninger
b921f1aad6 Reduce scope of variables
This improves readability, keeps variables local, and
prevents the unintended use (e.g. typo) later on.
Found by Cppcheck (variableScope)
2024-02-11 22:00:03 +01:00
Yann Collet
b0e8580dc7 fix fuzz issue 5131069967892480 2024-02-08 16:38:20 -08:00
Yann Collet
22574d848d fix issue 5921623844651008
ossfuzz managed to create a scenario which triggers an `assert`.
This fixes it, by giving +1 more space for the backward search pass.
2024-02-06 13:01:14 -08:00
Yann Collet
b88c593d8f added or updated code comments
as suggested by @terrelln,
to make the code of the optimal parser a bit more understandable.
2024-02-05 18:32:25 -08:00
Yann Collet
6c35fb2e8c fix msan warnings 2024-02-05 01:21:06 -08:00
Yann Collet
641749fc09 fix uasan dictionary_stream_round_trip fuzz test 2024-02-05 00:36:10 -08:00
Yann Collet
fe2e2ad36d use ZSTD_memcpy()
which can be redirected in Linux kernel mode
2024-02-03 19:57:38 -08:00
Yann Collet
0ae21d8c31 removed trace control 2024-02-03 19:32:59 -08:00
Yann Collet
5474edbe60 fixed wrong assert
by introducing ZSTD_OPT_SIZE
2024-02-03 19:31:53 -08:00
Yann Collet
e5af24c5fa fixed wrong assert 2024-02-03 17:48:29 -08:00
Yann Collet
8168a451e5 minor optimization, mostly for clarity 2024-02-03 17:26:47 -08:00
Yann Collet
d31018e223 finally, a version that generalizes well
While it's not always strictly a win,
it's a win for files that see a noticeably compression ratio increase,
while it's a very small noise for other files.

Downside is, this patch is less efficient for 32-bit arrays of integer
than the previous patch which was introducing losses for other files,
but it's still a net improvement on this scenario.
2024-02-03 14:26:18 -08:00
Yann Collet
0166b2ba80 modification: differentiate literal update at pos+1
helps when litlen==1 is cheaper than litlen==0

works great on pathological arr[u32] examples
but doesn't generalize well on other files.

silesia/x-ray is amoung the most negatively affected ones.
2024-01-31 11:20:43 -08:00
Yann Collet
4683667785 refactor optimal parser
store stretches as intermediate solution instead of sequences.
makes it possible to link a solution to a predecessor.
2024-01-31 02:51:46 -08:00
Yann Collet
de10f56be2 improve high compression ratio for file like #3793
this works great for 32-bit arrays,
notably the synthetic ones, with extreme regularity,
unfortunately, it's not universal,
and in some cases, it's a loss.
Crucially, on average, it's a loss on silesia.
The most negatively impacted file is x-ray.
It deserves an investigation before suggesting it as an evolution.
2024-01-29 23:25:24 -08:00
Yann Collet
a07cae3976
Merge pull request #3847 from michoecho/fix_nullptr_deref_in_createCDict
Fix a nullptr dereference in ZSTD_createCDict_advanced2()
2023-12-30 13:23:39 -08:00
Elliot Gorokhovsky
c6cabf9441
Make offload API compatible with static CCtx (#3854)
* Add ZSTD_CCtxParams_registerSequenceProducer() to public API

* add unit test

* add docs to zstd.h

* nits

* Add ZSTDLIB_STATIC_API prefix

* Add asserts
2023-12-28 14:48:46 -05:00
Michał Chojnowski
9a3b17c4d6 Fix a nullptr dereference in ZSTD_createCDict_advanced2()
If the relevant allocation returns NULL, ZSTD_createCDict_advanced_internal()
will return NULL. But ZSTD_createCDict_advanced2() doesn't check for
this and attempts to use the returned pointer anyway, which leads to
a segfault.
2023-12-16 13:02:18 +01:00
Elliot Gorokhovsky
d151a4880b Move offload API params into ZSTD_CCtx_params 2023-11-27 08:11:01 -08:00
Elliot Gorokhovsky
809c7eb6bf Refactor ZSTD_sequenceProducer_F typedef to ZSTD_sequenceProducer_F* 2023-11-27 06:56:37 -08:00
Nick Terrell
8193250615 Modernize macros to use do { } while (0)
This PR introduces no functional changes. It attempts to change all
macros currently using `{ }` or some variant of that to to
`do { } while (0)`, and introduces trailing `;` where necessary.
There were no bugs found during this migration.

The bug in Visual Studios warning on this has been fixed since VS2015.
Additionally, we have several instances of `do { } while (0)` which have
been present for several releases, so we don't have to worry about
breaking peoples builds.

Fixes Issue #3830.
2023-11-21 20:05:17 -05:00
Yann Collet
d988e00a7f baby-step towards solving flexArray issue #3785
the flexArray in structure FSE_DecompressWksp
is just a way to derive a pointer easily,
without risk/complexity of calculating it manually.

Not sure if this change is good enough to avoid ubsan warnings though.
2023-10-18 16:21:39 -07:00
Yann Collet
6bb1688c1a extended the fix to ZSTDMT's Buffer Pool 2023-10-08 00:25:17 -07:00
Yann Collet
ea4027c003 removed unused macro constant 2023-10-07 23:32:22 -07:00
Yann Collet
c87ad5bdb5 fixes suggested by @ebiggers 2023-10-07 23:29:42 -07:00
Yann Collet
e8ff7d18eb removed FlexArray pattern from CCtxPool
within ZSTDMT_.
This pattern is flagged by less forgiving variants of ubsan
notably used during compilation of the Linux Kernel.

There are 2 other places in the code where this pattern is used.
This fixes just one of them.
2023-10-07 21:30:08 -07:00
Yann Collet
c1e588fcb4
Merge pull request #3771 from DimitriPapadopoulos/codespell
Fix new typos found by codespell
2023-10-07 19:29:41 -07:00
Nick Terrell
43118da8a7 Stop suppressing pointer-overflow UBSAN errors
* Remove all pointer-overflow suppressions from our UBSAN builds/tests.
* Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress
  pointer-overflow at a per-function level. This is a superior approach
  because it also applies to users who build zstd with UBSAN.
* Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions.
  The end goal is to only tag these functions with
  `ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions
  that rely on pointer overflow, and gradually transition to using
  these.
* Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the
  pointer may be `NULL`.
* Fix all the fuzzer issues that came up. I'm sure there will be a lot
  more, but these are the ones that came up within a few minutes of
  running the fuzzers, and while running GitHub CI.
2023-09-28 17:35:05 -04:00
Dimitri Papadopoulos
fe34776c20
Fix new typos found by codespell 2023-09-23 18:56:01 +02:00
Nick Terrell
396ef5b434 Fix & refactor Huffman repeat tables for dictionaries
The Huffman repeat mode checker assumed that the CTable was zeroed in the region `[maxSymbolValue + 1, 256)`.
This assumption didn't hold for tables built in the dictionaries, because it didn't go through the same codepath.

Since this code was originally written, we added a header to the CTable that specifies the `tableLog`.
Add `maxSymbolValue` to that header, and check that the table's `maxSymbolValue` is at least the block's `maxSymbolValue`.

This solution is cleaner because we write this header for every CTable we build, so it can't be missed in any code path.

Credit to OSS-Fuzz
2023-08-25 13:21:58 -04:00
Nick Terrell
bd02c9be6e No longer reject dictionaries with literals maxSymbolValue < 255
We already have logic in our Huffman encoder to validate Huffman tables with missing symbols.
We use this for higher compression levels to re-use the previous blocks statistics, or when the dictionaries table has zero-weighted symbols.
This check was leftover as an oversight from before we added validation for Huffman tables.

I validated that the `dictionary_loader` fuzzer has coverage of every line in the `ZSTD_loadCEntropy()` function to validate that it is correctly testing this function.
2023-08-22 13:22:35 -04:00
W. Felix Handte
9987d2f594 Unpoison Workspace Memory Before Freeing to Custom Free
MSAN is hooked into the system malloc, but when the user provides a custom
allocator, it may not provide the same cleansing behavior. So if we leave
memory poisoned and return it to the user's allocator, where it is re-used
elsewhere, our poisoning can blow up in some other context.
2023-08-16 12:09:12 -04:00