314 Commits

Author SHA1 Message Date
Yann Collet
22b2fd2517
Merge pull request #4317 from hirohira9119/fix-function-signature
Fix function signature mismatch for ZSTD_convertBlockSequences
2025-02-27 13:03:03 -08:00
Yann Collet
db2d205ada fixed -Wconversion for lib/decompress/zstd_decompress_block.c 2025-02-26 10:01:05 -08:00
hirohira
2840631dc1 Fix function signature mismatch for ZSTD_convertBlockSequences 2025-02-26 08:23:48 +09:00
Yann Collet
8eb2587432 added benchmark for get1BlockSummary() 2025-01-15 17:11:27 -08:00
Victor Zhang
a610550e2c
Merge pull request #4218 from facebook/externC
Move #includes out of `extern "C"` blocks
2025-01-07 10:06:08 -08:00
Yann Collet
12c47d3262 improved speed of the Sequences converter 2024-12-20 10:37:00 -08:00
Yann Collet
95ad9e47ff added benchmark for ZSTD_convertBlockSequences_wBlockDelim() 2024-12-20 10:37:00 -08:00
Yann Collet
1c8f5b0f11 minor optimization for ZSTD_compressSequencesAndLiterals()
does not need to track and update internal `litPtr`.
note: does not measurably impact performance.
2024-12-20 10:36:59 -08:00
Yann Collet
8ab04097ed add the compressSequences() benchmark scenario 2024-12-20 10:36:59 -08:00
Yann Collet
4ef9d7d585 codemod: ZSTD_cParamMode_e -> ZSTD_CParamMode_e 2024-12-20 10:36:58 -08:00
Yann Collet
56cfb7816a codemod: ZSTD_paramSwitch_e -> ZSTD_ParamSwitch_e 2024-12-20 10:36:58 -08:00
Yann Collet
08edecb78c codemod: ZSTD_blockCompressor -> ZSTD_BlockCompressor_f 2024-12-20 10:36:57 -08:00
Yann Collet
25bef24c5c codemod: rawSeqStore_t -> RawSeqStore_t 2024-12-20 10:36:57 -08:00
Yann Collet
41c667c0fd codemod: repcodes_t -> Repcodes_t 2024-12-20 10:36:57 -08:00
Yann Collet
5df80acedb codemod: ZSTD_matchState_t -> ZSTD_MatchState_t 2024-12-20 10:36:57 -08:00
Yann Collet
30671d77af codemod: ZSTD_sequencePosition -> ZSTD_SequencePosition 2024-12-20 10:36:57 -08:00
Yann Collet
03d95f9d13 fix proper type for .forceNonContiguous 2024-12-20 10:36:57 -08:00
Yann Collet
76dd3a98c4 scope: ZSTD_copySequencesToSeqStore*() are private to ZSTD_compress.c
no need to publish them outside of this unit.
2024-12-20 10:36:57 -08:00
Yann Collet
c97522f7fb codemod: ZSTD_sequenceFormat_e -> ZSTD_SequenceFormat_e
since it's a type name.

Note: in contrast with previous names, this one is on the Public API side.
So there is a #define, so that existing programs using ZSTD_sequenceFormat_e still work.
2024-12-20 10:36:56 -08:00
Yann Collet
477a01067f codemod: symbolEncodingType_e -> SymbolEncodingType_e 2024-12-20 10:36:56 -08:00
Yann Collet
8d4506bc94 codemod: ZSTD_sequenceLength -> ZSTD_SequenceLength 2024-12-20 10:36:55 -08:00
Yann Collet
a2245721ca codemod: seqStore_t -> SeqStore_t
same idea, SeqStore_t is a type name, it should start with a Capital letter.
2024-12-20 10:36:55 -08:00
Yann Collet
9671813375 codemod: seqDef -> SeqDef
SeqDef is a type name, so it should start with a Capital letter.
It's an internal symbol, no impact on public API.
2024-12-20 10:36:55 -08:00
Yann Collet
b4a40a845f move Sequences definition to zstd_compress_internal.h
they should not be in common/zstd_internal.h,
since these definitions are not shared beyond lib/compress/.
2024-12-20 10:36:55 -08:00
Yann Collet
a00f45a037 created ZSTD_storeSeqOnly()
makes it possible to register a sequence without copying its literals.
2024-12-20 10:36:04 -08:00
Victor Zhang
d0d5ce4c00 Remove extern C blocks from lib/* internal APIs (except xxhash.h) 2024-12-19 16:00:11 -08:00
Yann Collet
01474bf73b add internal compression parameter preBlockSplitter_level
not yet exposed to the interface.

Also: renames `useBlockSplitter` to `postBlockSplitter`
to better qualify the difference between the 2 settings.
2024-10-28 16:31:15 -07:00
Yann Collet
cae8d13294 splitter workspace is now provided by ZSTD_CCtx* 2024-10-23 11:50:56 -07:00
Yann Collet
31d48e9ffa fixing minor formatting issue in 32-bit mode with logs enabled 2024-10-23 11:50:56 -07:00
Yann Collet
8c38bda935
Merge pull request #4165 from facebook/cspeed_cmov
Improve compression speed on small blocks
2024-10-11 16:20:19 -07:00
Yann Collet
186b132495 made search strategy switchable
between cmov and branch
and use a simple heuristic based on wlog to select between them.

note: performance is not good on clang (yet)
2024-10-08 13:52:56 -07:00
Yann Collet
2cc600bab2 refactor search into an inline function
for easier swapping with a parameter
2024-10-08 11:10:48 -07:00
Yann Collet
1e7fa242f4 minor refactor zstd_fast
make hot variables more local
2024-10-07 11:22:40 -07:00
Ilya Tokar
e8fce38954 Optimize compression by avoiding unpredictable branches
Avoid unpredictable branch. Use conditional move to generate the address
that is guaranteed to be safe and compare unconditionally.
Instead of

if (idx < limit && x[idx] == val ) // mispredicted idx < limit branch

Do

addr = cmov(safe,x+idx)
if (*addr == val && idx < limit) // almost always false so well predicted

Using microbenchmarks from https://github.com/google/fleetbench,
I get about ~10% speed-up:

name                                                                                          old cpu/op   new cpu/op    delta
BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:15                                     1.46ns ± 3%   1.31ns ± 7%   -9.88%  (p=0.000 n=35+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:16                                     1.41ns ± 3%   1.28ns ± 3%   -9.56%  (p=0.000 n=36+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:15                                     1.61ns ± 1%   1.43ns ± 3%  -10.70%  (p=0.000 n=30+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:16                                     1.54ns ± 2%   1.39ns ± 3%   -9.21%  (p=0.000 n=37+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:15                                     1.82ns ± 2%   1.61ns ± 3%  -11.31%  (p=0.000 n=37+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:16                                     1.73ns ± 3%   1.56ns ± 3%   -9.50%  (p=0.000 n=38+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:15                                     2.12ns ± 2%   1.79ns ± 3%  -15.55%  (p=0.000 n=34+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:16                                     1.99ns ± 3%   1.72ns ± 3%  -13.70%  (p=0.000 n=38+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:15                                      3.22ns ± 3%   2.94ns ± 3%   -8.67%  (p=0.000 n=38+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:16                                      3.19ns ± 4%   2.86ns ± 4%  -10.55%  (p=0.000 n=40+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:15                                      2.60ns ± 3%   2.22ns ± 3%  -14.53%  (p=0.000 n=40+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:16                                      2.46ns ± 3%   2.13ns ± 2%  -13.67%  (p=0.000 n=39+36)
BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:15                                      2.69ns ± 3%   2.46ns ± 3%   -8.63%  (p=0.000 n=37+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:16                                      2.63ns ± 3%   2.36ns ± 3%  -10.47%  (p=0.000 n=40+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:15                                      3.20ns ± 2%   2.95ns ± 3%   -7.94%  (p=0.000 n=35+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:16                                      3.20ns ± 4%   2.87ns ± 4%  -10.33%  (p=0.000 n=40+40)

I've also measured the impact on internal workloads and saw similar
~10% improvement in performance, measured by cpu usage/byte of data.
2024-09-20 16:07:01 -04:00
Yann Collet
09cb37cbb1 Limit range of operations on Indexes in 32-bit mode
and use unsigned type.
This reduce risks that an operation produces a negative number when crossing the 2 GB limit.
2024-08-21 11:03:43 -07:00
Yann Collet
cb784edf5d added android-ndk-build 2024-07-30 11:34:49 -07:00
Federico Maresca
5e9a6c2fe4
Refactor dictionary matchfinder index safety check (#4039) 2024-05-29 12:35:24 -04:00
Yann Collet
22574d848d fix issue 5921623844651008
ossfuzz managed to create a scenario which triggers an `assert`.
This fixes it, by giving +1 more space for the backward search pass.
2024-02-06 13:01:14 -08:00
Yann Collet
b88c593d8f added or updated code comments
as suggested by @terrelln,
to make the code of the optimal parser a bit more understandable.
2024-02-05 18:32:25 -08:00
Yann Collet
5474edbe60 fixed wrong assert
by introducing ZSTD_OPT_SIZE
2024-02-03 19:31:53 -08:00
Yann Collet
4683667785 refactor optimal parser
store stretches as intermediate solution instead of sequences.
makes it possible to link a solution to a predecessor.
2024-01-31 02:51:46 -08:00
Elliot Gorokhovsky
d151a4880b Move offload API params into ZSTD_CCtx_params 2023-11-27 08:11:01 -08:00
Elliot Gorokhovsky
809c7eb6bf Refactor ZSTD_sequenceProducer_F typedef to ZSTD_sequenceProducer_F* 2023-11-27 06:56:37 -08:00
Yann Collet
c1e588fcb4
Merge pull request #3771 from DimitriPapadopoulos/codespell
Fix new typos found by codespell
2023-10-07 19:29:41 -07:00
Nick Terrell
43118da8a7 Stop suppressing pointer-overflow UBSAN errors
* Remove all pointer-overflow suppressions from our UBSAN builds/tests.
* Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress
  pointer-overflow at a per-function level. This is a superior approach
  because it also applies to users who build zstd with UBSAN.
* Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions.
  The end goal is to only tag these functions with
  `ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions
  that rely on pointer overflow, and gradually transition to using
  these.
* Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the
  pointer may be `NULL`.
* Fix all the fuzzer issues that came up. I'm sure there will be a lot
  more, but these are the ones that came up within a few minutes of
  running the fuzzers, and while running GitHub CI.
2023-09-28 17:35:05 -04:00
Dimitri Papadopoulos
fe34776c20
Fix new typos found by codespell 2023-09-23 18:56:01 +02:00
Elliot Gorokhovsky
c6a888c073 suppress false error message in LDM mode 2023-06-21 19:19:02 -07:00
Nick Terrell
a3c3a38b9b [lazy] Skip over incompressible data
Every 256 bytes the lazy match finders process without finding a match,
they will increase their step size by 1. So for bytes [0, 256) they search
every position, for bytes [256, 512) they search every other position,
and so on. However, they currently still insert every position into
their hash tables. This is different from fast & dfast, which only
insert the positions they search.

This PR changes that, so now after we've searched 2KB without finding
any matches, at which point we'll only be searching one in 9 positions,
we'll stop inserting every position, and only insert the positions we
search. The exact cutoff of 2KB isn't terribly important, I've just
selected a cutoff that is reasonably large, to minimize the impact on
"normal" data.

This PR only adds skipping to greedy, lazy, and lazy2, but does not
touch btlazy2.

| Dataset | Level | Compiler     | CSize ∆ | Speed ∆ |
|---------|-------|--------------|---------|---------|
| Random  |     5 | clang-14.0.6 |    0.0% |   +704% |
| Random  |     5 | gcc-12.2.0   |    0.0% |   +670% |
| Random  |     7 | clang-14.0.6 |    0.0% |   +679% |
| Random  |     7 | gcc-12.2.0   |    0.0% |   +657% |
| Random  |    12 | clang-14.0.6 |    0.0% |  +1355% |
| Random  |    12 | gcc-12.2.0   |    0.0% |  +1331% |
| Silesia |     5 | clang-14.0.6 | +0.002% |  +0.35% |
| Silesia |     5 | gcc-12.2.0   | +0.002% |  +2.45% |
| Silesia |     7 | clang-14.0.6 | +0.001% |  -1.40% |
| Silesia |     7 | gcc-12.2.0   | +0.007% |  +0.13% |
| Silesia |    12 | clang-14.0.6 | +0.011% | +22.70% |
| Silesia |    12 | gcc-12.2.0   | +0.011% |  -6.68% |
| Enwik8  |     5 | clang-14.0.6 |    0.0% |  -1.02% |
| Enwik8  |     5 | gcc-12.2.0   |    0.0% |  +0.34% |
| Enwik8  |     7 | clang-14.0.6 |    0.0% |  -1.22% |
| Enwik8  |     7 | gcc-12.2.0   |    0.0% |  -0.72% |
| Enwik8  |    12 | clang-14.0.6 |    0.0% | +26.19% |
| Enwik8  |    12 | gcc-12.2.0   |    0.0% |  -5.70% |

The speed difference for clang at level 12 is real, but is probably
caused by some sort of alignment or codegen issues. clang is
significantly slower than gcc before this PR, but gets up to parity with
it.

I also measured the ratio difference for the HC match finder, and it
looks basically the same as the row-based match finder. The speedup on
random data looks similar. And performance is about neutral, without the
big difference at level 12 for either clang or gcc.
2023-03-20 11:18:29 -07:00
Nick Terrell
fbd97f305a Deprecated bufferless and block level APIs
* Mark all bufferless and block level functions as deprecated
* Update documentation to suggest not using these functions
* Add `_deprecated()` wrappers for functions that we use internally and
  call those instead
2023-03-16 10:04:15 -07:00
Yonatan Komornik
91f4c23e63
Add salt into row hash (#3528 part 2) (#3533)
Part 2 of #3528

Adds hash salt that helps to avoid regressions where consecutive compressions use the same tag space with similar data (running zstd -b5e7 enwik8 -B128K reproduces this regression).
2023-03-13 15:34:13 -07:00