1497 Commits

Author SHA1 Message Date
Yann Collet
14a21e43b3 produced ZSTD_compressSequencesAndLiterals() as a separate pipeline
only supports explicit delimiter mode, at least for the time being
2024-12-20 10:36:58 -08:00
Yann Collet
bcb15091aa minor: more accurate variable scope 2024-12-20 10:36:58 -08:00
Yann Collet
047db4f1f8 ZSTD_SequenceCopier_f no returns the nb of bytes consumed from input
which feels much more natural
2024-12-20 10:36:58 -08:00
Yann Collet
4ef9d7d585 codemod: ZSTD_cParamMode_e -> ZSTD_CParamMode_e 2024-12-20 10:36:58 -08:00
Yann Collet
56cfb7816a codemod: ZSTD_paramSwitch_e -> ZSTD_ParamSwitch_e 2024-12-20 10:36:58 -08:00
Yann Collet
13b9296d79 minor simplification 2024-12-20 10:36:58 -08:00
Yann Collet
08edecb78c codemod: ZSTD_blockCompressor -> ZSTD_BlockCompressor_f 2024-12-20 10:36:57 -08:00
Yann Collet
25bef24c5c codemod: rawSeqStore_t -> RawSeqStore_t 2024-12-20 10:36:57 -08:00
Yann Collet
41c667c0fd codemod: repcodes_t -> Repcodes_t 2024-12-20 10:36:57 -08:00
Yann Collet
5df80acedb codemod: ZSTD_matchState_t -> ZSTD_MatchState_t 2024-12-20 10:36:57 -08:00
Yann Collet
fa468944f2 codemod: ZSTD_buildSeqStore_e -> ZSTD_BuildSeqStore_e 2024-12-20 10:36:57 -08:00
Yann Collet
30671d77af codemod: ZSTD_sequencePosition -> ZSTD_SequencePosition 2024-12-20 10:36:57 -08:00
Yann Collet
5359d16d8d enable proper type 2024-12-20 10:36:57 -08:00
Yann Collet
76dd3a98c4 scope: ZSTD_copySequencesToSeqStore*() are private to ZSTD_compress.c
no need to publish them outside of this unit.
2024-12-20 10:36:57 -08:00
Yann Collet
1ac79ba1b6 minor: simplify ZSTD_selectSequenceCopier 2024-12-20 10:36:56 -08:00
Yann Collet
894ea31281 codemod: ZSTD_sequenceCopier -> ZSTD_SequenceCopier_f 2024-12-20 10:36:56 -08:00
Yann Collet
c97522f7fb codemod: ZSTD_sequenceFormat_e -> ZSTD_SequenceFormat_e
since it's a type name.

Note: in contrast with previous names, this one is on the Public API side.
So there is a #define, so that existing programs using ZSTD_sequenceFormat_e still work.
2024-12-20 10:36:56 -08:00
Yann Collet
0165eeb441 created ZSTD_entropyCompressSeqStore_wExtLitBuffer()
can receive externally defined buffer of literals
2024-12-20 10:36:56 -08:00
Yann Collet
e9f8a119b4 ZSTD_entropyCompressSeqStore_internal() can accept an externally defined literals buffer 2024-12-20 10:36:56 -08:00
Yann Collet
0442e43aca codemod: ZSTD_defaultPolicy_e -> ZSTD_DefaultPolicy_e 2024-12-20 10:36:56 -08:00
Yann Collet
477a01067f codemod: symbolEncodingType_e -> SymbolEncodingType_e 2024-12-20 10:36:56 -08:00
Yann Collet
a2245721ca codemod: seqStore_t -> SeqStore_t
same idea, SeqStore_t is a type name, it should start with a Capital letter.
2024-12-20 10:36:55 -08:00
Yann Collet
9671813375 codemod: seqDef -> SeqDef
SeqDef is a type name, so it should start with a Capital letter.
It's an internal symbol, no impact on public API.
2024-12-20 10:36:55 -08:00
Yann Collet
b4a40a845f move Sequences definition to zstd_compress_internal.h
they should not be in common/zstd_internal.h,
since these definitions are not shared beyond lib/compress/.
2024-12-20 10:36:55 -08:00
Yann Collet
a00f45a037 created ZSTD_storeSeqOnly()
makes it possible to register a sequence without copying its literals.
2024-12-20 10:36:04 -08:00
Yann Collet
bbaba45589 change experimental parameter name
from ZSTD_c_useBlockSplitter to ZSTD_c_splitAfterSequences.
2024-10-31 13:43:40 -07:00
Yann Collet
4f93206d62 changed variable name to ZSTD_c_blockSplitterLevel
suggested by @terrelln
2024-10-29 11:12:09 -07:00
Yann Collet
fcbf6b014a fixed minor conversion warning 2024-10-28 16:47:38 -07:00
Yann Collet
37706a677c added a test
test both that the new parameter works as intended,
and that the over-split protection works as intended
2024-10-28 16:31:15 -07:00
Yann Collet
226ae73311 expose new parameter ZSTD_c_blockSplitter_level 2024-10-28 16:31:15 -07:00
Yann Collet
01474bf73b add internal compression parameter preBlockSplitter_level
not yet exposed to the interface.

Also: renames `useBlockSplitter` to `postBlockSplitter`
to better qualify the difference between the 2 settings.
2024-10-28 16:31:15 -07:00
Yann Collet
e557abc8a0 new block splitting variant _fromBorders
less precise but still suitable for `fast` strategy.
2024-10-25 16:13:55 -07:00
Yann Collet
da2c0dffd8 add faster block splitting heuristic, suitable for dfast strategy 2024-10-24 14:37:00 -07:00
Yann Collet
ca6e55cbf5 reduce splitBlock arguments 2024-10-24 13:17:56 -07:00
Yann Collet
566763fdc9 new variant, sampling by 11 2024-10-24 13:17:56 -07:00
Yann Collet
90095f056d apply limit conditions for all splitting strategies
instead of just for blind split.

This is in anticipation of adversarial input,
that would intentionally target the sampling pattern of the split detector.

Note that, even without this protection, splitting can never expand beyond ZSTD_COMPRESSBOUND(),
because this upper limit uses a 1KB block size worst case scenario,
and splitting never creates blocks thath small.

The protection is more to ensure that data is not expanded by more than 3-bytes per 128 KB full block,
which is a much stricter limit.
2024-10-24 11:36:56 -07:00
Yann Collet
c80645a055 stricter limits to ensure expansion factor with blind-split strategy
issue reported by @terrelln
2024-10-23 14:55:10 -07:00
Yann Collet
7d3e5e3ba1 split all full 128 KB blocks
this helps make the streaming behavior more consistent,
since it does no longer depend on having more data presented on the input.

suggested by @terrelln
2024-10-23 14:18:48 -07:00
Yann Collet
b68ddce818 rewrite fingerprint storage to no longer need 64-bit members
so that it can be stored using standard alignment requirement (sizeof(void*)).

Distance function still requires 64-bit signed multiplication though,
so it won't change the issue regarding the bug in ubsan for clang 32-bit on github ci.
2024-10-23 11:50:57 -07:00
Yann Collet
0be334d208 fixes static state allocation check
detected by @felixhandte
2024-10-23 11:50:57 -07:00
Yann Collet
ea85dc7af6 conservatively estimate over-splitting in presence of incompressible loss
ensure data can never be expanded by more than 3 bytes per full block.
2024-10-23 11:50:57 -07:00
Yann Collet
5ae34e4c96 ensure lastBlock is correctly determined
reported by @terrelln
2024-10-23 11:50:57 -07:00
Yann Collet
a167571db5 added a faster block splitter variant
that samples 1 in 5 positions.

This variant is fast enough for lazy2 and btlazy2,
but it's less good in combination with post-splitter at higher levels (>= btopt).
2024-10-23 11:50:57 -07:00
Yann Collet
4ce91cbf2b fixed workspace alignment on non 64-bit systems 2024-10-23 11:50:57 -07:00
Yann Collet
cae8d13294 splitter workspace is now provided by ZSTD_CCtx* 2024-10-23 11:50:56 -07:00
Yann Collet
73a6653653 ZSTD_splitBlock_4k() uses externally provided workspace
ideally, this workspace would be provided from the ZSTD_CCtx* state
2024-10-23 11:50:56 -07:00
Yann Collet
20c3d176cd fix assert 2024-10-23 11:50:56 -07:00
Yann Collet
0d4b520657 only split full blocks
short term simplification
2024-10-23 11:50:56 -07:00
Yann Collet
f83ed087f6 fixed RLE detection test 2024-10-23 11:50:56 -07:00
Yann Collet
83a3402a92 fix overlap write scenario in presence of incompressible data 2024-10-23 11:50:56 -07:00