Yann Collet
9e52789962
fixed strict C90 semantic
2024-10-23 11:50:56 -07:00
Yann Collet
a5bce4ae84
XP: add a pre-splitter
...
instead of ingesting only full blocks, make an analysis of data, and infer where to split.
2024-10-23 11:50:56 -07:00
Yann Collet
47d4f5662d
rewrite code in the manner suggested by @terrelln
2024-10-17 09:37:23 -07:00
Yann Collet
6326775166
slightly improved compression ratio at levels 3 & 4
...
The compression ratio benefits are small but consistent, i.e. always positive.
On `silesia.tar` corpus, this modification saves ~75 KB at level 3.
The measured speed cost is negligible, i.e. below noise level, between 0 and -1%.
2024-10-17 09:37:23 -07:00
Yann Collet
c2abfc5ba4
minor improvement to level 3 dictionary compression ratio
2024-10-15 17:58:33 -07:00
Yann Collet
e63896eb58
small dictionary compression speed improvement
...
not as good as small-blocks improvement,
but generally positive.
2024-10-15 17:48:35 -07:00
Yann Collet
8c38bda935
Merge pull request #4165 from facebook/cspeed_cmov
...
Improve compression speed on small blocks
2024-10-11 16:20:19 -07:00
Yann Collet
8e5823b65c
rename variable name
...
findMatch -> matchFound
since it's a test, as opposed to an active search operation.
suggested by @terrelln
2024-10-11 15:38:12 -07:00
Yann Collet
83de00316c
fixed parameter ordering in dfast
...
noticed by @terrelln
2024-10-11 15:36:15 -07:00
Yann Collet
fa1fcb08ab
minor: better variable naming
2024-10-10 16:07:20 -07:00
Yann Collet
d45aee43f4
make __asm__ a __GNUC__ specific
2024-10-08 16:38:35 -07:00
Yann Collet
741b860fc1
store dummy bytes within ZSTD_match4Found_cmov()
...
feels more logical, better contained
2024-10-08 16:34:40 -07:00
Yann Collet
197c258a79
introduce memory barrier to force test order
...
suggested by @terrelln
2024-10-08 15:54:48 -07:00
Yann Collet
186b132495
made search strategy switchable
...
between cmov and branch
and use a simple heuristic based on wlog to select between them.
note: performance is not good on clang (yet)
2024-10-08 13:52:56 -07:00
Yann Collet
2cc600bab2
refactor search into an inline function
...
for easier swapping with a parameter
2024-10-08 11:10:48 -07:00
Yann Collet
1e7fa242f4
minor refactor zstd_fast
...
make hot variables more local
2024-10-07 11:22:40 -07:00
Ilya Tokar
e8fce38954
Optimize compression by avoiding unpredictable branches
...
Avoid unpredictable branch. Use conditional move to generate the address
that is guaranteed to be safe and compare unconditionally.
Instead of
if (idx < limit && x[idx] == val ) // mispredicted idx < limit branch
Do
addr = cmov(safe,x+idx)
if (*addr == val && idx < limit) // almost always false so well predicted
Using microbenchmarks from https://github.com/google/fleetbench ,
I get about ~10% speed-up:
name old cpu/op new cpu/op delta
BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:15 1.46ns ± 3% 1.31ns ± 7% -9.88% (p=0.000 n=35+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:16 1.41ns ± 3% 1.28ns ± 3% -9.56% (p=0.000 n=36+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:15 1.61ns ± 1% 1.43ns ± 3% -10.70% (p=0.000 n=30+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:16 1.54ns ± 2% 1.39ns ± 3% -9.21% (p=0.000 n=37+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:15 1.82ns ± 2% 1.61ns ± 3% -11.31% (p=0.000 n=37+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:16 1.73ns ± 3% 1.56ns ± 3% -9.50% (p=0.000 n=38+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:15 2.12ns ± 2% 1.79ns ± 3% -15.55% (p=0.000 n=34+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:16 1.99ns ± 3% 1.72ns ± 3% -13.70% (p=0.000 n=38+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:15 3.22ns ± 3% 2.94ns ± 3% -8.67% (p=0.000 n=38+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:16 3.19ns ± 4% 2.86ns ± 4% -10.55% (p=0.000 n=40+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:15 2.60ns ± 3% 2.22ns ± 3% -14.53% (p=0.000 n=40+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:16 2.46ns ± 3% 2.13ns ± 2% -13.67% (p=0.000 n=39+36)
BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:15 2.69ns ± 3% 2.46ns ± 3% -8.63% (p=0.000 n=37+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:16 2.63ns ± 3% 2.36ns ± 3% -10.47% (p=0.000 n=40+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:15 3.20ns ± 2% 2.95ns ± 3% -7.94% (p=0.000 n=35+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:16 3.20ns ± 4% 2.87ns ± 4% -10.33% (p=0.000 n=40+40)
I've also measured the impact on internal workloads and saw similar
~10% improvement in performance, measured by cpu usage/byte of data.
2024-09-20 16:07:01 -04:00
Yann Collet
09cb37cbb1
Limit range of operations on Indexes in 32-bit mode
...
and use unsigned type.
This reduce risks that an operation produces a negative number when crossing the 2 GB limit.
2024-08-21 11:03:43 -07:00
Yann Collet
1eb32ff594
Merge pull request #4115 from Adenilson/leak01
...
[zstd][leak] Avoid memory leak on early return of ZSTD_generateSequence
2024-08-09 14:09:17 -07:00
Adenilson Cavalcanti
a40bad8ec0
[zstd][leak] Avoid memory leak on early return of ZSTD_generateSequence
...
Sanity checks on a few of the context parameters (i.e. workers and block size)
may prompt an early return on ZSTD_generateSequences.
Allocating the destination buffer past those return points avoids a potential
memory leak.
This patch should fix issue #4112 .
2024-08-06 18:01:20 -07:00
Yann Collet
cb784edf5d
added android-ndk-build
2024-07-30 11:34:49 -07:00
Dimitri Papadopoulos
44e83e9180
Fix typos not found by codespell
2024-06-20 20:16:25 +02:00
Dimitri Papadopoulos
2d736d9c50
Fix new typos found by codespell
2024-06-20 20:12:16 +02:00
Federico Maresca
5e9a6c2fe4
Refactor dictionary matchfinder index safety check ( #4039 )
2024-05-29 12:35:24 -04:00
Yann Collet
c6e5257240
Merge pull request #3977 from facebook/doc_advanced
...
Doc update
2024-03-21 12:33:15 -07:00
Nick Terrell
731f4b70fc
Fix & fuzz ZSTD_generateSequences
...
This function was seriously flawed:
* It didn't do output bounds checks
* It produced invalid sequences when an uncompressed or RLE block was emitted
* It produced invalid sequences when the block splitter was enabled
* It produced invalid sequences when ZSTD_c_targetCBlockSize was enabled
I've attempted to fix these issues, but this function is just a bad idea,
so I've marked it as deprecated and unsafe. We should replace it with
`ZSTD_extractSequences()` which operates on a compressed frame.
2024-03-21 07:18:05 -07:00
Yann Collet
6f1215b874
fix ZSTD_TARGETCBLOCKSIZE_MIN test
...
when requested CBlockSize is too low,
bound it to the minimum
instead of returning an error.
2024-03-18 14:10:08 -07:00
Yann Collet
f5728da365
update targetCBlockSize documentation
2024-03-18 12:04:02 -07:00
Yann Collet
c8ab027227
reduce the amount of includes in "cover.h"
2024-03-13 11:29:28 -07:00
Yonatan Komornik
b20703f273
Updates ZSTD_RowFindBestMatch
comment ( #3947 )
...
Updates the comment on the head of `ZSTD_RowFindBestMatch` to make sure it's aligned with recent changes to the hash table.
2024-03-12 15:10:07 -07:00
Yann Collet
aed172a8fe
minor: fix incorrect debug level
2024-03-08 14:29:44 -08:00
Yann Collet
8d31e8ec42
sizeBlockSequences() also tracks uncompressed size
...
and only defines a sub-block boundary when
it believes that it is compressible.
It's effectively an optimization,
avoiding a compression cycle to reach the same conclusion.
2024-02-26 14:31:12 -08:00
Yann Collet
d23b95d21d
minor refactor for clarity
...
since we can ensure that nbSubBlocks>0
2024-02-26 14:06:34 -08:00
Yann Collet
86db60752d
optimization: bail out faster in presence of incompressible data
2024-02-26 13:27:59 -08:00
Yann Collet
ef82b214ad
nit: comment indentation
...
as reported by @terrelln
2024-02-26 13:23:59 -08:00
Yann Collet
aa8592c532
minor: reformulate nbSubBlocks assignment
2024-02-26 13:21:14 -08:00
Yann Collet
e0412c2062
fix extraneous semicolon ';'
...
as reported by @terrelln
2024-02-26 12:26:54 -08:00
Yann Collet
1fafd0c4ae
fix minor visual static analyzer warning
...
it's a false positive,
but change the code nonetheless to make it more obvious to the static analyzer.
2024-02-25 19:45:32 -08:00
Yann Collet
038a8a906b
targetCBlockSize: modified splitting strategy to generate blocks of more regular size
...
notably avoiding to feature a larger first block
2024-02-25 17:39:29 -08:00
Yann Collet
f8372191f5
reduced minimum compressed block size
...
with the intention to match the transport layer size,
such as Ethernet and 4G mobile networks.
2024-02-24 01:59:16 -08:00
Yann Collet
4b51526412
fix partial block uncompressed
2024-02-24 01:24:58 -08:00
Yann Collet
6719794379
fixed some regressionTests
...
but not all
2024-02-23 18:48:29 -08:00
Yann Collet
0591e7eea1
minor: fix overly cautious conversion warning
2024-02-23 16:05:09 -08:00
Yann Collet
3b40100058
fix long sequences (> 64 KB)
2024-02-23 15:35:12 -08:00
Yann Collet
6b11fc436c
fix issue with incompressible sections
2024-02-23 14:53:56 -08:00
Yann Collet
cc4530924b
speed optimized version of targetCBlockSize
...
note that the size of individual compressed blocks will vary more wildly with this modification.
But it seems good enough for a first test, and fix the speed regression issue.
Further refinements can be attempted later.
2024-02-23 14:03:26 -08:00
Christoph Grüninger
b921f1aad6
Reduce scope of variables
...
This improves readability, keeps variables local, and
prevents the unintended use (e.g. typo) later on.
Found by Cppcheck (variableScope)
2024-02-11 22:00:03 +01:00
Yann Collet
b0e8580dc7
fix fuzz issue 5131069967892480
2024-02-08 16:38:20 -08:00
Yann Collet
22574d848d
fix issue 5921623844651008
...
ossfuzz managed to create a scenario which triggers an `assert`.
This fixes it, by giving +1 more space for the backward search pass.
2024-02-06 13:01:14 -08:00
Yann Collet
b88c593d8f
added or updated code comments
...
as suggested by @terrelln,
to make the code of the optimal parser a bit more understandable.
2024-02-05 18:32:25 -08:00