9639 Commits

Author SHA1 Message Date
Elliot Gorokhovsky
f9f27de91c Disallow empty output directory 2022-07-29 14:48:33 -07:00
Elliot Gorokhovsky
e1873ad576 Fix buffer underflow for null dir1 2022-07-29 11:10:47 -07:00
Elliot Gorokhovsky
e5db7c93f5
Merge pull request #3197 from embg/docstring_clarify
Clarify benchmark chunking docstring
2022-07-26 13:26:15 -04:00
Elliot Gorokhovsky
bef1d9a831
Merge pull request #3209 from zhuhan0/dev
[largeNbDicts] Second try at fixing decompression segfault to always create compressInstructions
2022-07-26 13:19:38 -04:00
Han Zhu
6255f994d3 [largeNbDicts] Second try at fixing decompression segfault to always create compressInstructions
Summary:
Freeing an uninitialized pointer is undefined behavior. This caused a segfault
when compiling the benchmark with Clang -O3 and benching decompression.

V2: always create compressInstructions but check if cctxParams is NULL before
setting CCtx params to avoid segfault.

Test Plan:
make and run
2022-07-21 11:55:01 -07:00
Elliot Gorokhovsky
466e13f722
Merge pull request #3205 from zhuhan0/dev
[contrib][largeNbDicts] Fix decompression segfault; Add additional benchmark metrics
2022-07-20 16:07:04 -04:00
Han Zhu
d993a288e0 [largeNbDicts] Add an option to print out median speed
Summary:
Added an option -p# where -p0 (default) sets the aggregation method to fastest
speed while -p1 sets the aggregation method to median. Also added a new column
in the csv file to report this option's value.

Test Plan:
``
$ ./largeNbDicts -1 --nbDicts=1 -D ~/benchmarks/html/html_8_16K.32K.dict
~/benchmarks/html/html_8_16K/*
loading 7450 files...
created src buffer of size 83.4 MB
split input into 7450 blocks
loading dictionary /home/zhuhan/benchmarks/html/html_8_16K.32K.dict
compressing at level 1 without dictionary : Ratio=3.03  (28827863 bytes)
compressed using a 32768 bytes dictionary : Ratio=4.28  (20410262 bytes)
generating 1 dictionaries, using 0.1 MB of memory
Compression Speed : 306.0 MB/s
Fastest Speed : 310.6 MB/s

$ ./largeNbDicts -1 --nbDicts=1 -p1 -D ~/benchmarks/html/html_8_16K.32K.dict
~/benchmarks/html/html_8_16K/*
loading 7450 files...
created src buffer of size 83.4 MB
split input into 7450 blocks
loading dictionary /home/zhuhan/benchmarks/html/html_8_16K.32K.dict
compressing at level 1 without dictionary : Ratio=3.03  (28827863 bytes)
compressed using a 32768 bytes dictionary : Ratio=4.28  (20410262 bytes)
generating 1 dictionaries, using 0.1 MB of memory
Compression Speed : 306.9 MB/s
Median Speed : 298.4 MB/s
```
2022-07-20 11:19:41 -07:00
Han Zhu
b550f9b77e [largeNbDicts] Print more metrics into csv file
Summary:
Add column headers and data for whether it's a compression or a decompression
run, compression level, nbDicts and dictAttachPref in additional to
compr/decompr speed.

Test Plan:
Example output:

```
./largeNbDicts
Compression/Decompression,Level,nbDicts,dictAttachPref,Speed
Compression,1,1,0,300.9
Compression,1,1,1,296.4
Compression,1,1,2,307.8
Compression,1,10,0,292.3
Compression,1,100,0,293.3
Compression,3,110,0,106.0
Decompression,-1,110,-1,155.6
Decompression,-1,110,-1,709.4
Decompression,-1,120,-1,709.1
Decompression,-1,120,-1,734.6
```
2022-07-19 16:50:28 -07:00
Han Zhu
d0c88afe6d [largeNbDicts] Fix decompression segfault in createCompressInstructions
Benchmarking decompression results in a segfault in `createCompressInstructions`
because `cctxParams` is NULL. Skip running that function if we are not benching
compression.
2022-07-19 13:55:52 -07:00
udayanbapat
43f21a600e
Intial commit to address 3090. Added support to decompress empty block. (#3118)
* Intial commit to address 3090. Added support to decompress empty block

* Update zstd_decompress_block.c

Addressed review comments for the case of 'set_basic'

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2022-07-14 11:54:34 -07:00
Elliot Gorokhovsky
6d75b36b7f Clarify -B docstring 2022-07-14 00:22:21 -04:00
Felix Handte
02ef78be58
Merge pull request #3184 from htnhan/features/list_verbose_to_show_dictionary_id
zstd -lv <file> to show dictID
2022-07-08 16:04:39 -04:00
htnhan
d7eb829af5 Detect multiple dictIDs in one file 2022-07-08 12:20:50 -05:00
htnhan
cc8c98485a zstd -lv <file> to show dictID 2022-07-05 21:28:33 -05:00
Elliot Gorokhovsky
3ef92cfcd4
Merge pull request #3180 from nocnokneo/MSVCBuildTests
Fix ZSTD_BUILD_TESTS=ON with MSVC
2022-07-05 13:13:34 -04:00
Taylor Braun-Jones
cd9d0a7e6e Fix ZSTD_BUILD_TESTS=ON build with MSVC
Fixes:

    Command line error D8021 : invalid numeric argument '/Wno-deprecated-declarations'
2022-06-30 13:20:42 -04:00
Elliot Gorokhovsky
5d2fb4288f
Merge pull request #3179 from embg/1.5.3_bump
Prepare v1.5.3
2022-06-29 13:03:52 -07:00
Elliot Gorokhovsky
bb3839a78c make -C programs zstd.1 2022-06-29 14:55:14 -04:00
Elliot Gorokhovsky
5c382bf110 1.5.3 version bump 2022-06-29 14:45:53 -04:00
Elliot Gorokhovsky
e9d6fc867a
Merge pull request #3177 from embg/dms_prefetch2
Add prefetchCDictTables CCtxParam (+10-20% cold dict compression speed)
2022-06-24 08:24:43 -07:00
Elliot Gorokhovsky
cb9e341129 Nits 2022-06-23 16:59:21 -04:00
Elliot Gorokhovsky
bb4a3c71ef
Update README.md for fuzzers (#3174)
* Update README.md for fuzzers

* Add ls corpora/*crash command

* nit

* Clarify wording and add Nick's command

* Minor clarification
2022-06-22 21:02:07 -04:00
Elliot Gorokhovsky
747e06f4f6 Add tests 2022-06-22 17:05:23 -04:00
Elliot Gorokhovsky
6bd5ac6713 add prefetchCDictTables to largeNbDicts 2022-06-22 16:13:07 -04:00
Elliot Gorokhovsky
93b89fb24b Add docs 2022-06-22 16:13:07 -04:00
Elliot Gorokhovsky
2a128110d0 Add prefetchCDictTables CCtxParam 2022-06-22 16:13:07 -04:00
Yann Collet
f5c4ec4658
Merge pull request #3175 from facebook/fix3169
Streaming decompression can detect incorrect header ID sooner
2022-06-22 11:21:09 -07:00
Yann Collet
91aeade735 Streaming decompression can detect incorrect header ID sooner
Streaming decompression used to wait for a minimum of 5 bytes before attempting decoding.
This meant that, in the case that only a few bytes (<5) were provided,
and assuming these bytes are incorrect,
there would be no error reported.
The streaming API would simply request more data, waiting for at least 5 bytes.

This PR makes it possible to detect incorrect Frame IDs as soon as the first byte is provided.

Fix #3169
2022-06-21 23:09:03 -07:00
Elliot Gorokhovsky
f6ef14329f
"Short cache" optimization for level 1-4 DMS (+5-30% compression speed) (#3152)
* first attempt at fast DMS short cache

* significant wins for some scenarios

* fix all clang regressions

* nits

* fix 1.5% gcc11 regression on hot 110Kdict scenario

* fix CI

* nit

* Add tags to doublefast hash table

* use tags in doublefast DMS

* Fix CI

* Clean up some hardcoded logic / constants

* Switch forCCtx to an enum

* nit

* add short cache to ip+1 long search

* Move tag size into hashLog

* Minor nits

* Truncate dictionaries greater than 16MB in short cache mode

* Helper function for tag comparison

* Cap short cache hashLog at 24 to prevent overflow

* size_t dictTagsMatch -> int dictTagsMatch

* nit

* Clean up and comment dictionary truncation

* Move ZSTD_tableFillPurpose_e next to ZSTD_dictTableLoadMethod_e

* Comment and expand helper functions

* Asserts and documentation

* nit
2022-06-21 17:27:19 -04:00
Yann Collet
eb842a2260
Merge pull request #3170 from facebook/mesongnu99
removed gnu99 statement from meson recipe
2022-06-21 10:17:36 -07:00
Yann Collet
15f3605135 removed gnu99 statement from meson recipe 2022-06-20 18:18:40 -07:00
Yann Collet
3367e6d414
Merge pull request #3167 from facebook/cmake_std
remove explicit standard setting from cmake script
2022-06-19 16:49:21 -07:00
Yann Collet
eceecc5b2c removed explicit compilation standard from cmake script
it's not expected to be useful
and can actually lead to subtle side effects
such as #3163.
2022-06-19 14:52:32 -07:00
Yann Collet
f15dd6420c
Merge pull request #3166 from facebook/warning_clockt
display a warning message when using C90 clock_t
2022-06-19 14:45:49 -07:00
Yann Collet
574ecbb0fc display a warning message when using C90 clock_t for MT speed measurements. 2022-06-19 11:38:06 -07:00
Yann Collet
b33ef91694 updated documentation regarding build systems 2022-06-19 11:12:16 -07:00
Elliot Gorokhovsky
b7b7edb3a3
Merge pull request #3161 from embg/largeNbDictsImprovements
[contrib] largeNbDicts bugfix + improvements
2022-06-15 07:39:50 -07:00
Elliot Gorokhovsky
24364057bc
fix typo
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2022-06-14 19:18:49 -04:00
Elliot Gorokhovsky
2bbdc9f40e Fix FILE handle leak 2022-06-14 14:57:54 -07:00
Elliot Gorokhovsky
f7ebbcd0cc Support advanced API so forceCopy/forceAttach works properly 2022-06-14 14:52:51 -07:00
Elliot Gorokhovsky
e0c4863c5c largeNbDicts bugfix + improvements 2022-06-13 17:26:44 -07:00
Elliot Gorokhovsky
b944db0c45
Merge pull request #3160 from danlark1/patch-1
Fix big endian ARM NEON path
2022-06-13 14:01:43 -04:00
Daniel Kutenin
05f3f415ce
Fix big endian ARM NEON path
It is not using the NEON acceleration but the bit grouping was applied
2022-06-13 09:16:24 +01:00
Nick Terrell
3b1bd91852
Merge pull request #3141 from JunHe77/seqDec
dec: adjust seqSymbol load on aarch64
2022-06-09 13:40:51 -07:00
Nick Terrell
3b915cd94b
Merge pull request #3145 from JunHe77/wildcopy
common: apply two stage copy to aarch64
2022-06-09 13:38:30 -07:00
Elliot Gorokhovsky
f313a773a4
Merge pull request #3157 from embg/huge_dict_bugfix
Bugfix for huge dictionaries
2022-06-09 15:35:29 -04:00
Elliot Gorokhovsky
31bd6402c6 Bugfix for huge dictionaries 2022-06-09 11:39:30 -04:00
Yann Collet
27bf96e72b updated --single-thread man 2022-06-07 17:45:15 -07:00
Nick Terrell
802ad778cc
Merge pull request #3154 from terrelln/rsyncable-speed-fix
Remove expensive assert in --rsyncable hot loop
2022-06-06 16:07:20 -07:00
Nick Terrell
7c05b9aec3 Remove expensive assert in --rsyncable hot loop
This assert slows the loop down by 10x. We can get similar
coverage by asserting at the beginning & end of the loop.

We need this fix because Debian compiles zstd with asserts
enabled. Separately, we should ask them why, and if they would
consider disabling asserts in their builds. Since we don't
optimize for assert enabled builds.

Fixes Issue #3150.
2022-06-06 11:56:13 -07:00