3971 Commits

Author SHA1 Message Date
stanjo74
52598d54e9
Limit train samples (#2809)
* Limit training samples size to 2GB

* simplified DISPLAYLEVEL() macro to use global vqriable instead of local.

* refactored training samples loading

* fixed compiler warning

* addressed comments from the pull request

* addressed @terrelln comments

* missed some fixes

* fixed type mismatch

* Fixed bug passing estimated number of samples rather insted of the loaded number of samples.
Changed unit conversion not to use bit-shifts.

* fixed a declaration after code

* fixed type conversion compile errors

* fixed more type castting

* fixed more type mismatching

* changed sizes type to size_t

* move type casting

* more type cast fixes
2021-10-04 17:47:52 -07:00
Yann Collet
7868f38019
Merge pull request #2747 from Helflym/dev
Add AIX support in Makefile
2021-10-01 08:13:39 -07:00
Nick Terrell
3a4d421c0f
Merge pull request #2802 from solbjorn/fix-kernel-wundef
[contrib][linux] Fix -Wundef inside Linux kernel tree
2021-09-29 09:48:17 -07:00
Sen Huang
4b7f45cb04 Pull hot loop into its own function 2021-09-28 08:19:44 -07:00
Sen Huang
ccdcbf4621 Try beginning and end of match 2021-09-28 08:19:44 -07:00
Sen Huang
b8fd6bf30c Skip most long matches in lazy hash table update 2021-09-28 08:19:39 -07:00
Nick Terrell
9ef055d706
Merge pull request #2808 from terrelln/huf-oss-fuzz-fix
[huf] Fix OSS-Fuzz assert
2021-09-27 15:00:52 -07:00
Felix Handte
8b7a19fcd4
Merge pull request #2805 from nolange/smaller_code_with_disabled_features
Smaller code with disabled features
2021-09-27 17:43:21 -04:00
Nick Terrell
a07ddb47f7 [huf] Fix OSS-Fuzz assert
PR #2784 introduced a bug in the decompressor that caused some valid
inputs to fail to decompress. The bitstream isn't reloaded after the 4X*
loop if the number of elements remaining is small enough, causing us to
read more bits than are available in the bitcontainer.

This was caught by the MSAN fuzzer in OSS-Fuzz because the assembly
implementation isn't used in the MSAN build.

Credit to OSS-Fuzz.
2021-09-27 13:56:07 -07:00
Yann Collet
2ed14c2476 minor : fix comment
provide correct reasons to include zstd_internal.h
2021-09-26 08:44:18 -07:00
Norbert Lange
6763f40331 zstd_decompress: use a helper function for context create
Multiple ZSTD_createDCtx* functions call other (public)
ZSTD_createDCtx* functions, this makes it harder for humans
and compilers to throw out code that is not used.

This farms out the logic into a static function, if a program
only uses a single ZSTD_createDCtx variant, all others can be easily
dropped and the remaining implementation can be specialized.
2021-09-26 14:41:37 +02:00
Norbert Lange
0d45540695 decompress: conditionally remove bmi2 from context
Use an helper function, which will just return 0 in case
the feature is disabled.
Allows constant propagation and removal of dead code.
2021-09-26 14:41:37 +02:00
Norbert Lange
02296cac82 decompress: conditionally remove legacy members from context
Remove the then unneeded variables from the struct,
and all accesses to them.
2021-09-26 12:12:17 +02:00
Alexander Lobakin
71526e6f29 [contrib][linux] Fix -Wundef inside Linux kernel tree
Commit d7ef97a013b5
("[build] Fix oss-fuzz build with the dataflow sanitizer") broke
build inside Linux-kernel after 'import', as it no longer can
conditionally remove ZSTD_MEMORY_SANITIZER definition from
the #if DEF_A || DEF_B block. This emits -Wundef warning which
can be treated as error.
Split this preprocessor condition into two separate conditions
to fix this.

Fixes: d7ef97a013b5 ("[build] Fix oss-fuzz build with the dataflow sanitizer")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
2021-09-25 13:35:25 +02:00
Nick Terrell
14772d97be
Merge pull request #2796 from terrelln/linux-fixes
[lib] Make lib compatible with `-Wfall-through` excepting legacy
2021-09-23 16:11:53 -07:00
Nick Terrell
01976ce4cd
Merge pull request #2799 from terrelln/oss-fuzz-build
[build] Fix oss-fuzz build with the dataflow sanitizer
2021-09-23 15:55:10 -07:00
Nick Terrell
1903d6a5a8
Merge pull request #2798 from abxhr/typo-fix
Fix typo
2021-09-23 13:11:45 -07:00
Nick Terrell
d7ef97a013 [build] Fix oss-fuzz build with the dataflow sanitizer
The dataflow sanitizer requires all code to be instrumented. We can't
instrument the ASM function, so we have to disable it.
2021-09-23 11:48:39 -07:00
Abshar Mohammed Aslam
54a888b57b
Fix typo 2021-09-23 21:54:38 +04:00
Nick Terrell
189e87bcbe [lib] Make lib compatible with -Wfall-through excepting legacy
Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported
compilers this uses an attribute, otherwise it becomes a comment.

This is necessary to be compatible with clang's `-Wfall-through`, and
gcc's `-Wfall-through=2` which don't support comments. Without this the
linux build emits a bunch of warnings.

Also add a test to CI to ensure that we don't regress.
2021-09-23 10:51:18 -07:00
Yann Collet
fa2a4d77c7 constify MatchState* parameter when possible
turns out, it's possible to constify MatchState* parameter
in some parts of the binary tree algorithm,
making it a pure read-only parameter,
as opposed to a mutable state.

This is supposed to be helpful for both maintenance and the compiler.
2021-09-23 08:27:44 -07:00
senhuang42
1d8143c84f Move block splitter from stack to CCtx 2021-09-23 00:02:31 -04:00
sen
044c8b4722
Merge pull request #2779 from senhuang42/fse_fix
Fix NCountWriteBound
2021-09-22 13:51:21 -04:00
sen
1e99d36361
Merge pull request #2788 from senhuang42/param_switch
Use new paramSwitch enum for row matchfinder and block splitter
2021-09-22 13:27:55 -04:00
Nick Terrell
9450876a9d [huf] Fix compilation when DYNAMIC_BMI2=0 && BMI2 is supported
* Fix compilation issues pointed out in PR #2790.
* Add test cases to GitHub actions that test all combinations of
  `DYNAMIC_BMI2` BMI2 support.
2021-09-21 16:49:13 -07:00
senhuang42
06f42c3bfd Use new paramSwitch enum for LDM 2021-09-21 14:22:09 -04:00
senhuang42
b5c35d7ea3 Use new paramSwitch enum for LCM, row matchfinder, and block splitter 2021-09-21 14:22:02 -04:00
Nick Terrell
a5f2c45528 Huffman ASM 2021-09-20 14:46:43 -07:00
Nick Terrell
d7542aacd9 [fuzzer] Add huf_decompress fuzzer
Add a fuzzer for Huffman decompression. Fix several bugs in Huffman
decompression, mostly related to `op == NULL` and pointer underflow.
2021-09-17 15:00:49 -07:00
Nick Terrell
8bf699aa59 [build] Add support for ASM files in Make + CMake
* Extract out common portion of `lib/Makefile` into `lib/libzstd.mk`.
  Most relevantly, the way we find library files.
* Use `lib/libzstd.mk` in the other Makefiles instead of repeating the
  same code.
* Add a test `tests/test-variants.sh` that checks that the builds of
  `make -C programs allVariants` are correct, and run it in Actions.
* Adds support for ASM files in the CMake build.

The Meson build is not updated because it lists every file in zstd,
and supports ASM off the bat, so the Huffman ASM commit will just add
the ASM file to the list.

The Visual Studios build is not updated because I'm not adding ASM
support to Visual Studios yet.
2021-09-17 14:13:53 -07:00
sen
9d2a45a705
Merge pull request #2778 from senhuang42/opt_inlining_revert
Revert opt outlining change
2021-09-15 14:22:10 -04:00
Sen Huang
a7aa2c5df6 Fix NCountWriteBound 2021-09-15 09:51:42 -07:00
Sen Huang
bd84e4a9d3 Revert opt outlining change 2021-09-15 09:08:41 -07:00
Nick Terrell
2fabd370bb
Merge pull request #2777 from terrelln/oss-fuzz-fix
[rsyncable] Fix test failures
2021-09-14 13:20:22 -07:00
Nick Terrell
9d9e2ed00b [rsyncable] Fix test failures
Test failures showed up on the daily cron job. They didn't show up
in CI because the condition is somewhat rare, and didn't trigger
during the CI tests.

This PR fixes up the logic in `findSynchronizationPoint()` to correctly
handle the edge case. It also un-comments an assert that helps catch the
issue, and verify that rsyncable mode is calculating the correct hash.

After the fix, the test that failed passes:

```
./zstreamtest --newapi -t1 --no-big-tests -s9680
```
2021-09-14 12:28:53 -07:00
Yann Collet
2e6f5bc0d8
Merge pull request #2771 from facebook/opt_investigation
Improve optimal parser performance on small data
2021-09-14 10:36:34 -07:00
Nick Terrell
d22bbed5db
Merge pull request #2776 from terrelln/oss-fuzz-fix
[rsyncable] Ensure ZSTD_compressBound() is respected
2021-09-14 09:37:43 -07:00
Yann Collet
fd94b9d1c9 Merge branch 'dev' into opt_investigation 2021-09-14 01:15:51 -07:00
Nick Terrell
a418b4e478 [rsyncable] Ensure ZSTD_compressBound() is respected
In degenerate cases `--rsyncable` could create very small blocks (1
byte). This causes the compressed output to be larger than
`ZSTD_compressBound()`. Fix the issue by ensuring that rsyncable mode
never outputs blocks smaller than 128 KB.

The minimum job size is 512 KB, so we shouldn't lose many
synchronization points from skipping any that cause blocks smaller than
128 KB. And even if we do, that is fine, because we'll find the next
one.

This fixes the `raw_dictionary_round_trip` oss-fuzz assert.

Credit to OSS-Fuzz
2021-09-13 17:14:07 -07:00
Sen Huang
1daf3c8dbc Use 32 buckets for log2 bucketing in huffman sort 2021-09-13 12:29:16 -04:00
Yann Collet
f58e63bee7 Merge branch 'dev' into opt_investigation 2021-09-12 01:42:49 -07:00
Felix Handte
d68aa19a2f
Merge pull request #2749 from felixhandte/zstd-fast-pipelined
Pipelined Implementation of ZSTD_fast (~+5% Speed)
2021-09-09 17:05:30 -04:00
Yann Collet
b7f46ebc23 use ZSTD_memcpy() for better portability
notably within kernel space
2021-09-08 14:45:53 -07:00
Yann Collet
7fce9a41b5 change update rate to 12/11/11/11
better for large files, and sources with relatively "stable" entropy,
like silesia.tar.
slightly worse for files with rapidly changing entropy,
like Calgary.tar/.

Updated small files tests in fuzzer
2021-09-08 14:05:57 -07:00
Yann Collet
ef78611c26 change update rate to 11/10/10/10
better for larger blocks,
very small inefficiency on small block.
2021-09-08 08:58:28 -07:00
Yann Collet
42a3ed752a removed frequency booster for stat initialization of btultra2
used to be necessary to counter-balance the fixed-weight frequency update
which has been recently changed for an adaptive rate (targeting stable starting frequency stats).
2021-09-08 07:56:43 -07:00
Yann Collet
08ceda3dfc new statistics update policy
small general compression ratio improvement for btopt+ strategies/
2021-09-04 00:52:44 -07:00
Yann Collet
23a9368c45 new starting offcode table for zstd_opt 2021-09-03 17:41:42 -07:00
Yann Collet
27a8bbe265 new initializer for ll price 2021-09-03 16:07:31 -07:00
Yann Collet
f0fc8cb3e1 Disable console notification by default within the library
As a library, the default shouldn't be to write anything on console.
`cover` and `fastcover` have a `g_displayLevel` variable to control this behavior.
It's now set to 0 (no display) by default.
Setting notification to a higher level should be an explicit operation by a console application.
2021-09-03 13:44:07 -07:00