3949 Commits

Author SHA1 Message Date
TrianglesPCT
77d54eb3b3
Add files via upload 2021-05-14 16:40:32 -06:00
TrianglesPCT
52f44bb365
Add files via upload
msvc
2021-05-14 16:33:07 -06:00
TrianglesPCT
25bda9053a
Add files via upload
msvc suport
avx2 path
2021-05-14 16:32:04 -06:00
Nick Terrell
03c4111299 [lib] Fix dictionary invalidation logic
Call `ZSTD_enforceMaxDist()` before each block with the beginning of the
block. This ensures that `lowLimit` is updated to `dictLimit` whenever
the ext-dict is out of range, so we can use prefix mode for speed.

This can cause non-determinism because prefix mode and ext-dict mode
match finders can return different results. It can also hurt speed
because ext-dict match finders are slower.

The scenario is:
1. Compress large data with a dictionary.
2. The dictionary goes out of bounds, so we invalidate it.
3. However, we still have `lowLimit < dictLimit`, since it is
   never updated.
4. We will call the ext-dict match finder instead of the prefix one.
2021-05-13 17:05:59 -07:00
Nick Terrell
10b35b312b [lib] Fix off-by-one error in repcode checks
The repcode checks disallowed repcodes that are equal to `windowLow`.
This is slightly inefficient, but isn't a problem on its own. Together
with the next commit, it cause non-determinism.
2021-05-13 17:05:59 -07:00
Nick Terrell
91c9a247b6 [lib] Fix determinism bug in the optimal parser
`ZSTD_insertBt1()` has a speed optimization that skips the prefix of
very long matches.

40def70387/lib/compress/zstd_opt.c (L476)

This optimization is based off the length longest match found. However,
when indices are reset, we only ensure that we can reference the whole
window starting from `ip`. If the previous block ended with a long match
then `nextToUpdate` could be much less than `ip`. It might be far enough
back that `nextToUpdate < maxDist`, so it doesn't have a full window of
data to reference. This can cause non-determinism bugs, because we may
find a match that is beyond `ip - maxDist`, and may sometimes be
un-referencable, and that match triggers the speed optimization.

The fix is to base the `windowLow` off of the `target` of
`ZSTD_updateTree_internal()`, because anything below that value will be
obsolete by the time `ZSTD_updateTree_internal()` completes.
2021-05-13 17:05:59 -07:00
Yann Collet
8fae35591e Merge branch 'dev' of github.com:facebook/zstd into dev 2021-05-12 13:12:30 -07:00
Yann Collet
cb0cad9b79 reduce Max nb Workers to 64 in 32-bit mode
and restored limit to 256 when in 64-bit mode
(it was reduced to 200 to give more room for 32-bit).

This should fix test instability issues
using lot of threads in 32-bit environments.
2021-05-12 13:10:25 -07:00
sen
c730b8c5a3
Remove const data members in threadpooltest payload (#2639) (#2640) 2021-05-12 16:09:48 -04:00
sen
9c23ea9e2b
Bump version to 1.5.0, rebuild documentation (#2634) 2021-05-11 16:32:09 -04:00
Bernhard M. Wiedemann
28d0120b5a Avoid SIGBUS on armv6
When running armv6 userspace on armv8 hardware with a 64 bit Linux kernel,
the mode 2 caused SIGBUS (unaligned memory access).
Running all our arm builds in the build farm
only on armv8 simplifies administration a lot.

Depending on compiler and environment, this change might slow down
memory accesses (did not benchmark it). The original analysis is 6 years old.

Fixes #2632
2021-05-11 17:51:03 +02:00
Yann Collet
9fb5a0407c
Merge pull request #2630 from facebook/gcc9
improved gcc-9 and gcc-10 decoding speed
2021-05-10 10:54:16 -07:00
Yann Collet
334ac69db7
Merge pull request #2628 from skitt/libzstd-nomt-flags
Apply flags to libzstd-nomt in libzstd style
2021-05-08 00:21:59 -07:00
Yann Collet
439e58d060 improved gcc-9 and gcc-10 decoding speed
the new alignment setting is better for gcc-9 and gcc-10
by about ~+5%.

Unfortunately, it's worse for essentially all other compilers.

Make the new alignment setting conditional to gcc-9+.
2021-05-08 00:01:01 -07:00
Yann Collet
5b6d38a99e
Merge pull request #2547 from facebook/d_prefetch_refactor
Refactor prefetching for the decoding loop
2021-05-07 16:28:00 -07:00
Yann Collet
6755baf940 update decoder hot loop alignment
This seems to bring an additional ~+1.2% decompression speed
on average across 10 compilers x 6 scenarios.
2021-05-07 15:18:16 -07:00
Yann Collet
4d9caa4928 Merge branch 'd_prefetch_refactor' of github.com:facebook/zstd into d_prefetch_refactor 2021-05-07 11:30:44 -07:00
Yann Collet
1db5947591 improve decompression speed of long variant by ~+5%
changed strategy,
now unconditionally prefetch the first 2 cache lines,
instead of cache lines corresponding to the first and last bytes of the match.

This better corresponds to cpu expectation,
which should auto-prefetch following cachelines on detecting the sequential nature of the read.

This is globally positive, by +5%,
though exact gains depend on compiler (from -2% to +15%).
The only negative counter-example is gcc-9.
2021-05-07 11:26:14 -07:00
sen
13449d7ce1
Add PHONY targets to makefiles (#2629) 2021-05-07 14:03:19 -04:00
Nick Terrell
66772efe73
Merge pull request #2627 from terrelln/timeout-fix
[lib] Fix fuzzer timeouts by backing off overflow correction
2021-05-07 10:55:26 -07:00
sen
9e94b7cac5
Assert no divison by 0, correct superblocks 0 sequences case (#2592) 2021-05-07 13:26:56 -04:00
Yann Collet
a4d55c8748 Merge branch 'dev' into d_prefetch_refactor 2021-05-07 09:32:53 -07:00
sen
91465e23b2
[1.5.0] Enable multithreading in lib build by default (#2584)
* Update lib Makefile to have new targets

* Update lib/README.md for mt
2021-05-07 11:13:30 -04:00
Stephen Kitt
b2582de3c9
Apply flags to libzstd-nomt in libzstd style
... for consistency (this doesn't actually change the build flags used
in practice, currently).

Signed-off-by: Stephen Kitt <steve@sk2.org>
2021-05-07 13:25:27 +02:00
Nick Terrell
c2555f8c6f [lib] Fix fuzzer timeouts by backing off overflow correction
Linearly back off the frequency of overflow correction based on the
number of times the `ZSTD_window_t` has been overflow corrected. This
will still allow the fuzzer to quickly find overflow correction bugs,
while also keeping good speed for larger inputs.

Additionally, the `nbOverflowCorrections` variable can be useful for
debugging coredumps, since we can inspect the `ZSTD_CCtx` to see if
overflow correction has happened yet.

I've verified this fixes the timeouts in OSS-Fuzz (176 seconds -> 6
seconds). I've also verified that fuzzers and `fuzzer` and `zstreamtest`
still catch the row-hash overflow correction bug.
2021-05-06 22:03:41 -07:00
Yann Collet
ee425faaa7 Merge branch 'dev' into d_prefetch_refactor 2021-05-06 19:49:26 -07:00
Nick Terrell
b052b583e5 [lib] Fix UBSAN warning in ZSTD_decompressSequences() 2021-05-06 15:31:30 -07:00
sen
698f261b35
[1.5.0] Deprecate some functions (#2582)
* Add deprecated macro to zstd.h, mark certain functions as deprecated

* Remove ZSTD_compress.c dependencies on deprecated functions
2021-05-06 17:59:32 -04:00
Nick Terrell
2b82948e58
Merge pull request #2622 from terrelln/zdict-api
[zdict] Add a FAQ to the top of zdict.h
2021-05-06 12:42:56 -07:00
Nick Terrell
1874f0844d [zdict] Add a FAQ to the top of zdict.h
The FAQ covers the questions asked in Issue #2566. It first covers why
you would want to use a dictionary, then what a dictionary is, and
finally it tells you how to train a dictionary, and clarifies some of
the parameters.

There is definitely more that could be said about some of the advanced
trainers, but this should be a good start.
2021-05-06 12:48:19 -07:00
Nick Terrell
207e33bb61
Merge pull request #2616 from terrelln/deterministic-dict
[lib] Add ZSTD_c_deterministicRefPrefix
2021-05-06 11:09:22 -07:00
Nick Terrell
d2925de98a
Merge pull request #2615 from terrelln/stack-space
[lib] Move some ZSTD_CCtx_params off the stack
2021-05-05 19:43:39 -07:00
Nick Terrell
172b4b6ac4 [lib] Add ZSTD_c_deterministicRefPrefix
This flag forces zstd to always load the prefix in ext-dict mode, even
if it happens to be contiguous, to force determinism. It also applies to
dictionaries that are re-processed.

A determinism test case is also added, which fails without
`ZSTD_c_deterministicRefPrefix` and passes with it set.

Question: Should this be the default behavior? It isn't in this PR.
2021-05-05 18:49:56 -07:00
Nick Terrell
eb7e74ccb7 [tests] Set DEBUGLEVEL=2 by default
This allows us to quickly check for compile errors in debug log
messages, which are compiled out when `DEBUGLEVEL < 2`.
2021-05-05 13:29:06 -07:00
Nick Terrell
c2183d7cdf [lib] Move some ZSTD_CCtx_params off the stack
* Take `params` by const reference in `ZSTD_resetCCtx_internal()`.
* Add `simpleApiParams` to the CCtx and use them in the simple API
  functions, instead of creating those parameters on the stack.

I think this is a good direction to move in, because we shouldn't need
to worry about adding parameters to `ZSTD_CCtx_params`, since it should
always be on the heap (unless they become absoultely gigantic).

Some `ZSTD_CCtx_params` are still on the stack in the CDict functions,
but I've left them for now, because it was a little more complex, and we
don't use those functions in stack-constrained currently.
2021-05-05 13:25:16 -07:00
Yann Collet
7ef6d7b36c deeper prefetching pipeline for decompressSequencesLong
pipeline increased from 4 to 8 slots.
This change substantially improves decompression speed when there are long distance offsets.
example with enwik9 compressed at level 22 :
gcc-9 : 947 -> 1039 MB/s
clang-10: 884 -> 946 MB/s

I also checked the "cold dictionary" scenario,
and found a smaller benefit, around ~2%
(measurements are more noisy for this scenario).
2021-05-05 10:04:03 -07:00
Yann Collet
8cde167a27 Merge branch 'dev' into d_prefetch_refactor 2021-05-05 09:13:38 -07:00
Yann Collet
455fd1a067 updated documentation regarding minimum job size 2021-05-05 09:03:11 -07:00
Yann Collet
c077f257b4
Merge pull request #2611 from facebook/smallerJobs
allow jobSize to be as low as 512 KB
2021-05-05 00:03:29 -07:00
Nick Terrell
8389a5122b
Merge pull request #2602 from terrelln/ldm-opt
[LDM] Speed optimization on repetitive data
2021-05-04 23:13:09 -07:00
Nick Terrell
d40f55cd95
Merge pull request #2610 from senhuang42/lazy_underflow_fix
Fix bad integer wraparound in repcode index for fast, dfast, lazy
2021-05-04 23:10:23 -07:00
Nick Terrell
0b88c2582c [test] Add large dict/data --patch-from test
Dictionary size must be > `ZSTD_CHUNKSIZE_MAX`.
2021-05-04 17:31:32 -07:00
Sen Huang
e6c8a5dd40 Fix incorrect usages of repIndex across all strategies 2021-05-04 19:50:55 -04:00
Nick Terrell
94db4398a0 [lib] Always load the dictionary in one go
Dictionaries larger than `ZSTD_CHUNKSIZE_MAX` used to have to be loaded
in multiple segments. Instead, when we detect large dictionaries, ensure
that we reset the context's indicies. Then, for dictionaries larger than
`ZSTD_CURRENT_MAX - 1`, only load the suffix of the dictionary. Finally,
enable DDS for large dictionaries, since we no longer load in multiple
segments.

This simplifes the dictionary loading code, and reduces opportunities
for non-determinism to slip in.
2021-05-04 16:45:25 -07:00
Yann Collet
1026b9fa10 fix rsyncable mode 2021-05-04 15:59:27 -07:00
Nick Terrell
8a8899fc08
Merge pull request #2612 from terrelln/minor-fix
[easy] Rewrite rowHashLog computation
2021-05-04 15:02:00 -07:00
Yann Collet
40cabd0efd
Merge pull request #2608 from facebook/docMinVer
Documented minimum version numbers
2021-05-04 12:10:52 -07:00
Nick Terrell
1ffa80a09e [easy] Rewrite rowHashLog computation
`ZSTD_highbit32(1u << x) == x` when it isn't undefined behavior.
2021-05-04 11:43:20 -07:00
Nick Terrell
a8ecf4ff88
Merge pull request #2597 from terrelln/public-headers
[1.5.0] Move `zstd_errors.h` and `zdict.h` to `lib/` root
2021-05-04 11:28:41 -07:00
Yann Collet
8f86c29c06 allow jobSize to be as low as 512 KB
previous lower limit was 1 MB.

Note : by default, the lowest job size is 2 MB, achieved at level 1.
Even lower job sizes can be achieved by manipulating this value directly,
or manually modifying window sizes to lower amounts.

Updated unit test to ensure that this new limit works fine
(test would fail with previous 1 MB limit).
2021-05-04 11:02:55 -07:00