8751 Commits

Author SHA1 Message Date
sen
d6be7659b0
Add seekable roundtrip fuzzer (#2617) 2021-05-06 10:08:21 -04:00
Yann Collet
26c7b0038e
Merge pull request #2619 from facebook/winbench
improved benchmark experience on Windows
2021-05-05 20:34:31 -07:00
Nick Terrell
d2925de98a
Merge pull request #2615 from terrelln/stack-space
[lib] Move some ZSTD_CCtx_params off the stack
2021-05-05 19:43:39 -07:00
Yann Collet
fed8589430
Merge pull request #2614 from facebook/dlong8
faster speed for decompressSequencesLong
2021-05-05 16:55:40 -07:00
Yann Collet
9750f3c87b improved benchmark experience on Windows
benchmark results are not progressively displayed on Windows terminal.
For long benchmark sessions, nothing is displayed,
until the end, where everything is flushed.

Force display to be flushed after each update.
Updates happen roughtly every second, or even less,
so it's not a substantial workload.
2021-05-05 16:52:21 -07:00
Felix Handte
b062d97520
Merge pull request #2525 from felixhandte/fix-file-permissions-again
Improve Setting Permissions of Created Files
2021-05-05 17:59:13 -04:00
Nick Terrell
eb7e74ccb7 [tests] Set DEBUGLEVEL=2 by default
This allows us to quickly check for compile errors in debug log
messages, which are compiled out when `DEBUGLEVEL < 2`.
2021-05-05 13:29:06 -07:00
Nick Terrell
c2183d7cdf [lib] Move some ZSTD_CCtx_params off the stack
* Take `params` by const reference in `ZSTD_resetCCtx_internal()`.
* Add `simpleApiParams` to the CCtx and use them in the simple API
  functions, instead of creating those parameters on the stack.

I think this is a good direction to move in, because we shouldn't need
to worry about adding parameters to `ZSTD_CCtx_params`, since it should
always be on the heap (unless they become absoultely gigantic).

Some `ZSTD_CCtx_params` are still on the stack in the CDict functions,
but I've left them for now, because it was a little more complex, and we
don't use those functions in stack-constrained currently.
2021-05-05 13:25:16 -07:00
W. Felix Handte
4f9c6fdb7f Attempt to Fix Windows Build Error 2021-05-05 13:13:56 -04:00
W. Felix Handte
da61918c75 Also Pass Mode Bits in on Windows
I think in some unix emulation environments on Windows, (cygwin?) mode bits
are somehow respected. So we might as well pass them in. Can't hurt.
2021-05-05 13:10:34 -04:00
W. Felix Handte
bea1b2ba70 rm -f in playTests.sh 2021-05-05 13:10:34 -04:00
W. Felix Handte
45c4918ccf Fix Build for Windows 2021-05-05 13:10:34 -04:00
W. Felix Handte
018ed6552a Attempt to Fix stat Format for BSDs 2021-05-05 13:10:34 -04:00
W. Felix Handte
1fb10ba831 Don't Block Removing File on Being Able to Read It
`open()`'s mode bits are only applied to files that are created by the call.
If the output file already exists, but is not readable, the `fopen()` would
fail, preventing us from removing it, which would mean that the file would
not end up with the correct permission bits.

It's not clear to me why the `fopen()` is there at all. `UTIL_isRegularFile()`
should be sufficient, AFAICT.
2021-05-05 13:10:34 -04:00
W. Felix Handte
b87f97b3ea Create Files with Desired Permissions; Avoid chmod(); Remove UTIL_chmod() 2021-05-05 13:10:34 -04:00
W. Felix Handte
4e10ff15f5 Add Tests Checking File Permissions of Created Files 2021-05-05 13:10:34 -04:00
Felix Handte
2d10544b84
Merge pull request #2613 from felixhandte/allow-block-device
Allow Reading from Block Devices with `--force`
2021-05-05 13:06:32 -04:00
Yann Collet
7ef6d7b36c deeper prefetching pipeline for decompressSequencesLong
pipeline increased from 4 to 8 slots.
This change substantially improves decompression speed when there are long distance offsets.
example with enwik9 compressed at level 22 :
gcc-9 : 947 -> 1039 MB/s
clang-10: 884 -> 946 MB/s

I also checked the "cold dictionary" scenario,
and found a smaller benefit, around ~2%
(measurements are more noisy for this scenario).
2021-05-05 10:04:03 -07:00
Yann Collet
455fd1a067 updated documentation regarding minimum job size 2021-05-05 09:03:11 -07:00
Azat Khuzhin
53a60e98de
seekable decompression fixes (#2594)
* seekable_format: fix from-file reading (not in-memory)

It tries to check the buffer boundary, but there is no buffer for
from-file reading.

* seekable_decompression: break when ZSTD_seekable_decompress() returns zero

* seekable_decompression_mem: break when ZSTD_seekable_decompress() returns zero

* seekable_format: cap the offset+len up to the last dOffset

This will allow to read the whole file w/o gotting corruption error if
the offset is more then the data left in file, i.e.:

    $ ./seekable_compression seekable_compression.c 8192 | head
    $ zstd -cdq seekable_compression.c.zst | wc -c
    4737

Before this patch:

    $ ./seekable_decompression seekable_compression.c.zst 0 10000000 | wc -c
    ZSTD_seekable_decompress() error : Corrupted block detected
    0

After:

    $ ./seekable_decompression seekable_compression.c.zst 0 10000000 | wc -c
    4737
2021-05-05 10:05:41 -04:00
Yann Collet
c077f257b4
Merge pull request #2611 from facebook/smallerJobs
allow jobSize to be as low as 512 KB
2021-05-05 00:03:29 -07:00
Nick Terrell
8389a5122b
Merge pull request #2602 from terrelln/ldm-opt
[LDM] Speed optimization on repetitive data
2021-05-04 23:13:09 -07:00
Nick Terrell
d40f55cd95
Merge pull request #2610 from senhuang42/lazy_underflow_fix
Fix bad integer wraparound in repcode index for fast, dfast, lazy
2021-05-04 23:10:23 -07:00
Nick Terrell
10e5513113
Merge pull request #2607 from terrelln/deterministic-dict
[lib] Always load the dictionary in one go
2021-05-04 22:48:48 -07:00
Nick Terrell
0b88c2582c [test] Add large dict/data --patch-from test
Dictionary size must be > `ZSTD_CHUNKSIZE_MAX`.
2021-05-04 17:31:32 -07:00
Sen Huang
e6c8a5dd40 Fix incorrect usages of repIndex across all strategies 2021-05-04 19:50:55 -04:00
Nick Terrell
94db4398a0 [lib] Always load the dictionary in one go
Dictionaries larger than `ZSTD_CHUNKSIZE_MAX` used to have to be loaded
in multiple segments. Instead, when we detect large dictionaries, ensure
that we reset the context's indicies. Then, for dictionaries larger than
`ZSTD_CURRENT_MAX - 1`, only load the suffix of the dictionary. Finally,
enable DDS for large dictionaries, since we no longer load in multiple
segments.

This simplifes the dictionary loading code, and reduces opportunities
for non-determinism to slip in.
2021-05-04 16:45:25 -07:00
Yann Collet
1026b9fa10 fix rsyncable mode 2021-05-04 15:59:27 -07:00
W. Felix Handte
e58e9c7928 Add Test Case (Behind Flag); Run in GitHub Action 2021-05-04 18:43:39 -04:00
Nick Terrell
8a8899fc08
Merge pull request #2612 from terrelln/minor-fix
[easy] Rewrite rowHashLog computation
2021-05-04 15:02:00 -07:00
W. Felix Handte
33f3e293e8 Allow Reading from Block Devices with --force 2021-05-04 16:25:26 -04:00
Yann Collet
40cabd0efd
Merge pull request #2608 from facebook/docMinVer
Documented minimum version numbers
2021-05-04 12:10:52 -07:00
Nick Terrell
1ffa80a09e [easy] Rewrite rowHashLog computation
`ZSTD_highbit32(1u << x) == x` when it isn't undefined behavior.
2021-05-04 11:43:20 -07:00
Nick Terrell
a8ecf4ff88
Merge pull request #2597 from terrelln/public-headers
[1.5.0] Move `zstd_errors.h` and `zdict.h` to `lib/` root
2021-05-04 11:28:41 -07:00
Felix Handte
da74f1c717
Merge pull request #2609 from felixhandte/md5sum-darwin
Detect Presence of `md5` on Darwin
2021-05-04 14:22:54 -04:00
Yann Collet
8f86c29c06 allow jobSize to be as low as 512 KB
previous lower limit was 1 MB.

Note : by default, the lowest job size is 2 MB, achieved at level 1.
Even lower job sizes can be achieved by manipulating this value directly,
or manually modifying window sizes to lower amounts.

Updated unit test to ensure that this new limit works fine
(test would fail with previous 1 MB limit).
2021-05-04 11:02:55 -07:00
Nick Terrell
32823bc150 [LDM] Speed optimization on repetitive data
LDM does especially poorly on repetitive data when that data's hash happens
to have `(hash & stopMask) == 0`. Either because the `stopMask == 0` or
random chance. Optimize this case by skipping over repetitive patterns.
The detection is very simplistic, but should catch most of the offending
cases.

```
head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
      21.187881087 seconds time elapsed

head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
       1.149707921 seconds time elapsed

```
2021-05-04 10:57:42 -07:00
W. Felix Handte
ee122baacf Detect Presence of md5 on Darwin
This fixes #2568.
2021-05-04 12:33:19 -04:00
Yann Collet
8aafbd3604 Documented minimum version numbers
Any stable API entry point introduced after v1.0
should be documented with its minimum version number.

Since PR fixes this requirement
updating mostly new entry points since v1.4.0
and newly introduced ones for future v1.5.0.
2021-05-04 09:05:22 -07:00
Nick Terrell
6f40571ae2
Merge pull request #2606 from terrelln/test-memory
[tests] Reduce memory usage of MT CLI tests
2021-05-03 21:16:28 -07:00
Nick Terrell
0b370e9da8
Merge pull request #2603 from terrelln/reduce-indices-fuzzer
Bug fix & run overflow correction much more frequently in tests
2021-05-03 19:24:55 -07:00
Nick Terrell
2e4fca38d8 [tests] Reduce memory usage of MT CLI tests
Switch from `-T0` to the default `-T1` which significantly reduces
memory usage for level 19 when there are many cores. This fixes
32-bit issues of running out of address space.

Fixes #2603.
2021-05-03 16:29:11 -07:00
Nick Terrell
34aff7ea06 Bug fix & run overflow correction much more frequently in tests
* Fix overflow correction when `windowLog < cycleLog`. Previously, we
  got the correction wrong in this case, and our chain tables and binary
  trees would be corrupted. Now, we work as long as `maxDist` is a power
  of two, by adding `MAX(maxDist, cycleSize)` to our indices.
* When `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` is defined to non-zero
  run overflow correction as frequently as allowed without impacting
  compression ratio.
* Enable `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` in `fuzzer` and
  `zstreamtest` as well as all the OSS-Fuzz fuzzers. This has a 5-10%
  speed penalty at most, which seems reasonable.
2021-05-03 15:21:47 -07:00
sen
cc31bb8b66
Merge pull request #2598 from senhuang42/reduce_index_rowhash_fix
Fix chaintable check to include rowhash in ZSTD_reduceIndex()
2021-05-03 17:34:39 -04:00
sen
4c5cc345fb
Merge pull request #2581 from senhuang42/lcm_stable
[1.5.0] Promote ZSTD_c_literalCompressionMode to stable params
2021-05-03 11:59:19 -04:00
sen
cdc979ddb3
Merge pull request #2580 from senhuang42/defaultclevel_to_stable
[1.5.0] Promote ZSTD_defaultCLevel() into stable API
2021-05-03 11:59:05 -04:00
senhuang42
61fe571af6 Fix chaintable check to include rowhash in ZSTD_reduceIndex() 2021-04-30 19:52:04 -04:00
Nick Terrell
09149beaf8 [1.5.0] Move zstd_errors.h and zdict.h to lib/ root
`zstd_errors.h` and `zdict.h` are public headers, so they deserve to be
in the root `lib/` directory with `zstd.h`, not mixed in with our private
headers.
2021-04-30 15:13:54 -07:00
Nick Terrell
0e2345b859
Merge pull request #2593 from terrelln/linux-comments
[linux-kernel] Replace kernel-style comments
2021-04-29 17:15:40 -07:00
Nick Terrell
fbb9006e18 [linux-kernel] Replace kernel-style comments
Replace kernel-style comments with regular comments.

E.g.

```
/** Before */

/* After */

/**
 * Before
 */

/*
 * After
 */

/***********************************
 * Before
 ***********************************/

/* *********************************
 * After
 ***********************************/
```
2021-04-29 15:50:23 -07:00