10208 Commits

Author SHA1 Message Date
Yann Collet
9f58241dcc updated version number to v1.5.5
also : updated man pages
2023-03-31 23:02:08 -07:00
Yann Collet
c45eddfa40
Merge pull request #3584 from facebook/fix_o_blockdev
fix decompression with -o writing into a block device
2023-03-31 23:01:50 -07:00
daniellerozenblit
fcaf06ddb4
Check that dest is valid for decompression (#3555)
* add check for valid dest buffer and fuzz on random dest ptr when malloc 0

* add uptrval to linux-kernel

* remove bin files

* get rid of uptrval

* restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)
2023-03-31 23:00:55 -07:00
Yann Collet
14d0cd5d69 do not add invocation of UTIL_isRegularFile() 2023-03-31 13:09:52 -07:00
Yann Collet
7b828aaeb5
Merge pull request #3581 from facebook/seekable_readOpt
Seekable format read optimization
2023-03-31 12:26:16 -07:00
Yann Collet
f33a4068b1
Merge pull request #3579 from facebook/clangclwintest
added a Clang-CL Windows test to CI
2023-03-31 12:26:01 -07:00
Yann Collet
c1024af3e3
Merge pull request #3540 from dvoropaev/tests_timeout
Increase tests timeout
2023-03-31 12:25:38 -07:00
Yann Collet
bb7fbd56a6
Merge pull request #3576 from zhuhan0/dev
Couple tweaks to improve decompression speed with clang PGO compilation
2023-03-31 12:25:13 -07:00
Yann Collet
5bf1359e3b fix decompression with -o writing into a block device
decompression features automatic support of sparse files,
aka a form of "compression" where entire blocks consists only of zeroes.
This only works for some compatible file systems (like ext4),
others simply ignore it (like afs).

Triggering this feature relies of `fseek()`.
But `fseek()` is not compatible with non-seekable devices, such as pipes.
Therefore it's disabled for pipes.

However, there are other objects which are not compatible with `fseek()`, such as block devices.

Changed the logic, so that `fseek()` (and therefore sparse write) is only automatically enabled on regular files.

Note that this automatic behavior can always be overridden by explicit commands `--sparse` and `--no-sparse`.

fix #3583
2023-03-31 11:29:16 -07:00
Yoni Gilad
649a9c85c3 seekable_format: Add unit test for multiple decompress calls
This does the following:
1. Compress test data into multiple frames
2. Perform a series of small decompressions and seeks forward, checking
   that compressed data wasn't reread unnecessarily.
3. Perform some seeks forward and backward to ensure correctness.
2023-03-29 21:35:52 -07:00
Yoni Gilad
618bf84e0d seekable_format: Prevent rereading frame when seeking forward
When decompressing a seekable file, if seeking forward within
a frame (by issuing multiple ZSTD_seekable_decompress calls
with a small gap between them), the frame will be unnecessarily
reread from the beginning. This patch makes it continue using
the current frame data and simply skip over the unneeded bytes.
2023-03-29 21:24:12 -07:00
Yann Collet
0f77956bcc added a Clang-CL Windows test to CI
If I understand correctly,
this should trigger the issue notified in #3569.
2023-03-28 22:06:18 -07:00
Yann Collet
871f3a4026
Merge pull request #3569 from tru/linker_flag_fix
Disable linker flag detection on MSVC/ClangCL.
2023-03-28 22:05:07 -07:00
Yann Collet
262e553b23
Merge pull request #3573 from facebook/dependabot/github_actions/github/codeql-action-2.2.8
Bump github/codeql-action from 2.2.6 to 2.2.8
2023-03-28 16:59:41 -07:00
daniellerozenblit
b2ad17a658
mmap for windows (#3557)
* mmap for windows

* remove enabling mmap for testing

* rename FIO dictionary initialization methods + un-const dictionary objects in free functions

* remove enabling mmap for testing

* initDict returns void, underlying setDictBuffer methods return the size of the set buffer

* fix comment
2023-03-28 19:44:53 -04:00
Han Zhu
b558190ac7 Remove clang-only branch hints from ZSTD_decodeSequence
Looking at the __builtin_expect in ZSTD_decodeSequence:

{   size_t offset;
    #if defined(__clang__)
 if (LIKELY(ofBits > 1)) {
    #else
 if (ofBits > 1) {
    #endif
 ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);

From profile-annotated assembly, the probability of ofBits > 1 is about 75%
(101k counts out of 135k counts). This is much smaller than the recommended
likelihood to use __builtin_expect which is 99%. As a result, clang moved the
else block further away which hurts cache locality. Removing this
__built_expect along with two others in ZSTD_decodeSequence gave better
performance when PGO is enabled. I suggest to remove these branch hints and
rely on PGO which leverages runtime profiles from actual workload to calculate
branch probability instead.
2023-03-28 15:36:22 -07:00
Han Zhu
e6dccbf482 Inline BIT_reloadDStream
Inlining `BIT_reloadDStream` provided >3% decompression speed improvement for
clang PGO-optimized zstd binary, measured using the Silesia corpus with
compression level 1. The win comes from improved register allocation which leads
to fewer spills and reloads. Take a look at this comparison of
profile-annotated hot assembly before and after this change:
https://www.diffchecker.com/UjDGIyLz/. The diff is a bit messy, but notice three
fewer moves after inlining.

In general LLVM's register allocator works better when it can see more code. For
example, when the register allocator sees a call instruction, it partitions the
registers into caller registers and callee registers, and it is not free to do
whatever it wants with all the registers for the current function. Inlining the
callee lets the register allocation access all registers and use them more
flexsibly.
2023-03-28 15:36:02 -07:00
Elliot Gorokhovsky
57e1b45920
Merge pull request #3551 from embg/seq_prod_fuzz
Provide an interface for fuzzing sequence producer plugins
2023-03-28 14:20:54 -07:00
Elliot Gorokhovsky
a810e1eeb7 Provide an interface for fuzzing sequence producer plugins 2023-03-28 12:02:57 -07:00
Yann Collet
abb3585c3b
Merge pull request #3568 from facebook/readme_cmake_fat
Add instructions for building Universal2 on macOS via CMake
2023-03-28 10:48:39 -07:00
Felix Handte
93da0416e8
Merge pull request #3574 from felixhandte/pzstd-max-cpp-std
[contrib/pzstd] Detect and Select Maximum Available C++ Standard
2023-03-27 16:44:07 -07:00
W. Felix Handte
cbe0f0e435 Switch Strategies: Only Set -std=c++11 When Default is Older 2023-03-27 18:37:19 -04:00
Yann Collet
c36d54f5ed
Update README.md
fix minor doc mistake (`ninja build` doesn't work)
2023-03-27 09:09:22 -07:00
Yann Collet
7306832e8a
Merge pull request #3570 from facebook/rsync_doc
[easy] minor doc update for --rsyncable
2023-03-27 09:07:13 -07:00
W. Felix Handte
1b8bddc41e [contrib/pzstd] Detect and Select Maximum Available C++ Standard
Rather than remove the flag entirely, as proposed in #3499, this commit uses
the newest C++ standard the compiler supports. This retains the selection of
using only standardized features (excluding GNU extensions) and keeps the
recency requirements of the codebase explicit.

Tested with various versions of `g++` and `clang++`.
2023-03-27 11:24:47 -04:00
Yann Collet
167157dd74
Merge pull request #3572 from facebook/dependabot/github_actions/actions/checkout-3.5.0
Bump actions/checkout from 3.3.0 to 3.5.0
2023-03-27 07:07:55 -07:00
dependabot[bot]
191d22994f
Bump github/codeql-action from 2.2.6 to 2.2.8
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.6 to 2.2.8.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](16964e90ba...67a35a0858)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-27 06:06:11 +00:00
dependabot[bot]
4cf9c7e098
Bump actions/checkout from 3.3.0 to 3.5.0
Bumps [actions/checkout](https://github.com/actions/checkout) from 3.3.0 to 3.5.0.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](ac59398561...8f4b7f8486)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-27 06:06:05 +00:00
Yann Collet
35c0c2075e minor doc update on --rsyncable
as requested by @devZer0.
fix #3567
2023-03-23 15:42:27 -06:00
Rick Mark
ca799f84ca Merge branch 'readme_cmake_fat' of github.com:facebook/zstd into readme_cmake_fat 2023-03-23 09:44:06 -07:00
Rick Mark
408bd1e9fe Add instructions for building Universal2 on macOS via CMake 2023-03-23 09:41:31 -07:00
Tobias Hieta
979b047114 Disable linker flag detection on MSVC/ClangCL.
This fixes compilation with clang-cl on Windows. There
is a bug in cmake so that check_linker_flag() doesn't give
the correct result when using link.exe/lld-link.exe.

Details in CMake's gitlab: https://gitlab.kitware.com/cmake/cmake/-/issues/22023

Fixes #3522
2023-03-22 22:13:57 +01:00
Rick Mark
82cf6037ac Add instructions for building Universal2 on macOS via CMake 2023-03-22 11:28:03 -07:00
daniellerozenblit
3e0550ee52
fix window update (#3556) 2023-03-21 13:28:26 -04:00
Nick Terrell
a3c3a38b9b [lazy] Skip over incompressible data
Every 256 bytes the lazy match finders process without finding a match,
they will increase their step size by 1. So for bytes [0, 256) they search
every position, for bytes [256, 512) they search every other position,
and so on. However, they currently still insert every position into
their hash tables. This is different from fast & dfast, which only
insert the positions they search.

This PR changes that, so now after we've searched 2KB without finding
any matches, at which point we'll only be searching one in 9 positions,
we'll stop inserting every position, and only insert the positions we
search. The exact cutoff of 2KB isn't terribly important, I've just
selected a cutoff that is reasonably large, to minimize the impact on
"normal" data.

This PR only adds skipping to greedy, lazy, and lazy2, but does not
touch btlazy2.

| Dataset | Level | Compiler     | CSize ∆ | Speed ∆ |
|---------|-------|--------------|---------|---------|
| Random  |     5 | clang-14.0.6 |    0.0% |   +704% |
| Random  |     5 | gcc-12.2.0   |    0.0% |   +670% |
| Random  |     7 | clang-14.0.6 |    0.0% |   +679% |
| Random  |     7 | gcc-12.2.0   |    0.0% |   +657% |
| Random  |    12 | clang-14.0.6 |    0.0% |  +1355% |
| Random  |    12 | gcc-12.2.0   |    0.0% |  +1331% |
| Silesia |     5 | clang-14.0.6 | +0.002% |  +0.35% |
| Silesia |     5 | gcc-12.2.0   | +0.002% |  +2.45% |
| Silesia |     7 | clang-14.0.6 | +0.001% |  -1.40% |
| Silesia |     7 | gcc-12.2.0   | +0.007% |  +0.13% |
| Silesia |    12 | clang-14.0.6 | +0.011% | +22.70% |
| Silesia |    12 | gcc-12.2.0   | +0.011% |  -6.68% |
| Enwik8  |     5 | clang-14.0.6 |    0.0% |  -1.02% |
| Enwik8  |     5 | gcc-12.2.0   |    0.0% |  +0.34% |
| Enwik8  |     7 | clang-14.0.6 |    0.0% |  -1.22% |
| Enwik8  |     7 | gcc-12.2.0   |    0.0% |  -0.72% |
| Enwik8  |    12 | clang-14.0.6 |    0.0% | +26.19% |
| Enwik8  |    12 | gcc-12.2.0   |    0.0% |  -5.70% |

The speed difference for clang at level 12 is real, but is probably
caused by some sort of alignment or codegen issues. clang is
significantly slower than gcc before this PR, but gets up to parity with
it.

I also measured the ratio difference for the HC match finder, and it
looks basically the same as the row-based match finder. The speedup on
random data looks similar. And performance is about neutral, without the
big difference at level 12 for either clang or gcc.
2023-03-20 11:18:29 -07:00
Peter Pentchev
3b001a38fe Simplify line splitting in the CLI tests 2023-03-20 11:17:43 -07:00
Peter Pentchev
29b8a3d8f2 Fix a Python bytes/int mismatch in CLI tests
In Python 3.x, a single element of a bytes array is returned as
an integer number. Thus, NEWLINE is an int variable, and attempting
to add it to the line array will fail with a type mismatch error
that may be demonstrated as follows:

    [roam@straylight ~]$ python3 -c 'b"hello" + b"\n"[0]'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    TypeError: can't concat int to bytes
    [roam@straylight ~]$
2023-03-20 11:17:43 -07:00
Yann Collet
e2208242ac
Merge pull request #3553 from facebook/ldm_dict
added documentation for LDM + dictionary compatibility
2023-03-16 11:20:32 -07:00
Nick Terrell
fbd97f305a Deprecated bufferless and block level APIs
* Mark all bufferless and block level functions as deprecated
* Update documentation to suggest not using these functions
* Add `_deprecated()` wrappers for functions that we use internally and
  call those instead
2023-03-16 10:04:15 -07:00
daniellerozenblit
53bad103ce
patch-from speed optimization (#3545)
* patch-from speed optimization: only load portion of dictionary into normal matchfinders

* test regression for x8 multiplier

* fix off-by-one error for bit shift bound

* restrict patchfrom speed optimization to strategy < ZSTD_btultra

* update results.csv

* update regression test
2023-03-14 20:36:56 -04:00
Yann Collet
f4563d87b9 added documentation for LDM + dictionary compatibility 2023-03-14 17:17:21 -07:00
Yann Collet
488e45f38b
Merge pull request #3547 from facebook/seekable_doc
added documentation for the seekable format
2023-03-13 20:25:58 -07:00
Yonatan Komornik
91f4c23e63
Add salt into row hash (#3528 part 2) (#3533)
Part 2 of #3528

Adds hash salt that helps to avoid regressions where consecutive compressions use the same tag space with similar data (running zstd -b5e7 enwik8 -B128K reproduces this regression).
2023-03-13 15:34:13 -07:00
Yonatan Komornik
9420bce8a4
Add init once memory (#3528) (#3529)
- Adds memory type that is guaranteed to have been initialized at least once in the workspace's lifetime.
- Changes tag space in row hash to be based on init once memory.
2023-03-13 13:20:49 -07:00
dependabot[bot]
e2965edd10
Bump github/codeql-action from 2.2.5 to 2.2.6 (#3549)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.5 to 2.2.6.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](32dc499307...16964e90ba)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-13 10:07:20 -07:00
Yonatan Komornik
a91e91d614
[Bugfix] row hash tries to match position 0 (#3548)
#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag.
Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.
2023-03-13 10:00:03 -07:00
Yann Collet
dd8cb5a0f1 added documentation for the seekable format
and notably provide additional context for the
Maximum Frame Size parameter.

requested by @P-E-Meunier
at 1df9f36c6c (commitcomment-103856979).
2023-03-10 15:54:31 -08:00
Yonatan Komornik
33e39094e7
Reduce RowHash's tag space size by x2 (#3543)
Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index).
The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).
2023-03-10 14:15:04 -08:00
Yann Collet
134d332b10
Merge pull request #3544 from facebook/seek_faster
Improved seekable format ingestion speed for small frame size
2023-03-10 12:33:33 -08:00
Yann Collet
1df9f36c6c Improved seekable format ingestion speed for small frame size
As reported by @P-E-Meunier in https://github.com/facebook/zstd/issues/2662#issuecomment-1443836186,
seekable format ingestion speed can be particularly slow
when selected `FRAME_SIZE` is very small,
especially in combination with the recent row_hash compression mode.
The specific scenario mentioned was `pijul`,
using frame sizes of 256 bytes and level 10.

This is improved in this PR,
by providing approximate parameter adaptation to the compression process.

Tested locally on a M1 laptop,
ingestion of `enwik8` using `pijul` parameters
went from 35sec. (before this PR) to 2.5sec (with this PR).
For the specific corner case of a file full of zeroes,
this is even more pronounced, going from 45sec. to 0.5sec.

These benefits are unrelated to (and come on top of) other improvement efforts currently being made by @yoniko for the row_hash compression method specifically.

The `seekable_compress` test program has been updated to allows setting compression level,
in order to produce these performance results.
2023-03-09 18:00:30 -08:00