10316 Commits

Author SHA1 Message Date
Yonatan Komornik
91f4c23e63
Add salt into row hash (#3528 part 2) (#3533)
Part 2 of #3528

Adds hash salt that helps to avoid regressions where consecutive compressions use the same tag space with similar data (running zstd -b5e7 enwik8 -B128K reproduces this regression).
2023-03-13 15:34:13 -07:00
Yonatan Komornik
9420bce8a4
Add init once memory (#3528) (#3529)
- Adds memory type that is guaranteed to have been initialized at least once in the workspace's lifetime.
- Changes tag space in row hash to be based on init once memory.
2023-03-13 13:20:49 -07:00
dependabot[bot]
e2965edd10
Bump github/codeql-action from 2.2.5 to 2.2.6 (#3549)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.5 to 2.2.6.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](32dc499307...16964e90ba)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-13 10:07:20 -07:00
Yonatan Komornik
a91e91d614
[Bugfix] row hash tries to match position 0 (#3548)
#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag.
Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.
2023-03-13 10:00:03 -07:00
Yann Collet
dd8cb5a0f1 added documentation for the seekable format
and notably provide additional context for the
Maximum Frame Size parameter.

requested by @P-E-Meunier
at 1df9f36c6c (commitcomment-103856979).
2023-03-10 15:54:31 -08:00
Yonatan Komornik
33e39094e7
Reduce RowHash's tag space size by x2 (#3543)
Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index).
The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).
2023-03-10 14:15:04 -08:00
Yann Collet
134d332b10
Merge pull request #3544 from facebook/seek_faster
Improved seekable format ingestion speed for small frame size
2023-03-10 12:33:33 -08:00
Yann Collet
1df9f36c6c Improved seekable format ingestion speed for small frame size
As reported by @P-E-Meunier in https://github.com/facebook/zstd/issues/2662#issuecomment-1443836186,
seekable format ingestion speed can be particularly slow
when selected `FRAME_SIZE` is very small,
especially in combination with the recent row_hash compression mode.
The specific scenario mentioned was `pijul`,
using frame sizes of 256 bytes and level 10.

This is improved in this PR,
by providing approximate parameter adaptation to the compression process.

Tested locally on a M1 laptop,
ingestion of `enwik8` using `pijul` parameters
went from 35sec. (before this PR) to 2.5sec (with this PR).
For the specific corner case of a file full of zeroes,
this is even more pronounced, going from 45sec. to 0.5sec.

These benefits are unrelated to (and come on top of) other improvement efforts currently being made by @yoniko for the row_hash compression method specifically.

The `seekable_compress` test program has been updated to allows setting compression level,
in order to produce these performance results.
2023-03-09 18:00:30 -08:00
Felix Handte
d55a6483d7
Merge pull request #3542 from felixhandte/pin-moar-action-deps
Pin Moar Action Dependencies
2023-03-09 16:22:11 -08:00
W. Felix Handte
cd9486031d Also Pin Dockerfile Dependency Hashes 2023-03-09 17:01:22 -05:00
Felix Handte
283c228abe
Merge pull request #3541 from felixhandte/fix-setvbuf-segfault
Avoid Segfault Caused by Calling `setvbuf()` on Null File Pointer
2023-03-09 13:54:11 -08:00
Yann Collet
e769da1645
Merge pull request #3526 from facebook/bench_zstd_api
Simplify benchmark unit invocation API from CLI
2023-03-09 13:11:11 -08:00
Yann Collet
6bedef8095
Merge pull request #3538 from facebook/doc_huffman
added clarifications for sizes of compressed huffman blocks and streams.
2023-03-09 13:09:42 -08:00
daniellerozenblit
e0fc9fd90b
Merge pull request #3486 from daniellerozenblit/patch-from-low-memory-mode
Mmap large dictionaries in patch-from mode
2023-03-09 15:30:09 -05:00
Nick Terrell
c40c7378c6 Clarify dstCapacity requirements
Clarify `dstCapacity` requirements for single-pass functions.

Fixes #3524.
2023-03-09 10:18:30 -08:00
W. Felix Handte
1ec556238e Pin Moar Action Dependencies
An offering to the Scorecard gods, may they have mercy on our souls.
2023-03-09 12:54:07 -05:00
W. Felix Handte
957a0ae52d Add CLI Test 2023-03-09 12:48:11 -05:00
W. Felix Handte
c4c3e11958 Avoid Calling setvbuf() on Null File Pointer 2023-03-09 12:47:40 -05:00
W. Felix Handte
50e8f55e7d Fix Python 3.6 Incompatibility in CLI Tests 2023-03-09 12:46:37 -05:00
Dmitriy Voropaev
b7080f4c67 Increase tests timeout
Current timeout is too small for some slower machines, e.g. most modern riscv64 boards,
    where tests fail with the following diagnostics:

    	Traceback (most recent call last):
    	  File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 734, in <module>
    	    success = run_tests(tests, opts)
    	  File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 601, in run_tests
    	    tests[test_case.name] = test_case.run()
    	  File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 285, in run
    	    return self.analyze()
    	  File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 275, in analyze
    	    self._join_test()
    	  File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 330, in _join_test
    	    (stdout, stderr) = self._test_process.communicate(timeout=self._opts.timeout)
    	  File "/usr/lib64/python3.10/subprocess.py", line 1154, in communicate
    	    stdout, stderr = self._communicate(input, endtime, timeout)
    	  File "/usr/lib64/python3.10/subprocess.py", line 2006, in _communicate
    	    self._check_timeout(endtime, orig_timeout, stdout, stderr)
    	  File "/usr/lib64/python3.10/subprocess.py", line 1198, in _check_timeout
    	    raise TimeoutExpired(
    	subprocess.TimeoutExpired: Command '['/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/cli-tests/compression/window-resize.sh']' timed out after 60 seconds
2023-03-09 16:31:05 +04:00
Danielle Rozenblit
70850eb72b assert to ensure that dict buffer type is valid 2023-03-08 16:54:57 -08:00
Yann Collet
64e8511b26 added clarifications for sizes of compressed huffman blocks and streams. 2023-03-08 15:31:36 -08:00
Nick Terrell
07a2a33135 Add ZSTD_set{C,F,}Params() helper functions
* Add ZSTD_setFParams() and ZSTD_setParams()
* Modify ZSTD_setCParams() to use ZSTD_setParameter() to avoid a second path setting parameters
* Add unit tests
* Update documentation to suggest using them to replace deprecated functions

Fixes #3396.
2023-03-08 09:57:35 -08:00
Danielle Rozenblit
96e55c14f2 ability to disable mmap + struct to manage FIO dictionary 2023-03-08 08:06:10 -08:00
Nick Terrell
6313a58e45 [linux-kernel] Fix assert definition
Backport upstream fix of the assert definition. This code is currently unused, and can be enabled for testing, which is why it wasn't caught.

https://lore.kernel.org/lkml/20230129131436.1343228-1-j.neuschaefer@gmx.net/
2023-03-07 16:53:36 -08:00
Yonatan Komornik
988ce61a0c
Adds initialization of clevel to static cdict (#3525) (#3527)
- Initializes clevel in `ZSTD_CCtxParams_init`
- Adds CI workflow for msan fuzzers runs without optimization (`-O0`)
- Fixes Makefile to correctly pass on user defined `MOREFLAGS` and `FUZZER_FLAGS` in cases they have been overwritten
2023-03-06 18:05:12 -08:00
Yann Collet
1e38e07b3d simplified BMK_benchFilesAdvanced() 2023-03-06 12:34:13 -08:00
Yann Collet
9efc14804e minor: fixed zlib wrapper internal benchmark
another possibility could be to link it to programs/benchfn .
Not worth the effort.
2023-03-06 12:20:06 -08:00
Yann Collet
db79219f70 simplify BMK_syntheticTest() 2023-03-06 12:15:22 -08:00
Yann Collet
db7d7b6974
Merge pull request #3516 from dloidolt/fullbench_2_files
fullbench with two files
2023-03-06 11:56:30 -08:00
dependabot[bot]
1be95291a8
Bump github/codeql-action from 2.2.4 to 2.2.5 (#3518)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.4 to 2.2.5.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](17573ee1cc...32dc499307)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-02 10:44:06 -08:00
Yann Collet
bd86e24637
Merge pull request #3513 from DimitriPapadopoulos/codespell
Fix typos found by codespell
2023-02-27 11:44:31 -08:00
Yann Collet
e1ab6913ad
Merge pull request #3514 from facebook/spec_huffman
Clarify zstd specification for Huffman blocks
2023-02-23 15:35:00 -08:00
Nick Terrell
395a2c5462 [bug-fix] Fix rare corruption bug affecting the block splitter
The block splitter confuses sequences with literal length == 65536 that use a
repeat offset code. It interprets this as literal length == 0 when deciding the
meaning of the repeat offset, and corrupts the repeat offset history. This is
benign, merely causing suboptimal compression performance, if the confused
history is flushed before the end of the block, e.g. if there are 3 consecutive
non-repeat code sequences after the mistake. It also is only triggered if the
block splitter decided to split the block.

All that to say: This is a rare bug, and requires quite a few conditions to
trigger. However, the good news is that if you have a way to validate that the
decompressed data is correct, e.g. you've enabled zstd's checksum or have a
checksum elsewhere, the original data is very likely recoverable. So if you were
affected by this bug please reach out.

The fix is to remind the block splitter that the literal length is actually 64K.
The test case is a bit tricky to set up, but I've managed to reproduce the issue.

Thanks to @danlark1 for alerting us to the issue and providing us a reproducer!
2023-02-23 10:54:31 -08:00
Dominik Loidolt
4b9e3d11a6 When benchmarking two files with fullbench, the second file will not be benchmarked because the benchNb has not been reset to zero. 2023-02-20 16:36:26 +01:00
Yann Collet
832f559b0b clarify zstd specification for Huffman blocks
Following detailed comments from @dweiller in #3508.
2023-02-18 18:18:16 -08:00
Dimitri Papadopoulos
547794ef40
Fix typos found by codespell 2023-02-18 10:31:48 +01:00
Yann Collet
4ebaf36582
Merge pull request #3490 from eli-schwartz/meson-tests-noprograms
meson: always build the zstd binary when tests are enabled
2023-02-16 11:27:27 -08:00
Sutou Kouhei
8420502ef9 Don't require CMake 3.18 or later
fix #3500

CMake 3.18 or later was required by #3392. Because it uses
`CheckLinkerFlag`. But requiring CMake 3.18 or later is a bit
aggressive. Because Ubuntu 20.04 LTS still uses CMake 3.16.3:
https://packages.ubuntu.com/search?keywords=cmake

This change disables `-z noexecstack` check with old CMake. This will
not break any existing users. Because users who need `-z noexecstack`
must already use CMake 3.18 or later.
2023-02-16 10:08:45 -08:00
Felix Handte
1c42844668
Merge pull request #3479 from felixhandte/faster-file-ops
Use `f`-variants of `chmod()` and `chown()`
2023-02-16 13:07:34 -05:00
Felix Handte
3c50854c05
Merge pull request #3511 from felixhandte/fix-release-artifact-upload-permission
Fix Permissions on Publish Release Artifacts Job
2023-02-15 13:35:04 -05:00
daniellerozenblit
345ed63976
Merge pull request #3509 from daniellerozenblit/fix-window-resize-test
Fix cli-tests issues
2023-02-15 13:32:07 -05:00
W. Felix Handte
d54ad3c234 Fix Permissions on Publish Release Artifacts Job
Publishing release artifacts requires the `contents` permission, as documented
by: https://docs.github.com/en/rest/overview/permissions-required-for-github-apps.
2023-02-15 11:05:54 -05:00
Danielle Rozenblit
d3d0b92e5e add make test for 32bit 2023-02-15 06:03:02 -08:00
Danielle Rozenblit
7da1c6ddbf fix cli-tests issues 2023-02-14 11:33:26 -08:00
Danielle Rozenblit
2d8afd9ce1 add manual flag to mmap dictionary 2023-02-14 09:42:23 -08:00
Yonatan Komornik
6a86db11a4
CI workflow to test external compressors dependencies
Implemented CI workflow for testing compilation with external compressors and without them. This serves as a sanity check to avoid any code dependencies on libraries that may not always be present. (Reference: #3497 for a bug fix related to this issue.)
2023-02-13 18:00:13 -08:00
Yonatan Komornik
727d03161f
Make Github workflows permissions read-only by default (#3488)
* Make Github workflows permissions read-only by default

* Pins `skx/github-action-publish-binaries` action to specific hash
2023-02-13 16:57:05 -08:00
Alex Xu
886de7bc04
Use correct types in LZMA comp/decomp (#3497)
Bytef and uInt are zlib types, not available when zlib is disabled

Fixes: 1598e6c634ac ("Async write for decompression")
Fixes: cc0657f27d81 ("AsyncIO compression part 2 - added async read and asyncio to compression code (#3022)")
2023-02-13 16:30:56 -08:00
Danielle Rozenblit
8a189b1b29 refactor dictionary file stat 2023-02-13 15:23:06 -08:00