1888 Commits

Author SHA1 Message Date
daniellerozenblit
9116000be6
Merge pull request #3439 from daniellerozenblit/sequence-validation-bug-fix
Fix sequence validation and seqStore bounds check
2023-01-23 13:50:37 -05:00
Danielle Rozenblit
7fc00c18b8 calloc dictionary in sequence compression fuzzer rather than generating a random buffer 2023-01-23 10:42:09 -08:00
Danielle Rozenblit
815d1d4eda update external sequence error to fit error naming scheme 2023-01-23 09:58:34 -08:00
Danielle Rozenblit
f75afb613f merge dev 2023-01-23 08:12:19 -08:00
Danielle Rozenblit
1b65727e74 fix nits and add new error code for invalid external sequences 2023-01-23 07:59:02 -08:00
Danielle Rozenblit
638d502002 modify sequence compression api fuzzer 2023-01-23 07:55:11 -08:00
Yann Collet
cee6bec9fa refactor : --rm is ignored with stdout
`zstd` CLI has progressively moved to the policy of
ignoring `--rm` command when the output is `stdout`.
The primary drive is to feature a behavior more consistent with `gzip`,
when `--rm` is the default, but is also ignored when output is `stdout`.
Other policies are certainly possible, but would break from this `gzip` convention.

The new policy was inconsistenly enforced, depending on the exact list of commands.
For example, it was possible to circumvent it by using `-c --rm` in this order,
which would re-establish source removal.

- Update the CLI so that it necessarily catch these situations and ensure that `--rm` is always disabled when output is `stdout`.
- Added a warning message in this case (for verbosity 3 `-v`).
- Added an `assert()`, which controls that `--rm` is no longer active with `stdout`
- Added tests, which control the behavior, even when `--rm` is added after `-c`
- Removed some legacy code which where trying to apply a specific policy for the `stdout` + `--rm` case, which is no longer possible
2023-01-20 18:04:55 -08:00
Felix Handte
3d25502c2d
Merge pull request #3432 from felixhandte/fix-perms
Fix CLI Handling of Permissions and Ownership (Again)
2023-01-20 19:19:05 -05:00
Nick Terrell
b4467c1061 Fix bufferless API with attached dictionary
Fixes #3102.
2023-01-20 16:15:16 -08:00
Nick Terrell
329169189c Replace Huffman boolean args with flags bit set 2023-01-20 14:12:53 -08:00
Nick Terrell
0cc1b0cb22 Delete unused Huffman functions
Remove all Huffman functions that aren't used by zstd.
2023-01-20 14:12:53 -08:00
Nick Terrell
667eb6d4fd [versions-test] Work around bug in dictionary builder for older versions
Older versions of zstandard have a bug in the dictionary builder, that
can cause dictionary building to fail. The process still exits 0, but
the dictionary is not created.

For reference, the bug is that it creates a dictionary that starts with
the zstd dictionary magic, in the process of writing the dictionary header,
but the header isn't fully written yet, and zstd fails compressions in
this case, because the dictionary is malformated. We fixed this later on
by trying to load the dictionary as a zstd dictionary, but if that fails
we fallback to content only (by default).

The fix is to:
1. Make the dictionary determinsitic by sorting the input files.
   Previously the bug would only sometimes occur, when the input files
   were in a particular order.
2. If dictionary creation fails, fallback to the `head` dictionary.
2023-01-20 14:05:36 -08:00
Nick Terrell
666944fbe6 Cap hashLog & chainLog to ensure that we only use 32 bits of hash
* Cap shortCache chainLog to 24
* Cap row match finder hashLog so that rowLog <= 24
* Add unit tests to expose all cases. The row match finder unit tests
  are only run in 64-bit mode, because they allocate ~1GB.

Fixes #3336
2023-01-20 14:05:26 -08:00
Danielle Rozenblit
aa385ece13 fix sequence validation and bounds check in ZSTD_copySequencesToSeqStore() 2023-01-20 10:32:35 -08:00
Elliot Gorokhovsky
f593e54ee1
Enable if == 1 rather than if == 0
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2023-01-20 11:41:53 -05:00
Elliot Gorokhovsky
3f9f568aa6 Fuzz the external matchfinder API 2023-01-19 13:33:25 -08:00
Elliot Gorokhovsky
bce0382c82
Bugfixes for the External Matchfinder API (#3433)
* external matchfinder bugfixes + tests

* small doc fix
2023-01-19 10:41:24 -05:00
daniellerozenblit
dc1c6cc5df
Merge pull request #3418 from daniellerozenblit/fuzz-max-block-size
Fuzz on maxBlockSize
2023-01-19 08:18:04 -05:00
Yann Collet
bbe65d760c
Merge pull request #3423 from facebook/ptime
Refactor timefn, restore support for clock_gettime()
2023-01-18 13:27:42 -08:00
W. Felix Handte
7a8c8f3fe7 Easy: Print Mode as Octal in chmod() Trace 2023-01-18 11:57:54 -08:00
W. Felix Handte
0d2d460223 Mimic gzip chown(gid), chmod(), chown(uid) Behavior
Avoids a race condition in which we unintentionally open up permissions to
the wrong group.
2023-01-18 11:57:54 -08:00
W. Felix Handte
1e3eba65a6 Copy Permissions from Source File 2023-01-18 11:57:35 -08:00
W. Felix Handte
0382076af7 Re-Use stat_t in FIO_compressFilename_srcFile() 2023-01-18 11:33:07 -08:00
Nick Terrell
860548cd5b [tests] Fix version test determinism
The dictionary source files were taken from the `dev` branch before this
commit, which could introduce non-determinism on PR jobs. Instead take
the sources from the PR checkout.

This PR also adds stderr logging, and verbose output for the jobs that
are failing, to help catch the failure if it occurs again.
2023-01-17 14:10:46 -08:00
W. Felix Handte
a5ed28f1fb Use Existing Src File Stat in *_dstFile() Funcs
One fewer `stat()` call to make per operation!
2023-01-17 14:08:22 -08:00
Danielle Rozenblit
8353a4b095 fix maxBlockSize resolution + add test cases 2023-01-17 12:24:18 -08:00
Yann Collet
2086e7396e missing #include for Windows 2023-01-13 11:38:27 -08:00
Yann Collet
bcfb7ad03c refactor timefn
The timer storage type is no longer dependent on OS.
This will make it possible to re-enable posix precise timers
since the timer storage type will no longer be sensible to #include order.
See #3168 for details of pbs of previous interface.

Suggestion by @terrelln
2023-01-12 19:24:31 -08:00
Nick Terrell
5b266196a4 Add support for in-place decompression
* Add a function and macro ZSTD_decompressionMargin() that computes the
  decompression margin for in-place decompression. The function computes
  a tight margin that works in all cases, and the macro computes an upper
  bound that will only work if flush isn't used.
* When doing in-place decompression, make sure that our output buffer
  doesn't overlap with the input buffer. This ensures that we don't
  decide to use the portion of the output buffer that overlaps the input
  buffer for temporary memory, like for literals.
* Add a simple unit test.
* Add in-place decompression to the simple_round_trip and
  stream_round_trip fuzzers. This should help verify that our margin stays
  correct.
2023-01-12 16:28:08 -08:00
Yann Collet
423500d1ae
Merge pull request #3413 from facebook/timefn
minor refactoring for timefn
2023-01-12 15:34:00 -08:00
Danielle Rozenblit
06b096db47 additional tests and documentation updates + allow maxBlockSize to be set to 0 (goes to default) 2023-01-12 13:41:50 -08:00
Danielle Rozenblit
53eb5a758c add simple test for maxBlockSize expected functionality 2023-01-12 08:55:39 -08:00
Danielle Rozenblit
1fffcfe01d update minimum threshold for max block size 2023-01-11 11:09:57 -08:00
Daniel Kutenin
ca2ff788df Make the producer use the same amount of entropy 2023-01-11 10:09:19 -08:00
Daniel Kutenin
3ac0b91302 Fix fuzzing with ZSTD_MULTITHREAD
At Google we fuzz zstd without ZSTD_MULTITHREAD but we want inputs to be as much as reproducible. It allows us to test new fuzzing methods for our fuzz team internally and have more horsepower to find bugs
2023-01-11 10:09:19 -08:00
Danielle Rozenblit
fe08137d9a resolve max block value in cctx and use when calculating the max block size 2023-01-09 07:53:53 -08:00
Yann Collet
8b130009e3 minor simplification refactoring for timefn
`UTIL_getSpanTimeMicro()` can be factored in a generic way,
reducing OS-dependent code.
2023-01-06 16:12:54 -08:00
Danielle Rozenblit
908e812733 initial commit 2023-01-04 13:01:54 -08:00
Yann Collet
c79fb4d78d update levels.sh test
comparing level 19 to level 22 and expecting a stricter better result from level 22
is not that guaranteed,
because level 19 and 22 are very close to each other,
especially for small files,
so any noise in the final compression result
result in failing this test.

Level 22 could be compared to something much lower, like level 15,
But level 19 is required anyway, because there is a clamping test which depends on it.

Removed level 22, kept level 19
2023-01-03 14:04:41 -08:00
Yann Collet
ebba9ff425 update regression results 2023-01-03 14:04:23 -08:00
daniellerozenblit
1c818e3a0a
Merge pull request #3302 from daniellerozenblit/optimal-huff-depth-speed
Optimal huff depth speed improvements
2023-01-03 12:51:51 -05:00
Danielle Rozenblit
87becc567d update regression results.csv 2023-01-03 08:41:40 -08:00
Yann Collet
bcbd395c1c
Merge pull request #3395 from terrelln/2022-12-21-deprecated-test
[tests] Remove deprecated function from longmatch.c test
2022-12-28 15:49:50 -08:00
Yann Collet
481a2e1010
Merge pull request #3403 from facebook/setCParams
ZSTD_CCtx_setCParams
2022-12-28 14:07:13 -08:00
Elliot Gorokhovsky
2a402626dd
External matchfinder API (#3333)
* First building commit with sample matchfinder

* Set up ZSTD_externalMatchCtx struct

* move seqBuffer to ZSTD_Sequence*

* support non-contiguous dictionary

* clean up parens

* add clearExternalMatchfinder, handle allocation errors

* Add useExternalMatchfinder cParam

* validate useExternalMatchfinder cParam

* Disable LDM + external matchfinder

* Check for static CCtx

* Validate mState and mStateDestructor

* Improve LDM check to cover both branches

* Error API with optional fallback

* handle RLE properly for external matchfinder

* nit

* Move to a CDict-like model for resource ownership

* Add hidden useExternalMatchfinder bool to CCtx_params_s

* Eliminate malloc, move to cwksp allocation

* Handle CCtx reset properly

* Ensure seqStore has enough space for external sequences

* fix capitalization

* Add DEBUGLOG statements

* Add compressionLevel param to matchfinder API

* fix c99 issues and add a param combination error code

* nits

* Test external matchfinder API

* C90 compat for simpleExternalMatchFinder

* Fix some @nocommits and an ASAN bug

* nit

* nit

* nits

* forward declare copySequencesToSeqStore functions in zstd_compress_internal.h

* nit

* nit

* nits

* Update copyright headers

* Fix CMake zstreamtest build

* Fix copyright headers (again)

* typo

* Add externalMatchfinder demo program to make contrib

* Reduce memory consumption for small blockSize

* ZSTD_postProcessExternalMatchFinderResult nits

* test sum(matchlen) + sum(litlen) == srcSize in debug builds

* refExternalMatchFinder -> registerExternalMatchFinder

* C90 nit

* zstreamtest nits

* contrib nits

* contrib nits

* allow block splitter + external matchfinder, refactor

* add windowSize param

* add contrib/externalMatchfinder/README.md

* docs

* go back to old RLE heuristic because of the first block issue

* fix initializer element is not a constant expression

* ref contrib from zstd.h

* extremely pedantic compiler warning fix, meson fix, typo fix

* Additional docs on API limitations

* minor nits

* Refactor maxNbSeq calculation into a helper function

* Fix copyright
2022-12-28 16:45:14 -05:00
Yann Collet
89342d1e07 New xp library symbol : ZSTD_CCtx_setCParams()
Inspired by #3395,
offer a new capability to set all parameters defined in a ZSTD_compressionParameters structure
with a single symbol invocation
to improve user code brevity.
2022-12-27 23:49:22 -08:00
Yann Collet
90597d78ea
Merge pull request #3394 from terrelln/issue-3010
[cli-tests] Test file stat read/write
2022-12-27 16:20:05 -08:00
Nick Terrell
7fe7a166c2 [cli-tests] Add tests that use --trace-file-stat
Basic tests for (de)compressing in the following modes:
* file to file
* file to stdout
* stdin to file
* stdin to stdout

These are basic tests, and aren't testing more advanced scenarios, but
it adds the groundwork for more complex tests as needed.

Fixes #3010.
2022-12-21 18:32:12 -08:00
Nick Terrell
4b40e405d3 [tests] Remove deprecated function from longmatch.c test
Thanks to @eli-schwartz for pointing it out!

We should maybe consider adding a helper function for applying
`ZSTD_parameters` and `ZSTD_compressionParameters` to a context.
That would aid the transition to the new API in situations like this.
2022-12-21 17:52:10 -08:00
Nick Terrell
40a7188130 Fix make clangbuild & add CI
Fix the errors for:
* `-Wdocumentation`
* `-Wconversion` except `-Wsign-conversion`
2022-12-21 17:31:04 -08:00