9456 Commits

Author SHA1 Message Date
Nick Terrell
f3096ff6d1 [test] Add new CLI testing platform
Adds the new CLI testing platform that I'm proposing.
See the added `README.md` for details.
2022-01-27 13:56:59 -08:00
Nick Terrell
f088c430e3 [datagen] Remove extra newline printed
`datagen` was printing a `\n` even when it had no other output. Raise
the output level for the final `\n` to the minimum output level used.

This minor bug was caught by the new testing framework.
2022-01-27 13:56:59 -08:00
Nick Terrell
495dcb839a [zstdcli] Fix option detection for --auto-threads
The option `--auto-threads` should still be accepted and parsed, even if
`ZSTD_MULTITHREAD` is not defined. It doesn't mean anything, but we
should still accept the option. Since we want scripts to be able to work
generically.

This bug was caught by tests I added to the new testing framework.
2022-01-27 13:56:59 -08:00
Nick Terrell
246982e782 [dibio] Fix assertion triggered by no inputs
Passing 0 inputs to `DiB_shuffle()` caused an assertion failure where
it should just return.

A test is added in a later commit, with the initial introduction of the
new testing framework.

Fixes #3007.
2022-01-27 13:56:59 -08:00
Yann Collet
5d70ec0bc4
Merge pull request #3033 from facebook/fix44108
fix issue 44108
2022-01-27 10:57:48 -08:00
Yann Collet
bad7f82300
Merge pull request #2974 from facebook/fix2966_part3
Lazy parameters adaptation (part 1 - ZSTD_c_stableInBuffer)
2022-01-27 06:14:04 -08:00
Yann Collet
8df1257c3c fix issue 44108
credit to oss-fuzz

In rare circumstances, the block-splitter might cut a block at the exact beginning of a repcode.
In which case, since litlength=0, if the repcode expected 1+ literals in front, its signification changes.
This scenario is controlled in ZSTD_seqStore_resolveOffCodes(),
and the repcode is transformed into a raw offset when its new meaning is incorrect.

In more complex scenarios, the previous block might be emitted as uncompressed after all,
thus modifying the expected repcode history.
In the case discovered by oss-fuzz, the first block is emitted as uncompressed,
so the repcode history remains at default values: 1,4,8.

But since the starting repcode is repcode3, and the literal length is == 0,
its meaning is : = repcode1 - 1.
Since repcode1==1, it results in an offset value of 0, which is invalid.

So that's what the `assert()` was verifying : the result of the repcode translation should be a valid offset.

But actually, it doesn't matter, because this result will then be compared to reality,
and since it's an invalid offset, it will necessarily be discarded if incorrect,
then the repcode will be replaced by a raw offset.

So the `assert()` is not useful.
Furthermore, it's incorrect, because it assumes this situation cannot happen, but it does, as described in above scenario.
2022-01-27 05:49:59 -08:00
Yann Collet
f2d9652ad8 more usage of new error code stabilityCondition_notRespected
as suggested by @terrelln
2022-01-26 18:30:55 -08:00
Yann Collet
7543085013
Merge pull request #3019 from facebook/huf_traces
More traces to improved debugging of literals compression
2022-01-26 18:02:05 -08:00
Yann Collet
8b46895588 removed new huffman depth heuristic
results are now identical to before this PR
2022-01-26 15:22:06 -08:00
Yann Collet
a66e8bb437 introduced LitHufLog constant
which properly represents the maximum bit size of compressed literals (11) as defined in the specification.

To be preferred from HUF_TABLELOG_DEFAULT which represents the same value but by accident.

Name selected to keep the same convention as existing width definitions,
MLFSELog, LLFSELog and OffFSELog.
2022-01-26 14:47:24 -08:00
Yann Collet
2d154e627a renamed HufLog into ZSTD_HUFFDTABLE_CAPACITY_LOG
old name was not descriptive and actually misleading
2022-01-26 14:47:24 -08:00
Yann Collet
32a5d95dcb moved HufLog to lib/decompress
it's only used to size decompression tables
2022-01-26 14:47:24 -08:00
Yann Collet
e9dd923fa4 only declare debug functions in debug mode 2022-01-26 14:47:24 -08:00
Yann Collet
5db717af10 proper max limit to 11 2022-01-26 14:47:24 -08:00
Yann Collet
4684836f4f update regression tests
minor compression ratio benefits in some cases,
no compression ratio regression in the measured scenarios.
2022-01-26 14:47:24 -08:00
Yann Collet
51da2d2ff2 improved compression of literals in specific corner cases
In rare cases, the default huffman depth selector is a bit too harsh,
requiring brutal adaptations to the tree,
resulting is some loss of compression ratio.
This new heuristic avoids the worse cases, favoring compression ratio.

As an example, compression of a specific distribution of 771 literals
is now improved to 441 bytes, from 601 bytes before.
2022-01-26 14:47:24 -08:00
Yann Collet
7616e39f3b adding traces to better track processing of literals 2022-01-26 14:47:21 -08:00
Yann Collet
a0acf9aa49
Merge pull request #3023 from facebook/fix_seqCompress_withDelimiter
fix sequence compression API in Explicit Delimiter mode
2022-01-26 14:15:28 -08:00
Yann Collet
dda4c10f07 added ZSTD_compressStream2() + ZSTD_c_stableInBuffer test 2022-01-26 13:33:04 -08:00
Yann Collet
cbff372d10 added helper function inBuffer_forEndFlush() 2022-01-26 11:05:57 -08:00
Yann Collet
b99ece96b9 converted checks into user validation generating error codes
had to create a new error code for this condition,
none of the existing ones were fitting enough.
2022-01-26 10:43:50 -08:00
Yann Collet
af3d9c506e added streaming test starting from non-0 pos 2022-01-26 10:31:25 -08:00
Yann Collet
c1668a00d2 fix extended case combining stableInBuffer with continue() and flush() modes 2022-01-26 10:31:25 -08:00
Yann Collet
270f9bf005 better consistency in accessing @input
as suggested by @terrelln.

Also : commented zstreamtest more
to ensure ZSTD_stableInBuffer is tested/
2022-01-26 10:31:24 -08:00
Yann Collet
8296be4a0a pretend consuming input to provide a sense of forward progress 2022-01-26 10:31:24 -08:00
Yann Collet
4b9d1dd9ff fixed incorrect comment 2022-01-26 10:31:24 -08:00
Yann Collet
27d336b099 minor behavior refinements
specifically, there is no obligation to start streaming compression with pos=0.
stableSrc mode is now compatible with this setup.
2022-01-26 10:31:24 -08:00
Yann Collet
37b87add7a make stableSrc compatible with regular streaming API
including flushStream().

Now the only condition is for `input.size` to continuously grow.
2022-01-26 10:31:24 -08:00
Yann Collet
c0c5ffa973 streaming compression : lazy parameter adaptation with stable input
effectively makes ZSTD_c_stableInput compatible ZSTD_compressStream()
and zstd_e_continue operation mode.
2022-01-26 10:31:24 -08:00
Yann Collet
5684bae4f6 minor refactoring
on streaming compression implementation.
2022-01-26 10:31:23 -08:00
Yann Collet
fc2ea97442 refactored fuzzer tests for sequence compression api
add explicit delimiter mode to libfuzzer test
2022-01-26 00:19:35 -08:00
Yann Collet
87dcd3326a fix sequence compression API in Explicit Delimiter mode 2022-01-25 13:33:41 -08:00
Yann Collet
cc7d23bcec
Merge pull request #2965 from facebook/offbase
Converge sumtype (offset | repcode) numeric representation towards offBase
2022-01-24 15:47:42 -08:00
Yonatan Komornik
70df5de1b2
AsyncIO compression part 1 - refactor of existing asyncio code (#3021)
* Refactored fileio.c:
- Extracted asyncio code to fileio_asyncio.c/.h
- Moved type definitions to fileio_types.h
- Moved common macro definitions needed by both fileio.c and fileio_asyncio.c to fileio_common.h

* Bugfix - rename fileio_asycio to fileio_asyncio

* Added copyrights & license to new files

* CR fixes
2022-01-24 14:43:02 -08:00
Nick Terrell
87f81d0796
Merge pull request #3026 from trixirt/from-linux-fix
cleanup double word in comment.
2022-01-24 14:31:16 -08:00
Tom Rix
2b957afec7 cleanup double word in comment.
Remove the second 'a' and 'into'

Signed-off-by: Tom Rix <trix@redhat.com>
2022-01-24 12:43:39 -08:00
Yann Collet
feaaf7a6b1 slightly shortened status and summary lines in very verbose mode 2022-01-21 21:38:35 -08:00
binhdvo
17017ac8db
Change zstdless behavior to align with zless (#2909)
* Change zstdless behavior to align with zless
2022-01-21 19:57:19 -05:00
Yann Collet
b27356fd27
Merge pull request #3005 from cwoffenden/faster-amalgamate
Use faster Python script to amalgamate
2022-01-21 16:12:22 -08:00
Yann Collet
24318093cc slightly shortened compression status update line
to fit within 80 columns limit.
2022-01-21 14:08:46 -08:00
Yonatan Komornik
8ab95f24da
Merge pull request #2985 from yoniko/zstd-output-file-buffer
ZSTD CLI: Use buffered output
2022-01-21 13:57:05 -08:00
Yonatan Komornik
1598e6c634
Async write for decompression (#2975)
* Async IO decompression:
- Added --[no-]asyncio flag for CLI decompression.
- Replaced dstBuffer in decompression with a pool of write jobs.
- Added an ability to execute write jobs in a separate thread.
- Added an ability to wait (join) on all jobs in a thread pool (queued and running).
2022-01-21 13:55:41 -08:00
Nick Terrell
2f03c1996f
Merge pull request #3013 from WojciechMula/simplify-asm
Simplify HUF_decompress4X2_usingDTable_internal_bmi2_asm_loop
2022-01-21 11:13:40 -08:00
Felix Handte
4a35912b3b
Merge pull request #3018 from felixhandte/gh-action-release-artifacts-trigger-fix
Trigger Release Artifact Generation on Publish
2022-01-20 21:08:59 -05:00
Yann Collet
71921e596f
Merge pull request #2983 from facebook/minLitPricev2
[opt] minor compression ratio improvement
2022-01-20 16:02:31 -08:00
W. Felix Handte
fa9cb4510a Trigger Release Artifact Generation on Publish
We previously triggered release artifact generation on release creation. We
sometimes observed that the action failed to run. I hypothesized that we were
hitting rate limiting or something. I just stumbled across [this documentat-
ion](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#release), which says:

> Note: Workflows are not triggered for the `created`, `edited`, or `deleted`
> activity types for draft releases. When you create your release through the
> GitHub browser UI, your release may automatically be saved as a draft.

This must have been what was happening. This commit therefore changes the
trigger to the `published` activity. This should be more reliable.

This does have the unfortunate side effect that artifacts won't be generated
or attached until *after* the release has been published, which is what I was
trying to avoid by using the `created` activity. Oh well.
2022-01-20 17:36:28 -05:00
Felix Handte
330c97d2bf
Merge pull request #3015 from felixhandte/enable-cet-test-illegal-instruction
Add GitHub Action Checking that Zstd Runs Successfully Under CET
2022-01-20 16:18:42 -05:00
Elliot Gorokhovsky
a8f1aa2f6d
Merge pull request #3016 from embg/macro_lint
Minor lint fix
2022-01-20 13:05:02 -07:00
Felix Handte
c7e8315d88
Merge pull request #2994 from hjl-tools/hjl/cet-report/dev
x86: Append -z cet-report=error to LDFLAGS
2022-01-20 14:58:13 -05:00