Merge branch 'dev' into initStatic_tests

2025-10-19 00:05:29 -04:00 · 2020-05-11 16:51:13 -07:00 · 2020-05-11 16:51:13 -07:00 · 58227db405
commit 58227db405
parent dd026ca505 d8b40fe0be
3 changed files with 129 additions and 73 deletions
--- a/README.md
+++ b/README.md
@ -31,10 +31,10 @@ a list of known ports and bindings is provided on [Zstandard homepage](http://ww
 ## Benchmarks
 For reference, several fast compression algorithms were tested and compared
-on a server running Arch Linux (`Linux version 5.0.5-arch1-1`),
+on a server running Arch Linux (`Linux version 5.5.11-arch1-1`),
 with a Core i9-9900K CPU @ 5.0GHz,
 using [lzbench], an open-source in-memory benchmark by @inikep
-compiled with [gcc] 8.2.1,
+compiled with [gcc] 9.3.0,
 on the [Silesia compression corpus].
 [lzbench]: https://github.com/inikep/lzbench
@ -43,18 +43,26 @@ on the [Silesia compression corpus].
 | Compressor name         | Ratio | Compression| Decompress.|
 | ---------------         | ------| -----------| ---------- |
-| **zstd 1.4.4 -1**       | 2.884 |   520 MB/s |  1600 MB/s |
+| **zstd 1.4.5 -1**       | 2.884 |   500 MB/s |  1660 MB/s |
-| zlib 1.2.11 -1          | 2.743 |   110 MB/s |   440 MB/s |
+| zlib 1.2.11 -1          | 2.743 |    90 MB/s |   400 MB/s |
-| brotli 1.0.7 -0         | 2.701 |   430 MB/s |   470 MB/s |
+| brotli 1.0.7 -0         | 2.703 |   400 MB/s |   450 MB/s |
-| quicklz 1.5.0 -1        | 2.238 |   600 MB/s |   800 MB/s |
+| **zstd 1.4.5 --fast=1** | 2.434 |   570 MB/s |  2200 MB/s |
-| lzo1x 2.09 -1           | 2.106 |   680 MB/s |   950 MB/s |
+| **zstd 1.4.5 --fast=3** | 2.312 |   640 MB/s |  2300 MB/s |
-| lz4 1.8.3               | 2.101 |   800 MB/s |  4220 MB/s |
+| quicklz 1.5.0 -1        | 2.238 |   560 MB/s |   710 MB/s |
-| snappy 1.1.4            | 2.073 |   580 MB/s |  2020 MB/s |
+| **zstd 1.4.5 --fast=5** | 2.178 |   700 MB/s |  2420 MB/s |
-| lzf 3.6 -1              | 2.077 |   440 MB/s |   930 MB/s |
+| lzo1x 2.10 -1           | 2.106 |   690 MB/s |   820 MB/s |
 | lz4 1.9.2               | 2.101 |   740 MB/s |  4530 MB/s |
 | **zstd 1.4.5 --fast=7** | 2.096 |   750 MB/s |  2480 MB/s |
 | lzf 3.6 -1              | 2.077 |   410 MB/s |   860 MB/s |
 | snappy 1.1.8            | 2.073 |   560 MB/s |  1790 MB/s |
 [zlib]: http://www.zlib.net/
 [LZ4]: http://www.lz4.org/
 The negative compression levels, specified with `--fast=#`,
 offer faster compression and decompression speed in exchange for some loss in
 compression ratio compared to level 1, as seen in the table above.
 Zstd can also offer stronger compression ratios at the cost of compression speed.
 Speed vs Compression trade-off is configurable by small increments.
 Decompression speed is preserved and remains roughly the same at all settings,
--- a/lib/compress/zstd_compress.c
+++ b/lib/compress/zstd_compress.c
@ -1144,13 +1144,26 @@ size_t ZSTD_estimateCCtxSize_usingCCtxParams(const ZSTD_CCtx_params* params)
        size_t const ldmSpace = ZSTD_ldm_getTableSize(params->ldmParams);
        size_t const ldmSeqSpace = ZSTD_cwksp_alloc_size(ZSTD_ldm_getMaxNbSeq(params->ldmParams, blockSize) * sizeof(rawSeq));
-        size_t const neededSpace = entropySpace + blockStateSpace + tokenSpace +
+        /* estimateCCtxSize is for one-shot compression. So no buffers should
-                                   matchStateSize + ldmSpace + ldmSeqSpace;
+         * be needed. However, we still allocate two 0-sized buffers, which can
         * take space under ASAN. */
        size_t const bufferSpace = ZSTD_cwksp_alloc_size(0)
                                 + ZSTD_cwksp_alloc_size(0);
        size_t const cctxSpace = ZSTD_cwksp_alloc_size(sizeof(ZSTD_CCtx));
-        DEBUGLOG(5, "sizeof(ZSTD_CCtx) : %u", (U32)cctxSpace);
+        size_t const neededSpace =
            cctxSpace +
            entropySpace +
            blockStateSpace +
            ldmSpace +
            ldmSeqSpace +
            matchStateSize +
            tokenSpace +
            bufferSpace;
        DEBUGLOG(5, "estimate workspace : %u", (U32)neededSpace);
-        return cctxSpace + neededSpace;
+        return neededSpace;
    }
 }
--- a/programs/README.md
+++ b/programs/README.md
@ -10,7 +10,7 @@ There are however other Makefile targets that create different variations of CLI
 - `zstd-decompress` : version of CLI which can only decompress zstd format
-#### Compilation variables
+### Compilation variables
 `zstd` scope can be altered by modifying the following `make` variables :
 - __HAVE_THREAD__ : multithreading is automatically enabled when `pthread` is detected.
@ -61,6 +61,24 @@ There are however other Makefile targets that create different variations of CLI
  In which case, linking stage will fail if `lz4` library cannot be found.
  This is useful to prevent silent feature disabling.
 - __ZSTD_NOBENCH__ : `zstd` cli will be compiled without its integrated benchmark module.
  This can be useful to produce smaller binaries.
  In this case, the corresponding unit can also be excluded from compilation target.
 - __ZSTD_NODICT__ : `zstd` cli will be compiled without support for the integrated dictionary builder.
  This can be useful to produce smaller binaries.
  In this case, the corresponding unit can also be excluded from compilation target.
 - __ZSTD_NOCOMPRESS__ : `zstd` cli will be compiled without support for compression.
  The resulting binary will only be able to decompress files.
  This can be useful to produce smaller binaries.
  A corresponding `Makefile` target using this ability is `zstd-decompress`.
 - __ZSTD_NODECOMPRESS__ : `zstd` cli will be compiled without support for decompression.
  The resulting binary will only be able to compress files.
  This can be useful to produce smaller binaries.
  A corresponding `Makefile` target using this ability is `zstd-compress`.
 - __BACKTRACE__ : `zstd` can display a stack backtrace when execution
  generates a runtime exception. By default, this feature may be
  degraded/disabled on some platforms unless additional compiler directives are
@ -69,11 +87,11 @@ There are however other Makefile targets that create different variations of CLI
  Example : `make zstd BACKTRACE=1`
-#### Aggregation of parameters
+### Aggregation of parameters
 CLI supports aggregation of parameters i.e. `-b1`, `-e18`, and `-i1` can be joined into `-b1e18i1`.
-#### Symlink shortcuts
+### Symlink shortcuts
 It's possible to invoke `zstd` through a symlink.
 When the name of the symlink has a specific value, it triggers an associated behavior.
 - `zstdmt` : compress using all cores available on local system.
@ -86,7 +104,7 @@ When the name of the symlink has a specific value, it triggers an associated beh
 - `ungz`, `unxz` and `unlzma` will do the same, and will also remove source file by default (use `--keep` to preserve).
-#### Dictionary builder in Command Line Interface
+### Dictionary builder in Command Line Interface
 Zstd offers a training mode, which can be used to tune the algorithm for a selected
 type of data, by providing it with a few samples. The result of the training is stored
 in a file selected with the `-o` option (default name is `dictionary`),
@ -106,7 +124,7 @@ Usage of the dictionary builder and created dictionaries with CLI:
 3. Decompress with the dictionary: `zstd --decompress FILE.zst -D dictionaryName`
-#### Benchmark in Command Line Interface
+### Benchmark in Command Line Interface
 CLI includes in-memory compression benchmark module for zstd.
 The benchmark is conducted using given filenames. The files are read into memory and joined together.
 It makes benchmark more precise as it eliminates I/O overhead.
@ -118,81 +136,84 @@ One can select compression levels starting from `-b` and ending with `-e`.
 The `-i` parameter selects minimal time used for each of tested levels.
-#### Usage of Command Line Interface
+### Usage of Command Line Interface
 The full list of options can be obtained with `-h` or `-H` parameter:
 ```
-Usage : 
+Usage :
-      zstd [args] [FILE(s)] [-o file] 
+      zstd [args] [FILE(s)] [-o file]
-FILE    : a filename 
+FILE    : a filename
          with no FILE, or when FILE is - , read standard input
-Arguments : 
+Arguments :
- -#     : # compression level (1-19, default: 3) 
+ -#     : # compression level (1-19, default: 3)
- -d     : decompression 
+ -d     : decompression
- -D file: use `file` as Dictionary 
+ -D file: use `file` as Dictionary
- -o file: result stored into `file` (only if 1 input file) 
+ -o file: result stored into `file` (only if 1 input file)
- -f     : overwrite output without prompting and (de)compress links 
+ -f     : overwrite output without prompting and (de)compress links
--rm    : remove source file(s) after successful de/compression 
+--rm    : remove source file(s) after successful de/compression
- -k     : preserve source file(s) (default) 
+ -k     : preserve source file(s) (default)
- -h/-H  : display help/long help and exit 
+ -h/-H  : display help/long help and exit
-Advanced arguments : 
+Advanced arguments :
- -V     : display Version number and exit 
+ -V     : display Version number and exit
 -v     : verbose mode; specify multiple times to increase verbosity
 -q     : suppress warnings; specify twice to suppress errors too
 -c     : force write to standard output, even if it is the console
- -l     : print information about zstd compressed files 
+ -l     : print information about zstd compressed files
--exclude-compressed:  only compress files that are not previously compressed 
+--exclude-compressed:  only compress files that are not previously compressed
 --ultra : enable levels beyond 19, up to 22 (requires more memory)
 --long[=#]: enable long distance matching with given window log (default: 27)
 --fast[=#]: switch to very fast compression levels (default: 1)
--adapt : dynamically adapt compression level to I/O conditions 
+--adapt : dynamically adapt compression level to I/O conditions
--stream-size=# : optimize compression parameters for streaming input of given number of bytes 
+--stream-size=# : optimize compression parameters for streaming input of given number of bytes
 --size-hint=# optimize compression parameters for streaming input of approximately this size
--target-compressed-block-size=# : make compressed block near targeted size 
+--target-compressed-block-size=# : make compressed block near targeted size
- -T#    : spawns # compression threads (default: 1, 0==# cores) 
+ -T#    : spawns # compression threads (default: 1, 0==# cores)
- -B#    : select size of each job (default: 0==automatic) 
+ -B#    : select size of each job (default: 0==automatic)
--rsyncable : compress using a rsync-friendly method (-B sets block size) 
+--rsyncable : compress using a rsync-friendly method (-B sets block size)
 --no-dictID : don't write dictID into header (dictionary compression)
--[no-]check : integrity check (default: enabled) 
+--[no-]check : integrity check (default: enabled)
--[no-]compress-literals : force (un)compressed literals 
+--[no-]compress-literals : force (un)compressed literals
- -r     : operate recursively on directories 
+ -r     : operate recursively on directories
--output-dir-flat[=directory]: all resulting files stored into `directory`. 
+--output-dir-flat[=directory]: all resulting files stored into `directory`.
--format=zstd : compress files to the .zst format (default) 
+--format=zstd : compress files to the .zst format (default)
--format=gzip : compress files to the .gz format 
+--format=gzip : compress files to the .gz format
--test  : test compressed file integrity 
+--test  : test compressed file integrity
 --[no-]sparse : sparse mode (default: disabled)
- -M#    : Set a memory usage limit for decompression 
+ -M#    : Set a memory usage limit for decompression
--no-progress : do not display the progress bar 
+--no-progress : do not display the progress bar
--      : All arguments after "--" are treated as files 
+--      : All arguments after "--" are treated as files
-Dictionary builder : 
+Dictionary builder :
--train ## : create a dictionary from a training set of files 
+--train ## : create a dictionary from a training set of files
 --train-cover[=k=#,d=#,steps=#,split=#,shrink[=#]] : use the cover algorithm with optional args
 --train-fastcover[=k=#,d=#,f=#,steps=#,split=#,accel=#,shrink[=#]] : use the fast cover algorithm with optional args
 --train-legacy[=s=#] : use the legacy algorithm with selectivity (default: 9)
- -o file : `file` is dictionary name (default: dictionary) 
+ -o file : `file` is dictionary name (default: dictionary)
--maxdict=# : limit dictionary to specified size (default: 112640) 
+--maxdict=# : limit dictionary to specified size (default: 112640)
 --dictID=# : force dictionary ID to specified value (default: random)
-Benchmark arguments : 
+Benchmark arguments :
- -b#    : benchmark file(s), using # compression level (default: 3) 
+ -b#    : benchmark file(s), using # compression level (default: 3)
 -e#    : test all compression levels from -bX to # (default: 1)
- -i#    : minimum evaluation time in seconds (default: 3s) 
+ -i#    : minimum evaluation time in seconds (default: 3s)
 -B#    : cut file into independent blocks of size # (default: no block)
--priority=rt : set process priority to real-time 
+--priority=rt : set process priority to real-time
 ```
-#### Restricted usage of Environment Variables
+### Passing parameters through Environment Variables
-Using environment variables to set parameters has security implications.
+`ZSTD_CLEVEL` can be used to modify the default compression level of `zstd`
-Therefore, this avenue is intentionally restricted.
+(usually set to `3`) to another value between 1 and 19 (the "normal" range).
-Only `ZSTD_CLEVEL` is supported currently, for setting compression level.
+This can be useful when `zstd` CLI is invoked in a way that doesn't allow passing arguments.
-`ZSTD_CLEVEL` can be used to set the level between 1 and 19 (the "normal" range).
+One such scenario is `tar --zstd`.
-If the value of `ZSTD_CLEVEL` is not a valid integer, it will be ignored with a warning message.
+As `ZSTD_CLEVEL` only replaces the default compression level,
-`ZSTD_CLEVEL` just replaces the default compression level (`3`).
+it can then be overridden by corresponding command line arguments.
 It can be overridden by corresponding command line arguments.
-#### Long distance matching mode
+There is no "generic" way to pass "any kind of parameter" to `zstd` in a pass-through manner.
 Using environment variables for this purpose has security implications.
 Therefore, this avenue is intentionally restricted and only supports `ZSTD_CLEVEL`.
 ### Long distance matching mode
 The long distance matching mode, enabled with `--long`, is designed to improve
 the compression ratio for files with long matches at a large distance (up to the
 maximum window size, `128 MiB`) while still maintaining compression speed.
@ -216,12 +237,12 @@ Compression Speed vs Ratio | Decompression Speed
 | Method | Compression ratio | Compression speed | Decompression speed  |
 |:-------|------------------:|-------------------------:|---------------------------:|
-| `zstd -1`   | `5.065`   | `284.8 MB/s`  | `759.3 MB/s`  |
+| `zstd -1`  | `5.065`    | `284.8 MB/s`  | `759.3 MB/s`  |
 | `zstd -5`  | `5.826`    | `124.9 MB/s`  | `674.0 MB/s`  |
 | `zstd -10` | `6.504`    | `29.5 MB/s`   | `771.3 MB/s`  |
 | `zstd -1 --long` | `17.426` | `220.6 MB/s` | `1638.4 MB/s` |
-| `zstd -5 --long` | `19.661` | `165.5 MB/s` | `1530.6 MB/s`|
+| `zstd -5 --long` | `19.661` | `165.5 MB/s` | `1530.6 MB/s` |
-| `zstd -10 --long`| `21.949` | `75.6 MB/s` | `1632.6 MB/s`|
+| `zstd -10 --long`| `21.949` |  `75.6 MB/s` | `1632.6 MB/s` |
 On this file, the compression ratio improves significantly with minimal impact
 on compression speed, and the decompression speed doubles.
@ -243,13 +264,27 @@ The below table illustrates this on the [Silesia compression corpus].
 | `zstd -10`       | `3.523` | `16.4 MB/s`       | `489.2 MB/s`   |
 | `zstd -10 --long`| `3.566` | `16.2 MB/s`       | `415.7 MB/s`   |
-#### zstdgrep
+
 ### zstdgrep
 `zstdgrep` is a utility which makes it possible to `grep` directly a `.zst` compressed file.
 It's used the same way as normal `grep`, for example :
 `zstdgrep pattern file.zst`
 `zstdgrep` is _not_ compatible with dictionary compression.
 `zstdgrep` does not support the following grep options
 ```
 --dereference-recursive (-R)
    --directories (-d)
    --exclude
    --exclude-from
    --exclude-dir
    --include
    --null (-Z),
    --null-data (-z)
    --recursive (-r)
 ```
 To search into a file compressed with a dictionary,
 it's necessary to decompress it using `zstd` or `zstdcat`,