2037 Commits

Author SHA1 Message Date
Nick Terrell
014bbb29f8
Merge pull request #2898 from terrelln/issue-2862
Improve zstd_opt build speed and size
2021-12-02 19:49:43 -05:00
Yann Collet
1bf3d8a475
Merge pull request #2896 from facebook/m68k
Zstandard compiles and run on m68k cpus
2021-12-02 14:25:45 -08:00
Nick Terrell
e5bfaeede7 Improve zstd_opt build speed and size
Use the same trick as we did for zstd_lazy in PR #2828:
* Create one search function specialization for each (dictMode, mls).
* Select the search function pointer at the top of the match finder.

Additionally, we no longer inline `ZSTD_compressBlock_opt_generic` into
every function, since `dictMode` is no longer used as a template. Create
two specializations, for opt levels 0 and 2, and call one of the two
specializations.

Lastly, remove the hack that disabled inlining for zstd_opt for the
Linux Kernel, as we've gotten most of the benefit already.

Compilation time sees a ~4x reduction:

| Compiler | Flags                            | Dev Time (s) | PR Time (s) | Delta |
|----------|----------------------------------|--------------|-------------|-------|
| gcc      | -O3                              |         10.1 |         2.3 |  -77% |
| gcc      | -O3 -fsanitize=address,undefined |         61.1 |        10.2 |  -83% |
| clang    | -O3                              |          9.0 |         2.1 |  -76% |
| clang    | -O3 -fsanitize=address,undefined |         33.5 |         5.1 |  -84% |

Build size is reduced by 150KB - 200KB:

| Compiler | Dev libzstd.a Size (B) | PR libzstd.a Size (B) | Delta |
|----------|------------------------|-----------------------|-------|
| gcc      |                1327476 |               1177108 |  -11% |
| clang    |                1378324 |               1167780 |  -15% |

There is a <2% speed loss in all cases:

| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta  |
|----------|-------|------------------|-----------------|--------|
| gcc      |    16 |             4.78 |            4.72 | -1.25% |
| gcc      |    17 |             3.49 |            3.46 | -0.85% |
| gcc      |    18 |             2.92 |            2.86 | -2.04% |
| gcc      |    19 |             2.61 |            2.61 |  0.00% |
| clang    |    16 |             4.69 |            4.80 |  2.34% |
| clang    |    17 |             3.53 |            3.49 | -1.13% |
| clang    |    18 |             2.86 |            2.85 | -0.34% |
| clang    |    19 |             2.61 |            2.61 |  0.00% |

Fixes Issue #2862.
2021-12-02 14:19:41 -08:00
Nick Terrell
01ecd6ffc0
Merge pull request #2892 from terrelln/issue-2785
[CircleCI] Fix short-tests-0
2021-12-02 16:20:56 -05:00
Yann Collet
30b9db8ae4 changed macro name to ZSTD_ALIGNOF
for better consistency
2021-12-02 12:57:42 -08:00
Nick Terrell
21e28f5c24
Merge pull request #2891 from supperPants/dev
Fix typos
2021-12-02 13:53:33 -05:00
Yann Collet
39dced092e fix align conditions for huf_compress 2021-12-01 23:02:00 -08:00
Nick Terrell
91f5891dd0 [CircleCI] Fix short-tests-0
short-tests-0 were silently failing. I think because of the && make clean construction. Switch to ; instead.

Also fix all the test failures that were exposed.

`make all` is failing on CircleCI because it is missing Docker. Move that test
to GitHub actions, and switch the pedantic CircleCI test to `make allmost`.
2021-12-01 17:43:46 -08:00
Yann Collet
e89e847820 added alignment test
and fix an incorrect alignment check in cwksp which was failing on m68k
2021-12-01 17:16:36 -08:00
Yann Collet
3f64b31585 Merge branch 'dev' into tomerge2051 2021-12-01 15:29:49 -08:00
Yann Collet
8031dc7a48
Merge pull request #2885 from yoniko/limit-level-32bit-systems
Limit `ZSTD_maxCLevel` to 21 for 32-bit binaries.
2021-12-01 14:19:16 -08:00
supperPants
d4713de5a3 Fix typos. 2021-12-01 22:36:21 +08:00
Nick Terrell
5414dd7978 [bmi2] Add lzcnt and bmi target attributes
* When dynamic dispatching to bmi2 add lzcnt and bmi to the
  TARGET_ATTRIBUTE.
* Centralize the bmi2 TARGET_ATTRIBUTE definition to
  BMI2_TARGET_ATTRIBUTE so we can change it in the future.
* Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't
  be any cases where bmi2 is supported but bmi1 isn't. But, since we are
  using the instruction we should check bmi1 as well.
2021-11-30 17:54:56 -08:00
Yonatan Komornik
ef2cba609d ZSTD_maxCLevel now limited to 21 for 32-bit binaries.
CI tests for constrained memory runs with max level on 32-bit binaries.
2021-11-30 10:31:52 -08:00
Felix Handte
c2c6a4ab40
Merge pull request #2869 from felixhandte/oss-fuzz-fix-41005
Determinism: Avoid Mapping Window into Reserved Indices during Reduction
2021-11-18 10:11:48 -05:00
W. Felix Handte
66079085f0 Determinism: Avoid Mapping Window into Reserved Indices during Reduction
PR #2850 attempted to fix a determinism bug that was uncovered by OSS-Fuzz. It
succeeded in addressing that source of non-determinism, but introduced a new
one: it was possible, when index reduction occurred, to map indices in the
window to the reserved value, which would cause them to be zeroed, potentially
altering parsing of the input.

This PR addresses this issue. It makes sure that the bottom of the window is
always `>= ZSTD_WINDOW_START_INDEX`.

I'm not sure if this makes #2850 redundant. I think it's probably still
valuable to have that protection as well.

Credit to OSS-Fuzz for discovering this issue.
2021-11-17 18:09:18 -05:00
Yann Collet
a37a8df532
Merge pull request #2856 from rex4539/typos
Fix typos
2021-11-17 13:04:30 -08:00
Nick Terrell
b7d899d99d
Merge pull request #2864 from terrelln/linux-opt
[linux-kernel] Don't inline function in zstd_opt.c
2021-11-16 14:13:39 -08:00
Nick Terrell
19eb459da3 [linux-kernel] Don't inline function in zstd_opt.c
The optimal parser is unlikely to be used in the linux kernel in
practice. There is no reason these functions should be force inlined,
since we aren't gaining anything, and are losing build size.

| Compiler | Before (Bytes) | After (Bytes) | Delta (Bytes) |
|----------|----------------|---------------|---------------|
| gcc-11   |        1142090 |        952754 |       -189336 |
| clang-12 |        1228402 |        976290 |       -252112 |

This is a temporary solution pending the resolution of PR #2862 in the
`dev` branch.
2021-11-15 20:37:30 -08:00
Nick Terrell
802ea885ef Reduce function size in fast & dfast
Take the same approach as in PR #2828 [0] to remove functions that force
inline many function bodies and `switch`. Instead, create one function per
"template" combination, and then switch between these functions. This
allows the compiler to break the large function into many small
functions, which generally helps codegen.

Also, in the `extDict` modes when there is no ext-dict, call the top
level function instead of the force inlined one, to save on code size.

I'm specifically doing this because gcc on the parisc architecture doesn't
handle the large function body well, and ends up using a lot of excess
stack space. Outlining these functions fixes it.
2021-11-15 19:05:48 -08:00
Dimitris Apostolou
ebbd675998
Fix typos 2021-11-13 10:04:04 +02:00
W. Felix Handte
48572f52b1 Rewrite Fix to Still Auto-Vectorize 2021-11-09 12:17:03 -05:00
W. Felix Handte
61765cacd0 Avoid Reducing Indices to Reserved Values
Previously, if an index was equal to `reducerValue + 1`, it would get remapped
during index reduction to 1 i.e. `ZSTD_DUBT_UNSORTED_MARK`. This can affect the
parsing of the input slightly, by causing tree nodes to be nullified when they
otherwise wouldn't be. This hardly matters from a correctness or efficiency
perspective, but it does impact determinism.

So this commit changes index reduction to avoid mapping indices to collide with
`ZSTD_DUBT_UNSORTED_MARK`.
2021-11-08 20:03:52 -05:00
senhuang42
384744888e Void out unused functions 2021-11-04 14:32:07 +03:00
binhdvo
b399b47467
Move mingw tests from appveyor to github actions (#2838) 2021-11-02 13:17:55 -04:00
Yann Collet
aba88fa996
Merge pull request #2829 from facebook/ZSTD_DECODER_INTERNAL_BUFFER
minor : change build macro to ZSTD_DECODER_INTERNAL_BUFFER
2021-10-26 10:48:16 -07:00
Yann Collet
2b2a5c449a fix minor cast warning 2021-10-26 08:38:17 -07:00
Yann Collet
518f06b281 added minimum for decoder buffer
also : introduced macro BOUNDED()
2021-10-26 08:21:31 -07:00
Yann Collet
12e177cba8
Merge pull request #2830 from facebook/clevels
separate compression level tables into their own file
2021-10-25 13:35:54 -07:00
Yann Collet
082d6c6775 separate compression level tables into their own files
that's clearer than finding the tables somewhere in the middle of `compress.c`.

Also, down the line, it may potentially allows zstd to feature adjusted tables depending on target cpu.
2021-10-25 08:49:54 -07:00
Nick Terrell
13cad3abb1 [lazy] Speed up compilation times
Speed up compilation times by moving each specialized search function
into its own function. This is faster because compilers can handle many
smaller functions much faster than one gigantic function. The previous
approach generated one giant function with `switch` statements and
inlining to select the implementation.

| Compiler | Flags                               | Dev Time (s) | PR Time (s) | Delta |
|----------|-------------------------------------|--------------|-------------|-------|
| gcc      | -O3                                 |         16.5 |         5.6 |  -66% |
| gcc      | -O3 -g -fsanitize=address,undefined |        158.9 |        38.2 |  -75% |
| clang    | -O3                                 |         36.5 |         5.5 |  -85% |
| clang    | -O3 -g -fsanitize=address,undefined |         27.8 |        17.5 |  -37% |

This also reduces the binary size because the search functions are no
longer inlined into the main body.

| Compiler | Dev libzstd.a Size (B) | PR libzstd.a Size (B) | Delta |
|----------|------------------------|-----------------------|-------|
| gcc      |                1563868 |               1308844 |  -16% |
| clang    |                1924372 |               1376020 |  -28% |

Finally, the performance is not impacted significantly by this change,
in fact we generally see a small speed boost.

| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta |
|----------|-------|------------------|-----------------|-------|
| gcc      |     5 |            110.6 |           110.0 | -0.5% |
| gcc      |     7 |             70.4 |            72.2 | +2.5% |
| gcc      |     9 |             53.2 |            53.5 | +0.5% |
| gcc      |    13 |             12.7 |            12.9 | +1.5% |
| clang    |     5 |            113.9 |           110.4 | -3.0% |
| clang    |     7 |             67.7 |            70.6 | +4.2% |
| clang    |     9 |             51.9 |            52.2 | +0.5% |
| clang    |    13 |             12.4 |            13.3 | +7.2% |

The compression strategy is unmodified in this PR, so the compressed size
should be exactly the same. I may have a follow up PR to slightly improve
the compression ratio, if it doesn't cost too much speed.
2021-10-22 13:45:26 -07:00
Yann Collet
9d62957b31
Merge pull request #2800 from animalize/fix_c89
Fix a C89 error in msvc
2021-10-18 14:32:04 -07:00
Felix Handte
23c1a2d260
Merge pull request #2774 from felixhandte/zstd-dfast-pipelined-single
Pipelined Implementation of ZSTD_dfast
2021-10-13 16:38:43 -04:00
W. Felix Handte
0bfc935add Convert Outer Control Structure to Loop 2021-10-12 13:34:17 -04:00
Nick Terrell
b77d95b053
Merge pull request #2820 from terrelln/nb-compares
[binary-tree] Fix underflow of nbCompares
2021-10-11 09:59:57 -07:00
Nick Terrell
26486db9ab
Merge pull request #2819 from terrelln/ldm-hash-rate-log
[ldm] Fix ZSTD_c_ldmHashRateLog bounds check
2021-10-08 14:58:29 -07:00
Nick Terrell
c6c482fe07 [binary-tree] Fix underflow of nbCompares
Fix underflow of `nbCompares` by switching to an `int` and comparing
`nbCompares > 0`. This is a minimal fix, because I don't want to change
the logic. These loops seem to be doing `nbCompares + 1` comparisons.

The bug was reported by Dan Carpenter and found by Smatch static
checker.

https://lore.kernel.org/all/20211008063704.GA5370@kili/
2021-10-08 13:22:55 -07:00
Nick Terrell
1bbb372e3e [ldm] Fix ZSTD_c_ldmHashRateLog bounds check
There is no minimum value check, so the parameter could be negative.
Switch to the standard pattern of using `BOUNDCHECK()`.

The bug was reported by Dan Carpenter and found by Smatch static
checker.

https://lore.kernel.org/all/20211008063704.GA5370@kili/
2021-10-08 11:17:40 -07:00
Nick Terrell
399644b1f1 [nit] Fix buggy indentation
The bug was reported by Dan Carpenter and found by Smatch static
checker.

https://lore.kernel.org/all/20211008063704.GA5370@kili/
2021-10-08 11:13:11 -07:00
W. Felix Handte
79ca830766 Style: Add Comments to Variables and Move a Couple into the Loop 2021-10-05 16:18:09 -04:00
W. Felix Handte
62536ef7da Simplify DMS Implementation by Removing noDict Support 2021-10-05 14:54:37 -04:00
W. Felix Handte
051b473e7e Fall Back in _extDict to New _noDict Rather than Old Merged Impl 2021-10-05 14:54:37 -04:00
W. Felix Handte
fcab4841aa Nit: Rename Function 2021-10-05 14:54:37 -04:00
W. Felix Handte
47fd762ecc Nit: Unnest Blocks that Don't Declare Anything 2021-10-05 14:54:37 -04:00
W. Felix Handte
2cdfad538c Search One Last Position 2021-10-05 14:54:37 -04:00
W. Felix Handte
6ae44c0db8 Advance Long Index Lookup (+0.5% Speed)
This lookup can be advanced to before the short match check because either way
we will use it (in the next loop iter or in `_search_next_long`).
2021-10-05 14:54:37 -04:00
W. Felix Handte
2ddef7c872 Write Back Advanced Hash in Long Matches as Well (+Ratio)
Since we're now hashing the position ahead even if we find a long match and
don't search that next position, we can write it back into the hashtable even
in long matches. This seems to cost us no speed, and improves compression
ratio slightly!
2021-10-05 14:54:37 -04:00
W. Felix Handte
39f2491bfc Use Look-Ahead Hash for Next Long Check after Short Match (+0.5% Speed)
This costs a little ratio, unfortunately.
2021-10-05 14:54:37 -04:00
W. Felix Handte
db4e1b5479 Hash Long One Position Ahead (+2.5% Speed)
Aside from maybe a latency win in the loop, this means that when we find a
short match, we've already done the hash we need to check the next long match.
2021-10-05 14:54:37 -04:00
W. Felix Handte
a1ac7205d0 Pull Match Found Stuff Out of the Loop 2021-10-05 14:54:37 -04:00