Use a switch statement to select the search function instead of an
indirect function call. This results in a sizable performance win.
This PR is a modification of the approach taken in PR #2828.
When I measured performance for that commit, it was neutral.
However, I now see a performance regression on gcc, but still
neutral on clang. I'm measuring on the same platform, but with
newer compilers. The new approach beats both the current dev
branch and the baseline before PR #2828 was merged.
This PR is necessary for Issue #3275, to update zstd in the kernel.
Without this PR there is a large regression in greedy - btlazy2
compression speed. With this PR it is about neutral.
gcc version: 12.2.0
clang version: 14.0.6
dataset: silesia.tar
| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta |
|----------|-------|------------------|-----------------|--------|
| gcc | 5 | 102.6 | 113.7 | +10.8% |
| gcc | 7 | 66.6 | 74.8 | +12.3% |
| gcc | 9 | 51.5 | 58.9 | +14.3% |
| gcc | 13 | 14.3 | 14.3 | +0.0% |
| clang | 5 | 108.1 | 114.8 | +6.2% |
| clang | 7 | 68.5 | 72.3 | +5.5% |
| clang | 9 | 53.2 | 56.2 | +5.6% |
| clang | 13 | 14.3 | 14.7 | +2.8% |
The binary size stays just about the same for clang and gcc, measured
using the `size` command:
| Compiler | Branch | Text | Data | BSS | Total |
|----------|--------|---------|------|-----|---------|
| gcc | dev | 1127950 | 3312 | 280 | 1131542 |
| gcc | PR | 1123422 | 2512 | 280 | 1126214 |
| clang | dev | 1046254 | 3256 | 216 | 1049726 |
| clang | PR | 1048198 | 2296 | 216 | 1050710 |
Currently this function actually reads the dict ID from the dictionary's
header, via `ZSTD_getDictID_fromDict()`. But during decompression the decomp-
ressor actually compares the dict ID in the frame header with the dict ID in
the DDict. Now of course the dict ID in the dictionary contents and the dict
ID in the DDict struct *should* be the same. But in cases of memory corrupt-
ion, where they can drift out of sync, it's misleading for this function to
read it again from the dict buffer rather then return the dict ID that will
actually be used.
Also doing it this way avoids rechecking the magic and so on and so it is a
tiny bit more efficient.
Clang doesn't allow [[deprecated(...)]] attribute after __attribute__.
Move [[deprecated(...)]] before __attribute__ to fix C++14/C++17 uses
with Clang.
Fix#3250
Comparing 4B instead of comparing 1B in ZSTD_noDict
mode, thus it can avoid cases like match in match[ml]
but mismatch in match[ml-3]..match[ml-1]. So the call
count of ZSTD_count can be reduced.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I3449ea423d5c8e8344f75341f19a2d1643c703f6
Fixes#3212.
Long literal and match lengths had an off-by-one error in ZSTD_getSequenceLength.
Fix the off-by-one error, and add a golden compression test that catches the bug.
Also run all the golden tests in the cli-tests framework.
The discussion for this task is here: facebook/zstd#3128.
This task can probably be scoped to the first part: marking these functions deprecated.
We'll later look at removal when we roll out v1.6.0.
With statistic data of test data files of silesia
the chance of position beyond highThreshold is very
low (~1.3%@L8 in most cases, all <2.5%), and is in
"lowprob area". Add the branch hint so compiler can
get better pipiline codegen.
With this change it is observed ~1% of mozilla and
xml, and slight (0.3%~0.8%) but consistent uplift on
other files on Arm N1.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade
match is used for following sequence copy. It is
only updated when extDict is needed, which is a
low probability case. So it can be prefetched to
reduce cache miss.
The benchmarks on various Arm platforms showed
uplift from 1% ~ 14% with gcc-11/clang-14.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: If201af4799d2455d74c79f8387404439d7f684ae
* Intial commit to address 3090. Added support to decompress empty block
* Update zstd_decompress_block.c
Addressed review comments for the case of 'set_basic'
* Update lib/decompress/zstd_decompress_block.c
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
* Update lib/decompress/zstd_decompress_block.c
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>