mirror of
https://github.com/facebook/zstd.git
synced 2025-11-30 00:03:21 -05:00
Use the same trick as we did for zstd_lazy in PR #2828: * Create one search function specialization for each (dictMode, mls). * Select the search function pointer at the top of the match finder. Additionally, we no longer inline `ZSTD_compressBlock_opt_generic` into every function, since `dictMode` is no longer used as a template. Create two specializations, for opt levels 0 and 2, and call one of the two specializations. Lastly, remove the hack that disabled inlining for zstd_opt for the Linux Kernel, as we've gotten most of the benefit already. Compilation time sees a ~4x reduction: | Compiler | Flags | Dev Time (s) | PR Time (s) | Delta | |----------|----------------------------------|--------------|-------------|-------| | gcc | -O3 | 10.1 | 2.3 | -77% | | gcc | -O3 -fsanitize=address,undefined | 61.1 | 10.2 | -83% | | clang | -O3 | 9.0 | 2.1 | -76% | | clang | -O3 -fsanitize=address,undefined | 33.5 | 5.1 | -84% | Build size is reduced by 150KB - 200KB: | Compiler | Dev libzstd.a Size (B) | PR libzstd.a Size (B) | Delta | |----------|------------------------|-----------------------|-------| | gcc | 1327476 | 1177108 | -11% | | clang | 1378324 | 1167780 | -15% | There is a <2% speed loss in all cases: | Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta | |----------|-------|------------------|-----------------|--------| | gcc | 16 | 4.78 | 4.72 | -1.25% | | gcc | 17 | 3.49 | 3.46 | -0.85% | | gcc | 18 | 2.92 | 2.86 | -2.04% | | gcc | 19 | 2.61 | 2.61 | 0.00% | | clang | 16 | 4.69 | 4.80 | 2.34% | | clang | 17 | 3.53 | 3.49 | -1.13% | | clang | 18 | 2.86 | 2.85 | -0.34% | | clang | 19 | 2.61 | 2.61 | 0.00% | Fixes Issue #2862.