11297 Commits

Author SHA1 Message Date
Yann Collet
a1e11db08a
Merge pull request #4435 from zijianli1234/dev
add riscv  ci
2025-07-18 18:54:24 -08:00
Yann Collet
afa96bbf25
Merge pull request #4429 from arpadpanyik-arm/convertSequences_Neon
Improve speed of ZSTD_compressSequencesAndLiterals using Neon
2025-07-13 23:52:48 -08:00
Yann Collet
c768d7b94b
Merge pull request #4436 from facebook/dependabot/github_actions/cygwin/cygwin-install-action-6
Bump cygwin/cygwin-install-action from 5 to 6
2025-07-13 23:52:32 -08:00
dependabot[bot]
3ce4d1cba3
Bump cygwin/cygwin-install-action from 5 to 6
Bumps [cygwin/cygwin-install-action](https://github.com/cygwin/cygwin-install-action) from 5 to 6.
- [Release notes](https://github.com/cygwin/cygwin-install-action/releases)
- [Commits](f61179d722...f200932376)

---
updated-dependencies:
- dependency-name: cygwin/cygwin-install-action
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-07-14 06:27:46 +00:00
Yann Collet
9a41990883
Merge pull request #4433 from facebook/vs2025
removed VS2019 runners
2025-07-12 19:44:28 -08:00
ZijianLi
534860c90b add -DMEM_FORCE_MEMORY_ACCESS=0 in CI RVV test 2025-07-13 10:51:08 +08:00
Yann Collet
7325384a68 removed VS2019 runners
replaced by one vs2025 runner,
which is badly named since it still running MSVC 2022,
but it's a good test that  shows that the matrix is able to handle multiple MSVC versions.
2025-07-11 10:29:07 -07:00
Arpad Panyik
703f855734 AArch64: Enable optimized QEMU CI builds
Add missing `-O3` flag to the compilation of AArch64 SVE2 builds
executed by QEMU. This can decrease the CI job runtime considerably.
2025-07-10 18:20:57 +00:00
Arpad Panyik
07cd78d366 AArch64: Add Neon path for convertSequences_noRepcodes
Add a 4-way Neon implementation for the convertSequences_noRepcodes
function. Remove 'static' keywords from all of its implementations to
be able to add unit tests.

Relative performance to Clang-18 using: `./fullbench -b18 -l5 enwik5`

Neoverse-V2   before     after
Clang-18:    100.000%  311.703%
Clang-19:    100.191%  311.714%
Clang-20:    100.181%  311.723%
GCC-13:      107.520%  252.309%
GCC-14:      107.652%  253.158%
GCC-15:      107.674%  253.168%

Cortex-A720   before     after
Clang-18:    100.000%  204.512%
Clang-19:    102.825%  204.600%
Clang-20:    102.807%  204.558%
GCC-13:      110.668%  203.594%
GCC-14:      110.684%  203.978%
GCC-15:      102.864%  204.299%

Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>
2025-07-10 18:20:57 +00:00
Arpad Panyik
8e4400463a Improve ZSTD_get1BlockSummary
Add a faster scalar implementation of ZSTD_get1BlockSummary which
removes the data dependency of the accumulators in the hot loop to
leverage the superscalar potential of recent out-of-order CPUs.
The new algorithm leverages SWAR (SIMD Within A Register) methodology
to exploit the capabilities of 64-bit architectures. It achieves this
by packing two 32-bit data elements into a single 64-bit register,
enabling parallel operations on these subcomponents while ensuring
that the 32-bit boundaries prevent overflow, thereby optimizing
computational efficiency.

Corresponding unit tests are included.

Relative performance to GCC-13 using: `./fullbench -b19 -l5 enwik5`

Neoverse-V2   before     after
GCC-13:      100.000%  290.527%
GCC-14:      100.000%  291.714%
GCC-15:       99.914%  291.495%
Clang-18:    148.072%  264.524%
Clang-19:    148.075%  264.512%
Clang-20:    148.062%  264.490%

Cortex-A720   before     after
GCC-13:      100.000%  235.261%
GCC-14:      101.064%  234.903%
GCC-15:      112.977%  218.547%
Clang-18:    127.135%  180.359%
Clang-19:    127.149%  180.297%
Clang-20:    127.154%  180.260%

Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>
2025-07-10 18:20:49 +00:00
ZijianLi
d04e7944dd add compiler version check. 2025-07-07 23:07:39 +08:00
ZijianLi
2c3f23b018 fix dereferencing type-punned pointer error 2025-06-29 15:36:25 +08:00
ZijianLi
40f64f3493 add riscv rvv ci 2025-06-29 15:33:50 +08:00
Yann Collet
1dbc2e0908
Merge pull request #4414 from arpadpanyik-arm/copy8
AArch64: Use better block COPY8
2025-06-25 07:47:01 -04:00
Yann Collet
3c3b8274c5
Merge pull request #4417 from facebook/dependabot/github_actions/msys2/setup-msys2-2.28.0
Bump msys2/setup-msys2 from 2.27.0 to 2.28.0
2025-06-23 06:32:14 -07:00
dependabot[bot]
7b1b6a0d2d
Bump msys2/setup-msys2 from 2.27.0 to 2.28.0
Bumps [msys2/setup-msys2](https://github.com/msys2/setup-msys2) from 2.27.0 to 2.28.0.
- [Release notes](https://github.com/msys2/setup-msys2/releases)
- [Changelog](https://github.com/msys2/setup-msys2/blob/main/CHANGELOG.md)
- [Commits](61f9e5e925...40677d36a5)

---
updated-dependencies:
- dependency-name: msys2/setup-msys2
  dependency-version: 2.28.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-23 06:24:00 +00:00
Yann Collet
bdceb81271
Merge pull request #4415 from bgilbert/buildtype
meson: drop unused variable
2025-06-21 20:31:26 -07:00
Yann Collet
2e8ec28b30
Merge pull request #4416 from facebook/test_largeDictionary
added test-largeDictionary to dev-long CI script
2025-06-21 12:37:08 -07:00
Yann Collet
2295826266 update tests duration indications 2025-06-21 12:01:07 -07:00
Yann Collet
d77a7b6895 added test-largeDictionary to dev-long CI script 2025-06-21 11:34:10 -07:00
Yann Collet
528132e9a0
Merge pull request #4402 from mugitya03/tests
Release resources in error paths via cleanup
2025-06-21 11:33:44 -07:00
jinyaoguo
878be1c8f0 fix 2025-06-21 13:43:47 -04:00
jinyaoguo
16e13ebdeb delete 2025-06-21 13:03:13 -04:00
jinyaoguo
a74f7fcabd merge 2025-06-21 12:57:12 -04:00
Benjamin Gilbert
a4b9ebcbeb meson: drop unused variable 2025-06-20 23:34:13 -07:00
Arpad Panyik
1e9d2006ae AArch64: Use better block copy8
The vector copy is only necessary for 16-byte blocks on AArch64.

Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8
compiled with "-O3 -march=armv8.2-a+sve2":

                 Clang-19  Clang-20    GCC-14    GCC-15
 1#silesia.tar:   +0.316%   +0.865%   +0.025%   +0.096%
 2#silesia.tar:   +0.689%   +1.374%   +0.027%   +0.065%
 3#silesia.tar:   +0.811%   +1.654%   +0.034%   +0.033%
 4#silesia.tar:   +0.912%   +1.755%   +0.027%   +0.042%
 5#silesia.tar:   +0.995%   +1.826%   +0.062%   +0.094%
 6#silesia.tar:   +0.976%   +1.777%   +0.065%   +0.104%
 7#silesia.tar:   +0.910%   +1.738%   +0.077%   +0.110%
2025-06-20 17:05:41 +00:00
Yann Collet
7eefc22169
Merge pull request #4367 from ClickHouse/cfi
Add unwind information in huf_decompress_amd64.S
2025-06-19 23:41:38 -07:00
Yann Collet
354cede369
Merge pull request #4412 from Cyan4973/rm_bd
remove duplicate
2025-06-19 14:32:32 -07:00
Yann Collet
e315155cc2 removed duplicate
this file is already present as `largeDictionary.c`
2025-06-18 15:07:32 -07:00
Yann Collet
429dc891b2
Merge pull request #4411 from arpadpanyik-arm/hist_sve2
AArch64: Add SVE2 implementation of histogram computation
2025-06-18 13:48:54 -07:00
Yann Collet
2082749775
Merge pull request #4409 from bgilbert/meson-license
meson: use SPDX expression for license
2025-06-16 10:54:43 -07:00
Yann Collet
4255c5ea89
Merge pull request #4408 from mugitya03/MLK-3
Ensure BMK_timedFnState is always freed in benchMem
2025-06-16 09:01:58 -07:00
Benjamin Gilbert
57bd0eb6a7 meson: use SPDX expression for license
This is the format recommended by Meson documentation.
2025-06-14 19:48:40 -07:00
Arpad Panyik
d28a737750 Add unit tests for HIST_count_wksp
The following tests are included:
- Empty input scenario test.
- Workspace size and alignment tests.
- Symbol out-of-range tests.
- Cover multiple input sizes, vary permitted maximum symbol
  values, and include diverse symbol distributions.

These tests verifies count table correctness, maxSymbolValuePtr
updates, and error-handling paths. It enables automated regression
of core histogram logic as well.
2025-06-13 22:55:53 +00:00
jinyaoguo
cad0b72ad8 Ensure BMK_timedFnState is always freed in benchMem
When an error occurs in BMK_isSuccessful_runOutcome, the code
previously skipped the call to BMK_freeTimedFnState(tfs),
leaking the allocated tfs object.
Fiexed by calling BMK_freeTimedFnState(tfs) before goto _cleanOut.
2025-06-12 19:52:58 -04:00
Arpad Panyik
7e4937bc75 AArch64: Add SVE2 implementation of histogram computation
The existing scalar implementation uses a 4-way pipelined histogram
calculation which is very efficient on out-of-order CPUs. However,
this can be further accelerated using the SVE2 HISTSEG instructions -
which compute a histogram for 16 byte chunks in a vector register.

On a system with 128-bit vectors (VL128) we need 16 HISTSEG executions
to compute the histogram for the whole symbol space (0..255) of 16
bytes input. However we can only accumulate 15 of such 16 byte strips
before possible overflow. So we need to extend and save the 8-bit
histogram accumulators to 16-bit after every 240 byte chunks of input.
To store all in registers we would need 32 128-bit registers. Longer
SVE2 vectors could help here, if such machines become available.

The maximum input block size in Zstd is 128 KiB, so 16-bit accumulators
would not be enough. However an LZ pass will prepend the histogram
calculation, so it is impossible (my assumption) to overflow the 16-bit
accumulators.

The symbol distribution is also not uniform, the lower values are more
common, so we used a 3 pass algorithm to prevent stack spilling. In the
first pass we only compute histograms for 64 symbols (4-way SIMD) while
also computing the maximum symbol value. If we have symbol values
larger than 64 we start the second pass to compute the next 96 elements
of the histogram. The final pass calculates the remaining part of the
histogram (256 symbols in total) if needed. This split of histogram
generation gave the best overall results for performance.

This implementation is the best performing of a number of different
cache blocking schemes tested.

Compression uplifts on a Neoverse V2 system, using Zstd-1.5.8
(e26dde3d) as a baseline, compiled with "-O3 -march=armv8.2-a+sve2":

                 Clang-20    GCC-14
 1#silesia.tar:   +6.173%   +5.987%
 2#silesia.tar:   +5.200%   +5.011%
 3#silesia.tar:   +4.332%   +5.031%
 4#silesia.tar:   +2.789%   +3.064%
 5#silesia.tar:   +2.028%   +1.838%
 6#silesia.tar:   +1.562%   +1.340%
 7#silesia.tar:   +1.160%   +0.959%
2025-06-11 12:14:22 +00:00
Yann Collet
5e6bdf5e3d
Merge pull request #4406 from Cyan4973/separate-cmake-tests
cmake CI tests refactor
2025-06-09 15:19:47 -07:00
Yann Collet
9a6fe9a428 remove global variable
overkill and leaky to transport a test result just in one place.
2025-06-09 21:55:06 +00:00
Yann Collet
88bea95d11
Merge pull request #4403 from dloidolt/fix_FUZZ_malloc_rand
fuzz: Fix FUZZ_malloc_rand() to return non-NULL for zero-size allocations
2025-06-09 10:57:59 -07:00
Yann Collet
39c091bc9e
Merge pull request #4397 from xiaoge1001/free
Fix several locations with potential memory leak
2025-06-09 10:06:36 -07:00
shixuantong
de8d9e8914 Fix several locations with potential memory leak 2025-06-09 21:23:23 +08:00
Yann Collet
472acf5d83 fix #4405 2025-06-09 07:24:03 +00:00
Yann Collet
7e0324e124 fixed cmake + windows + visual + clang-cl
by removing processing of resource files in this case
2025-06-09 07:09:51 +00:00
Yann Collet
b6dc2924f8 remove fail-fast so that the outcome of other tests can be observed 2025-06-09 06:59:18 +00:00
Yann Collet
49fe2ec793 refactor: modularize CMakeLists.txt for better maintainability
- Split monolithic 235-line CMakeLists.txt into focused modules
- Main file reduced to 78 lines with clear section organization
- Created 5 specialized modules:
  * ZstdVersion.cmake - CMake policies and version management
  * ZstdOptions.cmake - Build options and platform configuration
  * ZstdDependencies.cmake - External dependency management
  * ZstdBuild.cmake - Build targets and validation
  * ZstdPackage.cmake - Package configuration generation

Benefits:
- Improved readability and maintainability
- Better separation of concerns
- Easier debugging and modification
- Preserved 100% backward compatibility
- All existing build options and targets unchanged

The refactored build system passes all tests and maintains
identical functionality while being much easier to understand
and maintain.
2025-06-09 03:47:33 +00:00
Yann Collet
75abb8bc1c add cmake build test with ZSTD_BUILD_TESTS disabled
should reproduce #4405 and fail
2025-06-09 00:05:19 +00:00
Yann Collet
c826c572cf added macos arm64 tests
and comment out windows arm64 tests due to unacceptably long queue time
2025-06-08 22:40:15 +00:00
Yann Collet
a168ae7232 added windows arm64 runner to cmake tests 2025-06-08 22:19:57 +00:00
Yann Collet
b922774602 refactor CMake tests workflow for readability 2025-06-08 21:44:21 +00:00
Yann Collet
a2dba85fd1 ci: separate cmake tests into dedicated workflow file
- Create new .github/workflows/cmake-tests.yml with all cmake-related jobs
- Move cmake-build-and-test-check, cmake-source-directory-with-spaces, and cmake-visual-2022 jobs
- Remove cmake tests from dev-short-tests.yml to improve organization
- Maintain same trigger conditions and test configurations
- Add dedicated concurrency group for cmake tests

This separation allows cmake tests to run independently and makes
the CI configuration more modular and easier to maintain.
2025-06-08 20:25:25 +00:00