Release notes for 0.9.2-rc1

Run copy_from_upstream
Checkout post-0.9.0 copy_from_upstream fixes
2025-06-23 00:01:22 -04:00 · 2024-01-11 17:54:58 +01:00 · 2024-01-08 11:51:32 -05:00 · 2024-01-08 11:51:32 -05:00 · 2024-01-08 11:51:32 -05:00 · 2024-01-08 11:51:32 -05:00
30 changed files with 578 additions and 136 deletions
--- a/.github/workflows/linux.yml
+++ b/.github/workflows/linux.yml
@ -30,6 +30,7 @@ jobs:
          git config --global user.name "ciuser" && \
          git config --global user.email "ci@openquantumsafe.org" && \
          export LIBOQS_DIR=`pwd` && \
+          git config --global --add safe.directory $LIBOQS_DIR && \
          cd scripts/copy_from_upstream && \
          ! pip3 install -r requirements.txt 2>&1 | grep ERROR && \
          python3 copy_from_upstream.py copy && \
--- a/.travis.yml
+++ b/.travis.yml
@ -1,6 +1,6 @@
 language: c
 before_script:
-  - sudo apt -y install astyle cmake gcc ninja-build libssl-dev python3-pytest python3-pytest-xdist unzip xsltproc doxygen graphviz valgrind
+  - sudo apt update && sudo apt -y install astyle cmake gcc ninja-build libssl-dev python3-pytest python3-pytest-xdist unzip xsltproc doxygen graphviz valgrind
 jobs:
  include:
    - arch: ppc64le         # The IBM Power LXD container based build for OSS only
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -33,7 +33,7 @@ set(CMAKE_C_STANDARD 11)
 set(CMAKE_C_STANDARD_REQUIRED ON)
 set(CMAKE_POSITION_INDEPENDENT_CODE ON)
 set(CMAKE_C_VISIBILITY_PRESET hidden)
-set(OQS_VERSION_TEXT "0.9.0")
+set(OQS_VERSION_TEXT "0.9.2-rc1")
 set(OQS_COMPILE_BUILD_TARGET "${CMAKE_SYSTEM_PROCESSOR}-${CMAKE_HOST_SYSTEM}")
 set(OQS_MINIMAL_GCC_VERSION "7.1.0")
 set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
--- a/RELEASE.md
+++ b/RELEASE.md
@ -1,5 +1,5 @@
-liboqs version 0.9.0
-====================
+liboqs version 0.9.2-rc1
+========================

 About
 -----
@ -28,78 +28,22 @@ liboqs can also be used in the following programming languages via language-spec
 Release notes
 =============

-This is version 0.9.0 of liboqs. It was released on October 12, 2023.
+This is release candidate 1 of version 0.9.2 of liboqs. It was released on January 11, 2024.

-This release features an update to the Classic McEliece KEM, bringing it in line with NIST Round 4. It also adds or updates ARM implementations for Kyber, Dilithium, and Falcon.
+This release is a security release which fixes potential non-constant-time behaviour in Kyber based on https://github.com/pq-crystals/kyber/commit/272125f6acc8e8b6850fd68ceb901a660ff48196

 What's New
 ----------

-This release continues from the 0.8.0 release of liboqs.
+This release continues from the 0.9.1 release of liboqs.

 ### Key encapsulation mechanisms

- Classic McEliece: updated to Round 4 version.
- Kyber: aarch64 implementation updated.
-
-### Digital signature schemes
-
- Dilithium: aarch64 implementation updated.
- Falcon: aarch64 implementation added.
-
-### Other changes
-
- Update algorithm documentation
- Support compilation for Windows on ARM64, Apple mobile, and Android platforms
- Improve resilience of randombytes on Apple systems
-
-Release call
-============
-
-Users of liboqs are invited to join a webinar on Thursday, November 2, 2023, from 12-1pm US Eastern time for information on this release, plans for the next release cycle, and to provide feedback on OQS usage and features.  
-
-The Zoom link for the webinar is: https://uwaterloo.zoom.us/j/98288698086
-
---
+- Kyber: C, AVX2, and aarch64 implementation updated

 Detailed changelog
 ------------------

-* Fix libdir value in liboqs.pc by @vt-alt in https://github.com/open-quantum-safe/liboqs/pull/1496
-* update version and remove CCI triggers by @baentsch in https://github.com/open-quantum-safe/liboqs/pull/1498
-* create deb package and retain as artifact by @baentsch in https://github.com/open-quantum-safe/liboqs/pull/1501
-* README correction to docs path & additional gitignore to macos + vscode by @planetf1 in https://github.com/open-quantum-safe/liboqs/pull/1503
-* Trigger liboqs-python CI via GitHub API by @SWilson4 in https://github.com/open-quantum-safe/liboqs/pull/1507
-* Update Classic McEliece by @praveksharma in https://github.com/open-quantum-safe/liboqs/pull/1470
-* update BIKE documentation by @baentsch in https://github.com/open-quantum-safe/liboqs/pull/1509
-* kyber/dilithium aarch64 pull from pqclean + patches by @bhess in https://github.com/open-quantum-safe/liboqs/pull/1512
-* Pull Falcon updates from PQClean by @dstebila in https://github.com/open-quantum-safe/liboqs/pull/1523
-* Bump XCode by @baentsch in https://github.com/open-quantum-safe/liboqs/pull/1526
-* Update Classic McEliece supression files by @praveksharma in https://github.com/open-quantum-safe/liboqs/pull/1527
-* Bump gitpython from 3.1.30 to 3.1.32 in /scripts/copy_from_upstream by @dependabot in https://github.com/open-quantum-safe/liboqs/pull/1524
-* ci: add CI for android by @res0nance in https://github.com/open-quantum-safe/liboqs/pull/1531
-* re-enable armhf speed testing by @baentsch in https://github.com/open-quantum-safe/liboqs/pull/1535
-* Bump gitpython from 3.1.32 to 3.1.34 in /scripts/copy_from_upstream by @dependabot in https://github.com/open-quantum-safe/liboqs/pull/1538
-* Prefer arc4random on Apple platforms by @res0nance in https://github.com/open-quantum-safe/liboqs/pull/1544
-* Bump gitpython from 3.1.34 to 3.1.35 in /scripts/copy_from_upstream by @dependabot in https://github.com/open-quantum-safe/liboqs/pull/1551
-* Update Classic McEliece suppression files by @praveksharma in https://github.com/open-quantum-safe/liboqs/pull/1541
-* Pull Neon implementation of Falcon from PQClean by @SWilson4 in https://github.com/open-quantum-safe/liboqs/pull/1547
-* ci: add CI for apple mobile platforms by @res0nance in https://github.com/open-quantum-safe/liboqs/pull/1546
-* Add Windows ARM64 support by @res0nance in https://github.com/open-quantum-safe/liboqs/pull/1545
-* Document Falcon constant time errors by @praveksharma in https://github.com/open-quantum-safe/liboqs/pull/1552
-* ci: github actions CI for Windows x86 and x64 by @res0nance in https://github.com/open-quantum-safe/liboqs/pull/1554
-* build: Align VS test folder with all other Generators by @res0nance in https://github.com/open-quantum-safe/liboqs/pull/1557
-* Fix weekly.yml to skip McEliece by @praveksharma in https://github.com/open-quantum-safe/liboqs/pull/1562
-* Enable extensions in constant-time tests by @SWilson4 in https://github.com/open-quantum-safe/liboqs/pull/1567
-* Update Classic McEliece supression files by @praveksharma in https://github.com/open-quantum-safe/liboqs/pull/1568
-* liboqs 0.9.0 release candidate 1 by @SWilson4 in https://github.com/open-quantum-safe/liboqs/pull/1570
-* add community standard documentation [skip ci] by @baentsch in https://github.com/open-quantum-safe/liboqs/pull/1565
-* Bump gitpython from 3.1.35 to 3.1.37 in /scripts/copy_from_upstream by @dependabot in https://github.com/open-quantum-safe/liboqs/pull/1575
+* Pull Kyber division fixes from PQ-Crystals into dev-092 by @praveksharma in https://github.com/open-quantum-safe/liboqs/pull/1652

-## New Contributors
-* @planetf1 made their first contribution in https://github.com/open-quantum-safe/liboqs/pull/1503
-* @SWilson4 made their first contribution in https://github.com/open-quantum-safe/liboqs/pull/1507
-* @praveksharma made their first contribution in https://github.com/open-quantum-safe/liboqs/pull/1470
-* @res0nance made their first contribution in https://github.com/open-quantum-safe/liboqs/pull/1531
-
-**Full Changelog**: https://github.com/open-quantum-safe/liboqs/compare/0.8.0...0.9.0
+**Full Changelog**: https://github.com/open-quantum-safe/liboqs/compare/0.9.1...0.9.2-rc1
--- a/docs/algorithms/kem/classic_mceliece.md
+++ b/docs/algorithms/kem/classic_mceliece.md
@ -14,7 +14,7 @@
 ## Advisories

 - Classic-McEliece-460896, Classic-McEliece-460896f, Classic-McEliece-6960119, and Classic-McEliece-6960119f parameter sets fail memory leak testing on x86-64 when building with ``clang`` using optimization level ``-O2`` and ``-O3``. Care is advised when using the algorithm at higher optimization levels, and any other compiler and architecture.
- Current implementation of the algorithm may not be constant-time. Additionally, environment specific constant-time leaks may not be documented; please report potential constant-time leaks when found. 
+- Current implementation of the algorithm may not be constant-time. Additionally, environment specific constant-time leaks may not be documented; please report potential constant-time leaks when found.

 ## Parameter set summary

@ -35,8 +35,8 @@

 |       Implementation source       | Identifier in upstream   | Supported architecture(s)   | Supported operating system(s)   | CPU extension(s) used   | No branching-on-secrets claimed?   | No branching-on-secrets checked by valgrind?   | Large stack usage?‡   |
 |:---------------------------------:|:-------------------------|:----------------------------|:--------------------------------|:------------------------|:-----------------------------------|:-----------------------------------------------|:----------------------|
-| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | True                               | True                                           | True                  |
-| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT             | False                              | True                                           | True                  |
+| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | False                              | False                                          | True                  |
+| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT             | False                              | False                                          | True                  |

 Are implementations chosen based on runtime CPU feature detection? **Yes**.

@ -46,8 +46,8 @@ Are implementations chosen based on runtime CPU feature detection? **Yes**.

 |       Implementation source       | Identifier in upstream   | Supported architecture(s)   | Supported operating system(s)   | CPU extension(s) used   | No branching-on-secrets claimed?   | No branching-on-secrets checked by valgrind?   | Large stack usage?   |
 |:---------------------------------:|:-------------------------|:----------------------------|:--------------------------------|:------------------------|:-----------------------------------|:-----------------------------------------------|:---------------------|
-| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | True                               | True                                           | True                 |
-| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT,BMI1        | False                              | True                                           | True                 |
+| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | False                              | False                                          | True                 |
+| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT,BMI1        | False                              | False                                          | True                 |

 Are implementations chosen based on runtime CPU feature detection? **Yes**.

@ -55,8 +55,8 @@ Are implementations chosen based on runtime CPU feature detection? **Yes**.

 |       Implementation source       | Identifier in upstream   | Supported architecture(s)   | Supported operating system(s)   | CPU extension(s) used   | No branching-on-secrets claimed?   | No branching-on-secrets checked by valgrind?   | Large stack usage?   |
 |:---------------------------------:|:-------------------------|:----------------------------|:--------------------------------|:------------------------|:-----------------------------------|:-----------------------------------------------|:---------------------|
-| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | True                               | True                                           | True                 |
-| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT             | False                              | True                                           | True                 |
+| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | False                              | False                                          | True                 |
+| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT             | False                              | False                                          | True                 |

 Are implementations chosen based on runtime CPU feature detection? **Yes**.

@ -64,8 +64,8 @@ Are implementations chosen based on runtime CPU feature detection? **Yes**.

 |       Implementation source       | Identifier in upstream   | Supported architecture(s)   | Supported operating system(s)   | CPU extension(s) used   | No branching-on-secrets claimed?   | No branching-on-secrets checked by valgrind?   | Large stack usage?   |
 |:---------------------------------:|:-------------------------|:----------------------------|:--------------------------------|:------------------------|:-----------------------------------|:-----------------------------------------------|:---------------------|
-| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | True                               | True                                           | True                 |
-| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT,BMI1        | False                              | True                                           | True                 |
+| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | False                              | False                                          | True                 |
+| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT,BMI1        | False                              | False                                          | True                 |

 Are implementations chosen based on runtime CPU feature detection? **Yes**.

@ -73,8 +73,8 @@ Are implementations chosen based on runtime CPU feature detection? **Yes**.

 |       Implementation source       | Identifier in upstream   | Supported architecture(s)   | Supported operating system(s)   | CPU extension(s) used   | No branching-on-secrets claimed?   | No branching-on-secrets checked by valgrind?   | Large stack usage?   |
 |:---------------------------------:|:-------------------------|:----------------------------|:--------------------------------|:------------------------|:-----------------------------------|:-----------------------------------------------|:---------------------|
-| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | True                               | True                                           | True                 |
-| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT             | False                              | True                                           | True                 |
+| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | False                              | False                                          | True                 |
+| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT             | False                              | False                                          | True                 |

 Are implementations chosen based on runtime CPU feature detection? **Yes**.

@ -82,8 +82,8 @@ Are implementations chosen based on runtime CPU feature detection? **Yes**.

 |       Implementation source       | Identifier in upstream   | Supported architecture(s)   | Supported operating system(s)   | CPU extension(s) used   | No branching-on-secrets claimed?   | No branching-on-secrets checked by valgrind?   | Large stack usage?   |
 |:---------------------------------:|:-------------------------|:----------------------------|:--------------------------------|:------------------------|:-----------------------------------|:-----------------------------------------------|:---------------------|
-| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | True                               | True                                           | True                 |
-| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT,BMI1        | False                              | True                                           | True                 |
+| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | False                              | False                                          | True                 |
+| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT,BMI1        | False                              | False                                          | True                 |

 Are implementations chosen based on runtime CPU feature detection? **Yes**.

@ -91,8 +91,8 @@ Are implementations chosen based on runtime CPU feature detection? **Yes**.

 |       Implementation source       | Identifier in upstream   | Supported architecture(s)   | Supported operating system(s)   | CPU extension(s) used   | No branching-on-secrets claimed?   | No branching-on-secrets checked by valgrind?   | Large stack usage?   |
 |:---------------------------------:|:-------------------------|:----------------------------|:--------------------------------|:------------------------|:-----------------------------------|:-----------------------------------------------|:---------------------|
-| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | True                               | True                                           | True                 |
-| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT             | False                              | True                                           | True                 |
+| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | False                              | False                                          | True                 |
+| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT             | False                              | False                                          | True                 |

 Are implementations chosen based on runtime CPU feature detection? **Yes**.

@ -100,8 +100,8 @@ Are implementations chosen based on runtime CPU feature detection? **Yes**.

 |       Implementation source       | Identifier in upstream   | Supported architecture(s)   | Supported operating system(s)   | CPU extension(s) used   | No branching-on-secrets claimed?   | No branching-on-secrets checked by valgrind?   | Large stack usage?   |
 |:---------------------------------:|:-------------------------|:----------------------------|:--------------------------------|:------------------------|:-----------------------------------|:-----------------------------------------------|:---------------------|
-| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | True                               | True                                           | True                 |
-| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT,BMI1        | False                              | True                                           | True                 |
+| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | False                              | False                                          | True                 |
+| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT,BMI1        | False                              | False                                          | True                 |

 Are implementations chosen based on runtime CPU feature detection? **Yes**.

@ -109,8 +109,8 @@ Are implementations chosen based on runtime CPU feature detection? **Yes**.

 |       Implementation source       | Identifier in upstream   | Supported architecture(s)   | Supported operating system(s)   | CPU extension(s) used   | No branching-on-secrets claimed?   | No branching-on-secrets checked by valgrind?   | Large stack usage?   |
 |:---------------------------------:|:-------------------------|:----------------------------|:--------------------------------|:------------------------|:-----------------------------------|:-----------------------------------------------|:---------------------|
-| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | True                               | True                                           | True                 |
-| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT             | False                              | True                                           | True                 |
+| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | False                              | False                                          | True                 |
+| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT             | False                              | False                                          | True                 |

 Are implementations chosen based on runtime CPU feature detection? **Yes**.

@ -118,8 +118,8 @@ Are implementations chosen based on runtime CPU feature detection? **Yes**.

 |       Implementation source       | Identifier in upstream   | Supported architecture(s)   | Supported operating system(s)   | CPU extension(s) used   | No branching-on-secrets claimed?   | No branching-on-secrets checked by valgrind?   | Large stack usage?   |
 |:---------------------------------:|:-------------------------|:----------------------------|:--------------------------------|:------------------------|:-----------------------------------|:-----------------------------------------------|:---------------------|
-| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | True                               | True                                           | True                 |
-| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT,BMI1        | False                              | True                                           | True                 |
+| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | False                              | False                                          | True                 |
+| [Primary Source](#primary-source) | avx2                     | x86\_64                     | Linux,Darwin                    | AVX2,POPCNT,BMI1        | False                              | False                                          | True                 |

 Are implementations chosen based on runtime CPU feature detection? **Yes**.

--- a/docs/algorithms/kem/classic_mceliece.yml
+++ b/docs/algorithms/kem/classic_mceliece.yml
@ -26,7 +26,9 @@ advisories:
  building with ``clang`` using optimization level ``-O2`` and ``-O3``. Care is advised
  when using the algorithm at higher optimization levels, and any other compiler and
  architecture.
- Current implementation of the algorithm may not be constant-time. Additionally, environment specific constant-time leaks may not be documented; please report potential constant-time leaks when found. 
+- Current implementation of the algorithm may not be constant-time. Additionally,
+  environment specific constant-time leaks may not be documented; please report potential
+  constant-time leaks when found.
 parameter-sets:
 - name: Classic-McEliece-348864
  claimed-nist-level: 1
--- a/docs/algorithms/kem/kyber.md
+++ b/docs/algorithms/kem/kyber.md
@ -7,9 +7,9 @@
 - **Authors' website**: https://pq-crystals.org/
 - **Specification version**: NIST Round 3 submission.
 - **Primary Source**<a name="primary-source"></a>:
-  - **Source**: https://github.com/pq-crystals/kyber/commit/518de2414a85052bb91349bcbcc347f391292d5b with copy_from_upstream patches
+  - **Source**: https://github.com/pq-crystals/kyber/commit/b628ba78711bc28327dc7d2d5c074a00f061884e with copy_from_upstream patches
  - **Implementation license (SPDX-Identifier)**: CC0-1.0 or Apache-2.0
- **Optimized Implementation sources**: https://github.com/pq-crystals/kyber/commit/518de2414a85052bb91349bcbcc347f391292d5b with copy_from_upstream patches
+- **Optimized Implementation sources**: https://github.com/pq-crystals/kyber/commit/b628ba78711bc28327dc7d2d5c074a00f061884e with copy_from_upstream patches
  - **pqclean-aarch64**:<a name="pqclean-aarch64"></a>
      - **Source**: https://github.com/PQClean/PQClean/commit/8e220a87308154d48fdfac40abbb191ac7fce06a with copy_from_upstream patches
      - **Implementation license (SPDX-Identifier)**: CC0-1.0 and (CC0-1.0 or Apache-2.0) and (CC0-1.0 or MIT) and MIT
--- a/docs/algorithms/kem/kyber.yml
+++ b/docs/algorithms/kem/kyber.yml
@ -17,7 +17,7 @@ website: https://pq-crystals.org/
 nist-round: 3
 spec-version: NIST Round 3 submission
 primary-upstream:
-  source: https://github.com/pq-crystals/kyber/commit/518de2414a85052bb91349bcbcc347f391292d5b
+  source: https://github.com/pq-crystals/kyber/commit/b628ba78711bc28327dc7d2d5c074a00f061884e
    with copy_from_upstream patches
  spdx-license-identifier: CC0-1.0 or Apache-2.0
 optimized-upstreams:
--- a/docs/algorithms/sig/falcon.md
+++ b/docs/algorithms/sig/falcon.md
@ -22,7 +22,7 @@

 |       Implementation source       | Identifier in upstream   | Supported architecture(s)   | Supported operating system(s)   | CPU extension(s) used   | No branching-on-secrets claimed?   | No branching-on-secrets checked by valgrind?   | Large stack usage?‡   |
 |:---------------------------------:|:-------------------------|:----------------------------|:--------------------------------|:------------------------|:-----------------------------------|:-----------------------------------------------|:----------------------|
-| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | False                              | False                                          | False                 |
+| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | True                               | True                                           | False                 |
 | [Primary Source](#primary-source) | avx2                     | x86\_64                     | All                             | AVX2                    | False                              | False                                          | False                 |
 | [Primary Source](#primary-source) | aarch64                  | ARM64\_V8                   | Linux,Darwin                    | None                    | False                              | False                                          | False                 |

@ -34,7 +34,7 @@ Are implementations chosen based on runtime CPU feature detection? **Yes**.

 |       Implementation source       | Identifier in upstream   | Supported architecture(s)   | Supported operating system(s)   | CPU extension(s) used   | No branching-on-secrets claimed?   | No branching-on-secrets checked by valgrind?   | Large stack usage?   |
 |:---------------------------------:|:-------------------------|:----------------------------|:--------------------------------|:------------------------|:-----------------------------------|:-----------------------------------------------|:---------------------|
-| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | False                              | False                                          | False                |
+| [Primary Source](#primary-source) | clean                    | All                         | All                             | None                    | True                               | True                                           | False                |
 | [Primary Source](#primary-source) | avx2                     | x86\_64                     | All                             | AVX2                    | False                              | False                                          | False                |
 | [Primary Source](#primary-source) | aarch64                  | ARM64\_V8                   | Linux,Darwin                    | None                    | False                              | False                                          | False                |

--- a/scripts/copy_from_upstream/copy_from_upstream.py
+++ b/scripts/copy_from_upstream/copy_from_upstream.py
@ -611,8 +611,6 @@ def copy_from_upstream():
    for t in ["kem", "sig"]:
        with open(os.path.join(os.environ['LIBOQS_DIR'], 'tests', 'KATs', t, 'kats.json'), "w") as f:
            json.dump(kats[t], f, indent=2, sort_keys=True)
-    if not keepdata:
-        shutil.rmtree('repos')

    update_upstream_alg_docs.do_it(os.environ['LIBOQS_DIR'])

@ -622,6 +620,10 @@ def copy_from_upstream():
    update_docs_from_yaml.do_it(os.environ['LIBOQS_DIR'])
    update_cbom.update_cbom_if_algs_not_changed(os.environ['LIBOQS_DIR'], "git")

+    if not keepdata:
+        shutil.rmtree('repos')
+
+
 def verify_from_upstream():
    instructions = load_instructions()
    basedir = "verify_from_upstream"
--- a/scripts/copy_from_upstream/copy_from_upstream.yml
+++ b/scripts/copy_from_upstream/copy_from_upstream.yml
@ -8,13 +8,13 @@ upstreams:
    sig_meta_path: 'crypto_sign/{pqclean_scheme}/META.yml'
    kem_scheme_path: 'crypto_kem/{pqclean_scheme}'
    sig_scheme_path: 'crypto_sign/{pqclean_scheme}'
-    patches: [pqclean-sphincs.patch, pqclean-dilithium-arm-randomized-signing.patch, pqclean-kyber-armneon-shake-fixes.patch, pqclean-kyber-armneon-768-1024-fixes.patch]
+    patches: [pqclean-sphincs.patch, pqclean-dilithium-arm-randomized-signing.patch, pqclean-kyber-armneon-shake-fixes.patch, pqclean-kyber-armneon-768-1024-fixes.patch, pqclean-kyber-armneon-variable-timing-fix.patch]
    ignore: pqclean_sphincs-shake-256s-simple_aarch64, pqclean_sphincs-shake-256s-simple_aarch64, pqclean_sphincs-shake-256f-simple_aarch64, pqclean_sphincs-shake-192s-simple_aarch64, pqclean_sphincs-shake-192f-simple_aarch64, pqclean_sphincs-shake-128s-simple_aarch64, pqclean_sphincs-shake-128f-simple_aarch64
  -
    name: pqcrystals-kyber
    git_url: https://github.com/pq-crystals/kyber.git
    git_branch: master
-    git_commit: 518de2414a85052bb91349bcbcc347f391292d5b
+    git_commit: b628ba78711bc28327dc7d2d5c074a00f061884e
    kem_meta_path: '{pretty_name_full}_META.yml'
    kem_scheme_path: '.'
    patches: [pqcrystals-kyber-yml.patch, pqcrystals-kyber-ref-shake-aes.patch, pqcrystals-kyber-avx2-shake-aes.patch]
--- a/scripts/copy_from_upstream/patches/pqclean-kyber-armneon-variable-timing-fix.patch
+++ b/scripts/copy_from_upstream/patches/pqclean-kyber-armneon-variable-timing-fix.patch
@ -0,0 +1,274 @@
+927a0eff4a45781218062953002001af4e6a5c8a
+diff --git a/crypto_kem/kyber1024/aarch64/poly.c b/crypto_kem/kyber1024/aarch64/poly.c
+index 1dfa52c..3115d1c 100644
+--- a/crypto_kem/kyber1024/aarch64/poly.c
+++ b/crypto_kem/kyber1024/aarch64/poly.c
+@@ -51,6 +51,7 @@
+ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const int16_t a[KYBER_N]) {
+     unsigned int i, j;
+     int16_t u;
+    uint32_t d0;
+     uint8_t t[8];
+ 
+     for (i = 0; i < KYBER_N / 8; i++) {
+@@ -58,7 +59,12 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const int16_t a[KYBER_N
+             // map to positive standard representatives
+             u  = a[8 * i + j];
+             u += (u >> 15) & KYBER_Q;
+-            t[j] = ((((uint32_t)u << 5) + KYBER_Q / 2) / KYBER_Q) & 31;
+            // t[j] = ((((uint32_t)u << 5) + KYBER_Q / 2) / KYBER_Q) & 31;
+            d0 = u << 5;
+            d0 += 1664;
+            d0 *= 40318;
+            d0 >>= 27;
+            t[j] = d0 & 0x1f;
+         }
+ 
+         r[0] = (t[0] >> 0) | (t[1] << 5);
+@@ -207,14 +213,19 @@ void poly_frommsg(int16_t r[KYBER_N], const uint8_t msg[KYBER_INDCPA_MSGBYTES])
+ **************************************************/
+ void poly_tomsg(uint8_t msg[KYBER_INDCPA_MSGBYTES], const int16_t a[KYBER_N]) {
+     unsigned int i, j;
+-    uint16_t t;
+    uint32_t t;
+ 
+     for (i = 0; i < KYBER_N / 8; i++) {
+         msg[i] = 0;
+         for (j = 0; j < 8; j++) {
+             t  = a[8 * i + j];
+-            t += ((int16_t)t >> 15) & KYBER_Q;
+-            t  = (((t << 1) + KYBER_Q / 2) / KYBER_Q) & 1;
+            // t += ((int16_t)t >> 15) & KYBER_Q;
+            // t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;
+            t <<= 1;
+            t += 1665;
+            t *= 80635;
+            t >>= 28;
+            t &= 1;
+             msg[i] |= t << j;
+         }
+     }
+diff --git a/crypto_kem/kyber1024/aarch64/polyvec.c b/crypto_kem/kyber1024/aarch64/polyvec.c
+index d400348..f9a1ebf 100644
+--- a/crypto_kem/kyber1024/aarch64/polyvec.c
+++ b/crypto_kem/kyber1024/aarch64/polyvec.c
+@@ -21,6 +21,7 @@
+ **************************************************/
+ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K][KYBER_N]) {
+     unsigned int i, j, k;
+    uint64_t d0;
+ 
+     #if (KYBER_POLYVECCOMPRESSEDBYTES == (KYBER_K * 352))
+     uint16_t t[8];
+@@ -29,7 +30,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K
+             for (k = 0; k < 8; k++) {
+                 t[k]  = a[i][8 * j + k];
+                 t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
+-                t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q / 2) / KYBER_Q) & 0x7ff;
+                // t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q / 2) / KYBER_Q) & 0x7ff;
+                d0 = t[k];
+                d0 <<= 11;
+                d0 += 1664;
+                d0 *= 645084;
+                d0 >>= 31;
+                t[k] = d0 & 0x7ff;
+             }
+ 
+             r[ 0] = (t[0] >>  0);
+@@ -53,7 +60,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K
+             for (k = 0; k < 4; k++) {
+                 t[k]  = a[i][4 * j + k];
+                 t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
+-                t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q / 2) / KYBER_Q) & 0x3ff;
+                // t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q / 2) / KYBER_Q) & 0x3ff;
+                d0 = t[k];
+                d0 <<= 10;
+                d0 += 1665;
+                d0 *= 1290167;
+                d0 >>= 32;
+                t[k] = d0 & 0x3ff;
+             }
+ 
+             r[0] = (t[0] >> 0);
+diff --git a/crypto_kem/kyber512/aarch64/poly.c b/crypto_kem/kyber512/aarch64/poly.c
+index dffc655..361ce89 100644
+--- a/crypto_kem/kyber512/aarch64/poly.c
+++ b/crypto_kem/kyber512/aarch64/poly.c
+@@ -51,6 +51,7 @@
+ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const int16_t a[KYBER_N]) {
+     unsigned int i, j;
+     int16_t u;
+    uint32_t d0;
+     uint8_t t[8];
+ 
+     for (i = 0; i < KYBER_N / 8; i++) {
+@@ -58,7 +59,12 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const int16_t a[KYBER_N
+             // map to positive standard representatives
+             u  = a[8 * i + j];
+             u += (u >> 15) & KYBER_Q;
+-            t[j] = ((((uint16_t)u << 4) + KYBER_Q / 2) / KYBER_Q) & 15;
+            // t[j] = ((((uint16_t)u << 4) + KYBER_Q / 2) / KYBER_Q) & 15;
+            d0 = u << 4;
+            d0 += 1665;
+            d0 *= 80635;
+            d0 >>= 28;
+            t[j] = d0 & 0xf;
+         }
+ 
+         r[0] = t[0] | (t[1] << 4);
+@@ -194,14 +200,19 @@ void poly_frommsg(int16_t r[KYBER_N], const uint8_t msg[KYBER_INDCPA_MSGBYTES])
+ **************************************************/
+ void poly_tomsg(uint8_t msg[KYBER_INDCPA_MSGBYTES], const int16_t a[KYBER_N]) {
+     unsigned int i, j;
+-    uint16_t t;
+    uint32_t t;
+ 
+     for (i = 0; i < KYBER_N / 8; i++) {
+         msg[i] = 0;
+         for (j = 0; j < 8; j++) {
+             t  = a[8 * i + j];
+-            t += ((int16_t)t >> 15) & KYBER_Q;
+-            t  = (((t << 1) + KYBER_Q / 2) / KYBER_Q) & 1;
+            // t += ((int16_t)t >> 15) & KYBER_Q;
+            // t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;
+            t <<= 1;
+            t += 1665;
+            t *= 80635;
+            t >>= 28;
+            t &= 1;
+             msg[i] |= t << j;
+         }
+     }
+diff --git a/crypto_kem/kyber512/aarch64/polyvec.c b/crypto_kem/kyber512/aarch64/polyvec.c
+index d400348..f9a1ebf 100644
+--- a/crypto_kem/kyber512/aarch64/polyvec.c
+++ b/crypto_kem/kyber512/aarch64/polyvec.c
+@@ -21,6 +21,7 @@
+ **************************************************/
+ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K][KYBER_N]) {
+     unsigned int i, j, k;
+    uint64_t d0;
+ 
+     #if (KYBER_POLYVECCOMPRESSEDBYTES == (KYBER_K * 352))
+     uint16_t t[8];
+@@ -29,7 +30,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K
+             for (k = 0; k < 8; k++) {
+                 t[k]  = a[i][8 * j + k];
+                 t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
+-                t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q / 2) / KYBER_Q) & 0x7ff;
+                // t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q / 2) / KYBER_Q) & 0x7ff;
+                d0 = t[k];
+                d0 <<= 11;
+                d0 += 1664;
+                d0 *= 645084;
+                d0 >>= 31;
+                t[k] = d0 & 0x7ff;
+             }
+ 
+             r[ 0] = (t[0] >>  0);
+@@ -53,7 +60,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K
+             for (k = 0; k < 4; k++) {
+                 t[k]  = a[i][4 * j + k];
+                 t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
+-                t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q / 2) / KYBER_Q) & 0x3ff;
+                // t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q / 2) / KYBER_Q) & 0x3ff;
+                d0 = t[k];
+                d0 <<= 10;
+                d0 += 1665;
+                d0 *= 1290167;
+                d0 >>= 32;
+                t[k] = d0 & 0x3ff;
+             }
+ 
+             r[0] = (t[0] >> 0);
+diff --git a/crypto_kem/kyber768/aarch64/poly.c b/crypto_kem/kyber768/aarch64/poly.c
+index dffc655..361ce89 100644
+--- a/crypto_kem/kyber768/aarch64/poly.c
+++ b/crypto_kem/kyber768/aarch64/poly.c
+@@ -51,6 +51,7 @@
+ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const int16_t a[KYBER_N]) {
+     unsigned int i, j;
+     int16_t u;
+    uint32_t d0;
+     uint8_t t[8];
+ 
+     for (i = 0; i < KYBER_N / 8; i++) {
+@@ -58,7 +59,12 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const int16_t a[KYBER_N
+             // map to positive standard representatives
+             u  = a[8 * i + j];
+             u += (u >> 15) & KYBER_Q;
+-            t[j] = ((((uint16_t)u << 4) + KYBER_Q / 2) / KYBER_Q) & 15;
+            // t[j] = ((((uint16_t)u << 4) + KYBER_Q / 2) / KYBER_Q) & 15;
+            d0 = u << 4;
+            d0 += 1665;
+            d0 *= 80635;
+            d0 >>= 28;
+            t[j] = d0 & 0xf;
+         }
+ 
+         r[0] = t[0] | (t[1] << 4);
+@@ -194,14 +200,19 @@ void poly_frommsg(int16_t r[KYBER_N], const uint8_t msg[KYBER_INDCPA_MSGBYTES])
+ **************************************************/
+ void poly_tomsg(uint8_t msg[KYBER_INDCPA_MSGBYTES], const int16_t a[KYBER_N]) {
+     unsigned int i, j;
+-    uint16_t t;
+    uint32_t t;
+ 
+     for (i = 0; i < KYBER_N / 8; i++) {
+         msg[i] = 0;
+         for (j = 0; j < 8; j++) {
+             t  = a[8 * i + j];
+-            t += ((int16_t)t >> 15) & KYBER_Q;
+-            t  = (((t << 1) + KYBER_Q / 2) / KYBER_Q) & 1;
+            // t += ((int16_t)t >> 15) & KYBER_Q;
+            // t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;
+            t <<= 1;
+            t += 1665;
+            t *= 80635;
+            t >>= 28;
+            t &= 1;
+             msg[i] |= t << j;
+         }
+     }
+diff --git a/crypto_kem/kyber768/aarch64/polyvec.c b/crypto_kem/kyber768/aarch64/polyvec.c
+index d400348..f9a1ebf 100644
+--- a/crypto_kem/kyber768/aarch64/polyvec.c
+++ b/crypto_kem/kyber768/aarch64/polyvec.c
+@@ -21,6 +21,7 @@
+ **************************************************/
+ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K][KYBER_N]) {
+     unsigned int i, j, k;
+    uint64_t d0;
+ 
+     #if (KYBER_POLYVECCOMPRESSEDBYTES == (KYBER_K * 352))
+     uint16_t t[8];
+@@ -29,7 +30,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K
+             for (k = 0; k < 8; k++) {
+                 t[k]  = a[i][8 * j + k];
+                 t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
+-                t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q / 2) / KYBER_Q) & 0x7ff;
+                // t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q / 2) / KYBER_Q) & 0x7ff;
+                d0 = t[k];
+                d0 <<= 11;
+                d0 += 1664;
+                d0 *= 645084;
+                d0 >>= 31;
+                t[k] = d0 & 0x7ff;
+             }
+ 
+             r[ 0] = (t[0] >>  0);
+@@ -53,7 +60,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K
+             for (k = 0; k < 4; k++) {
+                 t[k]  = a[i][4 * j + k];
+                 t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
+-                t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q / 2) / KYBER_Q) & 0x3ff;
+                // t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q / 2) / KYBER_Q) & 0x3ff;
+                d0 = t[k];
+                d0 <<= 10;
+                d0 += 1665;
+                d0 *= 1290167;
+                d0 >>= 32;
+                t[k] = d0 & 0x3ff;
+             }
+ 
+             r[0] = (t[0] >> 0);
--- a/src/kem/kyber/pqclean_kyber1024_aarch64/poly.c
+++ b/src/kem/kyber/pqclean_kyber1024_aarch64/poly.c
@ -51,6 +51,7 @@
 void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const int16_t a[KYBER_N]) {
    unsigned int i, j;
    int16_t u;
+    uint32_t d0;
    uint8_t t[8];

    for (i = 0; i < KYBER_N / 8; i++) {
@ -58,7 +59,12 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const int16_t a[KYBER_N
            // map to positive standard representatives
            u  = a[8 * i + j];
            u += (u >> 15) & KYBER_Q;
-            t[j] = ((((uint32_t)u << 5) + KYBER_Q / 2) / KYBER_Q) & 31;
+            // t[j] = ((((uint32_t)u << 5) + KYBER_Q / 2) / KYBER_Q) & 31;
+            d0 = u << 5;
+            d0 += 1664;
+            d0 *= 40318;
+            d0 >>= 27;
+            t[j] = d0 & 0x1f;
        }

        r[0] = (t[0] >> 0) | (t[1] << 5);
@ -207,14 +213,19 @@ void poly_frommsg(int16_t r[KYBER_N], const uint8_t msg[KYBER_INDCPA_MSGBYTES])
 **************************************************/
 void poly_tomsg(uint8_t msg[KYBER_INDCPA_MSGBYTES], const int16_t a[KYBER_N]) {
    unsigned int i, j;
-    uint16_t t;
+    uint32_t t;

    for (i = 0; i < KYBER_N / 8; i++) {
        msg[i] = 0;
        for (j = 0; j < 8; j++) {
            t  = a[8 * i + j];
-            t += ((int16_t)t >> 15) & KYBER_Q;
-            t  = (((t << 1) + KYBER_Q / 2) / KYBER_Q) & 1;
+            // t += ((int16_t)t >> 15) & KYBER_Q;
+            // t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;
+            t <<= 1;
+            t += 1665;
+            t *= 80635;
+            t >>= 28;
+            t &= 1;
            msg[i] |= t << j;
        }
    }
--- a/src/kem/kyber/pqclean_kyber1024_aarch64/polyvec.c
+++ b/src/kem/kyber/pqclean_kyber1024_aarch64/polyvec.c
@ -21,6 +21,7 @@
 **************************************************/
 void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K][KYBER_N]) {
    unsigned int i, j, k;
+    uint64_t d0;

    #if (KYBER_POLYVECCOMPRESSEDBYTES == (KYBER_K * 352))
    uint16_t t[8];
@ -29,7 +30,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K
            for (k = 0; k < 8; k++) {
                t[k]  = a[i][8 * j + k];
                t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
-                t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q / 2) / KYBER_Q) & 0x7ff;
+                // t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q / 2) / KYBER_Q) & 0x7ff;
+                d0 = t[k];
+                d0 <<= 11;
+                d0 += 1664;
+                d0 *= 645084;
+                d0 >>= 31;
+                t[k] = d0 & 0x7ff;
            }

            r[ 0] = (t[0] >>  0);
@ -53,7 +60,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K
            for (k = 0; k < 4; k++) {
                t[k]  = a[i][4 * j + k];
                t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
-                t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q / 2) / KYBER_Q) & 0x3ff;
+                // t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q / 2) / KYBER_Q) & 0x3ff;
+                d0 = t[k];
+                d0 <<= 10;
+                d0 += 1665;
+                d0 *= 1290167;
+                d0 >>= 32;
+                t[k] = d0 & 0x3ff;
            }

            r[0] = (t[0] >> 0);
--- a/src/kem/kyber/pqclean_kyber512_aarch64/poly.c
+++ b/src/kem/kyber/pqclean_kyber512_aarch64/poly.c
@ -51,6 +51,7 @@
 void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const int16_t a[KYBER_N]) {
    unsigned int i, j;
    int16_t u;
+    uint32_t d0;
    uint8_t t[8];

    for (i = 0; i < KYBER_N / 8; i++) {
@ -58,7 +59,12 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const int16_t a[KYBER_N
            // map to positive standard representatives
            u  = a[8 * i + j];
            u += (u >> 15) & KYBER_Q;
-            t[j] = ((((uint16_t)u << 4) + KYBER_Q / 2) / KYBER_Q) & 15;
+            // t[j] = ((((uint16_t)u << 4) + KYBER_Q / 2) / KYBER_Q) & 15;
+            d0 = u << 4;
+            d0 += 1665;
+            d0 *= 80635;
+            d0 >>= 28;
+            t[j] = d0 & 0xf;
        }

        r[0] = t[0] | (t[1] << 4);
@ -194,14 +200,19 @@ void poly_frommsg(int16_t r[KYBER_N], const uint8_t msg[KYBER_INDCPA_MSGBYTES])
 **************************************************/
 void poly_tomsg(uint8_t msg[KYBER_INDCPA_MSGBYTES], const int16_t a[KYBER_N]) {
    unsigned int i, j;
-    uint16_t t;
+    uint32_t t;

    for (i = 0; i < KYBER_N / 8; i++) {
        msg[i] = 0;
        for (j = 0; j < 8; j++) {
            t  = a[8 * i + j];
-            t += ((int16_t)t >> 15) & KYBER_Q;
-            t  = (((t << 1) + KYBER_Q / 2) / KYBER_Q) & 1;
+            // t += ((int16_t)t >> 15) & KYBER_Q;
+            // t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;
+            t <<= 1;
+            t += 1665;
+            t *= 80635;
+            t >>= 28;
+            t &= 1;
            msg[i] |= t << j;
        }
    }
--- a/src/kem/kyber/pqclean_kyber512_aarch64/polyvec.c
+++ b/src/kem/kyber/pqclean_kyber512_aarch64/polyvec.c
@ -21,6 +21,7 @@
 **************************************************/
 void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K][KYBER_N]) {
    unsigned int i, j, k;
+    uint64_t d0;

    #if (KYBER_POLYVECCOMPRESSEDBYTES == (KYBER_K * 352))
    uint16_t t[8];
@ -29,7 +30,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K
            for (k = 0; k < 8; k++) {
                t[k]  = a[i][8 * j + k];
                t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
-                t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q / 2) / KYBER_Q) & 0x7ff;
+                // t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q / 2) / KYBER_Q) & 0x7ff;
+                d0 = t[k];
+                d0 <<= 11;
+                d0 += 1664;
+                d0 *= 645084;
+                d0 >>= 31;
+                t[k] = d0 & 0x7ff;
            }

            r[ 0] = (t[0] >>  0);
@ -53,7 +60,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K
            for (k = 0; k < 4; k++) {
                t[k]  = a[i][4 * j + k];
                t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
-                t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q / 2) / KYBER_Q) & 0x3ff;
+                // t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q / 2) / KYBER_Q) & 0x3ff;
+                d0 = t[k];
+                d0 <<= 10;
+                d0 += 1665;
+                d0 *= 1290167;
+                d0 >>= 32;
+                t[k] = d0 & 0x3ff;
            }

            r[0] = (t[0] >> 0);
--- a/src/kem/kyber/pqclean_kyber768_aarch64/poly.c
+++ b/src/kem/kyber/pqclean_kyber768_aarch64/poly.c
@ -51,6 +51,7 @@
 void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const int16_t a[KYBER_N]) {
    unsigned int i, j;
    int16_t u;
+    uint32_t d0;
    uint8_t t[8];

    for (i = 0; i < KYBER_N / 8; i++) {
@ -58,7 +59,12 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const int16_t a[KYBER_N
            // map to positive standard representatives
            u  = a[8 * i + j];
            u += (u >> 15) & KYBER_Q;
-            t[j] = ((((uint16_t)u << 4) + KYBER_Q / 2) / KYBER_Q) & 15;
+            // t[j] = ((((uint16_t)u << 4) + KYBER_Q / 2) / KYBER_Q) & 15;
+            d0 = u << 4;
+            d0 += 1665;
+            d0 *= 80635;
+            d0 >>= 28;
+            t[j] = d0 & 0xf;
        }

        r[0] = t[0] | (t[1] << 4);
@ -194,14 +200,19 @@ void poly_frommsg(int16_t r[KYBER_N], const uint8_t msg[KYBER_INDCPA_MSGBYTES])
 **************************************************/
 void poly_tomsg(uint8_t msg[KYBER_INDCPA_MSGBYTES], const int16_t a[KYBER_N]) {
    unsigned int i, j;
-    uint16_t t;
+    uint32_t t;

    for (i = 0; i < KYBER_N / 8; i++) {
        msg[i] = 0;
        for (j = 0; j < 8; j++) {
            t  = a[8 * i + j];
-            t += ((int16_t)t >> 15) & KYBER_Q;
-            t  = (((t << 1) + KYBER_Q / 2) / KYBER_Q) & 1;
+            // t += ((int16_t)t >> 15) & KYBER_Q;
+            // t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;
+            t <<= 1;
+            t += 1665;
+            t *= 80635;
+            t >>= 28;
+            t &= 1;
            msg[i] |= t << j;
        }
    }
--- a/src/kem/kyber/pqclean_kyber768_aarch64/polyvec.c
+++ b/src/kem/kyber/pqclean_kyber768_aarch64/polyvec.c
@ -21,6 +21,7 @@
 **************************************************/
 void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K][KYBER_N]) {
    unsigned int i, j, k;
+    uint64_t d0;

    #if (KYBER_POLYVECCOMPRESSEDBYTES == (KYBER_K * 352))
    uint16_t t[8];
@ -29,7 +30,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K
            for (k = 0; k < 8; k++) {
                t[k]  = a[i][8 * j + k];
                t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
-                t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q / 2) / KYBER_Q) & 0x7ff;
+                // t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q / 2) / KYBER_Q) & 0x7ff;
+                d0 = t[k];
+                d0 <<= 11;
+                d0 += 1664;
+                d0 *= 645084;
+                d0 >>= 31;
+                t[k] = d0 & 0x7ff;
            }

            r[ 0] = (t[0] >>  0);
@ -53,7 +60,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], int16_t a[KYBER_K
            for (k = 0; k < 4; k++) {
                t[k]  = a[i][4 * j + k];
                t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
-                t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q / 2) / KYBER_Q) & 0x3ff;
+                // t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q / 2) / KYBER_Q) & 0x3ff;
+                d0 = t[k];
+                d0 <<= 10;
+                d0 += 1665;
+                d0 *= 1290167;
+                d0 >>= 32;
+                t[k] = d0 & 0x3ff;
            }

            r[0] = (t[0] >> 0);
--- a/src/kem/kyber/pqcrystals-kyber_kyber1024_avx2/verify.c
+++ b/src/kem/kyber/pqcrystals-kyber_kyber1024_avx2/verify.c
@ -57,6 +57,16 @@ void cmov(uint8_t * restrict r, const uint8_t *x, size_t len, uint8_t b)
  size_t i;
  __m256i xvec, rvec, bvec;

+#if defined(__GNUC__) || defined(__clang__)
+  // Prevent the compiler from
+  //    1) inferring that b is 0/1-valued, and
+  //    2) handling the two cases with a branch.
+  // This is not necessary when verify.c and kem.c are separate translation
+  // units, but we expect that downstream consumers will copy this code and/or
+  // change how it is built.
+  __asm__("" : "+r"(b) : /* no inputs */);
+#endif
+
  bvec = _mm256_set1_epi64x(-(uint64_t)b);
  for(i=0;i<len/32;i++) {
    rvec = _mm256_loadu_si256((__m256i *)&r[32*i]);
--- a/src/kem/kyber/pqcrystals-kyber_kyber1024_ref/poly.c
+++ b/src/kem/kyber/pqcrystals-kyber_kyber1024_ref/poly.c
@ -19,6 +19,7 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const poly *a)
 {
  unsigned int i,j;
  int16_t u;
+  uint32_t d0;
  uint8_t t[8];

 #if (KYBER_POLYCOMPRESSEDBYTES == 128)
@ -27,7 +28,12 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const poly *a)
      // map to positive standard representatives
      u  = a->coeffs[8*i+j];
      u += (u >> 15) & KYBER_Q;
-      t[j] = ((((uint16_t)u << 4) + KYBER_Q/2)/KYBER_Q) & 15;
+/*    t[j] = ((((uint16_t)u << 4) + KYBER_Q/2)/KYBER_Q) & 15; */
+      d0 = u << 4;
+      d0 += 1665;
+      d0 *= 80635;
+      d0 >>= 28;
+      t[j] = d0 & 0xf;
    }

    r[0] = t[0] | (t[1] << 4);
@ -42,7 +48,12 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const poly *a)
      // map to positive standard representatives
      u  = a->coeffs[8*i+j];
      u += (u >> 15) & KYBER_Q;
-      t[j] = ((((uint32_t)u << 5) + KYBER_Q/2)/KYBER_Q) & 31;
+/*      t[j] = ((((uint32_t)u << 5) + KYBER_Q/2)/KYBER_Q) & 31; */
+      d0 = u << 5;
+      d0 += 1664;
+      d0 *= 40318;
+      d0 >>= 27;
+      t[j] = d0 & 0x1f;
    }

    r[0] = (t[0] >> 0) | (t[1] << 5);
@ -180,14 +191,19 @@ void poly_frommsg(poly *r, const uint8_t msg[KYBER_INDCPA_MSGBYTES])
 void poly_tomsg(uint8_t msg[KYBER_INDCPA_MSGBYTES], const poly *a)
 {
  unsigned int i,j;
-  uint16_t t;
+  uint32_t t;

  for(i=0;i<KYBER_N/8;i++) {
    msg[i] = 0;
    for(j=0;j<8;j++) {
      t  = a->coeffs[8*i+j];
-      t += ((int16_t)t >> 15) & KYBER_Q;
-      t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;
+      // t += ((int16_t)t >> 15) & KYBER_Q;
+      // t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;
+      t <<= 1;
+      t += 1665;
+      t *= 80635;
+      t >>= 28;
+      t &= 1;
      msg[i] |= t << j;
    }
  }
--- a/src/kem/kyber/pqcrystals-kyber_kyber1024_ref/polyvec.c
+++ b/src/kem/kyber/pqcrystals-kyber_kyber1024_ref/polyvec.c
@ -15,6 +15,7 @@
 void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], const polyvec *a)
 {
  unsigned int i,j,k;
+  uint64_t d0;

 #if (KYBER_POLYVECCOMPRESSEDBYTES == (KYBER_K * 352))
  uint16_t t[8];
@ -23,7 +24,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], const polyvec *a)
      for(k=0;k<8;k++) {
        t[k]  = a->vec[i].coeffs[8*j+k];
        t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
-        t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q/2)/KYBER_Q) & 0x7ff;
+/*      t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q/2)/KYBER_Q) & 0x7ff; */
+        d0 = t[k];
+        d0 <<= 11;
+        d0 += 1664;
+        d0 *= 645084;
+        d0 >>= 31;
+        t[k] = d0 & 0x7ff;
      }

      r[ 0] = (t[0] >>  0);
@ -47,7 +54,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], const polyvec *a)
      for(k=0;k<4;k++) {
        t[k]  = a->vec[i].coeffs[4*j+k];
        t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
-        t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q/2)/ KYBER_Q) & 0x3ff;
+/*      t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q/2)/ KYBER_Q) & 0x3ff; */
+        d0 = t[k];
+        d0 <<= 10;
+        d0 += 1665;
+        d0 *= 1290167;
+        d0 >>= 32;
+        t[k] = d0 & 0x3ff;
      }

      r[0] = (t[0] >> 0);
--- a/src/kem/kyber/pqcrystals-kyber_kyber1024_ref/verify.c
+++ b/src/kem/kyber/pqcrystals-kyber_kyber1024_ref/verify.c
@ -41,6 +41,16 @@ void cmov(uint8_t *r, const uint8_t *x, size_t len, uint8_t b)
 {
  size_t i;

+#if defined(__GNUC__) || defined(__clang__)
+  // Prevent the compiler from
+  //    1) inferring that b is 0/1-valued, and
+  //    2) handling the two cases with a branch.
+  // This is not necessary when verify.c and kem.c are separate translation
+  // units, but we expect that downstream consumers will copy this code and/or
+  // change how it is built.
+  __asm__("" : "+r"(b) : /* no inputs */);
+#endif
+
  b = -b;
  for(i=0;i<len;i++)
    r[i] ^= b & (r[i] ^ x[i]);
--- a/src/kem/kyber/pqcrystals-kyber_kyber512_avx2/verify.c
+++ b/src/kem/kyber/pqcrystals-kyber_kyber512_avx2/verify.c
@ -57,6 +57,16 @@ void cmov(uint8_t * restrict r, const uint8_t *x, size_t len, uint8_t b)
  size_t i;
  __m256i xvec, rvec, bvec;

+#if defined(__GNUC__) || defined(__clang__)
+  // Prevent the compiler from
+  //    1) inferring that b is 0/1-valued, and
+  //    2) handling the two cases with a branch.
+  // This is not necessary when verify.c and kem.c are separate translation
+  // units, but we expect that downstream consumers will copy this code and/or
+  // change how it is built.
+  __asm__("" : "+r"(b) : /* no inputs */);
+#endif
+
  bvec = _mm256_set1_epi64x(-(uint64_t)b);
  for(i=0;i<len/32;i++) {
    rvec = _mm256_loadu_si256((__m256i *)&r[32*i]);
--- a/src/kem/kyber/pqcrystals-kyber_kyber512_ref/poly.c
+++ b/src/kem/kyber/pqcrystals-kyber_kyber512_ref/poly.c
@ -19,6 +19,7 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const poly *a)
 {
  unsigned int i,j;
  int16_t u;
+  uint32_t d0;
  uint8_t t[8];

 #if (KYBER_POLYCOMPRESSEDBYTES == 128)
@ -27,7 +28,12 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const poly *a)
      // map to positive standard representatives
      u  = a->coeffs[8*i+j];
      u += (u >> 15) & KYBER_Q;
-      t[j] = ((((uint16_t)u << 4) + KYBER_Q/2)/KYBER_Q) & 15;
+/*    t[j] = ((((uint16_t)u << 4) + KYBER_Q/2)/KYBER_Q) & 15; */
+      d0 = u << 4;
+      d0 += 1665;
+      d0 *= 80635;
+      d0 >>= 28;
+      t[j] = d0 & 0xf;
    }

    r[0] = t[0] | (t[1] << 4);
@ -42,7 +48,12 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const poly *a)
      // map to positive standard representatives
      u  = a->coeffs[8*i+j];
      u += (u >> 15) & KYBER_Q;
-      t[j] = ((((uint32_t)u << 5) + KYBER_Q/2)/KYBER_Q) & 31;
+/*      t[j] = ((((uint32_t)u << 5) + KYBER_Q/2)/KYBER_Q) & 31; */
+      d0 = u << 5;
+      d0 += 1664;
+      d0 *= 40318;
+      d0 >>= 27;
+      t[j] = d0 & 0x1f;
    }

    r[0] = (t[0] >> 0) | (t[1] << 5);
@ -180,14 +191,19 @@ void poly_frommsg(poly *r, const uint8_t msg[KYBER_INDCPA_MSGBYTES])
 void poly_tomsg(uint8_t msg[KYBER_INDCPA_MSGBYTES], const poly *a)
 {
  unsigned int i,j;
-  uint16_t t;
+  uint32_t t;

  for(i=0;i<KYBER_N/8;i++) {
    msg[i] = 0;
    for(j=0;j<8;j++) {
      t  = a->coeffs[8*i+j];
-      t += ((int16_t)t >> 15) & KYBER_Q;
-      t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;
+      // t += ((int16_t)t >> 15) & KYBER_Q;
+      // t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;
+      t <<= 1;
+      t += 1665;
+      t *= 80635;
+      t >>= 28;
+      t &= 1;
      msg[i] |= t << j;
    }
  }
--- a/src/kem/kyber/pqcrystals-kyber_kyber512_ref/polyvec.c
+++ b/src/kem/kyber/pqcrystals-kyber_kyber512_ref/polyvec.c
@ -15,6 +15,7 @@
 void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], const polyvec *a)
 {
  unsigned int i,j,k;
+  uint64_t d0;

 #if (KYBER_POLYVECCOMPRESSEDBYTES == (KYBER_K * 352))
  uint16_t t[8];
@ -23,7 +24,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], const polyvec *a)
      for(k=0;k<8;k++) {
        t[k]  = a->vec[i].coeffs[8*j+k];
        t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
-        t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q/2)/KYBER_Q) & 0x7ff;
+/*      t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q/2)/KYBER_Q) & 0x7ff; */
+        d0 = t[k];
+        d0 <<= 11;
+        d0 += 1664;
+        d0 *= 645084;
+        d0 >>= 31;
+        t[k] = d0 & 0x7ff;
      }

      r[ 0] = (t[0] >>  0);
@ -47,7 +54,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], const polyvec *a)
      for(k=0;k<4;k++) {
        t[k]  = a->vec[i].coeffs[4*j+k];
        t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
-        t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q/2)/ KYBER_Q) & 0x3ff;
+/*      t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q/2)/ KYBER_Q) & 0x3ff; */
+        d0 = t[k];
+        d0 <<= 10;
+        d0 += 1665;
+        d0 *= 1290167;
+        d0 >>= 32;
+        t[k] = d0 & 0x3ff;
      }

      r[0] = (t[0] >> 0);
--- a/src/kem/kyber/pqcrystals-kyber_kyber512_ref/verify.c
+++ b/src/kem/kyber/pqcrystals-kyber_kyber512_ref/verify.c
@ -41,6 +41,16 @@ void cmov(uint8_t *r, const uint8_t *x, size_t len, uint8_t b)
 {
  size_t i;

+#if defined(__GNUC__) || defined(__clang__)
+  // Prevent the compiler from
+  //    1) inferring that b is 0/1-valued, and
+  //    2) handling the two cases with a branch.
+  // This is not necessary when verify.c and kem.c are separate translation
+  // units, but we expect that downstream consumers will copy this code and/or
+  // change how it is built.
+  __asm__("" : "+r"(b) : /* no inputs */);
+#endif
+
  b = -b;
  for(i=0;i<len;i++)
    r[i] ^= b & (r[i] ^ x[i]);
--- a/src/kem/kyber/pqcrystals-kyber_kyber768_avx2/verify.c
+++ b/src/kem/kyber/pqcrystals-kyber_kyber768_avx2/verify.c
@ -57,6 +57,16 @@ void cmov(uint8_t * restrict r, const uint8_t *x, size_t len, uint8_t b)
  size_t i;
  __m256i xvec, rvec, bvec;

+#if defined(__GNUC__) || defined(__clang__)
+  // Prevent the compiler from
+  //    1) inferring that b is 0/1-valued, and
+  //    2) handling the two cases with a branch.
+  // This is not necessary when verify.c and kem.c are separate translation
+  // units, but we expect that downstream consumers will copy this code and/or
+  // change how it is built.
+  __asm__("" : "+r"(b) : /* no inputs */);
+#endif
+
  bvec = _mm256_set1_epi64x(-(uint64_t)b);
  for(i=0;i<len/32;i++) {
    rvec = _mm256_loadu_si256((__m256i *)&r[32*i]);
--- a/src/kem/kyber/pqcrystals-kyber_kyber768_ref/poly.c
+++ b/src/kem/kyber/pqcrystals-kyber_kyber768_ref/poly.c
@ -19,6 +19,7 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const poly *a)
 {
  unsigned int i,j;
  int16_t u;
+  uint32_t d0;
  uint8_t t[8];

 #if (KYBER_POLYCOMPRESSEDBYTES == 128)
@ -27,7 +28,12 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const poly *a)
      // map to positive standard representatives
      u  = a->coeffs[8*i+j];
      u += (u >> 15) & KYBER_Q;
-      t[j] = ((((uint16_t)u << 4) + KYBER_Q/2)/KYBER_Q) & 15;
+/*    t[j] = ((((uint16_t)u << 4) + KYBER_Q/2)/KYBER_Q) & 15; */
+      d0 = u << 4;
+      d0 += 1665;
+      d0 *= 80635;
+      d0 >>= 28;
+      t[j] = d0 & 0xf;
    }

    r[0] = t[0] | (t[1] << 4);
@ -42,7 +48,12 @@ void poly_compress(uint8_t r[KYBER_POLYCOMPRESSEDBYTES], const poly *a)
      // map to positive standard representatives
      u  = a->coeffs[8*i+j];
      u += (u >> 15) & KYBER_Q;
-      t[j] = ((((uint32_t)u << 5) + KYBER_Q/2)/KYBER_Q) & 31;
+/*      t[j] = ((((uint32_t)u << 5) + KYBER_Q/2)/KYBER_Q) & 31; */
+      d0 = u << 5;
+      d0 += 1664;
+      d0 *= 40318;
+      d0 >>= 27;
+      t[j] = d0 & 0x1f;
    }

    r[0] = (t[0] >> 0) | (t[1] << 5);
@ -180,14 +191,19 @@ void poly_frommsg(poly *r, const uint8_t msg[KYBER_INDCPA_MSGBYTES])
 void poly_tomsg(uint8_t msg[KYBER_INDCPA_MSGBYTES], const poly *a)
 {
  unsigned int i,j;
-  uint16_t t;
+  uint32_t t;

  for(i=0;i<KYBER_N/8;i++) {
    msg[i] = 0;
    for(j=0;j<8;j++) {
      t  = a->coeffs[8*i+j];
-      t += ((int16_t)t >> 15) & KYBER_Q;
-      t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;
+      // t += ((int16_t)t >> 15) & KYBER_Q;
+      // t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;
+      t <<= 1;
+      t += 1665;
+      t *= 80635;
+      t >>= 28;
+      t &= 1;
      msg[i] |= t << j;
    }
  }
--- a/src/kem/kyber/pqcrystals-kyber_kyber768_ref/polyvec.c
+++ b/src/kem/kyber/pqcrystals-kyber_kyber768_ref/polyvec.c
@ -15,6 +15,7 @@
 void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], const polyvec *a)
 {
  unsigned int i,j,k;
+  uint64_t d0;

 #if (KYBER_POLYVECCOMPRESSEDBYTES == (KYBER_K * 352))
  uint16_t t[8];
@ -23,7 +24,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], const polyvec *a)
      for(k=0;k<8;k++) {
        t[k]  = a->vec[i].coeffs[8*j+k];
        t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
-        t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q/2)/KYBER_Q) & 0x7ff;
+/*      t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q/2)/KYBER_Q) & 0x7ff; */
+        d0 = t[k];
+        d0 <<= 11;
+        d0 += 1664;
+        d0 *= 645084;
+        d0 >>= 31;
+        t[k] = d0 & 0x7ff;
      }

      r[ 0] = (t[0] >>  0);
@ -47,7 +54,13 @@ void polyvec_compress(uint8_t r[KYBER_POLYVECCOMPRESSEDBYTES], const polyvec *a)
      for(k=0;k<4;k++) {
        t[k]  = a->vec[i].coeffs[4*j+k];
        t[k] += ((int16_t)t[k] >> 15) & KYBER_Q;
-        t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q/2)/ KYBER_Q) & 0x3ff;
+/*      t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q/2)/ KYBER_Q) & 0x3ff; */
+        d0 = t[k];
+        d0 <<= 10;
+        d0 += 1665;
+        d0 *= 1290167;
+        d0 >>= 32;
+        t[k] = d0 & 0x3ff;
      }

      r[0] = (t[0] >> 0);
--- a/src/kem/kyber/pqcrystals-kyber_kyber768_ref/verify.c
+++ b/src/kem/kyber/pqcrystals-kyber_kyber768_ref/verify.c
@ -41,6 +41,16 @@ void cmov(uint8_t *r, const uint8_t *x, size_t len, uint8_t b)
 {
  size_t i;

+#if defined(__GNUC__) || defined(__clang__)
+  // Prevent the compiler from
+  //    1) inferring that b is 0/1-valued, and
+  //    2) handling the two cases with a branch.
+  // This is not necessary when verify.c and kem.c are separate translation
+  // units, but we expect that downstream consumers will copy this code and/or
+  // change how it is built.
+  __asm__("" : "+r"(b) : /* no inputs */);
+#endif
+
  b = -b;
  for(i=0;i<len;i++)
    r[i] ^= b & (r[i] ^ x[i]);
Author	SHA1	Message	Date
Douglas Stebila	9922f7cd13	Release notes for 0.9.2-rc1	2024-01-11 17:54:58 +01:00
Spencer Wilson	4522fae9a4	Run copy_from_upstream	2024-01-08 11:51:32 -05:00
Spencer Wilson	6252372d47	Checkout post-0.9.0 copy_from_upstream fixes	2024-01-08 11:51:32 -05:00
Spencer Wilson	b0c20b9fce	Update ARM patch	2024-01-08 11:51:32 -05:00
Pravek Sharma	9ffad26326	Run copy_from_upstream.py	2024-01-08 11:51:32 -05:00
Pravek Sharma	58eae24cce	Update to latest Kyber commit	2024-01-08 11:51:32 -05:00
Pravek Sharma	0c0675d180	Run copy_from_upstream.py -k	2024-01-08 11:51:32 -05:00
Pravek Sharma	9c42d64705	Update copy_from_upstream.yml	2024-01-08 11:51:32 -05:00
Douglas Stebila	31f570b553	0.9.2 dev branch	2024-01-02 13:09:51 -05:00
Douglas Stebila	7a680dff97	Release notes for 0.9.1	2023-12-22 15:27:57 -05:00
Douglas Stebila	0ab83c8fe4	Detailed changelog [skip ci]	2023-12-19 15:17:06 -05:00
Douglas Stebila	d9a34c93d3	Release notes for 0.9.1-rc1 [skip-ci]	2023-12-19 15:13:20 -05:00
Basil Hess	f22c8316f9	Adds patch to aarch64 Kyber pulled from PQClean for variable-time division in poly_tomsg.	2023-12-19 14:58:37 -05:00
Basil Hess	e68dbc6f6e	update .travis.yml (#1629 )	2023-12-19 11:25:34 -05:00
Basil Hess	5197b9e125	pull kyber from upstream: dda29cc63af721981ee2c831cf00822e69be3220 (#1631 )	2023-12-19 11:25:34 -05:00