Update to picnic 3.0.11 (fixes #1178) (#1181)

This commit is contained in:
Sebastian Ramacher 2022-01-25 18:42:26 +01:00 committed by GitHub
parent 18b3fe39b2
commit 0a0adf1639
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 245 additions and 176 deletions

View File

@ -4,8 +4,8 @@
- **Main cryptographic assumption**: hash function security (ROM/QROM), key recovery attacks on the lowMC block cipher.
- **Principal submitters**: Greg Zaverucha, Melissa Chase, David Derler, Steven Goldfeder, Claudio Orlandi, Sebastian Ramacher, Christian Rechberger, Daniel Slamanig, Jonathan Katz, Xiao Wang, Vladmir Kolesnikov.
- **Authors' website**: https://microsoft.github.io/Picnic/
- **Specification version**: 3.0.10.
- **Implementation source**: https://github.com/IAIK/Picnic/tree/v3.0.10
- **Specification version**: 3.0.11.
- **Implementation source**: https://github.com/IAIK/Picnic/tree/v3.0.11
- **Implementation license (SPDX-Identifier)**: MIT.
## Parameter set summary

View File

@ -16,9 +16,9 @@ crypto-assumption: hash function security (ROM/QROM), key recovery attacks on th
lowMC block cipher
website: https://microsoft.github.io/Picnic/
nist-round: 3
spec-version: 3.0.10
spec-version: 3.0.11
spdx-license-identifier: MIT
upstream: https://github.com/IAIK/Picnic/tree/v3.0.10
upstream: https://github.com/IAIK/Picnic/tree/v3.0.11
parameter-sets:
- name: picnic_L1_FS
claimed-nist-level: 1

View File

@ -1,37 +1,39 @@
Version 3.0.10 -- 2022-01-08
----------------------------
# Changelog for the optimized Picnic implementation
## Version 3.0.11 -- 2022-01-25
* Fix NEON code on M1.
* Ensure SSE2/AVX2/NEON shift intrinsics with immediate operands are used correctly.
* Use Boost.Test as unit test framework.
## Version 3.0.10 -- 2022-01-08
* Fix build with llvm on ARM with NEON enabled
Version 3.0.9 -- 2021-12-22
---------------------------
## Version 3.0.9 -- 2021-12-22
* Unbreak x86-32 build.
* Fix build on M1 with NEON enabled.
Version 3.0.8 -- 2021-12-18
---------------------------
## Version 3.0.8 -- 2021-12-18
* Prefix compat function implementations with `picnic_`.
* Use OQS instruction set checking functions.
* Use OQS implementations of `aligned_alloc`, `aligned_free`, `explicit_bzero`, and `timingsafe_bcmp`.
* Install cmake configuration files.
Version 3.0.7 -- 2021-12-15
---------------------------
## Version 3.0.7 -- 2021-12-15
* Various changes to improve OQS integration.
* Require cmake version 3.10.
Version 3.0.6 -- 2021-12-14
---------------------------
## Version 3.0.6 -- 2021-12-14
* Reduce size of global parameters for instance specification to 12 bytes per instance.
* Provide compat implementation of `clz` on MSVC using `_BitScanReverse`.
* Do not assume that `aligned_alloc` is available on MSVC.
Version 3.0.5 -- 2021-10-19
---------------------------
## Version 3.0.5 -- 2021-10-19
* Update SHAKE3 implementation.
* Fix build with GCC 11.
@ -39,91 +41,78 @@ Version 3.0.5 -- 2021-10-19
* Expose `picnic_get_{private,public}key_size` as part of the public API.
* Add `picnic_get_{private,public}_key_param` to retrieve a key's parameter set.
Version 3.0.4 -- 2020-12-17
---------------------------
## Version 3.0.4 -- 2020-12-17
* Slightly improve memory consumption.
* Initial work to support PQClean integration in the future.
* Add cmake options to control availability of specific LowMC instances.
Version 3.0.3 -- 2020-10-12
---------------------------
## Version 3.0.3 -- 2020-10-12
* Fix `explicit_bzero` fallback implementation.
* Remove some unused code.
Version 3.0.2 -- 2020-10-06
---------------------------
## Version 3.0.2 -- 2020-10-06
* Update SHAKE3 implementation.
* Add support to check constant time implementation with TIMECOP.
* Slightly reduce memory consumption.
* Add support for BSD variants.
Version 3.0.1 -- 2020-08-11
---------------------------
## Version 3.0.1 -- 2020-08-11
* Expose `picnic_sk_to_pk` as part of the public API.
* Add `picnic_clear_private_key` to clear the private key.
Version 3.0 -- 2020-04-15
-------------------------
## Version 3.0 -- 2020-04-15
* Implement new Picnic 3 parameter set. This implementation replaces the Picnic 2 parameter set.
* Implement new Picnic instances with full Sbox layer.
* Various small improvements and bug fixes.
* Remove all optimizations for partial LowMC instances except for OLLE.
Version 2.2 -- 2020-04-08
---------------------------
## Version 2.2 -- 2020-04-08
* Fix Picnic2 implementation on big endian systems.
* Add support for SHA3/SHAKE3 instructions on IBM z.
* Various small improvements and bug fixes.
* Remove LowMC instances with m=1.
Version 2.1.2 -- 2019-10-03
---------------------------
## Version 2.1.2 -- 2019-10-03
* Add options to build with ZKB++- or KKW-based instances only.
* Fix ARM NEON optimizations.
* Slightly reduce heap usage.
* Remove more unused code.
Version 2.1.1 -- 2019-08-07
---------------------------
## Version 2.1.1 -- 2019-08-07
* Various small improvements and bug fixes.
Version 2.1 -- 2019-07-29
-------------------------
## Version 2.1 -- 2019-07-29
* Remove M4RM-based implementation.
* Fix input size in Picnic2's commitment implementation.
* Additional improvements and optimizations of the Picnic2 code.
Version 2.0 -- 2019-04-08
-------------------------
## Version 2.0 -- 2019-04-08
* Implement Picnic 2.
* Use 4-times parallel SHAKE3 for faster PRF evaluation, commitment generation, etc.
* Fix size of salts to 32 bytes.
Version 1.3.1 -- 2018-12-21
---------------------------
## Version 1.3.1 -- 2018-12-21
* Reduce heap usage.
Version 1.3 -- 2018-12-21
-------------------------
## Version 1.3 -- 2018-12-21
* Implement linear layer optimizations to speed up LowMC evaluations. Besides the runtime improvements, this optimization also greatly reduces the memory size of the LowMC instances.
* Provide LowMC instances with m=1 to demonstrate feasibility of those instances.
* Slightly improve internal storage of matrices to require less memory.
* Remove unused code and support for dynamic LowMC instances.
Version 1.2 -- 2018-12-05
-------------------------
## Version 1.2 -- 2018-12-05
* Implement RRKC optimizations for round constants.
* Compatibility fixes for Mac OS X.
@ -133,8 +122,7 @@ Version 1.2 -- 2018-12-05
* Record state before Sbox evaluation and drop one branch of XOR computations. This optimization is based based on an idea by Markus Schofnegger.
* Add per-signature salt to random tapes generation. Prevents a seed-guessing attack reported by Itai Dinur.
Version 1.1 -- 2018-06-29
-------------------------
## Version 1.1 -- 2018-06-29
* Compatibility fixes for Visual Studio, clang and MinGW.
* Various improvements to the SIMD versions of the matrix operations.
@ -142,8 +130,7 @@ Version 1.1 -- 2018-06-29
* Add option to feed extra randomness to initial seed expansion to counter fault attacks.
* Version submitted for inclusion in SUPERCOP.
Version 1.0 -- 2017-11-28
-------------------------
## Version 1.0 -- 2017-11-28
* Initial release.
* Version submitted to the NIST PQC project.

View File

@ -150,5 +150,76 @@ static inline void mzd_from_bitstream(bitstream_t* bs, mzd_local_t* v, const siz
*d = 0;
}
}
#if defined(WITH_OPT)
#if defined(WITH_AVX2)
ATTR_TARGET_S256
static inline void w256_to_bitstream(bitstream_t* bs, const word256 v, const size_t width,
const size_t size) {
uint64_t buf[4] ATTR_ALIGNED(32);
mm256_store(buf, v);
const uint64_t* d = &buf[width - 1];
size_t bits = size;
for (; bits >= sizeof(uint64_t) * 8; bits -= sizeof(uint64_t) * 8, --d) {
bitstream_put_bits(bs, *d, sizeof(uint64_t) * 8);
}
if (bits) {
bitstream_put_bits(bs, *d >> (sizeof(uint64_t) * 8 - bits), bits);
}
}
ATTR_TARGET_S256
static inline word256 w256_from_bitstream(bitstream_t* bs, const size_t width, const size_t size) {
uint64_t buf[4] ATTR_ALIGNED(32) = {0};
uint64_t* d = &buf[width - 1];
size_t bits = size;
for (; bits >= sizeof(uint64_t) * 8; bits -= sizeof(uint64_t) * 8, --d) {
*d = bitstream_get_bits(bs, sizeof(uint64_t) * 8);
}
if (bits) {
*d = bitstream_get_bits(bs, bits) << (sizeof(uint64_t) * 8 - bits);
}
return mm256_load(&buf[0]);
}
#endif
#if defined(WITH_SSE2) || defined(WITH_NEON)
ATTR_TARGET_S128
static inline void w128_to_bitstream(bitstream_t* bs, const word128 v[2], const size_t width,
const size_t size) {
uint64_t buf[4] ATTR_ALIGNED(16);
mm128_store(&buf[0], v[0]);
mm128_store(&buf[2], v[1]);
const uint64_t* d = &buf[width - 1];
size_t bits = size;
for (; bits >= sizeof(uint64_t) * 8; bits -= sizeof(uint64_t) * 8, --d) {
bitstream_put_bits(bs, *d, sizeof(uint64_t) * 8);
}
if (bits) {
bitstream_put_bits(bs, *d >> (sizeof(uint64_t) * 8 - bits), bits);
}
}
ATTR_TARGET_S128
static inline void w128_from_bitstream(bitstream_t* bs, word128 v[2], const size_t width,
const size_t size) {
uint64_t buf[4] ATTR_ALIGNED(16) = {0};
uint64_t* d = &buf[width - 1];
size_t bits = size;
for (; bits >= sizeof(uint64_t) * 8; bits -= sizeof(uint64_t) * 8, --d) {
*d = bitstream_get_bits(bs, sizeof(uint64_t) * 8);
}
if (bits) {
*d = bitstream_get_bits(bs, bits) << (sizeof(uint64_t) * 8 - bits);
}
v[0] = mm128_load(&buf[0]);
v[1] = mm128_load(&buf[2]);
}
#endif
#endif
#endif
#endif

View File

@ -486,11 +486,11 @@ static void sbox_aux_uint64_lowmc_255_255_4(mzd_local_t* statein, mzd_local_t* s
word128 t0[2] ATTR_ALIGNED(alignof(word128)); \
word128 t1[2] ATTR_ALIGNED(alignof(word128)); \
word128 t2[2] ATTR_ALIGNED(alignof(word128)); \
mzd_local_t tmp[1], aux[1]; \
word128 aux[2] ATTR_ALIGNED(alignof(word128)); \
SHR(t2, fresh_output_ca, 2); \
SHR(t1, fresh_output_bc, 1); \
XOR(t2, t2, t1); \
XOR(aux->w128, t2, fresh_output_ab); \
XOR(aux, t2, fresh_output_ab); \
\
/* a & b */ \
AND(t0, a, b); \
@ -502,21 +502,21 @@ static void sbox_aux_uint64_lowmc_255_255_4(mzd_local_t* statein, mzd_local_t* s
SHR(t1, t1, 1); \
XOR(t2, t2, t1); \
XOR(t2, t2, t0); \
XOR(aux->w128, aux->w128, t2); \
XOR(aux, aux, t2); \
\
bitstream_t parity_tape = {{tapes->parity_tapes}, tapes->pos}; \
bitstream_t last_party_tape = {{tapes->tape[15]}, tapes->pos}; \
\
/* calculate aux_bits to fix and_helper */ \
mzd_from_bitstream(&parity_tape, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
XOR(aux->w128, aux->w128, tmp->w128); \
mzd_from_bitstream(&last_party_tape, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
XOR(aux->w128, aux->w128, tmp->w128); \
w128_from_bitstream(&parity_tape, t0, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
XOR(aux, aux, t0); \
w128_from_bitstream(&last_party_tape, t1, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
XOR(aux, aux, t1); \
\
last_party_tape.position = tapes->pos; \
mzd_to_bitstream(&last_party_tape, aux, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
w128_to_bitstream(&last_party_tape, aux, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
bitstream_t aux_tape = {{tapes->aux_bits}, tapes->aux_pos}; \
mzd_to_bitstream(&aux_tape, aux, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
w128_to_bitstream(&aux_tape, aux, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
\
tapes->aux_pos += LOWMC_N; \
} while (0)
@ -616,37 +616,38 @@ static void sbox_aux_s128_lowmc_255_255_4(mzd_local_t* statein, mzd_local_t* sta
word256 t0 ATTR_ALIGNED(alignof(word256)); \
word256 t1 ATTR_ALIGNED(alignof(word256)); \
word256 t2 ATTR_ALIGNED(alignof(word256)); \
mzd_local_t tmp[1], aux[1]; \
t2 = ROR(fresh_output_ca, 2); \
t1 = ROR(fresh_output_bc, 1); \
t2 = XOR(t2, t1); \
aux->w256 = XOR(t2, fresh_output_ab); \
word256 aux ATTR_ALIGNED(alignof(word256)); \
\
t2 = ROR(fresh_output_ca, 2); \
t1 = ROR(fresh_output_bc, 1); \
t2 = XOR(t2, t1); \
aux = XOR(t2, fresh_output_ab); \
\
/* a & b */ \
t0 = AND(a, b); \
/* b & c */ \
t1 = AND(b, c); \
/* c & a */ \
t2 = AND(c, a); \
t2 = ROR(t2, 2); \
t1 = ROR(t1, 1); \
t2 = XOR(t2, t1); \
t2 = XOR(t2, t0); \
aux->w256 = XOR(aux->w256, t2); \
t2 = AND(c, a); \
t2 = ROR(t2, 2); \
t1 = ROR(t1, 1); \
t2 = XOR(t2, t1); \
t2 = XOR(t2, t0); \
aux = XOR(aux, t2); \
\
bitstream_t parity_tape = {{tapes->parity_tapes}, tapes->pos}; \
bitstream_t last_party_tape = {{tapes->tape[15]}, tapes->pos}; \
\
/* calculate aux_bits to fix and_helper */ \
mzd_from_bitstream(&parity_tape, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
aux->w256 = XOR(aux->w256, tmp->w256); \
mzd_from_bitstream(&last_party_tape, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
aux->w256 = XOR(aux->w256, tmp->w256); \
t0 = w256_from_bitstream(&parity_tape, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
aux = XOR(aux, t0); \
t1 = w256_from_bitstream(&last_party_tape, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
aux = XOR(aux, t1); \
\
last_party_tape.position = tapes->pos; \
mzd_to_bitstream(&last_party_tape, aux, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
w256_to_bitstream(&last_party_tape, aux, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
bitstream_t aux_tape = {{tapes->aux_bits}, tapes->aux_pos}; \
mzd_to_bitstream(&aux_tape, aux, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
w256_to_bitstream(&aux_tape, aux, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
\
tapes->aux_pos += LOWMC_N; \
} while (0)

View File

@ -171,7 +171,7 @@
/* target attribute */
#if defined(__GNUC__) || __has_attribute(target)
#define ATTR_TARGET(x) __attribute__((target((x))))
#define ATTR_TARGET_AVX2 __attribute__((target("avx2,bmi2")))
#define ATTR_TARGET_AVX2 __attribute__((target("avx2,bmi2,sse2")))
#define ATTR_TARGET_SSE2 __attribute__((target("sse2")))
#else
#define ATTR_TARGET(x)
@ -186,6 +186,20 @@
#define ATTR_ARTIFICIAL
#endif
/* may_alias attribute */
#if GNUC_CHECK(3, 3) || __has_attribute(__may_alias__)
#define ATTR_MAY_ALIAS __attribute__((__may_alias__))
#else
#define ATTR_MAY_ALIAS
#endif
/* vector_size attribute */
#if GNUC_CHECK(4, 8) || __has_attribute(__vector_size__)
#define ATTR_VECTOR_SIZE(s) __attribute__((__vector_size__(s)))
#else
#define ATTR_VECTOR_SIZE(s)
#endif
#define FN_ATTRIBUTES_AVX2 ATTR_ARTIFICIAL ATTR_ALWAYS_INLINE ATTR_TARGET_AVX2
#define FN_ATTRIBUTES_SSE2 ATTR_ARTIFICIAL ATTR_ALWAYS_INLINE ATTR_TARGET_SSE2
#define FN_ATTRIBUTES_NEON ATTR_ARTIFICIAL ATTR_ALWAYS_INLINE

View File

@ -466,8 +466,8 @@ static void mpc_sbox_verify_uint64_lowmc_255_255_4(mzd_local_t* out, const mzd_l
#if defined(WITH_OPT)
#define NROLR(a, b, c) \
do { \
(void)a; \
(void)b; \
a[0] = b[0]; \
a[1] = b[1]; \
(void)c; \
} while (0)

View File

@ -240,17 +240,17 @@ static void picnic3_mpc_sbox_uint64_lowmc_255_255_4(mzd_local_t* statein, random
/* a & b */ \
AND(s_ab, a, b); \
for (int i = 0; i < 16; i++) { \
mzd_local_t tmp[1]; \
word128 tmp[2] ATTR_ALIGNED(alignof(word128)); \
bitstream_t party_msgs = {{msgs->msgs[i]}, msgs->pos}; \
if (i == msgs->unopened) { \
/* we are in verify, just grab the broadcast s from the msgs array */ \
mzd_from_bitstream(&party_msgs, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
w128_from_bitstream(&party_msgs, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
/* a */ \
AND(t0, bitmask_a->w128, tmp->w128); \
AND(t0, bitmask_a->w128, tmp); \
/* b */ \
AND(t1, bitmask_b->w128, tmp->w128); \
AND(t1, bitmask_b->w128, tmp); \
/* c */ \
AND(t2, bitmask_c->w128, tmp->w128); \
AND(t2, bitmask_c->w128, tmp); \
SHL(t0, t0, 2); \
SHL(t1, t1, 1); \
XOR(s_ab, t2, s_ab); \
@ -264,13 +264,13 @@ static void picnic3_mpc_sbox_uint64_lowmc_255_255_4(mzd_local_t* statein, random
word128 mask_a[2] ATTR_ALIGNED(alignof(word128)); \
word128 mask_b[2] ATTR_ALIGNED(alignof(word128)); \
word128 mask_c[2] ATTR_ALIGNED(alignof(word128)); \
mzd_from_bitstream(&party_tape, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
w128_from_bitstream(&party_tape, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
/* a */ \
AND(mask_a, bitmask_a->w128, tmp->w128); \
AND(mask_a, bitmask_a->w128, tmp); \
/* b */ \
AND(mask_b, bitmask_b->w128, tmp->w128); \
AND(mask_b, bitmask_b->w128, tmp); \
/* c */ \
AND(mask_c, bitmask_c->w128, tmp->w128); \
AND(mask_c, bitmask_c->w128, tmp); \
SHL(mask_a, mask_a, 2); \
SHL(mask_b, mask_b, 1); \
\
@ -278,13 +278,13 @@ static void picnic3_mpc_sbox_uint64_lowmc_255_255_4(mzd_local_t* statein, random
word128 and_helper_ab[2] ATTR_ALIGNED(alignof(word128)); \
word128 and_helper_bc[2] ATTR_ALIGNED(alignof(word128)); \
word128 and_helper_ca[2] ATTR_ALIGNED(alignof(word128)); \
mzd_from_bitstream(&party_tape, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
w128_from_bitstream(&party_tape, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
/* a */ \
AND(and_helper_ab, bitmask_c->w128, tmp->w128); \
AND(and_helper_ab, bitmask_c->w128, tmp); \
/* b */ \
AND(and_helper_bc, bitmask_b->w128, tmp->w128); \
AND(and_helper_bc, bitmask_b->w128, tmp); \
/* c */ \
AND(and_helper_ca, bitmask_a->w128, tmp->w128); \
AND(and_helper_ca, bitmask_a->w128, tmp); \
SHL(and_helper_ca, and_helper_ca, 2); \
SHL(and_helper_bc, and_helper_bc, 1); \
\
@ -292,8 +292,8 @@ static void picnic3_mpc_sbox_uint64_lowmc_255_255_4(mzd_local_t* statein, random
AND(t0, a, mask_b); \
AND(t1, b, mask_a); \
XOR(t0, t0, t1); \
XOR(tmp->w128, t0, and_helper_ab); \
XOR(s_ab, tmp->w128, s_ab); \
XOR(tmp, t0, and_helper_ab); \
XOR(s_ab, tmp, s_ab); \
/* s_bc */ \
AND(t0, b, mask_c); \
AND(t1, c, mask_b); \
@ -302,7 +302,7 @@ static void picnic3_mpc_sbox_uint64_lowmc_255_255_4(mzd_local_t* statein, random
XOR(s_bc, t0, s_bc); \
\
SHR(t0, t0, 1); \
XOR(tmp->w128, tmp->w128, t0); \
XOR(tmp, tmp, t0); \
/* s_ca */ \
AND(t0, c, mask_a); \
AND(t1, a, mask_c); \
@ -311,8 +311,8 @@ static void picnic3_mpc_sbox_uint64_lowmc_255_255_4(mzd_local_t* statein, random
XOR(s_ca, t0, s_ca); \
\
SHR(t0, t0, 2); \
XOR(tmp->w128, tmp->w128, t0); \
mzd_to_bitstream(&party_msgs, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
XOR(tmp, tmp, t0); \
w128_to_bitstream(&party_msgs, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
} \
tapes->pos += LOWMC_N; \
tapes->pos += LOWMC_N; \
@ -421,17 +421,17 @@ static void picnic3_mpc_sbox_s128_lowmc_255_255_4(mzd_local_t* statein, randomTa
/* a & b */ \
s_ab = AND(a, b); \
for (int i = 0; i < 16; i++) { \
mzd_local_t tmp[1]; \
word256 tmp ATTR_ALIGNED(alignof(word256)); \
bitstream_t party_msgs = {{msgs->msgs[i]}, msgs->pos}; \
if (i == msgs->unopened) { \
/* we are in verify, just grab the broadcast s from the msgs array */ \
mzd_from_bitstream(&party_msgs, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
tmp = w256_from_bitstream(&party_msgs, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
/* a */ \
t0 = AND(bitmask_a->w256, tmp->w256); \
t0 = AND(bitmask_a->w256, tmp); \
/* b */ \
t1 = AND(bitmask_b->w256, tmp->w256); \
t1 = AND(bitmask_b->w256, tmp); \
/* c */ \
t2 = AND(bitmask_c->w256, tmp->w256); \
t2 = AND(bitmask_c->w256, tmp); \
t0 = ROL(t0, 2); \
t1 = ROL(t1, 1); \
s_ab = XOR(t2, s_ab); \
@ -445,13 +445,13 @@ static void picnic3_mpc_sbox_s128_lowmc_255_255_4(mzd_local_t* statein, randomTa
word256 mask_a ATTR_ALIGNED(alignof(word256)); \
word256 mask_b ATTR_ALIGNED(alignof(word256)); \
word256 mask_c ATTR_ALIGNED(alignof(word256)); \
mzd_from_bitstream(&party_tape, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
tmp = w256_from_bitstream(&party_tape, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
/* a */ \
mask_a = AND(bitmask_a->w256, tmp->w256); \
mask_a = AND(bitmask_a->w256, tmp); \
/* b */ \
mask_b = AND(bitmask_b->w256, tmp->w256); \
mask_b = AND(bitmask_b->w256, tmp); \
/* c */ \
mask_c = AND(bitmask_c->w256, tmp->w256); \
mask_c = AND(bitmask_c->w256, tmp); \
mask_a = ROL(mask_a, 2); \
mask_b = ROL(mask_b, 1); \
\
@ -459,22 +459,22 @@ static void picnic3_mpc_sbox_s128_lowmc_255_255_4(mzd_local_t* statein, randomTa
word256 and_helper_ab ATTR_ALIGNED(alignof(word256)); \
word256 and_helper_bc ATTR_ALIGNED(alignof(word256)); \
word256 and_helper_ca ATTR_ALIGNED(alignof(word256)); \
mzd_from_bitstream(&party_tape, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
tmp = w256_from_bitstream(&party_tape, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
/* a */ \
and_helper_ab = AND(bitmask_c->w256, tmp->w256); \
and_helper_ab = AND(bitmask_c->w256, tmp); \
/* b */ \
and_helper_bc = AND(bitmask_b->w256, tmp->w256); \
and_helper_bc = AND(bitmask_b->w256, tmp); \
/* c */ \
and_helper_ca = AND(bitmask_a->w256, tmp->w256); \
and_helper_ca = AND(bitmask_a->w256, tmp); \
and_helper_ca = ROL(and_helper_ca, 2); \
and_helper_bc = ROL(and_helper_bc, 1); \
\
/* s_ab */ \
t0 = AND(a, mask_b); \
t1 = AND(b, mask_a); \
t0 = XOR(t0, t1); \
tmp->w256 = XOR(t0, and_helper_ab); \
s_ab = XOR(tmp->w256, s_ab); \
t0 = AND(a, mask_b); \
t1 = AND(b, mask_a); \
t0 = XOR(t0, t1); \
tmp = XOR(t0, and_helper_ab); \
s_ab = XOR(tmp, s_ab); \
/* s_bc */ \
t0 = AND(b, mask_c); \
t1 = AND(c, mask_b); \
@ -482,8 +482,8 @@ static void picnic3_mpc_sbox_s128_lowmc_255_255_4(mzd_local_t* statein, randomTa
t0 = XOR(t0, and_helper_bc); \
s_bc = XOR(t0, s_bc); \
\
t0 = ROR(t0, 1); \
tmp->w256 = XOR(tmp->w256, t0); \
t0 = ROR(t0, 1); \
tmp = XOR(tmp, t0); \
/* s_ca */ \
t0 = AND(c, mask_a); \
t1 = AND(a, mask_c); \
@ -491,9 +491,9 @@ static void picnic3_mpc_sbox_s128_lowmc_255_255_4(mzd_local_t* statein, randomTa
t0 = XOR(t0, and_helper_ca); \
s_ca = XOR(t0, s_ca); \
\
t0 = ROR(t0, 2); \
tmp->w256 = XOR(tmp->w256, t0); \
mzd_to_bitstream(&party_msgs, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
t0 = ROR(t0, 2); \
tmp = XOR(tmp, t0); \
w256_to_bitstream(&party_msgs, tmp, (LOWMC_N + 63) / (sizeof(uint64_t) * 8), LOWMC_N); \
} \
tapes->pos += LOWMC_N; \
tapes->pos += LOWMC_N; \

View File

@ -107,6 +107,8 @@ typedef __m256i word256;
#endif
#define mm256_zero _mm256_setzero_si256()
#define mm256_load(s) _mm256_load_si256((const word256*)s)
#define mm256_store(d, s) _mm256_store_si256((word256*)d, s)
#define mm256_xor(l, r) _mm256_xor_si256((l), (r))
#define mm256_and(l, r) _mm256_and_si256((l), (r))
/* !l & r */
@ -147,11 +149,14 @@ apply_mask(mm256_xor_mask, word256, mm256_xor, mm256_and, FN_ATTRIBUTES_AVX2_CON
typedef __m128i word128;
#define mm128_zero _mm_setzero_si128()
#define mm128_load(s) _mm_load_si128((const word128*)s)
#define mm128_store(d, s) _mm_store_si128((word128*)d, s)
#define mm128_xor(l, r) _mm_xor_si128((l), (r))
#define mm128_and(l, r) _mm_and_si128((l), (r))
/* !l & r */
#define mm128_nand(l, r) _mm_andnot_si128((l), (r))
#define mm128_broadcast_u64(x) _mm_set1_epi64x((x))
/* bit shifts up to 63 bits */
#define mm128_sl_u64(x, s) _mm_slli_epi64((x), (s))
#define mm128_sr_u64(x, s) _mm_srli_epi64((x), (s))
@ -177,73 +182,64 @@ apply_array(mm128_and_256, word128, mm128_and, 2, FN_ATTRIBUTES_SSE2)
_mm_or_si128(_mm_srli_epi64(data, count), \
_mm_shuffle_epi32(_mm_slli_epi64(data, 64 - count), _MM_SHUFFLE(1, 0, 3, 2)))
static inline void FN_ATTRIBUTES_SSE2 mm128_shift_right_256(__m128i res[2], __m128i const data[2],
const unsigned int count) {
__m128i total_carry = _mm_bslli_si128(data[1], 8);
total_carry = _mm_slli_epi64(total_carry, 64 - count);
for (unsigned int i = 0; i < 2; ++i) {
__m128i carry = _mm_bsrli_si128(data[i], 8);
carry = _mm_slli_epi64(carry, 64 - count);
res[i] = _mm_srli_epi64(data[i], count);
res[i] = _mm_or_si128(res[i], carry);
}
res[0] = _mm_or_si128(res[0], total_carry);
}
#define mm128_shift_right_256(res, data, count) \
do { \
const __m128i total_carry = _mm_slli_epi64(_mm_bslli_si128(data[1], 8), 64 - count); \
__m128i carry = _mm_slli_epi64(_mm_bsrli_si128(data[0], 8), 64 - count); \
res[0] = _mm_or_si128(_mm_srli_epi64(data[0], count), carry); \
carry = _mm_slli_epi64(_mm_bsrli_si128(data[1], 8), 64 - count); \
res[1] = _mm_or_si128(_mm_srli_epi64(data[1], count), carry); \
res[0] = _mm_or_si128(res[0], total_carry); \
} while (0)
static inline void FN_ATTRIBUTES_SSE2 mm128_shift_left_256(__m128i res[2], __m128i const data[2],
const unsigned int count) {
__m128i total_carry = _mm_bsrli_si128(data[0], 8);
total_carry = _mm_srli_epi64(total_carry, 64 - count);
for (unsigned int i = 0; i < 2; ++i) {
__m128i carry = _mm_bslli_si128(data[i], 8);
carry = _mm_srli_epi64(carry, 64 - count);
res[i] = _mm_slli_epi64(data[i], count);
res[i] = _mm_or_si128(res[i], carry);
}
res[1] = _mm_or_si128(res[1], total_carry);
}
#define mm128_shift_left_256(res, data, count) \
do { \
const __m128i total_carry = _mm_srli_epi64(_mm_bsrli_si128(data[0], 8), 64 - count); \
__m128i carry = _mm_srli_epi64(_mm_bslli_si128(data[0], 8), 64 - count); \
res[0] = _mm_or_si128(_mm_slli_epi64(data[0], count), carry); \
carry = _mm_srli_epi64(_mm_bslli_si128(data[1], 8), 64 - count); \
res[1] = _mm_or_si128(_mm_slli_epi64(data[1], count), carry); \
res[1] = _mm_or_si128(res[1], total_carry); \
} while (0)
/* shift left by 64 to 127 bits */
#define mm128_shift_left_64_127(data, count) _mm_slli_epi64(_mm_bslli_si128(data, 8), count - 64)
/* shift right by 64 to 127 bits */
#define mm128_shift_right_64_127(data, count) _mm_srli_epi64(_mm_bsrli_si128(data, 8), count - 64)
static inline void FN_ATTRIBUTES_SSE2 mm128_rotate_left_256(__m128i res[2], __m128i const data[2],
const unsigned int count) {
const __m128i carry = mm128_shift_right_64_127(data[0], 128 - count);
#define mm128_rotate_left_256(res, data, count) \
do { \
const __m128i carry = mm128_shift_right_64_127(data[0], 128 - count); \
\
res[0] = _mm_or_si128(mm128_shift_left(data[0], count), \
mm128_shift_right_64_127(data[1], 128 - count)); \
res[1] = _mm_or_si128(mm128_shift_left(data[1], count), carry); \
} while (0)
res[0] = _mm_or_si128(mm128_shift_left(data[0], count),
mm128_shift_right_64_127(data[1], 128 - count));
res[1] = _mm_or_si128(mm128_shift_left(data[1], count), carry);
}
static inline void FN_ATTRIBUTES_SSE2 mm128_rotate_right_256(__m128i res[2], __m128i const data[2],
const unsigned int count) {
const __m128i carry = mm128_shift_left_64_127(data[0], 128 - count);
res[0] = _mm_or_si128(mm128_shift_right(data[0], count),
mm128_shift_left_64_127(data[1], 128 - count));
res[1] = _mm_or_si128(mm128_shift_right(data[1], count), carry);
}
#define mm128_rotate_right_256(res, data, count) \
do { \
const __m128i carry = mm128_shift_left_64_127(data[0], 128 - count); \
\
res[0] = _mm_or_si128(mm128_shift_right(data[0], count), \
mm128_shift_left_64_127(data[1], 128 - count)); \
res[1] = _mm_or_si128(mm128_shift_right(data[1], count), carry); \
} while (0)
#endif
#if defined(WITH_NEON)
typedef uint64x2_t word128;
#define mm128_zero vmovq_n_u64(0)
#define mm128_load(s) vld1q_u64(s)
#define mm128_store(d, s) vst1q_u64(d, s)
#define mm128_xor(l, r) veorq_u64((l), (r))
#define mm128_and(l, r) vandq_u64((l), (r))
/* !l & r, requires l to be an immediate */
#define mm128_nand(l, r) vbicq_u64((r), (l))
#define mm128_broadcast_u64(x) vdupq_n_u64((x))
#define mm128_sl_u64(x, s) \
__builtin_choose_expr(__builtin_constant_p(s), vshlq_n_u64((x), (s)), \
vshlq_u64((x), vdupq_n_s64(s)))
#define mm128_sr_u64(x, s) \
__builtin_choose_expr(__builtin_constant_p(s), vshrq_n_u64((x), (s)), \
vshlq_u64((x), vdupq_n_s64(-(int64_t)(s))))
/* bit shifts up to 63 bits */
#define mm128_sl_u64(x, s) vshlq_n_u64((x), (s))
#define mm128_sr_u64(x, s) vshrq_n_u64((x), (s))
// clang-format off
apply_region(mm128_xor_region, word128, mm128_xor, FN_ATTRIBUTES_NEON)

View File

@ -125,7 +125,7 @@ OQS_SIG *OQS_SIG_picnic_L1_FS_new() {
return NULL;
}
sig->method_name = OQS_SIG_alg_picnic_L1_FS;
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.10";
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.11";
sig->claimed_nist_level = 1;
sig->euf_cma = true;
@ -164,7 +164,7 @@ OQS_SIG *OQS_SIG_picnic_L1_UR_new() {
return NULL;
}
sig->method_name = OQS_SIG_alg_picnic_L1_UR;
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.10";
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.11";
sig->claimed_nist_level = 1;
sig->euf_cma = true;
@ -203,7 +203,7 @@ OQS_SIG *OQS_SIG_picnic_L1_full_new() {
return NULL;
}
sig->method_name = OQS_SIG_alg_picnic_L1_full;
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.10";
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.11";
sig->claimed_nist_level = 1;
sig->euf_cma = true;
@ -242,7 +242,7 @@ OQS_SIG *OQS_SIG_picnic_L3_FS_new() {
return NULL;
}
sig->method_name = OQS_SIG_alg_picnic_L3_FS;
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.10";
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.11";
sig->claimed_nist_level = 3;
sig->euf_cma = true;
@ -281,7 +281,7 @@ OQS_SIG *OQS_SIG_picnic_L3_UR_new() {
return NULL;
}
sig->method_name = OQS_SIG_alg_picnic_L3_UR;
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.10";
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.11";
sig->claimed_nist_level = 3;
sig->euf_cma = true;
@ -320,7 +320,7 @@ OQS_SIG *OQS_SIG_picnic_L3_full_new() {
return NULL;
}
sig->method_name = OQS_SIG_alg_picnic_L3_full;
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.10";
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.11";
sig->claimed_nist_level = 3;
sig->euf_cma = true;
@ -359,7 +359,7 @@ OQS_SIG *OQS_SIG_picnic_L5_FS_new() {
return NULL;
}
sig->method_name = OQS_SIG_alg_picnic_L5_FS;
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.10";
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.11";
sig->claimed_nist_level = 5;
sig->euf_cma = true;
@ -399,7 +399,7 @@ OQS_SIG *OQS_SIG_picnic_L5_UR_new() {
}
sig->method_name = OQS_SIG_alg_picnic_L5_UR;
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.10";
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.11";
sig->claimed_nist_level = 5;
sig->euf_cma = true;
@ -438,7 +438,7 @@ OQS_SIG *OQS_SIG_picnic_L5_full_new() {
return NULL;
}
sig->method_name = OQS_SIG_alg_picnic_L5_full;
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.10";
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.11";
sig->claimed_nist_level = 5;
sig->euf_cma = true;
@ -475,7 +475,7 @@ OQS_SIG *OQS_SIG_picnic3_L1_new() {
return NULL;
}
sig->method_name = OQS_SIG_alg_picnic3_L1;
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.10";
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.11";
sig->claimed_nist_level = 1;
sig->euf_cma = true;
@ -513,7 +513,7 @@ OQS_SIG *OQS_SIG_picnic3_L3_new() {
return NULL;
}
sig->method_name = OQS_SIG_alg_picnic3_L3;
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.10";
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.11";
sig->claimed_nist_level = 3;
sig->euf_cma = true;
@ -551,7 +551,7 @@ OQS_SIG *OQS_SIG_picnic3_L5_new() {
return NULL;
}
sig->method_name = OQS_SIG_alg_picnic3_L5;
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.10";
sig->alg_version = "https://github.com/IAIK/Picnic/tree/v3.0.11";
sig->claimed_nist_level = 5;
sig->euf_cma = true;