From ce91b64f00e8fb871c440ce66342f5361126ccc3 Mon Sep 17 00:00:00 2001 From: Nick Terrell Date: Mon, 26 Jun 2017 18:33:04 -0700 Subject: [PATCH 01/14] [linux-kernel] Update patches for v2 * Reduce stack usage of many zstd functions, none use over 388 B anymore. * Remove an incorrect `const` in `xxhash`. --- contrib/linux-kernel/0000-cover-letter.patch | 96 ++++ .../0001-lib-Add-xxhash-module.patch | 20 +- .../0002-lib-Add-zstd-modules.patch | 537 ++++++++++-------- .../0003-btrfs-Add-zstd-support.patch | 2 +- .../0004-squashfs-Add-zstd-support.patch | 2 +- contrib/linux-kernel/lib/xxhash.c | 2 +- contrib/linux-kernel/lib/zstd/compress.c | 47 +- contrib/linux-kernel/lib/zstd/decompress.c | 52 +- .../linux-kernel/lib/zstd/entropy_common.c | 5 +- contrib/linux-kernel/lib/zstd/fse.h | 13 +- contrib/linux-kernel/lib/zstd/fse_compress.c | 100 +--- .../linux-kernel/lib/zstd/fse_decompress.c | 33 +- contrib/linux-kernel/lib/zstd/huf.h | 49 +- contrib/linux-kernel/lib/zstd/huf_compress.c | 72 ++- .../linux-kernel/lib/zstd/huf_decompress.c | 92 ++- contrib/linux-kernel/lib/zstd/zstd_common.c | 2 +- contrib/linux-kernel/test/Makefile | 12 +- .../linux-kernel/test/include/linux/math64.h | 11 + 18 files changed, 691 insertions(+), 456 deletions(-) create mode 100644 contrib/linux-kernel/0000-cover-letter.patch create mode 100644 contrib/linux-kernel/test/include/linux/math64.h diff --git a/contrib/linux-kernel/0000-cover-letter.patch b/contrib/linux-kernel/0000-cover-letter.patch new file mode 100644 index 000000000..763b5a9cb --- /dev/null +++ b/contrib/linux-kernel/0000-cover-letter.patch @@ -0,0 +1,96 @@ +From 8bc9a0ae5c86a6d02d9a5274b9965ddac0e8d330 Mon Sep 17 00:00:00 2001 +From: Nick Terrell +Date: Wed, 28 Jun 2017 22:00:00 -0700 +Subject: [PATCH v2 0/4] Add xxhash and zstd modules + +Hi all, + +This patch set adds xxhash, zstd compression, and zstd decompression +modules. It also adds zstd support to BtrFS and SquashFS. + +Each patch has relevant summaries, benchmarks, and tests. + +Best, +Nick Terrell + +Changelog: + +v1 -> v2: +- Make pointer in lib/xxhash.c:394 non-const (1/4) +- Use div_u64() for division of u64s (2/4) +- Reduce stack usage of ZSTD_compressSequences(), ZSTD_buildSeqTable(), + ZSTD_decompressSequencesLong(), FSE_buildDTable(), FSE_decompress_wksp(), + HUF_writeCTable(), HUF_readStats(), HUF_readCTable(), + HUF_compressWeights(), HUF_readDTableX2(), and HUF_readDTableX4() (2/4) +- No zstd function uses more than 400 B of stack space (2/4) + +Nick Terrell (4): + lib: Add xxhash module + lib: Add zstd modules + btrfs: Add zstd support + squashfs: Add zstd support + + fs/btrfs/Kconfig | 2 + + fs/btrfs/Makefile | 2 +- + fs/btrfs/compression.c | 1 + + fs/btrfs/compression.h | 6 +- + fs/btrfs/ctree.h | 1 + + fs/btrfs/disk-io.c | 2 + + fs/btrfs/ioctl.c | 6 +- + fs/btrfs/props.c | 6 + + fs/btrfs/super.c | 12 +- + fs/btrfs/sysfs.c | 2 + + fs/btrfs/zstd.c | 433 ++++++ + fs/squashfs/Kconfig | 14 + + fs/squashfs/Makefile | 1 + + fs/squashfs/decompressor.c | 7 + + fs/squashfs/decompressor.h | 4 + + fs/squashfs/squashfs_fs.h | 1 + + fs/squashfs/zstd_wrapper.c | 150 ++ + include/linux/xxhash.h | 236 +++ + include/linux/zstd.h | 1157 +++++++++++++++ + include/uapi/linux/btrfs.h | 8 +- + lib/Kconfig | 11 + + lib/Makefile | 3 + + lib/xxhash.c | 500 +++++++ + lib/zstd/Makefile | 18 + + lib/zstd/bitstream.h | 374 +++++ + lib/zstd/compress.c | 3479 ++++++++++++++++++++++++++++++++++++++++++++ + lib/zstd/decompress.c | 2526 ++++++++++++++++++++++++++++++++ + lib/zstd/entropy_common.c | 243 ++++ + lib/zstd/error_private.h | 53 + + lib/zstd/fse.h | 575 ++++++++ + lib/zstd/fse_compress.c | 795 ++++++++++ + lib/zstd/fse_decompress.c | 332 +++++ + lib/zstd/huf.h | 212 +++ + lib/zstd/huf_compress.c | 771 ++++++++++ + lib/zstd/huf_decompress.c | 960 ++++++++++++ + lib/zstd/mem.h | 151 ++ + lib/zstd/zstd_common.c | 75 + + lib/zstd/zstd_internal.h | 269 ++++ + lib/zstd/zstd_opt.h | 1014 +++++++++++++ + 39 files changed, 14400 insertions(+), 12 deletions(-) + create mode 100644 fs/btrfs/zstd.c + create mode 100644 fs/squashfs/zstd_wrapper.c + create mode 100644 include/linux/xxhash.h + create mode 100644 include/linux/zstd.h + create mode 100644 lib/xxhash.c + create mode 100644 lib/zstd/Makefile + create mode 100644 lib/zstd/bitstream.h + create mode 100644 lib/zstd/compress.c + create mode 100644 lib/zstd/decompress.c + create mode 100644 lib/zstd/entropy_common.c + create mode 100644 lib/zstd/error_private.h + create mode 100644 lib/zstd/fse.h + create mode 100644 lib/zstd/fse_compress.c + create mode 100644 lib/zstd/fse_decompress.c + create mode 100644 lib/zstd/huf.h + create mode 100644 lib/zstd/huf_compress.c + create mode 100644 lib/zstd/huf_decompress.c + create mode 100644 lib/zstd/mem.h + create mode 100644 lib/zstd/zstd_common.c + create mode 100644 lib/zstd/zstd_internal.h + create mode 100644 lib/zstd/zstd_opt.h + +-- +2.9.3 diff --git a/contrib/linux-kernel/0001-lib-Add-xxhash-module.patch b/contrib/linux-kernel/0001-lib-Add-xxhash-module.patch index 9a8f50a25..84a2c53c6 100644 --- a/contrib/linux-kernel/0001-lib-Add-xxhash-module.patch +++ b/contrib/linux-kernel/0001-lib-Add-xxhash-module.patch @@ -1,7 +1,7 @@ -From e75beb7c2e05550b2846e31ad8a0082c188504da Mon Sep 17 00:00:00 2001 +From 5ac909c415ab4a18fd90794793c96e450795e8c6 Mon Sep 17 00:00:00 2001 From: Nick Terrell -Date: Wed, 21 Jun 2017 17:27:42 -0700 -Subject: [PATCH 1/4] lib: Add xxhash module +Date: Wed, 21 Jun 2017 17:37:36 -0700 +Subject: [PATCH v2 1/4] lib: Add xxhash module Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an extremely fast non-cryptographic hash algorithm for checksumming. @@ -73,6 +73,9 @@ XXHash source repository: https://github.com/cyan4973/xxhash Signed-off-by: Nick Terrell --- +v1 -> v2: +- Make pointer in lib/xxhash.c:394 non-const + include/linux/xxhash.h | 236 +++++++++++++++++++++++ lib/Kconfig | 3 + lib/Makefile | 1 + @@ -330,7 +333,7 @@ index 0c8b78a..b6009d7 100644 @@ -184,6 +184,9 @@ config CRC8 when they need to do cyclic redundancy check according CRC8 algorithm. Module will be called crc8. - + +config XXHASH + tristate + @@ -347,11 +350,11 @@ index 0166fbc..1338226 100644 obj-$(CONFIG_CRC8) += crc8.o +obj-$(CONFIG_XXHASH) += xxhash.o obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o - + obj-$(CONFIG_842_COMPRESS) += 842/ diff --git a/lib/xxhash.c b/lib/xxhash.c new file mode 100644 -index 0000000..dc94904 +index 0000000..aa61e2a --- /dev/null +++ b/lib/xxhash.c @@ -0,0 +1,500 @@ @@ -748,7 +751,7 @@ index 0000000..dc94904 + } + + if (state->memsize) { /* tmp buffer is full */ -+ const uint64_t *p64 = state->mem64; ++ uint64_t *p64 = state->mem64; + + memcpy(((uint8_t *)p64) + state->memsize, input, + 32 - state->memsize); @@ -855,6 +858,5 @@ index 0000000..dc94904 + +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_DESCRIPTION("xxHash"); --- +-- 2.9.3 - diff --git a/contrib/linux-kernel/0002-lib-Add-zstd-modules.patch b/contrib/linux-kernel/0002-lib-Add-zstd-modules.patch index f94afe362..971093996 100644 --- a/contrib/linux-kernel/0002-lib-Add-zstd-modules.patch +++ b/contrib/linux-kernel/0002-lib-Add-zstd-modules.patch @@ -1,7 +1,7 @@ -From b52ae824ae6c0f7c7786380b34da9daaa54bfc26 Mon Sep 17 00:00:00 2001 +From d2626127c6d6e60e940dd9a3ed58323bdcdc4930 Mon Sep 17 00:00:00 2001 From: Nick Terrell -Date: Wed, 21 Jun 2017 17:31:24 -0700 -Subject: [PATCH 2/4] lib: Add zstd modules +Date: Tue, 16 May 2017 14:55:36 -0700 +Subject: [PATCH v2 2/4] lib: Add zstd modules Add zstd compression and decompression kernel modules. zstd offers a wide varity of compression speed and quality trade-offs. @@ -102,26 +102,34 @@ zstd source repository: https://github.com/facebook/zstd Signed-off-by: Nick Terrell --- +v1 -> v2: +- Use div_u64() for division of u64s +- Reduce stack usage of ZSTD_compressSequences(), ZSTD_buildSeqTable(), + ZSTD_decompressSequencesLong(), FSE_buildDTable(), FSE_decompress_wksp(), + HUF_writeCTable(), HUF_readStats(), HUF_readCTable(), + HUF_compressWeights(), HUF_readDTableX2(), and HUF_readDTableX4() +- No function uses more than 400 B of stack space + include/linux/zstd.h | 1157 +++++++++++++++ lib/Kconfig | 8 + lib/Makefile | 2 + lib/zstd/Makefile | 18 + lib/zstd/bitstream.h | 374 +++++ - lib/zstd/compress.c | 3468 +++++++++++++++++++++++++++++++++++++++++++++ - lib/zstd/decompress.c | 2514 ++++++++++++++++++++++++++++++++ - lib/zstd/entropy_common.c | 244 ++++ + lib/zstd/compress.c | 3479 +++++++++++++++++++++++++++++++++++++++++++++ + lib/zstd/decompress.c | 2526 ++++++++++++++++++++++++++++++++ + lib/zstd/entropy_common.c | 243 ++++ lib/zstd/error_private.h | 53 + - lib/zstd/fse.h | 584 ++++++++ - lib/zstd/fse_compress.c | 857 +++++++++++ - lib/zstd/fse_decompress.c | 313 ++++ - lib/zstd/huf.h | 203 +++ - lib/zstd/huf_compress.c | 731 ++++++++++ - lib/zstd/huf_decompress.c | 920 ++++++++++++ + lib/zstd/fse.h | 575 ++++++++ + lib/zstd/fse_compress.c | 795 +++++++++++ + lib/zstd/fse_decompress.c | 332 +++++ + lib/zstd/huf.h | 212 +++ + lib/zstd/huf_compress.c | 771 ++++++++++ + lib/zstd/huf_decompress.c | 960 +++++++++++++ lib/zstd/mem.h | 151 ++ lib/zstd/zstd_common.c | 75 + lib/zstd/zstd_internal.h | 269 ++++ lib/zstd/zstd_opt.h | 1014 +++++++++++++ - 19 files changed, 12955 insertions(+) + 19 files changed, 13014 insertions(+) create mode 100644 include/linux/zstd.h create mode 100644 lib/zstd/Makefile create mode 100644 lib/zstd/bitstream.h @@ -1741,10 +1749,10 @@ index 0000000..a826b99 +#endif /* BITSTREAM_H_MODULE */ diff --git a/lib/zstd/compress.c b/lib/zstd/compress.c new file mode 100644 -index 0000000..1aff542 +index 0000000..d60ab7d --- /dev/null +++ b/lib/zstd/compress.c -@@ -0,0 +1,3468 @@ +@@ -0,0 +1,3479 @@ +/** + * Copyright (c) 2016-present, Yann Collet, Facebook, Inc. + * All rights reserved. @@ -1831,7 +1839,7 @@ index 0000000..1aff542 + FSE_CTable offcodeCTable[FSE_CTABLE_SIZE_U32(OffFSELog, MaxOff)]; + FSE_CTable matchlengthCTable[FSE_CTABLE_SIZE_U32(MLFSELog, MaxML)]; + FSE_CTable litlengthCTable[FSE_CTABLE_SIZE_U32(LLFSELog, MaxLL)]; -+ unsigned tmpCounters[HUF_WORKSPACE_SIZE_U32]; ++ unsigned tmpCounters[HUF_COMPRESS_WORKSPACE_SIZE_U32]; +}; + +size_t ZSTD_CCtxWorkspaceBound(ZSTD_compressionParameters cParams) @@ -2334,8 +2342,6 @@ index 0000000..1aff542 +{ + const int longOffsets = zc->params.cParams.windowLog > STREAM_ACCUMULATOR_MIN; + const seqStore_t *seqStorePtr = &(zc->seqStore); -+ U32 count[MaxSeq + 1]; -+ S16 norm[MaxSeq + 1]; + FSE_CTable *CTable_LitLength = zc->litlengthCTable; + FSE_CTable *CTable_OffsetBits = zc->offcodeCTable; + FSE_CTable *CTable_MatchLength = zc->matchlengthCTable; @@ -2349,7 +2355,21 @@ index 0000000..1aff542 + BYTE *op = ostart; + size_t const nbSeq = seqStorePtr->sequences - seqStorePtr->sequencesStart; + BYTE *seqHead; -+ BYTE scratchBuffer[1 << MAX(MLFSELog, LLFSELog)]; ++ ++ U32 *count; ++ S16 *norm; ++ U32 *workspace; ++ size_t workspaceSize = sizeof(zc->tmpCounters); ++ { ++ size_t spaceUsed32 = 0; ++ count = (U32 *)zc->tmpCounters + spaceUsed32; ++ spaceUsed32 += MaxSeq + 1; ++ norm = (S16 *)((U32 *)zc->tmpCounters + spaceUsed32); ++ spaceUsed32 += ALIGN(sizeof(S16) * (MaxSeq + 1), sizeof(U32)) >> 2; ++ ++ workspace = (U32 *)zc->tmpCounters + spaceUsed32; ++ workspaceSize -= (spaceUsed32 << 2); ++ } + + /* Compress literals */ + { @@ -2385,7 +2405,7 @@ index 0000000..1aff542 + /* CTable for Literal Lengths */ + { + U32 max = MaxLL; -+ size_t const mostFrequent = FSE_countFast_wksp(count, &max, llCodeTable, nbSeq, zc->tmpCounters); ++ size_t const mostFrequent = FSE_countFast_wksp(count, &max, llCodeTable, nbSeq, workspace); + if ((mostFrequent == nbSeq) && (nbSeq > 2)) { + *op++ = llCodeTable[0]; + FSE_buildCTable_rle(CTable_LitLength, (BYTE)max); @@ -2393,7 +2413,7 @@ index 0000000..1aff542 + } else if ((zc->flagStaticTables) && (nbSeq < MAX_SEQ_FOR_STATIC_FSE)) { + LLtype = set_repeat; + } else if ((nbSeq < MIN_SEQ_FOR_DYNAMIC_FSE) || (mostFrequent < (nbSeq >> (LL_defaultNormLog - 1)))) { -+ FSE_buildCTable_wksp(CTable_LitLength, LL_defaultNorm, MaxLL, LL_defaultNormLog, scratchBuffer, sizeof(scratchBuffer)); ++ FSE_buildCTable_wksp(CTable_LitLength, LL_defaultNorm, MaxLL, LL_defaultNormLog, workspace, workspaceSize); + LLtype = set_basic; + } else { + size_t nbSeq_1 = nbSeq; @@ -2409,7 +2429,7 @@ index 0000000..1aff542 + return NCountSize; + op += NCountSize; + } -+ FSE_buildCTable_wksp(CTable_LitLength, norm, max, tableLog, scratchBuffer, sizeof(scratchBuffer)); ++ FSE_buildCTable_wksp(CTable_LitLength, norm, max, tableLog, workspace, workspaceSize); + LLtype = set_compressed; + } + } @@ -2417,7 +2437,7 @@ index 0000000..1aff542 + /* CTable for Offsets */ + { + U32 max = MaxOff; -+ size_t const mostFrequent = FSE_countFast_wksp(count, &max, ofCodeTable, nbSeq, zc->tmpCounters); ++ size_t const mostFrequent = FSE_countFast_wksp(count, &max, ofCodeTable, nbSeq, workspace); + if ((mostFrequent == nbSeq) && (nbSeq > 2)) { + *op++ = ofCodeTable[0]; + FSE_buildCTable_rle(CTable_OffsetBits, (BYTE)max); @@ -2425,7 +2445,7 @@ index 0000000..1aff542 + } else if ((zc->flagStaticTables) && (nbSeq < MAX_SEQ_FOR_STATIC_FSE)) { + Offtype = set_repeat; + } else if ((nbSeq < MIN_SEQ_FOR_DYNAMIC_FSE) || (mostFrequent < (nbSeq >> (OF_defaultNormLog - 1)))) { -+ FSE_buildCTable_wksp(CTable_OffsetBits, OF_defaultNorm, MaxOff, OF_defaultNormLog, scratchBuffer, sizeof(scratchBuffer)); ++ FSE_buildCTable_wksp(CTable_OffsetBits, OF_defaultNorm, MaxOff, OF_defaultNormLog, workspace, workspaceSize); + Offtype = set_basic; + } else { + size_t nbSeq_1 = nbSeq; @@ -2441,7 +2461,7 @@ index 0000000..1aff542 + return NCountSize; + op += NCountSize; + } -+ FSE_buildCTable_wksp(CTable_OffsetBits, norm, max, tableLog, scratchBuffer, sizeof(scratchBuffer)); ++ FSE_buildCTable_wksp(CTable_OffsetBits, norm, max, tableLog, workspace, workspaceSize); + Offtype = set_compressed; + } + } @@ -2449,7 +2469,7 @@ index 0000000..1aff542 + /* CTable for MatchLengths */ + { + U32 max = MaxML; -+ size_t const mostFrequent = FSE_countFast_wksp(count, &max, mlCodeTable, nbSeq, zc->tmpCounters); ++ size_t const mostFrequent = FSE_countFast_wksp(count, &max, mlCodeTable, nbSeq, workspace); + if ((mostFrequent == nbSeq) && (nbSeq > 2)) { + *op++ = *mlCodeTable; + FSE_buildCTable_rle(CTable_MatchLength, (BYTE)max); @@ -2457,7 +2477,7 @@ index 0000000..1aff542 + } else if ((zc->flagStaticTables) && (nbSeq < MAX_SEQ_FOR_STATIC_FSE)) { + MLtype = set_repeat; + } else if ((nbSeq < MIN_SEQ_FOR_DYNAMIC_FSE) || (mostFrequent < (nbSeq >> (ML_defaultNormLog - 1)))) { -+ FSE_buildCTable_wksp(CTable_MatchLength, ML_defaultNorm, MaxML, ML_defaultNormLog, scratchBuffer, sizeof(scratchBuffer)); ++ FSE_buildCTable_wksp(CTable_MatchLength, ML_defaultNorm, MaxML, ML_defaultNormLog, workspace, workspaceSize); + MLtype = set_basic; + } else { + size_t nbSeq_1 = nbSeq; @@ -2473,7 +2493,7 @@ index 0000000..1aff542 + return NCountSize; + op += NCountSize; + } -+ FSE_buildCTable_wksp(CTable_MatchLength, norm, max, tableLog, scratchBuffer, sizeof(scratchBuffer)); ++ FSE_buildCTable_wksp(CTable_MatchLength, norm, max, tableLog, workspace, workspaceSize); + MLtype = set_compressed; + } + } @@ -4359,14 +4379,13 @@ index 0000000..1aff542 + const BYTE *const dictEnd = dictPtr + dictSize; + short offcodeNCount[MaxOff + 1]; + unsigned offcodeMaxValue = MaxOff; -+ BYTE scratchBuffer[1 << MAX(MLFSELog, LLFSELog)]; + + dictPtr += 4; /* skip magic number */ + cctx->dictID = cctx->params.fParams.noDictIDFlag ? 0 : ZSTD_readLE32(dictPtr); + dictPtr += 4; + + { -+ size_t const hufHeaderSize = HUF_readCTable(cctx->hufTable, 255, dictPtr, dictEnd - dictPtr); ++ size_t const hufHeaderSize = HUF_readCTable_wksp(cctx->hufTable, 255, dictPtr, dictEnd - dictPtr, cctx->tmpCounters, sizeof(cctx->tmpCounters)); + if (HUF_isError(hufHeaderSize)) + return ERROR(dictionary_corrupted); + dictPtr += hufHeaderSize; @@ -4380,7 +4399,7 @@ index 0000000..1aff542 + if (offcodeLog > OffFSELog) + return ERROR(dictionary_corrupted); + /* Defer checking offcodeMaxValue because we need to know the size of the dictionary content */ -+ CHECK_E(FSE_buildCTable_wksp(cctx->offcodeCTable, offcodeNCount, offcodeMaxValue, offcodeLog, scratchBuffer, sizeof(scratchBuffer)), ++ CHECK_E(FSE_buildCTable_wksp(cctx->offcodeCTable, offcodeNCount, offcodeMaxValue, offcodeLog, cctx->tmpCounters, sizeof(cctx->tmpCounters)), + dictionary_corrupted); + dictPtr += offcodeHeaderSize; + } @@ -4396,7 +4415,7 @@ index 0000000..1aff542 + /* Every match length code must have non-zero probability */ + CHECK_F(ZSTD_checkDictNCount(matchlengthNCount, matchlengthMaxValue, MaxML)); + CHECK_E( -+ FSE_buildCTable_wksp(cctx->matchlengthCTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog, scratchBuffer, sizeof(scratchBuffer)), ++ FSE_buildCTable_wksp(cctx->matchlengthCTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog, cctx->tmpCounters, sizeof(cctx->tmpCounters)), + dictionary_corrupted); + dictPtr += matchlengthHeaderSize; + } @@ -4411,7 +4430,7 @@ index 0000000..1aff542 + return ERROR(dictionary_corrupted); + /* Every literal length code must have non-zero probability */ + CHECK_F(ZSTD_checkDictNCount(litlengthNCount, litlengthMaxValue, MaxLL)); -+ CHECK_E(FSE_buildCTable_wksp(cctx->litlengthCTable, litlengthNCount, litlengthMaxValue, litlengthLog, scratchBuffer, sizeof(scratchBuffer)), ++ CHECK_E(FSE_buildCTable_wksp(cctx->litlengthCTable, litlengthNCount, litlengthMaxValue, litlengthLog, cctx->tmpCounters, sizeof(cctx->tmpCounters)), + dictionary_corrupted); + dictPtr += litlengthHeaderSize; + } @@ -5215,10 +5234,10 @@ index 0000000..1aff542 +MODULE_DESCRIPTION("Zstd Compressor"); diff --git a/lib/zstd/decompress.c b/lib/zstd/decompress.c new file mode 100644 -index 0000000..ec673d7 +index 0000000..62449ae --- /dev/null +++ b/lib/zstd/decompress.c -@@ -0,0 +1,2514 @@ +@@ -0,0 +1,2526 @@ +/** + * Copyright (c) 2016-present, Yann Collet, Facebook, Inc. + * All rights reserved. @@ -5291,6 +5310,7 @@ index 0000000..ec673d7 + FSE_DTable OFTable[FSE_DTABLE_SIZE_U32(OffFSELog)]; + FSE_DTable MLTable[FSE_DTABLE_SIZE_U32(MLFSELog)]; + HUF_DTable hufTable[HUF_DTABLE_SIZE(HufLog)]; /* can accommodate HUF_decompress4X */ ++ U64 workspace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32 / 2]; + U32 rep[ZSTD_REP_NUM]; +} ZSTD_entropyTables_t; + @@ -5704,8 +5724,10 @@ index 0000000..ec673d7 + ? (singleStream ? HUF_decompress1X_usingDTable(dctx->litBuffer, litSize, istart + lhSize, litCSize, dctx->HUFptr) + : HUF_decompress4X_usingDTable(dctx->litBuffer, litSize, istart + lhSize, litCSize, dctx->HUFptr)) + : (singleStream -+ ? HUF_decompress1X2_DCtx(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart + lhSize, litCSize) -+ : HUF_decompress4X_hufOnly(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart + lhSize, litCSize)))) ++ ? HUF_decompress1X2_DCtx_wksp(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart + lhSize, litCSize, ++ dctx->entropy.workspace, sizeof(dctx->entropy.workspace)) ++ : HUF_decompress4X_hufOnly_wksp(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart + lhSize, litCSize, ++ dctx->entropy.workspace, sizeof(dctx->entropy.workspace))))) + return ERROR(corruption_detected); + + dctx->litPtr = dctx->litBuffer; @@ -5968,7 +5990,7 @@ index 0000000..ec673d7 + or an error code if it fails, testable with ZSTD_isError() +*/ +static size_t ZSTD_buildSeqTable(FSE_DTable *DTableSpace, const FSE_DTable **DTablePtr, symbolEncodingType_e type, U32 max, U32 maxLog, const void *src, -+ size_t srcSize, const FSE_decode_t4 *defaultTable, U32 flagRepeatTable) ++ size_t srcSize, const FSE_decode_t4 *defaultTable, U32 flagRepeatTable, void *workspace, size_t workspaceSize) +{ + const void *const tmpPtr = defaultTable; /* bypass strict aliasing */ + switch (type) { @@ -5988,15 +6010,23 @@ index 0000000..ec673d7 + default: /* impossible */ + case set_compressed: { + U32 tableLog; -+ S16 norm[MaxSeq + 1]; -+ size_t const headerSize = FSE_readNCount(norm, &max, &tableLog, src, srcSize); -+ if (FSE_isError(headerSize)) -+ return ERROR(corruption_detected); -+ if (tableLog > maxLog) -+ return ERROR(corruption_detected); -+ FSE_buildDTable(DTableSpace, norm, max, tableLog); -+ *DTablePtr = DTableSpace; -+ return headerSize; ++ S16 *norm = (S16 *)workspace; ++ size_t const spaceUsed32 = ALIGN(sizeof(S16) * (MaxSeq + 1), sizeof(U32)) >> 2; ++ ++ if ((spaceUsed32 << 2) > workspaceSize) ++ return ERROR(GENERIC); ++ workspace = (U32 *)workspace + spaceUsed32; ++ workspaceSize -= (spaceUsed32 << 2); ++ { ++ size_t const headerSize = FSE_readNCount(norm, &max, &tableLog, src, srcSize); ++ if (FSE_isError(headerSize)) ++ return ERROR(corruption_detected); ++ if (tableLog > maxLog) ++ return ERROR(corruption_detected); ++ FSE_buildDTable_wksp(DTableSpace, norm, max, tableLog, workspace, workspaceSize); ++ *DTablePtr = DTableSpace; ++ return headerSize; ++ } + } + } +} @@ -6044,21 +6074,21 @@ index 0000000..ec673d7 + /* Build DTables */ + { + size_t const llhSize = ZSTD_buildSeqTable(dctx->entropy.LLTable, &dctx->LLTptr, LLtype, MaxLL, LLFSELog, ip, iend - ip, -+ LL_defaultDTable, dctx->fseEntropy); ++ LL_defaultDTable, dctx->fseEntropy, dctx->entropy.workspace, sizeof(dctx->entropy.workspace)); + if (ZSTD_isError(llhSize)) + return ERROR(corruption_detected); + ip += llhSize; + } + { + size_t const ofhSize = ZSTD_buildSeqTable(dctx->entropy.OFTable, &dctx->OFTptr, OFtype, MaxOff, OffFSELog, ip, iend - ip, -+ OF_defaultDTable, dctx->fseEntropy); ++ OF_defaultDTable, dctx->fseEntropy, dctx->entropy.workspace, sizeof(dctx->entropy.workspace)); + if (ZSTD_isError(ofhSize)) + return ERROR(corruption_detected); + ip += ofhSize; + } + { + size_t const mlhSize = ZSTD_buildSeqTable(dctx->entropy.MLTable, &dctx->MLTptr, MLtype, MaxML, MLFSELog, ip, iend - ip, -+ ML_defaultDTable, dctx->fseEntropy); ++ ML_defaultDTable, dctx->fseEntropy, dctx->entropy.workspace, sizeof(dctx->entropy.workspace)); + if (ZSTD_isError(mlhSize)) + return ERROR(corruption_detected); + ip += mlhSize; @@ -6581,10 +6611,11 @@ index 0000000..ec673d7 +#define STORED_SEQS 4 +#define STOSEQ_MASK (STORED_SEQS - 1) +#define ADVANCED_SEQS 4 -+ seq_t sequences[STORED_SEQS]; ++ seq_t *sequences = (seq_t *)dctx->entropy.workspace; + int const seqAdvance = MIN(nbSeq, ADVANCED_SEQS); + seqState_t seqState; + int seqNb; ++ ZSTD_STATIC_ASSERT(sizeof(dctx->entropy.workspace) >= sizeof(seq_t) * STORED_SEQS); + dctx->fseEntropy = 1; + { + U32 i; @@ -7087,7 +7118,7 @@ index 0000000..ec673d7 + dictPtr += 8; /* skip header = magic + dictID */ + + { -+ size_t const hSize = HUF_readDTableX4(entropy->hufTable, dictPtr, dictEnd - dictPtr); ++ size_t const hSize = HUF_readDTableX4_wksp(entropy->hufTable, dictPtr, dictEnd - dictPtr, entropy->workspace, sizeof(entropy->workspace)); + if (HUF_isError(hSize)) + return ERROR(dictionary_corrupted); + dictPtr += hSize; @@ -7101,7 +7132,7 @@ index 0000000..ec673d7 + return ERROR(dictionary_corrupted); + if (offcodeLog > OffFSELog) + return ERROR(dictionary_corrupted); -+ CHECK_E(FSE_buildDTable(entropy->OFTable, offcodeNCount, offcodeMaxValue, offcodeLog), dictionary_corrupted); ++ CHECK_E(FSE_buildDTable_wksp(entropy->OFTable, offcodeNCount, offcodeMaxValue, offcodeLog, entropy->workspace, sizeof(entropy->workspace)), dictionary_corrupted); + dictPtr += offcodeHeaderSize; + } + @@ -7113,7 +7144,7 @@ index 0000000..ec673d7 + return ERROR(dictionary_corrupted); + if (matchlengthLog > MLFSELog) + return ERROR(dictionary_corrupted); -+ CHECK_E(FSE_buildDTable(entropy->MLTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog), dictionary_corrupted); ++ CHECK_E(FSE_buildDTable_wksp(entropy->MLTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog, entropy->workspace, sizeof(entropy->workspace)), dictionary_corrupted); + dictPtr += matchlengthHeaderSize; + } + @@ -7125,7 +7156,7 @@ index 0000000..ec673d7 + return ERROR(dictionary_corrupted); + if (litlengthLog > LLFSELog) + return ERROR(dictionary_corrupted); -+ CHECK_E(FSE_buildDTable(entropy->LLTable, litlengthNCount, litlengthMaxValue, litlengthLog), dictionary_corrupted); ++ CHECK_E(FSE_buildDTable_wksp(entropy->LLTable, litlengthNCount, litlengthMaxValue, litlengthLog, entropy->workspace, sizeof(entropy->workspace)), dictionary_corrupted); + dictPtr += litlengthHeaderSize; + } + @@ -7735,10 +7766,10 @@ index 0000000..ec673d7 +MODULE_DESCRIPTION("Zstd Decompressor"); diff --git a/lib/zstd/entropy_common.c b/lib/zstd/entropy_common.c new file mode 100644 -index 0000000..b354fc2 +index 0000000..2b0a643 --- /dev/null +++ b/lib/zstd/entropy_common.c -@@ -0,0 +1,244 @@ +@@ -0,0 +1,243 @@ +/* + * Common functions of New Generation Entropy library + * Copyright (C) 2016, Yann Collet. @@ -7905,7 +7936,7 @@ index 0000000..b354fc2 + @return : size read from `src` , or an error Code . + Note : Needed by HUF_readCTable() and HUF_readDTableX?() . +*/ -+size_t HUF_readStats(BYTE *huffWeight, size_t hwSize, U32 *rankStats, U32 *nbSymbolsPtr, U32 *tableLogPtr, const void *src, size_t srcSize) ++size_t HUF_readStats_wksp(BYTE *huffWeight, size_t hwSize, U32 *rankStats, U32 *nbSymbolsPtr, U32 *tableLogPtr, const void *src, size_t srcSize, void *workspace, size_t workspaceSize) +{ + U32 weightTotal; + const BYTE *ip = (const BYTE *)src; @@ -7933,10 +7964,9 @@ index 0000000..b354fc2 + } + } + } else { /* header compressed with FSE (normal case) */ -+ FSE_DTable fseWorkspace[FSE_DTABLE_SIZE_U32(6)]; /* 6 is max possible tableLog for HUF header (maybe even 5, to be tested) */ + if (iSize + 1 > srcSize) + return ERROR(srcSize_wrong); -+ oSize = FSE_decompress_wksp(huffWeight, hwSize - 1, ip + 1, iSize, fseWorkspace, 6); /* max (hwSize-1) values decoded, as last one is implied */ ++ oSize = FSE_decompress_wksp(huffWeight, hwSize - 1, ip + 1, iSize, 6, workspace, workspaceSize); /* max (hwSize-1) values decoded, as last one is implied */ + if (FSE_isError(oSize)) + return oSize; + } @@ -8044,10 +8074,10 @@ index 0000000..1a60b31 +#endif /* ERROR_H_MODULE */ diff --git a/lib/zstd/fse.h b/lib/zstd/fse.h new file mode 100644 -index 0000000..bc2962a +index 0000000..7460ab0 --- /dev/null +++ b/lib/zstd/fse.h -@@ -0,0 +1,584 @@ +@@ -0,0 +1,575 @@ +/* + * FSE : Finite State Entropy codec + * Public Prototypes declaration @@ -8237,7 +8267,7 @@ index 0000000..bc2962a +/*! FSE_buildDTable(): + Builds 'dt', which must be already allocated, using FSE_createDTable(). + return : 0, or an errorCode, which can be tested using FSE_isError() */ -+FSE_PUBLIC_API size_t FSE_buildDTable(FSE_DTable *dt, const short *normalizedCounter, unsigned maxSymbolValue, unsigned tableLog); ++FSE_PUBLIC_API size_t FSE_buildDTable_wksp(FSE_DTable *dt, const short *normalizedCounter, unsigned maxSymbolValue, unsigned tableLog, void *workspace, size_t workspaceSize); + +/*! FSE_decompress_usingDTable(): + Decompress compressed source `cSrc` of size `cSrcSize` using `dt` @@ -8313,15 +8343,6 @@ index 0000000..bc2962a +unsigned FSE_optimalTableLog_internal(unsigned maxTableLog, size_t srcSize, unsigned maxSymbolValue, unsigned minus); +/**< same as FSE_optimalTableLog(), which used `minus==2` */ + -+/* FSE_compress_wksp() : -+ * Same as FSE_compress2(), but using an externally allocated scratch buffer (`workSpace`). -+ * FSE_WKSP_SIZE_U32() provides the minimum size required for `workSpace` as a table of FSE_CTable. -+ */ -+#define FSE_WKSP_SIZE_U32(maxTableLog, maxSymbolValue) \ -+ (FSE_CTABLE_SIZE_U32(maxTableLog, maxSymbolValue) + ((maxTableLog > 12) ? (1 << (maxTableLog - 2)) : 1024)) -+size_t FSE_compress_wksp(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace, -+ size_t wkspSize); -+ +size_t FSE_buildCTable_raw(FSE_CTable *ct, unsigned nbBits); +/**< build a fake FSE_CTable, designed for a flat distribution, where each symbol uses nbBits */ + @@ -8340,7 +8361,7 @@ index 0000000..bc2962a +size_t FSE_buildDTable_rle(FSE_DTable *dt, unsigned char symbolValue); +/**< build a fake FSE_DTable, designed to always generate the same symbolValue */ + -+size_t FSE_decompress_wksp(void *dst, size_t dstCapacity, const void *cSrc, size_t cSrcSize, FSE_DTable *workSpace, unsigned maxLog); ++size_t FSE_decompress_wksp(void *dst, size_t dstCapacity, const void *cSrc, size_t cSrcSize, unsigned maxLog, void *workspace, size_t workspaceSize); +/**< same as FSE_decompress(), using an externally allocated `workSpace` produced with `FSE_DTABLE_SIZE_U32(maxLog)` */ + +/* ***************************************** @@ -8634,10 +8655,10 @@ index 0000000..bc2962a +#endif /* FSE_H */ diff --git a/lib/zstd/fse_compress.c b/lib/zstd/fse_compress.c new file mode 100644 -index 0000000..e016bb1 +index 0000000..ef3d174 --- /dev/null +++ b/lib/zstd/fse_compress.c -@@ -0,0 +1,857 @@ +@@ -0,0 +1,795 @@ +/* + * FSE : Finite State Entropy encoder + * Copyright (C) 2013-2015, Yann Collet. @@ -8688,6 +8709,8 @@ index 0000000..e016bb1 +#include "bitstream.h" +#include "fse.h" +#include ++#include ++#include +#include /* memcpy, memset */ + +/* ************************************************************** @@ -8727,7 +8750,7 @@ index 0000000..e016bb1 + * wkspSize should be sized to handle worst case situation, which is `1<> 1 : 1); + FSE_symbolCompressionTransform *const symbolTT = (FSE_symbolCompressionTransform *)(FSCT); + U32 const step = FSE_TABLESTEP(tableSize); -+ U32 cumul[FSE_MAX_SYMBOL_VALUE + 2]; -+ -+ FSE_FUNCTION_TYPE *const tableSymbol = (FSE_FUNCTION_TYPE *)workSpace; + U32 highThreshold = tableSize - 1; + -+ /* CTable header */ -+ if (((size_t)1 << tableLog) * sizeof(FSE_FUNCTION_TYPE) > wkspSize) ++ U32 *cumul; ++ FSE_FUNCTION_TYPE *tableSymbol; ++ size_t spaceUsed32 = 0; ++ ++ cumul = (U32 *)workspace + spaceUsed32; ++ spaceUsed32 += FSE_MAX_SYMBOL_VALUE + 2; ++ tableSymbol = (FSE_FUNCTION_TYPE *)((U32 *)workspace + spaceUsed32); ++ spaceUsed32 += ALIGN(sizeof(FSE_FUNCTION_TYPE) * ((size_t)1 << tableLog), sizeof(U32)) >> 2; ++ ++ if ((spaceUsed32 << 2) > workspaceSize) + return ERROR(tableLog_tooLarge); ++ workspace = (U32 *)workspace + spaceUsed32; ++ workspaceSize -= (spaceUsed32 << 2); ++ ++ /* CTable header */ + tableU16[-2] = (U16)tableLog; + tableU16[-1] = (U16)maxSymbolValue; + @@ -9215,7 +9247,7 @@ index 0000000..e016bb1 + { + U64 const vStepLog = 62 - tableLog; + U64 const mid = (1ULL << (vStepLog - 1)) - 1; -+ U64 const rStep = ((((U64)1 << vStepLog) * ToDistribute) + mid) / total; /* scale on remaining */ ++ U64 const rStep = div_u64((((U64)1 << vStepLog) * ToDistribute) + mid, (U32)total); /* scale on remaining */ + U64 tmpTotal = mid; + for (s = 0; s <= maxSymbolValue; s++) { + if (norm[s] == NOT_YET_ASSIGNED) { @@ -9249,7 +9281,7 @@ index 0000000..e016bb1 + { + U32 const rtbTable[] = {0, 473195, 504333, 520860, 550000, 700000, 750000, 830000}; + U64 const scale = 62 - tableLog; -+ U64 const step = ((U64)1 << 62) / total; /* <== here, one division ! */ ++ U64 const step = div_u64((U64)1 << 62, (U32)total); /* <== here, one division ! */ + U64 const vStep = 1ULL << (scale - 20); + int stillToDistribute = 1 << tableLog; + unsigned s; @@ -9422,85 +9454,12 @@ index 0000000..e016bb1 +} + +size_t FSE_compressBound(size_t size) { return FSE_COMPRESSBOUND(size); } -+ -+#define CHECK_V_F(e, f) \ -+ size_t const e = f; \ -+ if (ERR_isError(e)) \ -+ return f -+#define CHECK_F(f) \ -+ { \ -+ CHECK_V_F(_var_err__, f); \ -+ } -+ -+/* FSE_compress_wksp() : -+ * Same as FSE_compress2(), but using an externally allocated scratch buffer (`workSpace`). -+ * `wkspSize` size must be `(1< not compressible */ -+ if (maxCount < (srcSize >> 7)) -+ return 0; /* Heuristic : not compressible enough */ -+ } -+ -+ tableLog = FSE_optimalTableLog(tableLog, srcSize, maxSymbolValue); -+ CHECK_F(FSE_normalizeCount(norm, tableLog, count, srcSize, maxSymbolValue)); -+ -+ /* Write table description header */ -+ { -+ CHECK_V_F(nc_err, FSE_writeNCount(op, oend - op, norm, maxSymbolValue, tableLog)); -+ op += nc_err; -+ } -+ -+ /* Compress */ -+ CHECK_F(FSE_buildCTable_wksp(CTable, norm, maxSymbolValue, tableLog, scratchBuffer, scratchBufferSize)); -+ { -+ CHECK_V_F(cSize, FSE_compress_usingCTable(op, oend - op, src, srcSize, CTable)); -+ if (cSize == 0) -+ return 0; /* not enough space for compressed data */ -+ op += cSize; -+ } -+ -+ /* check compressibility */ -+ if ((size_t)(op - ostart) >= srcSize - 1) -+ return 0; -+ -+ return op - ostart; -+} diff --git a/lib/zstd/fse_decompress.c b/lib/zstd/fse_decompress.c new file mode 100644 -index 0000000..96cf89f +index 0000000..a84300e --- /dev/null +++ b/lib/zstd/fse_decompress.c -@@ -0,0 +1,313 @@ +@@ -0,0 +1,332 @@ +/* + * FSE : Finite State Entropy decoder + * Copyright (C) 2013-2015, Yann Collet. @@ -9551,6 +9510,7 @@ index 0000000..96cf89f +#include "bitstream.h" +#include "fse.h" +#include ++#include +#include /* memcpy, memset */ + +/* ************************************************************** @@ -9594,17 +9554,19 @@ index 0000000..96cf89f + +/* Function templates */ + -+size_t FSE_buildDTable(FSE_DTable *dt, const short *normalizedCounter, unsigned maxSymbolValue, unsigned tableLog) ++size_t FSE_buildDTable_wksp(FSE_DTable *dt, const short *normalizedCounter, unsigned maxSymbolValue, unsigned tableLog, void *workspace, size_t workspaceSize) +{ + void *const tdPtr = dt + 1; /* because *dt is unsigned, 32-bits aligned on 32-bits */ + FSE_DECODE_TYPE *const tableDecode = (FSE_DECODE_TYPE *)(tdPtr); -+ U16 symbolNext[FSE_MAX_SYMBOL_VALUE + 1]; ++ U16 *symbolNext = (U16 *)workspace; + + U32 const maxSV1 = maxSymbolValue + 1; + U32 const tableSize = 1 << tableLog; + U32 highThreshold = tableSize - 1; + + /* Sanity Checks */ ++ if (workspaceSize < sizeof(U16) * (FSE_MAX_SYMBOL_VALUE + 1)) ++ return ERROR(tableLog_tooLarge); + if (maxSymbolValue > FSE_MAX_SYMBOL_VALUE) + return ERROR(maxSymbolValue_tooLarge); + if (tableLog > FSE_MAX_TABLELOG) @@ -9791,16 +9753,32 @@ index 0000000..96cf89f + return FSE_decompress_usingDTable_generic(dst, originalSize, cSrc, cSrcSize, dt, 0); +} + -+size_t FSE_decompress_wksp(void *dst, size_t dstCapacity, const void *cSrc, size_t cSrcSize, FSE_DTable *workSpace, unsigned maxLog) ++size_t FSE_decompress_wksp(void *dst, size_t dstCapacity, const void *cSrc, size_t cSrcSize, unsigned maxLog, void *workspace, size_t workspaceSize) +{ + const BYTE *const istart = (const BYTE *)cSrc; + const BYTE *ip = istart; -+ short counting[FSE_MAX_SYMBOL_VALUE + 1]; + unsigned tableLog; + unsigned maxSymbolValue = FSE_MAX_SYMBOL_VALUE; ++ size_t NCountLength; ++ ++ FSE_DTable *dt; ++ short *counting; ++ size_t spaceUsed32 = 0; ++ ++ FSE_STATIC_ASSERT(sizeof(FSE_DTable) == sizeof(U32)); ++ ++ dt = (FSE_DTable *)((U32 *)workspace + spaceUsed32); ++ spaceUsed32 += FSE_DTABLE_SIZE_U32(maxLog); ++ counting = (short *)((U32 *)workspace + spaceUsed32); ++ spaceUsed32 += ALIGN(sizeof(short) * (FSE_MAX_SYMBOL_VALUE + 1), sizeof(U32)) >> 2; ++ ++ if ((spaceUsed32 << 2) > workspaceSize) ++ return ERROR(tableLog_tooLarge); ++ workspace = (U32 *)workspace + spaceUsed32; ++ workspaceSize -= (spaceUsed32 << 2); + + /* normal FSE decoding mode */ -+ size_t const NCountLength = FSE_readNCount(counting, &maxSymbolValue, &tableLog, istart, cSrcSize); ++ NCountLength = FSE_readNCount(counting, &maxSymbolValue, &tableLog, istart, cSrcSize); + if (FSE_isError(NCountLength)) + return NCountLength; + // if (NCountLength >= cSrcSize) return ERROR(srcSize_wrong); /* too small input size; supposed to be already checked in NCountLength, only remaining @@ -9810,16 +9788,16 @@ index 0000000..96cf89f + ip += NCountLength; + cSrcSize -= NCountLength; + -+ CHECK_F(FSE_buildDTable(workSpace, counting, maxSymbolValue, tableLog)); ++ CHECK_F(FSE_buildDTable_wksp(dt, counting, maxSymbolValue, tableLog, workspace, workspaceSize)); + -+ return FSE_decompress_usingDTable(dst, dstCapacity, ip, cSrcSize, workSpace); /* always return, even if it is an error code */ ++ return FSE_decompress_usingDTable(dst, dstCapacity, ip, cSrcSize, dt); /* always return, even if it is an error code */ +} diff --git a/lib/zstd/huf.h b/lib/zstd/huf.h new file mode 100644 -index 0000000..56abe2f +index 0000000..2143da2 --- /dev/null +++ b/lib/zstd/huf.h -@@ -0,0 +1,203 @@ +@@ -0,0 +1,212 @@ +/* + * Huffman coder, part of New Generation Entropy library + * header file @@ -9877,7 +9855,7 @@ index 0000000..56abe2f +/** HUF_compress4X_wksp() : +* Same as HUF_compress2(), but uses externally allocated `workSpace`, which must be a table of >= 1024 unsigned */ +size_t HUF_compress4X_wksp(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace, -+ size_t wkspSize); /**< `workSpace` must be a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */ ++ size_t wkspSize); /**< `workSpace` must be a table of at least HUF_COMPRESS_WORKSPACE_SIZE_U32 unsigned */ + +/* *** Dependencies *** */ +#include "mem.h" /* U32 */ @@ -9913,17 +9891,23 @@ index 0000000..56abe2f +#define HUF_CREATE_STATIC_DTABLEX4(DTable, maxTableLog) HUF_DTable DTable[HUF_DTABLE_SIZE(maxTableLog)] = {((U32)(maxTableLog)*0x01000001)} + +/* The workspace must have alignment at least 4 and be at least this large */ -+#define HUF_WORKSPACE_SIZE (6 << 10) -+#define HUF_WORKSPACE_SIZE_U32 (HUF_WORKSPACE_SIZE / sizeof(U32)) ++#define HUF_COMPRESS_WORKSPACE_SIZE (6 << 10) ++#define HUF_COMPRESS_WORKSPACE_SIZE_U32 (HUF_COMPRESS_WORKSPACE_SIZE / sizeof(U32)) ++ ++/* The workspace must have alignment at least 4 and be at least this large */ ++#define HUF_DECOMPRESS_WORKSPACE_SIZE (3 << 10) ++#define HUF_DECOMPRESS_WORKSPACE_SIZE_U32 (HUF_DECOMPRESS_WORKSPACE_SIZE / sizeof(U32)) + +/* **************************************** +* Advanced decompression functions +******************************************/ -+size_t HUF_decompress4X_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); /**< decodes RLE and uncompressed */ -+size_t HUF_decompress4X_hufOnly(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, -+ size_t cSrcSize); /**< considers RLE and uncompressed as errors */ -+size_t HUF_decompress4X2_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); /**< single-symbol decoder */ -+size_t HUF_decompress4X4_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); /**< double-symbols decoder */ ++size_t HUF_decompress4X_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize); /**< decodes RLE and uncompressed */ ++size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, ++ size_t workspaceSize); /**< considers RLE and uncompressed as errors */ ++size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, ++ size_t workspaceSize); /**< single-symbol decoder */ ++size_t HUF_decompress4X4_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, ++ size_t workspaceSize); /**< double-symbols decoder */ + +/* **************************************** +* HUF detailed API @@ -9933,7 +9917,7 @@ index 0000000..56abe2f +1. count symbol occurrence from source[] into table count[] using FSE_count() +2. (optional) refine tableLog using HUF_optimalTableLog() +3. build Huffman table from count using HUF_buildCTable() -+4. save Huffman table to memory buffer using HUF_writeCTable() ++4. save Huffman table to memory buffer using HUF_writeCTable_wksp() +5. encode the data stream using HUF_compress4X_usingCTable() + +The following API allows targeting specific sub-functions for advanced tasks. @@ -9943,7 +9927,7 @@ index 0000000..56abe2f +/* FSE_count() : find it within "fse.h" */ +unsigned HUF_optimalTableLog(unsigned maxTableLog, size_t srcSize, unsigned maxSymbolValue); +typedef struct HUF_CElt_s HUF_CElt; /* incomplete type */ -+size_t HUF_writeCTable(void *dst, size_t maxDstSize, const HUF_CElt *CTable, unsigned maxSymbolValue, unsigned huffLog); ++size_t HUF_writeCTable_wksp(void *dst, size_t maxDstSize, const HUF_CElt *CTable, unsigned maxSymbolValue, unsigned huffLog, void *workspace, size_t workspaceSize); +size_t HUF_compress4X_usingCTable(void *dst, size_t dstSize, const void *src, size_t srcSize, const HUF_CElt *CTable); + +typedef enum { @@ -9959,7 +9943,7 @@ index 0000000..56abe2f +* If preferRepeat then the old table will always be used if valid. */ +size_t HUF_compress4X_repeat(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace, + size_t wkspSize, HUF_CElt *hufTable, HUF_repeat *repeat, -+ int preferRepeat); /**< `workSpace` must be a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */ ++ int preferRepeat); /**< `workSpace` must be a table of at least HUF_COMPRESS_WORKSPACE_SIZE_U32 unsigned */ + +/** HUF_buildCTable_wksp() : + * Same as HUF_buildCTable(), but using externally allocated scratch buffer. @@ -9972,11 +9956,12 @@ index 0000000..56abe2f + `huffWeight` is destination buffer. + @return : size read from `src` , or an error Code . + Note : Needed by HUF_readCTable() and HUF_readDTableXn() . */ -+size_t HUF_readStats(BYTE *huffWeight, size_t hwSize, U32 *rankStats, U32 *nbSymbolsPtr, U32 *tableLogPtr, const void *src, size_t srcSize); ++size_t HUF_readStats_wksp(BYTE *huffWeight, size_t hwSize, U32 *rankStats, U32 *nbSymbolsPtr, U32 *tableLogPtr, const void *src, size_t srcSize, ++ void *workspace, size_t workspaceSize); + +/** HUF_readCTable() : +* Loading a CTable saved with HUF_writeCTable() */ -+size_t HUF_readCTable(HUF_CElt *CTable, unsigned maxSymbolValue, const void *src, size_t srcSize); ++size_t HUF_readCTable_wksp(HUF_CElt *CTable, unsigned maxSymbolValue, const void *src, size_t srcSize, void *workspace, size_t workspaceSize); + +/* +HUF_decompress() does the following: @@ -9992,8 +9977,8 @@ index 0000000..56abe2f +* Assumption : 0 < cSrcSize < dstSize <= 128 KB */ +U32 HUF_selectDecoder(size_t dstSize, size_t cSrcSize); + -+size_t HUF_readDTableX2(HUF_DTable *DTable, const void *src, size_t srcSize); -+size_t HUF_readDTableX4(HUF_DTable *DTable, const void *src, size_t srcSize); ++size_t HUF_readDTableX2_wksp(HUF_DTable *DTable, const void *src, size_t srcSize, void *workspace, size_t workspaceSize); ++size_t HUF_readDTableX4_wksp(HUF_DTable *DTable, const void *src, size_t srcSize, void *workspace, size_t workspaceSize); + +size_t HUF_decompress4X_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable); +size_t HUF_decompress4X2_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable); @@ -10002,7 +9987,7 @@ index 0000000..56abe2f +/* single stream variants */ + +size_t HUF_compress1X_wksp(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace, -+ size_t wkspSize); /**< `workSpace` must be a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */ ++ size_t wkspSize); /**< `workSpace` must be a table of at least HUF_COMPRESS_WORKSPACE_SIZE_U32 unsigned */ +size_t HUF_compress1X_usingCTable(void *dst, size_t dstSize, const void *src, size_t srcSize, const HUF_CElt *CTable); +/** HUF_compress1X_repeat() : +* Same as HUF_compress1X_wksp(), but considers using hufTable if *repeat != HUF_repeat_none. @@ -10011,11 +9996,13 @@ index 0000000..56abe2f +* If preferRepeat then the old table will always be used if valid. */ +size_t HUF_compress1X_repeat(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace, + size_t wkspSize, HUF_CElt *hufTable, HUF_repeat *repeat, -+ int preferRepeat); /**< `workSpace` must be a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */ ++ int preferRepeat); /**< `workSpace` must be a table of at least HUF_COMPRESS_WORKSPACE_SIZE_U32 unsigned */ + -+size_t HUF_decompress1X_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); -+size_t HUF_decompress1X2_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); /**< single-symbol decoder */ -+size_t HUF_decompress1X4_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); /**< double-symbols decoder */ ++size_t HUF_decompress1X_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize); ++size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, ++ size_t workspaceSize); /**< single-symbol decoder */ ++size_t HUF_decompress1X4_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, ++ size_t workspaceSize); /**< double-symbols decoder */ + +size_t HUF_decompress1X_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, + const HUF_DTable *DTable); /**< automatic selection of sing or double symbol decoder, based on DTable */ @@ -10025,10 +10012,10 @@ index 0000000..56abe2f +#endif /* HUF_H_298734234 */ diff --git a/lib/zstd/huf_compress.c b/lib/zstd/huf_compress.c new file mode 100644 -index 0000000..e82a136 +index 0000000..0361f38 --- /dev/null +++ b/lib/zstd/huf_compress.c -@@ -0,0 +1,731 @@ +@@ -0,0 +1,771 @@ +/* + * Huffman encoder, part of New Generation Entropy library + * Copyright (C) 2013-2016, Yann Collet. @@ -10074,6 +10061,7 @@ index 0000000..e82a136 +#include "bitstream.h" +#include "fse.h" /* header compression */ +#include "huf.h" ++#include +#include /* memcpy, memset */ + +/* ************************************************************** @@ -10109,7 +10097,7 @@ index 0000000..e82a136 + * Note : all elements within weightTable are supposed to be <= HUF_TABLELOG_MAX. + */ +#define MAX_FSE_TABLELOG_FOR_HUFF_HEADER 6 -+size_t HUF_compressWeights(void *dst, size_t dstSize, const void *weightTable, size_t wtSize) ++size_t HUF_compressWeights_wksp(void *dst, size_t dstSize, const void *weightTable, size_t wtSize, void *workspace, size_t workspaceSize) +{ + BYTE *const ostart = (BYTE *)dst; + BYTE *op = ostart; @@ -10118,11 +10106,24 @@ index 0000000..e82a136 + U32 maxSymbolValue = HUF_TABLELOG_MAX; + U32 tableLog = MAX_FSE_TABLELOG_FOR_HUFF_HEADER; + -+ FSE_CTable CTable[FSE_CTABLE_SIZE_U32(MAX_FSE_TABLELOG_FOR_HUFF_HEADER, HUF_TABLELOG_MAX)]; -+ BYTE scratchBuffer[1 << MAX_FSE_TABLELOG_FOR_HUFF_HEADER]; ++ FSE_CTable *CTable; ++ U32 *count; ++ S16 *norm; ++ size_t spaceUsed32 = 0; + -+ U32 count[HUF_TABLELOG_MAX + 1]; -+ S16 norm[HUF_TABLELOG_MAX + 1]; ++ HUF_STATIC_ASSERT(sizeof(FSE_CTable) == sizeof(U32)); ++ ++ CTable = (FSE_CTable *)((U32 *)workspace + spaceUsed32); ++ spaceUsed32 += FSE_CTABLE_SIZE_U32(MAX_FSE_TABLELOG_FOR_HUFF_HEADER, HUF_TABLELOG_MAX); ++ count = (U32 *)workspace + spaceUsed32; ++ spaceUsed32 += HUF_TABLELOG_MAX + 1; ++ norm = (S16 *)((U32 *)workspace + spaceUsed32); ++ spaceUsed32 += ALIGN(sizeof(S16) * (HUF_TABLELOG_MAX + 1), sizeof(U32)) >> 2; ++ ++ if ((spaceUsed32 << 2) > workspaceSize) ++ return ERROR(tableLog_tooLarge); ++ workspace = (U32 *)workspace + spaceUsed32; ++ workspaceSize -= (spaceUsed32 << 2); + + /* init conditions */ + if (wtSize <= 1) @@ -10147,7 +10148,7 @@ index 0000000..e82a136 + } + + /* Compress */ -+ CHECK_F(FSE_buildCTable_wksp(CTable, norm, maxSymbolValue, tableLog, scratchBuffer, sizeof(scratchBuffer))); ++ CHECK_F(FSE_buildCTable_wksp(CTable, norm, maxSymbolValue, tableLog, workspace, workspaceSize)); + { + CHECK_V_F(cSize, FSE_compress_usingCTable(op, oend - op, weightTable, wtSize, CTable)); + if (cSize == 0) @@ -10163,16 +10164,28 @@ index 0000000..e82a136 + BYTE nbBits; +}; /* typedef'd to HUF_CElt within "huf.h" */ + -+/*! HUF_writeCTable() : ++/*! HUF_writeCTable_wksp() : + `CTable` : Huffman tree to save, using huf representation. + @return : size of saved CTable */ -+size_t HUF_writeCTable(void *dst, size_t maxDstSize, const HUF_CElt *CTable, U32 maxSymbolValue, U32 huffLog) ++size_t HUF_writeCTable_wksp(void *dst, size_t maxDstSize, const HUF_CElt *CTable, U32 maxSymbolValue, U32 huffLog, void *workspace, size_t workspaceSize) +{ -+ BYTE bitsToWeight[HUF_TABLELOG_MAX + 1]; /* precomputed conversion table */ -+ BYTE huffWeight[HUF_SYMBOLVALUE_MAX]; + BYTE *op = (BYTE *)dst; + U32 n; + ++ BYTE *bitsToWeight; ++ BYTE *huffWeight; ++ size_t spaceUsed32 = 0; ++ ++ bitsToWeight = (BYTE *)((U32 *)workspace + spaceUsed32); ++ spaceUsed32 += ALIGN(HUF_TABLELOG_MAX + 1, sizeof(U32)) >> 2; ++ huffWeight = (BYTE *)((U32 *)workspace + spaceUsed32); ++ spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX, sizeof(U32)) >> 2; ++ ++ if ((spaceUsed32 << 2) > workspaceSize) ++ return ERROR(tableLog_tooLarge); ++ workspace = (U32 *)workspace + spaceUsed32; ++ workspaceSize -= (spaceUsed32 << 2); ++ + /* check conditions */ + if (maxSymbolValue > HUF_SYMBOLVALUE_MAX) + return ERROR(maxSymbolValue_tooLarge); @@ -10186,7 +10199,7 @@ index 0000000..e82a136 + + /* attempt weights compression by FSE */ + { -+ CHECK_V_F(hSize, HUF_compressWeights(op + 1, maxDstSize - 1, huffWeight, maxSymbolValue)); ++ CHECK_V_F(hSize, HUF_compressWeights_wksp(op + 1, maxDstSize - 1, huffWeight, maxSymbolValue, workspace, workspaceSize)); + if ((hSize > 1) & (hSize < maxSymbolValue / 2)) { /* FSE compressed */ + op[0] = (BYTE)hSize; + return hSize + 1; @@ -10205,15 +10218,29 @@ index 0000000..e82a136 + return ((maxSymbolValue + 1) / 2) + 1; +} + -+size_t HUF_readCTable(HUF_CElt *CTable, U32 maxSymbolValue, const void *src, size_t srcSize) ++size_t HUF_readCTable_wksp(HUF_CElt *CTable, U32 maxSymbolValue, const void *src, size_t srcSize, void *workspace, size_t workspaceSize) +{ -+ BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1]; /* init not required, even though some static analyzer may complain */ -+ U32 rankVal[HUF_TABLELOG_ABSOLUTEMAX + 1]; /* large enough for values from 0 to 16 */ ++ U32 *rankVal; ++ BYTE *huffWeight; + U32 tableLog = 0; + U32 nbSymbols = 0; ++ size_t readSize; ++ size_t spaceUsed32 = 0; ++ ++ rankVal = (U32 *)workspace + spaceUsed32; ++ spaceUsed32 += HUF_TABLELOG_ABSOLUTEMAX + 1; ++ huffWeight = (BYTE *)((U32 *)workspace + spaceUsed32); ++ spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; ++ ++ if ((spaceUsed32 << 2) > workspaceSize) ++ return ERROR(tableLog_tooLarge); ++ workspace = (U32 *)workspace + spaceUsed32; ++ workspaceSize -= (spaceUsed32 << 2); + + /* get symbol weights */ -+ CHECK_V_F(readSize, HUF_readStats(huffWeight, HUF_SYMBOLVALUE_MAX + 1, rankVal, &nbSymbols, &tableLog, src, srcSize)); ++ readSize = HUF_readStats_wksp(huffWeight, HUF_SYMBOLVALUE_MAX + 1, rankVal, &nbSymbols, &tableLog, src, srcSize, workspace, workspaceSize); ++ if (ERR_isError(readSize)) ++ return readSize; + + /* check result */ + if (tableLog > HUF_TABLELOG_MAX) @@ -10711,7 +10738,7 @@ index 0000000..e82a136 + + /* Write table description header */ + { -+ CHECK_V_F(hSize, HUF_writeCTable(op, dstSize, CTable, maxSymbolValue, huffLog)); ++ CHECK_V_F(hSize, HUF_writeCTable_wksp(op, dstSize, CTable, maxSymbolValue, huffLog, workSpace, wkspSize)); + /* Check if using the previous table will be beneficial */ + if (repeat && *repeat != HUF_repeat_none) { + size_t const oldSize = HUF_estimateCompressedSize(oldHufTable, count, maxSymbolValue); @@ -10762,10 +10789,10 @@ index 0000000..e82a136 +} diff --git a/lib/zstd/huf_decompress.c b/lib/zstd/huf_decompress.c new file mode 100644 -index 0000000..950c194 +index 0000000..6526482 --- /dev/null +++ b/lib/zstd/huf_decompress.c -@@ -0,0 +1,920 @@ +@@ -0,0 +1,960 @@ +/* + * Huffman decoder, part of New Generation Entropy library + * Copyright (C) 2013-2016, Yann Collet. @@ -10817,6 +10844,7 @@ index 0000000..950c194 +#include "fse.h" /* header compression */ +#include "huf.h" +#include ++#include +#include /* memcpy, memset */ + +/* ************************************************************** @@ -10854,20 +10882,32 @@ index 0000000..950c194 + BYTE nbBits; +} HUF_DEltX2; /* single-symbol decoding */ + -+size_t HUF_readDTableX2(HUF_DTable *DTable, const void *src, size_t srcSize) ++size_t HUF_readDTableX2_wksp(HUF_DTable *DTable, const void *src, size_t srcSize, void *workspace, size_t workspaceSize) +{ -+ BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1]; -+ U32 rankVal[HUF_TABLELOG_ABSOLUTEMAX + 1]; /* large enough for values from 0 to 16 */ + U32 tableLog = 0; + U32 nbSymbols = 0; + size_t iSize; + void *const dtPtr = DTable + 1; + HUF_DEltX2 *const dt = (HUF_DEltX2 *)dtPtr; + ++ U32 *rankVal; ++ BYTE *huffWeight; ++ size_t spaceUsed32 = 0; ++ ++ rankVal = (U32 *)workspace + spaceUsed32; ++ spaceUsed32 += HUF_TABLELOG_ABSOLUTEMAX + 1; ++ huffWeight = (BYTE *)((U32 *)workspace + spaceUsed32); ++ spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; ++ ++ if ((spaceUsed32 << 2) > workspaceSize) ++ return ERROR(tableLog_tooLarge); ++ workspace = (U32 *)workspace + spaceUsed32; ++ workspaceSize -= (spaceUsed32 << 2); ++ + HUF_STATIC_ASSERT(sizeof(DTableDesc) == sizeof(HUF_DTable)); + /* memset(huffWeight, 0, sizeof(huffWeight)); */ /* is not necessary, even though some analyzer complain ... */ + -+ iSize = HUF_readStats(huffWeight, HUF_SYMBOLVALUE_MAX + 1, rankVal, &nbSymbols, &tableLog, src, srcSize); ++ iSize = HUF_readStats_wksp(huffWeight, HUF_SYMBOLVALUE_MAX + 1, rankVal, &nbSymbols, &tableLog, src, srcSize, workspace, workspaceSize); + if (HUF_isError(iSize)) + return iSize; + @@ -10984,11 +11024,11 @@ index 0000000..950c194 + return HUF_decompress1X2_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable); +} + -+size_t HUF_decompress1X2_DCtx(HUF_DTable *DCtx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) ++size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable *DCtx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) +{ + const BYTE *ip = (const BYTE *)cSrc; + -+ size_t const hSize = HUF_readDTableX2(DCtx, cSrc, cSrcSize); ++ size_t const hSize = HUF_readDTableX2_wksp(DCtx, cSrc, cSrcSize, workspace, workspaceSize); + if (HUF_isError(hSize)) + return hSize; + if (hSize >= cSrcSize) @@ -11115,11 +11155,11 @@ index 0000000..950c194 + return HUF_decompress4X2_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable); +} + -+size_t HUF_decompress4X2_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) ++size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) +{ + const BYTE *ip = (const BYTE *)cSrc; + -+ size_t const hSize = HUF_readDTableX2(dctx, cSrc, cSrcSize); ++ size_t const hSize = HUF_readDTableX2_wksp(dctx, cSrc, cSrcSize, workspace, workspaceSize); + if (HUF_isError(hSize)) + return hSize; + if (hSize >= cSrcSize) @@ -11190,6 +11230,7 @@ index 0000000..950c194 +} + +typedef U32 rankVal_t[HUF_TABLELOG_MAX][HUF_TABLELOG_MAX + 1]; ++typedef U32 rankValCol_t[HUF_TABLELOG_MAX + 1]; + +static void HUF_fillDTableX4(HUF_DEltX4 *DTable, const U32 targetLog, const sortedSymbol_t *sortedList, const U32 sortedListSize, const U32 *rankStart, + rankVal_t rankValOrigin, const U32 maxWeight, const U32 nbBitsBaseline) @@ -11233,27 +11274,50 @@ index 0000000..950c194 + } +} + -+size_t HUF_readDTableX4(HUF_DTable *DTable, const void *src, size_t srcSize) ++size_t HUF_readDTableX4_wksp(HUF_DTable *DTable, const void *src, size_t srcSize, void *workspace, size_t workspaceSize) +{ -+ BYTE weightList[HUF_SYMBOLVALUE_MAX + 1]; -+ sortedSymbol_t sortedSymbol[HUF_SYMBOLVALUE_MAX + 1]; -+ U32 rankStats[HUF_TABLELOG_MAX + 1] = {0}; -+ U32 rankStart0[HUF_TABLELOG_MAX + 2] = {0}; -+ U32 *const rankStart = rankStart0 + 1; -+ rankVal_t rankVal; + U32 tableLog, maxW, sizeOfSort, nbSymbols; + DTableDesc dtd = HUF_getDTableDesc(DTable); + U32 const maxTableLog = dtd.maxTableLog; + size_t iSize; + void *dtPtr = DTable + 1; /* force compiler to avoid strict-aliasing */ + HUF_DEltX4 *const dt = (HUF_DEltX4 *)dtPtr; ++ U32 *rankStart; ++ ++ rankValCol_t *rankVal; ++ U32 *rankStats; ++ U32 *rankStart0; ++ sortedSymbol_t *sortedSymbol; ++ BYTE *weightList; ++ size_t spaceUsed32 = 0; ++ ++ HUF_STATIC_ASSERT((sizeof(rankValCol_t) & 3) == 0); ++ ++ rankVal = (rankValCol_t *)((U32 *)workspace + spaceUsed32); ++ spaceUsed32 += (sizeof(rankValCol_t) * HUF_TABLELOG_MAX) >> 2; ++ rankStats = (U32 *)workspace + spaceUsed32; ++ spaceUsed32 += HUF_TABLELOG_MAX + 1; ++ rankStart0 = (U32 *)workspace + spaceUsed32; ++ spaceUsed32 += HUF_TABLELOG_MAX + 2; ++ sortedSymbol = (sortedSymbol_t *)((U32 *)workspace + spaceUsed32); ++ spaceUsed32 += ALIGN(sizeof(sortedSymbol_t) * (HUF_SYMBOLVALUE_MAX + 1), sizeof(U32)) >> 2; ++ weightList = (BYTE *)((U32 *)workspace + spaceUsed32); ++ spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; ++ ++ if ((spaceUsed32 << 2) > workspaceSize) ++ return ERROR(tableLog_tooLarge); ++ workspace = (U32 *)workspace + spaceUsed32; ++ workspaceSize -= (spaceUsed32 << 2); ++ ++ rankStart = rankStart0 + 1; ++ memset(rankStats, 0, sizeof(U32) * (2 * HUF_TABLELOG_MAX + 2 + 1)); + + HUF_STATIC_ASSERT(sizeof(HUF_DEltX4) == sizeof(HUF_DTable)); /* if compiler fails here, assertion is wrong */ + if (maxTableLog > HUF_TABLELOG_MAX) + return ERROR(tableLog_tooLarge); + /* memset(weightList, 0, sizeof(weightList)); */ /* is not necessary, even though some analyzer complain ... */ + -+ iSize = HUF_readStats(weightList, HUF_SYMBOLVALUE_MAX + 1, rankStats, &nbSymbols, &tableLog, src, srcSize); ++ iSize = HUF_readStats_wksp(weightList, HUF_SYMBOLVALUE_MAX + 1, rankStats, &nbSymbols, &tableLog, src, srcSize, workspace, workspaceSize); + if (HUF_isError(iSize)) + return iSize; + @@ -11420,11 +11484,11 @@ index 0000000..950c194 + return HUF_decompress1X4_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable); +} + -+size_t HUF_decompress1X4_DCtx(HUF_DTable *DCtx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) ++size_t HUF_decompress1X4_DCtx_wksp(HUF_DTable *DCtx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) +{ + const BYTE *ip = (const BYTE *)cSrc; + -+ size_t const hSize = HUF_readDTableX4(DCtx, cSrc, cSrcSize); ++ size_t const hSize = HUF_readDTableX4_wksp(DCtx, cSrc, cSrcSize, workspace, workspaceSize); + if (HUF_isError(hSize)) + return hSize; + if (hSize >= cSrcSize) @@ -11553,11 +11617,11 @@ index 0000000..950c194 + return HUF_decompress4X4_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable); +} + -+size_t HUF_decompress4X4_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) ++size_t HUF_decompress4X4_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) +{ + const BYTE *ip = (const BYTE *)cSrc; + -+ size_t hSize = HUF_readDTableX4(dctx, cSrc, cSrcSize); ++ size_t hSize = HUF_readDTableX4_wksp(dctx, cSrc, cSrcSize, workspace, workspaceSize); + if (HUF_isError(hSize)) + return hSize; + if (hSize >= cSrcSize) @@ -11629,7 +11693,7 @@ index 0000000..950c194 + +typedef size_t (*decompressionAlgo)(void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); + -+size_t HUF_decompress4X_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) ++size_t HUF_decompress4X_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) +{ + /* validation checks */ + if (dstSize == 0) @@ -11647,11 +11711,12 @@ index 0000000..950c194 + + { + U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize); -+ return algoNb ? HUF_decompress4X4_DCtx(dctx, dst, dstSize, cSrc, cSrcSize) : HUF_decompress4X2_DCtx(dctx, dst, dstSize, cSrc, cSrcSize); ++ return algoNb ? HUF_decompress4X4_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize) ++ : HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize); + } +} + -+size_t HUF_decompress4X_hufOnly(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) ++size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) +{ + /* validation checks */ + if (dstSize == 0) @@ -11661,11 +11726,12 @@ index 0000000..950c194 + + { + U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize); -+ return algoNb ? HUF_decompress4X4_DCtx(dctx, dst, dstSize, cSrc, cSrcSize) : HUF_decompress4X2_DCtx(dctx, dst, dstSize, cSrc, cSrcSize); ++ return algoNb ? HUF_decompress4X4_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize) ++ : HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize); + } +} + -+size_t HUF_decompress1X_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) ++size_t HUF_decompress1X_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) +{ + /* validation checks */ + if (dstSize == 0) @@ -11683,7 +11749,8 @@ index 0000000..950c194 + + { + U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize); -+ return algoNb ? HUF_decompress1X4_DCtx(dctx, dst, dstSize, cSrc, cSrcSize) : HUF_decompress1X2_DCtx(dctx, dst, dstSize, cSrc, cSrcSize); ++ return algoNb ? HUF_decompress1X4_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize) ++ : HUF_decompress1X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize); + } +} diff --git a/lib/zstd/mem.h b/lib/zstd/mem.h @@ -11845,7 +11912,7 @@ index 0000000..3a0f34c +#endif /* MEM_H_MODULE */ diff --git a/lib/zstd/zstd_common.c b/lib/zstd/zstd_common.c new file mode 100644 -index 0000000..6ebf68d +index 0000000..a282624 --- /dev/null +++ b/lib/zstd/zstd_common.c @@ -0,0 +1,75 @@ @@ -11902,7 +11969,7 @@ index 0000000..6ebf68d +void *ZSTD_stackAllocAll(void *opaque, size_t *size) +{ + ZSTD_stack *stack = (ZSTD_stack *)opaque; -+ *size = stack->end - ZSTD_PTR_ALIGN(stack->ptr); ++ *size = (BYTE const *)stack->end - (BYTE *)ZSTD_PTR_ALIGN(stack->ptr); + return stack_push(stack, *size); +} + diff --git a/contrib/linux-kernel/0003-btrfs-Add-zstd-support.patch b/contrib/linux-kernel/0003-btrfs-Add-zstd-support.patch index 53d03d3ca..abc8326cc 100644 --- a/contrib/linux-kernel/0003-btrfs-Add-zstd-support.patch +++ b/contrib/linux-kernel/0003-btrfs-Add-zstd-support.patch @@ -1,7 +1,7 @@ From 599f8f2aaace3df939cb145368574a52268d82d0 Mon Sep 17 00:00:00 2001 From: Nick Terrell Date: Wed, 21 Jun 2017 17:31:39 -0700 -Subject: [PATCH 3/4] btrfs: Add zstd support +Subject: [PATCH v2 3/4] btrfs: Add zstd support Add zstd compression and decompression support to BtrFS. zstd at its fastest level compresses almost as well as zlib, while offering much diff --git a/contrib/linux-kernel/0004-squashfs-Add-zstd-support.patch b/contrib/linux-kernel/0004-squashfs-Add-zstd-support.patch index e9c4b98c5..b638194f6 100644 --- a/contrib/linux-kernel/0004-squashfs-Add-zstd-support.patch +++ b/contrib/linux-kernel/0004-squashfs-Add-zstd-support.patch @@ -1,7 +1,7 @@ From 5ff6a64abaea7b7f11d37cb0fdf08642316a3a90 Mon Sep 17 00:00:00 2001 From: Nick Terrell Date: Mon, 12 Jun 2017 12:18:23 -0700 -Subject: [PATCH 4/4] squashfs: Add zstd support +Subject: [PATCH v2 4/4] squashfs: Add zstd support Add zstd compression and decompression support to SquashFS. zstd is a great fit for SquashFS because it can compress at ratios approaching xz, diff --git a/contrib/linux-kernel/lib/xxhash.c b/contrib/linux-kernel/lib/xxhash.c index dc94904c6..aa61e2a38 100644 --- a/contrib/linux-kernel/lib/xxhash.c +++ b/contrib/linux-kernel/lib/xxhash.c @@ -391,7 +391,7 @@ int xxh64_update(struct xxh64_state *state, const void *input, const size_t len) } if (state->memsize) { /* tmp buffer is full */ - const uint64_t *p64 = state->mem64; + uint64_t *p64 = state->mem64; memcpy(((uint8_t *)p64) + state->memsize, input, 32 - state->memsize); diff --git a/contrib/linux-kernel/lib/zstd/compress.c b/contrib/linux-kernel/lib/zstd/compress.c index 1aff542b0..d60ab7d4f 100644 --- a/contrib/linux-kernel/lib/zstd/compress.c +++ b/contrib/linux-kernel/lib/zstd/compress.c @@ -84,7 +84,7 @@ struct ZSTD_CCtx_s { FSE_CTable offcodeCTable[FSE_CTABLE_SIZE_U32(OffFSELog, MaxOff)]; FSE_CTable matchlengthCTable[FSE_CTABLE_SIZE_U32(MLFSELog, MaxML)]; FSE_CTable litlengthCTable[FSE_CTABLE_SIZE_U32(LLFSELog, MaxLL)]; - unsigned tmpCounters[HUF_WORKSPACE_SIZE_U32]; + unsigned tmpCounters[HUF_COMPRESS_WORKSPACE_SIZE_U32]; }; size_t ZSTD_CCtxWorkspaceBound(ZSTD_compressionParameters cParams) @@ -587,8 +587,6 @@ ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCa { const int longOffsets = zc->params.cParams.windowLog > STREAM_ACCUMULATOR_MIN; const seqStore_t *seqStorePtr = &(zc->seqStore); - U32 count[MaxSeq + 1]; - S16 norm[MaxSeq + 1]; FSE_CTable *CTable_LitLength = zc->litlengthCTable; FSE_CTable *CTable_OffsetBits = zc->offcodeCTable; FSE_CTable *CTable_MatchLength = zc->matchlengthCTable; @@ -602,7 +600,21 @@ ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCa BYTE *op = ostart; size_t const nbSeq = seqStorePtr->sequences - seqStorePtr->sequencesStart; BYTE *seqHead; - BYTE scratchBuffer[1 << MAX(MLFSELog, LLFSELog)]; + + U32 *count; + S16 *norm; + U32 *workspace; + size_t workspaceSize = sizeof(zc->tmpCounters); + { + size_t spaceUsed32 = 0; + count = (U32 *)zc->tmpCounters + spaceUsed32; + spaceUsed32 += MaxSeq + 1; + norm = (S16 *)((U32 *)zc->tmpCounters + spaceUsed32); + spaceUsed32 += ALIGN(sizeof(S16) * (MaxSeq + 1), sizeof(U32)) >> 2; + + workspace = (U32 *)zc->tmpCounters + spaceUsed32; + workspaceSize -= (spaceUsed32 << 2); + } /* Compress literals */ { @@ -638,7 +650,7 @@ ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCa /* CTable for Literal Lengths */ { U32 max = MaxLL; - size_t const mostFrequent = FSE_countFast_wksp(count, &max, llCodeTable, nbSeq, zc->tmpCounters); + size_t const mostFrequent = FSE_countFast_wksp(count, &max, llCodeTable, nbSeq, workspace); if ((mostFrequent == nbSeq) && (nbSeq > 2)) { *op++ = llCodeTable[0]; FSE_buildCTable_rle(CTable_LitLength, (BYTE)max); @@ -646,7 +658,7 @@ ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCa } else if ((zc->flagStaticTables) && (nbSeq < MAX_SEQ_FOR_STATIC_FSE)) { LLtype = set_repeat; } else if ((nbSeq < MIN_SEQ_FOR_DYNAMIC_FSE) || (mostFrequent < (nbSeq >> (LL_defaultNormLog - 1)))) { - FSE_buildCTable_wksp(CTable_LitLength, LL_defaultNorm, MaxLL, LL_defaultNormLog, scratchBuffer, sizeof(scratchBuffer)); + FSE_buildCTable_wksp(CTable_LitLength, LL_defaultNorm, MaxLL, LL_defaultNormLog, workspace, workspaceSize); LLtype = set_basic; } else { size_t nbSeq_1 = nbSeq; @@ -662,7 +674,7 @@ ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCa return NCountSize; op += NCountSize; } - FSE_buildCTable_wksp(CTable_LitLength, norm, max, tableLog, scratchBuffer, sizeof(scratchBuffer)); + FSE_buildCTable_wksp(CTable_LitLength, norm, max, tableLog, workspace, workspaceSize); LLtype = set_compressed; } } @@ -670,7 +682,7 @@ ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCa /* CTable for Offsets */ { U32 max = MaxOff; - size_t const mostFrequent = FSE_countFast_wksp(count, &max, ofCodeTable, nbSeq, zc->tmpCounters); + size_t const mostFrequent = FSE_countFast_wksp(count, &max, ofCodeTable, nbSeq, workspace); if ((mostFrequent == nbSeq) && (nbSeq > 2)) { *op++ = ofCodeTable[0]; FSE_buildCTable_rle(CTable_OffsetBits, (BYTE)max); @@ -678,7 +690,7 @@ ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCa } else if ((zc->flagStaticTables) && (nbSeq < MAX_SEQ_FOR_STATIC_FSE)) { Offtype = set_repeat; } else if ((nbSeq < MIN_SEQ_FOR_DYNAMIC_FSE) || (mostFrequent < (nbSeq >> (OF_defaultNormLog - 1)))) { - FSE_buildCTable_wksp(CTable_OffsetBits, OF_defaultNorm, MaxOff, OF_defaultNormLog, scratchBuffer, sizeof(scratchBuffer)); + FSE_buildCTable_wksp(CTable_OffsetBits, OF_defaultNorm, MaxOff, OF_defaultNormLog, workspace, workspaceSize); Offtype = set_basic; } else { size_t nbSeq_1 = nbSeq; @@ -694,7 +706,7 @@ ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCa return NCountSize; op += NCountSize; } - FSE_buildCTable_wksp(CTable_OffsetBits, norm, max, tableLog, scratchBuffer, sizeof(scratchBuffer)); + FSE_buildCTable_wksp(CTable_OffsetBits, norm, max, tableLog, workspace, workspaceSize); Offtype = set_compressed; } } @@ -702,7 +714,7 @@ ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCa /* CTable for MatchLengths */ { U32 max = MaxML; - size_t const mostFrequent = FSE_countFast_wksp(count, &max, mlCodeTable, nbSeq, zc->tmpCounters); + size_t const mostFrequent = FSE_countFast_wksp(count, &max, mlCodeTable, nbSeq, workspace); if ((mostFrequent == nbSeq) && (nbSeq > 2)) { *op++ = *mlCodeTable; FSE_buildCTable_rle(CTable_MatchLength, (BYTE)max); @@ -710,7 +722,7 @@ ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCa } else if ((zc->flagStaticTables) && (nbSeq < MAX_SEQ_FOR_STATIC_FSE)) { MLtype = set_repeat; } else if ((nbSeq < MIN_SEQ_FOR_DYNAMIC_FSE) || (mostFrequent < (nbSeq >> (ML_defaultNormLog - 1)))) { - FSE_buildCTable_wksp(CTable_MatchLength, ML_defaultNorm, MaxML, ML_defaultNormLog, scratchBuffer, sizeof(scratchBuffer)); + FSE_buildCTable_wksp(CTable_MatchLength, ML_defaultNorm, MaxML, ML_defaultNormLog, workspace, workspaceSize); MLtype = set_basic; } else { size_t nbSeq_1 = nbSeq; @@ -726,7 +738,7 @@ ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCa return NCountSize; op += NCountSize; } - FSE_buildCTable_wksp(CTable_MatchLength, norm, max, tableLog, scratchBuffer, sizeof(scratchBuffer)); + FSE_buildCTable_wksp(CTable_MatchLength, norm, max, tableLog, workspace, workspaceSize); MLtype = set_compressed; } } @@ -2612,14 +2624,13 @@ static size_t ZSTD_loadZstdDictionary(ZSTD_CCtx *cctx, const void *dict, size_t const BYTE *const dictEnd = dictPtr + dictSize; short offcodeNCount[MaxOff + 1]; unsigned offcodeMaxValue = MaxOff; - BYTE scratchBuffer[1 << MAX(MLFSELog, LLFSELog)]; dictPtr += 4; /* skip magic number */ cctx->dictID = cctx->params.fParams.noDictIDFlag ? 0 : ZSTD_readLE32(dictPtr); dictPtr += 4; { - size_t const hufHeaderSize = HUF_readCTable(cctx->hufTable, 255, dictPtr, dictEnd - dictPtr); + size_t const hufHeaderSize = HUF_readCTable_wksp(cctx->hufTable, 255, dictPtr, dictEnd - dictPtr, cctx->tmpCounters, sizeof(cctx->tmpCounters)); if (HUF_isError(hufHeaderSize)) return ERROR(dictionary_corrupted); dictPtr += hufHeaderSize; @@ -2633,7 +2644,7 @@ static size_t ZSTD_loadZstdDictionary(ZSTD_CCtx *cctx, const void *dict, size_t if (offcodeLog > OffFSELog) return ERROR(dictionary_corrupted); /* Defer checking offcodeMaxValue because we need to know the size of the dictionary content */ - CHECK_E(FSE_buildCTable_wksp(cctx->offcodeCTable, offcodeNCount, offcodeMaxValue, offcodeLog, scratchBuffer, sizeof(scratchBuffer)), + CHECK_E(FSE_buildCTable_wksp(cctx->offcodeCTable, offcodeNCount, offcodeMaxValue, offcodeLog, cctx->tmpCounters, sizeof(cctx->tmpCounters)), dictionary_corrupted); dictPtr += offcodeHeaderSize; } @@ -2649,7 +2660,7 @@ static size_t ZSTD_loadZstdDictionary(ZSTD_CCtx *cctx, const void *dict, size_t /* Every match length code must have non-zero probability */ CHECK_F(ZSTD_checkDictNCount(matchlengthNCount, matchlengthMaxValue, MaxML)); CHECK_E( - FSE_buildCTable_wksp(cctx->matchlengthCTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog, scratchBuffer, sizeof(scratchBuffer)), + FSE_buildCTable_wksp(cctx->matchlengthCTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog, cctx->tmpCounters, sizeof(cctx->tmpCounters)), dictionary_corrupted); dictPtr += matchlengthHeaderSize; } @@ -2664,7 +2675,7 @@ static size_t ZSTD_loadZstdDictionary(ZSTD_CCtx *cctx, const void *dict, size_t return ERROR(dictionary_corrupted); /* Every literal length code must have non-zero probability */ CHECK_F(ZSTD_checkDictNCount(litlengthNCount, litlengthMaxValue, MaxLL)); - CHECK_E(FSE_buildCTable_wksp(cctx->litlengthCTable, litlengthNCount, litlengthMaxValue, litlengthLog, scratchBuffer, sizeof(scratchBuffer)), + CHECK_E(FSE_buildCTable_wksp(cctx->litlengthCTable, litlengthNCount, litlengthMaxValue, litlengthLog, cctx->tmpCounters, sizeof(cctx->tmpCounters)), dictionary_corrupted); dictPtr += litlengthHeaderSize; } diff --git a/contrib/linux-kernel/lib/zstd/decompress.c b/contrib/linux-kernel/lib/zstd/decompress.c index ec673d7e6..62449ae05 100644 --- a/contrib/linux-kernel/lib/zstd/decompress.c +++ b/contrib/linux-kernel/lib/zstd/decompress.c @@ -70,6 +70,7 @@ typedef struct { FSE_DTable OFTable[FSE_DTABLE_SIZE_U32(OffFSELog)]; FSE_DTable MLTable[FSE_DTABLE_SIZE_U32(MLFSELog)]; HUF_DTable hufTable[HUF_DTABLE_SIZE(HufLog)]; /* can accommodate HUF_decompress4X */ + U64 workspace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32 / 2]; U32 rep[ZSTD_REP_NUM]; } ZSTD_entropyTables_t; @@ -483,8 +484,10 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx *dctx, const void *src, size_t srcSize ? (singleStream ? HUF_decompress1X_usingDTable(dctx->litBuffer, litSize, istart + lhSize, litCSize, dctx->HUFptr) : HUF_decompress4X_usingDTable(dctx->litBuffer, litSize, istart + lhSize, litCSize, dctx->HUFptr)) : (singleStream - ? HUF_decompress1X2_DCtx(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart + lhSize, litCSize) - : HUF_decompress4X_hufOnly(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart + lhSize, litCSize)))) + ? HUF_decompress1X2_DCtx_wksp(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart + lhSize, litCSize, + dctx->entropy.workspace, sizeof(dctx->entropy.workspace)) + : HUF_decompress4X_hufOnly_wksp(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart + lhSize, litCSize, + dctx->entropy.workspace, sizeof(dctx->entropy.workspace))))) return ERROR(corruption_detected); dctx->litPtr = dctx->litBuffer; @@ -747,7 +750,7 @@ static const FSE_decode_t4 OF_defaultDTable[(1 << OF_DEFAULTNORMLOG) + 1] = { or an error code if it fails, testable with ZSTD_isError() */ static size_t ZSTD_buildSeqTable(FSE_DTable *DTableSpace, const FSE_DTable **DTablePtr, symbolEncodingType_e type, U32 max, U32 maxLog, const void *src, - size_t srcSize, const FSE_decode_t4 *defaultTable, U32 flagRepeatTable) + size_t srcSize, const FSE_decode_t4 *defaultTable, U32 flagRepeatTable, void *workspace, size_t workspaceSize) { const void *const tmpPtr = defaultTable; /* bypass strict aliasing */ switch (type) { @@ -767,15 +770,23 @@ static size_t ZSTD_buildSeqTable(FSE_DTable *DTableSpace, const FSE_DTable **DTa default: /* impossible */ case set_compressed: { U32 tableLog; - S16 norm[MaxSeq + 1]; - size_t const headerSize = FSE_readNCount(norm, &max, &tableLog, src, srcSize); - if (FSE_isError(headerSize)) - return ERROR(corruption_detected); - if (tableLog > maxLog) - return ERROR(corruption_detected); - FSE_buildDTable(DTableSpace, norm, max, tableLog); - *DTablePtr = DTableSpace; - return headerSize; + S16 *norm = (S16 *)workspace; + size_t const spaceUsed32 = ALIGN(sizeof(S16) * (MaxSeq + 1), sizeof(U32)) >> 2; + + if ((spaceUsed32 << 2) > workspaceSize) + return ERROR(GENERIC); + workspace = (U32 *)workspace + spaceUsed32; + workspaceSize -= (spaceUsed32 << 2); + { + size_t const headerSize = FSE_readNCount(norm, &max, &tableLog, src, srcSize); + if (FSE_isError(headerSize)) + return ERROR(corruption_detected); + if (tableLog > maxLog) + return ERROR(corruption_detected); + FSE_buildDTable_wksp(DTableSpace, norm, max, tableLog, workspace, workspaceSize); + *DTablePtr = DTableSpace; + return headerSize; + } } } } @@ -823,21 +834,21 @@ size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx *dctx, int *nbSeqPtr, const void *src, si /* Build DTables */ { size_t const llhSize = ZSTD_buildSeqTable(dctx->entropy.LLTable, &dctx->LLTptr, LLtype, MaxLL, LLFSELog, ip, iend - ip, - LL_defaultDTable, dctx->fseEntropy); + LL_defaultDTable, dctx->fseEntropy, dctx->entropy.workspace, sizeof(dctx->entropy.workspace)); if (ZSTD_isError(llhSize)) return ERROR(corruption_detected); ip += llhSize; } { size_t const ofhSize = ZSTD_buildSeqTable(dctx->entropy.OFTable, &dctx->OFTptr, OFtype, MaxOff, OffFSELog, ip, iend - ip, - OF_defaultDTable, dctx->fseEntropy); + OF_defaultDTable, dctx->fseEntropy, dctx->entropy.workspace, sizeof(dctx->entropy.workspace)); if (ZSTD_isError(ofhSize)) return ERROR(corruption_detected); ip += ofhSize; } { size_t const mlhSize = ZSTD_buildSeqTable(dctx->entropy.MLTable, &dctx->MLTptr, MLtype, MaxML, MLFSELog, ip, iend - ip, - ML_defaultDTable, dctx->fseEntropy); + ML_defaultDTable, dctx->fseEntropy, dctx->entropy.workspace, sizeof(dctx->entropy.workspace)); if (ZSTD_isError(mlhSize)) return ERROR(corruption_detected); ip += mlhSize; @@ -1360,10 +1371,11 @@ static size_t ZSTD_decompressSequencesLong(ZSTD_DCtx *dctx, void *dst, size_t ma #define STORED_SEQS 4 #define STOSEQ_MASK (STORED_SEQS - 1) #define ADVANCED_SEQS 4 - seq_t sequences[STORED_SEQS]; + seq_t *sequences = (seq_t *)dctx->entropy.workspace; int const seqAdvance = MIN(nbSeq, ADVANCED_SEQS); seqState_t seqState; int seqNb; + ZSTD_STATIC_ASSERT(sizeof(dctx->entropy.workspace) >= sizeof(seq_t) * STORED_SEQS); dctx->fseEntropy = 1; { U32 i; @@ -1866,7 +1878,7 @@ static size_t ZSTD_loadEntropy(ZSTD_entropyTables_t *entropy, const void *const dictPtr += 8; /* skip header = magic + dictID */ { - size_t const hSize = HUF_readDTableX4(entropy->hufTable, dictPtr, dictEnd - dictPtr); + size_t const hSize = HUF_readDTableX4_wksp(entropy->hufTable, dictPtr, dictEnd - dictPtr, entropy->workspace, sizeof(entropy->workspace)); if (HUF_isError(hSize)) return ERROR(dictionary_corrupted); dictPtr += hSize; @@ -1880,7 +1892,7 @@ static size_t ZSTD_loadEntropy(ZSTD_entropyTables_t *entropy, const void *const return ERROR(dictionary_corrupted); if (offcodeLog > OffFSELog) return ERROR(dictionary_corrupted); - CHECK_E(FSE_buildDTable(entropy->OFTable, offcodeNCount, offcodeMaxValue, offcodeLog), dictionary_corrupted); + CHECK_E(FSE_buildDTable_wksp(entropy->OFTable, offcodeNCount, offcodeMaxValue, offcodeLog, entropy->workspace, sizeof(entropy->workspace)), dictionary_corrupted); dictPtr += offcodeHeaderSize; } @@ -1892,7 +1904,7 @@ static size_t ZSTD_loadEntropy(ZSTD_entropyTables_t *entropy, const void *const return ERROR(dictionary_corrupted); if (matchlengthLog > MLFSELog) return ERROR(dictionary_corrupted); - CHECK_E(FSE_buildDTable(entropy->MLTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog), dictionary_corrupted); + CHECK_E(FSE_buildDTable_wksp(entropy->MLTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog, entropy->workspace, sizeof(entropy->workspace)), dictionary_corrupted); dictPtr += matchlengthHeaderSize; } @@ -1904,7 +1916,7 @@ static size_t ZSTD_loadEntropy(ZSTD_entropyTables_t *entropy, const void *const return ERROR(dictionary_corrupted); if (litlengthLog > LLFSELog) return ERROR(dictionary_corrupted); - CHECK_E(FSE_buildDTable(entropy->LLTable, litlengthNCount, litlengthMaxValue, litlengthLog), dictionary_corrupted); + CHECK_E(FSE_buildDTable_wksp(entropy->LLTable, litlengthNCount, litlengthMaxValue, litlengthLog, entropy->workspace, sizeof(entropy->workspace)), dictionary_corrupted); dictPtr += litlengthHeaderSize; } diff --git a/contrib/linux-kernel/lib/zstd/entropy_common.c b/contrib/linux-kernel/lib/zstd/entropy_common.c index b354fc2cd..2b0a643c3 100644 --- a/contrib/linux-kernel/lib/zstd/entropy_common.c +++ b/contrib/linux-kernel/lib/zstd/entropy_common.c @@ -164,7 +164,7 @@ size_t FSE_readNCount(short *normalizedCounter, unsigned *maxSVPtr, unsigned *ta @return : size read from `src` , or an error Code . Note : Needed by HUF_readCTable() and HUF_readDTableX?() . */ -size_t HUF_readStats(BYTE *huffWeight, size_t hwSize, U32 *rankStats, U32 *nbSymbolsPtr, U32 *tableLogPtr, const void *src, size_t srcSize) +size_t HUF_readStats_wksp(BYTE *huffWeight, size_t hwSize, U32 *rankStats, U32 *nbSymbolsPtr, U32 *tableLogPtr, const void *src, size_t srcSize, void *workspace, size_t workspaceSize) { U32 weightTotal; const BYTE *ip = (const BYTE *)src; @@ -192,10 +192,9 @@ size_t HUF_readStats(BYTE *huffWeight, size_t hwSize, U32 *rankStats, U32 *nbSym } } } else { /* header compressed with FSE (normal case) */ - FSE_DTable fseWorkspace[FSE_DTABLE_SIZE_U32(6)]; /* 6 is max possible tableLog for HUF header (maybe even 5, to be tested) */ if (iSize + 1 > srcSize) return ERROR(srcSize_wrong); - oSize = FSE_decompress_wksp(huffWeight, hwSize - 1, ip + 1, iSize, fseWorkspace, 6); /* max (hwSize-1) values decoded, as last one is implied */ + oSize = FSE_decompress_wksp(huffWeight, hwSize - 1, ip + 1, iSize, 6, workspace, workspaceSize); /* max (hwSize-1) values decoded, as last one is implied */ if (FSE_isError(oSize)) return oSize; } diff --git a/contrib/linux-kernel/lib/zstd/fse.h b/contrib/linux-kernel/lib/zstd/fse.h index bc2962aec..7460ab04b 100644 --- a/contrib/linux-kernel/lib/zstd/fse.h +++ b/contrib/linux-kernel/lib/zstd/fse.h @@ -187,7 +187,7 @@ typedef unsigned FSE_DTable; /* don't allocate that. It's just a way to be more /*! FSE_buildDTable(): Builds 'dt', which must be already allocated, using FSE_createDTable(). return : 0, or an errorCode, which can be tested using FSE_isError() */ -FSE_PUBLIC_API size_t FSE_buildDTable(FSE_DTable *dt, const short *normalizedCounter, unsigned maxSymbolValue, unsigned tableLog); +FSE_PUBLIC_API size_t FSE_buildDTable_wksp(FSE_DTable *dt, const short *normalizedCounter, unsigned maxSymbolValue, unsigned tableLog, void *workspace, size_t workspaceSize); /*! FSE_decompress_usingDTable(): Decompress compressed source `cSrc` of size `cSrcSize` using `dt` @@ -263,15 +263,6 @@ size_t FSE_count_simple(unsigned *count, unsigned *maxSymbolValuePtr, const void unsigned FSE_optimalTableLog_internal(unsigned maxTableLog, size_t srcSize, unsigned maxSymbolValue, unsigned minus); /**< same as FSE_optimalTableLog(), which used `minus==2` */ -/* FSE_compress_wksp() : - * Same as FSE_compress2(), but using an externally allocated scratch buffer (`workSpace`). - * FSE_WKSP_SIZE_U32() provides the minimum size required for `workSpace` as a table of FSE_CTable. - */ -#define FSE_WKSP_SIZE_U32(maxTableLog, maxSymbolValue) \ - (FSE_CTABLE_SIZE_U32(maxTableLog, maxSymbolValue) + ((maxTableLog > 12) ? (1 << (maxTableLog - 2)) : 1024)) -size_t FSE_compress_wksp(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace, - size_t wkspSize); - size_t FSE_buildCTable_raw(FSE_CTable *ct, unsigned nbBits); /**< build a fake FSE_CTable, designed for a flat distribution, where each symbol uses nbBits */ @@ -290,7 +281,7 @@ size_t FSE_buildDTable_raw(FSE_DTable *dt, unsigned nbBits); size_t FSE_buildDTable_rle(FSE_DTable *dt, unsigned char symbolValue); /**< build a fake FSE_DTable, designed to always generate the same symbolValue */ -size_t FSE_decompress_wksp(void *dst, size_t dstCapacity, const void *cSrc, size_t cSrcSize, FSE_DTable *workSpace, unsigned maxLog); +size_t FSE_decompress_wksp(void *dst, size_t dstCapacity, const void *cSrc, size_t cSrcSize, unsigned maxLog, void *workspace, size_t workspaceSize); /**< same as FSE_decompress(), using an externally allocated `workSpace` produced with `FSE_DTABLE_SIZE_U32(maxLog)` */ /* ***************************************** diff --git a/contrib/linux-kernel/lib/zstd/fse_compress.c b/contrib/linux-kernel/lib/zstd/fse_compress.c index e016bb177..ef3d1741d 100644 --- a/contrib/linux-kernel/lib/zstd/fse_compress.c +++ b/contrib/linux-kernel/lib/zstd/fse_compress.c @@ -48,6 +48,8 @@ #include "bitstream.h" #include "fse.h" #include +#include +#include #include /* memcpy, memset */ /* ************************************************************** @@ -87,7 +89,7 @@ * wkspSize should be sized to handle worst case situation, which is `1<> 1 : 1); FSE_symbolCompressionTransform *const symbolTT = (FSE_symbolCompressionTransform *)(FSCT); U32 const step = FSE_TABLESTEP(tableSize); - U32 cumul[FSE_MAX_SYMBOL_VALUE + 2]; - - FSE_FUNCTION_TYPE *const tableSymbol = (FSE_FUNCTION_TYPE *)workSpace; U32 highThreshold = tableSize - 1; - /* CTable header */ - if (((size_t)1 << tableLog) * sizeof(FSE_FUNCTION_TYPE) > wkspSize) + U32 *cumul; + FSE_FUNCTION_TYPE *tableSymbol; + size_t spaceUsed32 = 0; + + cumul = (U32 *)workspace + spaceUsed32; + spaceUsed32 += FSE_MAX_SYMBOL_VALUE + 2; + tableSymbol = (FSE_FUNCTION_TYPE *)((U32 *)workspace + spaceUsed32); + spaceUsed32 += ALIGN(sizeof(FSE_FUNCTION_TYPE) * ((size_t)1 << tableLog), sizeof(U32)) >> 2; + + if ((spaceUsed32 << 2) > workspaceSize) return ERROR(tableLog_tooLarge); + workspace = (U32 *)workspace + spaceUsed32; + workspaceSize -= (spaceUsed32 << 2); + + /* CTable header */ tableU16[-2] = (U16)tableLog; tableU16[-1] = (U16)maxSymbolValue; @@ -575,7 +586,7 @@ static size_t FSE_normalizeM2(short *norm, U32 tableLog, const unsigned *count, { U64 const vStepLog = 62 - tableLog; U64 const mid = (1ULL << (vStepLog - 1)) - 1; - U64 const rStep = ((((U64)1 << vStepLog) * ToDistribute) + mid) / total; /* scale on remaining */ + U64 const rStep = div_u64((((U64)1 << vStepLog) * ToDistribute) + mid, (U32)total); /* scale on remaining */ U64 tmpTotal = mid; for (s = 0; s <= maxSymbolValue; s++) { if (norm[s] == NOT_YET_ASSIGNED) { @@ -609,7 +620,7 @@ size_t FSE_normalizeCount(short *normalizedCounter, unsigned tableLog, const uns { U32 const rtbTable[] = {0, 473195, 504333, 520860, 550000, 700000, 750000, 830000}; U64 const scale = 62 - tableLog; - U64 const step = ((U64)1 << 62) / total; /* <== here, one division ! */ + U64 const step = div_u64((U64)1 << 62, (U32)total); /* <== here, one division ! */ U64 const vStep = 1ULL << (scale - 20); int stillToDistribute = 1 << tableLog; unsigned s; @@ -782,76 +793,3 @@ size_t FSE_compress_usingCTable(void *dst, size_t dstSize, const void *src, size } size_t FSE_compressBound(size_t size) { return FSE_COMPRESSBOUND(size); } - -#define CHECK_V_F(e, f) \ - size_t const e = f; \ - if (ERR_isError(e)) \ - return f -#define CHECK_F(f) \ - { \ - CHECK_V_F(_var_err__, f); \ - } - -/* FSE_compress_wksp() : - * Same as FSE_compress2(), but using an externally allocated scratch buffer (`workSpace`). - * `wkspSize` size must be `(1< not compressible */ - if (maxCount < (srcSize >> 7)) - return 0; /* Heuristic : not compressible enough */ - } - - tableLog = FSE_optimalTableLog(tableLog, srcSize, maxSymbolValue); - CHECK_F(FSE_normalizeCount(norm, tableLog, count, srcSize, maxSymbolValue)); - - /* Write table description header */ - { - CHECK_V_F(nc_err, FSE_writeNCount(op, oend - op, norm, maxSymbolValue, tableLog)); - op += nc_err; - } - - /* Compress */ - CHECK_F(FSE_buildCTable_wksp(CTable, norm, maxSymbolValue, tableLog, scratchBuffer, scratchBufferSize)); - { - CHECK_V_F(cSize, FSE_compress_usingCTable(op, oend - op, src, srcSize, CTable)); - if (cSize == 0) - return 0; /* not enough space for compressed data */ - op += cSize; - } - - /* check compressibility */ - if ((size_t)(op - ostart) >= srcSize - 1) - return 0; - - return op - ostart; -} diff --git a/contrib/linux-kernel/lib/zstd/fse_decompress.c b/contrib/linux-kernel/lib/zstd/fse_decompress.c index 96cf89ff9..a84300e5a 100644 --- a/contrib/linux-kernel/lib/zstd/fse_decompress.c +++ b/contrib/linux-kernel/lib/zstd/fse_decompress.c @@ -48,6 +48,7 @@ #include "bitstream.h" #include "fse.h" #include +#include #include /* memcpy, memset */ /* ************************************************************** @@ -91,17 +92,19 @@ /* Function templates */ -size_t FSE_buildDTable(FSE_DTable *dt, const short *normalizedCounter, unsigned maxSymbolValue, unsigned tableLog) +size_t FSE_buildDTable_wksp(FSE_DTable *dt, const short *normalizedCounter, unsigned maxSymbolValue, unsigned tableLog, void *workspace, size_t workspaceSize) { void *const tdPtr = dt + 1; /* because *dt is unsigned, 32-bits aligned on 32-bits */ FSE_DECODE_TYPE *const tableDecode = (FSE_DECODE_TYPE *)(tdPtr); - U16 symbolNext[FSE_MAX_SYMBOL_VALUE + 1]; + U16 *symbolNext = (U16 *)workspace; U32 const maxSV1 = maxSymbolValue + 1; U32 const tableSize = 1 << tableLog; U32 highThreshold = tableSize - 1; /* Sanity Checks */ + if (workspaceSize < sizeof(U16) * (FSE_MAX_SYMBOL_VALUE + 1)) + return ERROR(tableLog_tooLarge); if (maxSymbolValue > FSE_MAX_SYMBOL_VALUE) return ERROR(maxSymbolValue_tooLarge); if (tableLog > FSE_MAX_TABLELOG) @@ -288,16 +291,32 @@ size_t FSE_decompress_usingDTable(void *dst, size_t originalSize, const void *cS return FSE_decompress_usingDTable_generic(dst, originalSize, cSrc, cSrcSize, dt, 0); } -size_t FSE_decompress_wksp(void *dst, size_t dstCapacity, const void *cSrc, size_t cSrcSize, FSE_DTable *workSpace, unsigned maxLog) +size_t FSE_decompress_wksp(void *dst, size_t dstCapacity, const void *cSrc, size_t cSrcSize, unsigned maxLog, void *workspace, size_t workspaceSize) { const BYTE *const istart = (const BYTE *)cSrc; const BYTE *ip = istart; - short counting[FSE_MAX_SYMBOL_VALUE + 1]; unsigned tableLog; unsigned maxSymbolValue = FSE_MAX_SYMBOL_VALUE; + size_t NCountLength; + + FSE_DTable *dt; + short *counting; + size_t spaceUsed32 = 0; + + FSE_STATIC_ASSERT(sizeof(FSE_DTable) == sizeof(U32)); + + dt = (FSE_DTable *)((U32 *)workspace + spaceUsed32); + spaceUsed32 += FSE_DTABLE_SIZE_U32(maxLog); + counting = (short *)((U32 *)workspace + spaceUsed32); + spaceUsed32 += ALIGN(sizeof(short) * (FSE_MAX_SYMBOL_VALUE + 1), sizeof(U32)) >> 2; + + if ((spaceUsed32 << 2) > workspaceSize) + return ERROR(tableLog_tooLarge); + workspace = (U32 *)workspace + spaceUsed32; + workspaceSize -= (spaceUsed32 << 2); /* normal FSE decoding mode */ - size_t const NCountLength = FSE_readNCount(counting, &maxSymbolValue, &tableLog, istart, cSrcSize); + NCountLength = FSE_readNCount(counting, &maxSymbolValue, &tableLog, istart, cSrcSize); if (FSE_isError(NCountLength)) return NCountLength; // if (NCountLength >= cSrcSize) return ERROR(srcSize_wrong); /* too small input size; supposed to be already checked in NCountLength, only remaining @@ -307,7 +326,7 @@ size_t FSE_decompress_wksp(void *dst, size_t dstCapacity, const void *cSrc, size ip += NCountLength; cSrcSize -= NCountLength; - CHECK_F(FSE_buildDTable(workSpace, counting, maxSymbolValue, tableLog)); + CHECK_F(FSE_buildDTable_wksp(dt, counting, maxSymbolValue, tableLog, workspace, workspaceSize)); - return FSE_decompress_usingDTable(dst, dstCapacity, ip, cSrcSize, workSpace); /* always return, even if it is an error code */ + return FSE_decompress_usingDTable(dst, dstCapacity, ip, cSrcSize, dt); /* always return, even if it is an error code */ } diff --git a/contrib/linux-kernel/lib/zstd/huf.h b/contrib/linux-kernel/lib/zstd/huf.h index 56abe2f1c..2143da28d 100644 --- a/contrib/linux-kernel/lib/zstd/huf.h +++ b/contrib/linux-kernel/lib/zstd/huf.h @@ -55,7 +55,7 @@ unsigned HUF_isError(size_t code); /**< tells if a return value is an error code /** HUF_compress4X_wksp() : * Same as HUF_compress2(), but uses externally allocated `workSpace`, which must be a table of >= 1024 unsigned */ size_t HUF_compress4X_wksp(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace, - size_t wkspSize); /**< `workSpace` must be a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */ + size_t wkspSize); /**< `workSpace` must be a table of at least HUF_COMPRESS_WORKSPACE_SIZE_U32 unsigned */ /* *** Dependencies *** */ #include "mem.h" /* U32 */ @@ -91,17 +91,23 @@ typedef U32 HUF_DTable; #define HUF_CREATE_STATIC_DTABLEX4(DTable, maxTableLog) HUF_DTable DTable[HUF_DTABLE_SIZE(maxTableLog)] = {((U32)(maxTableLog)*0x01000001)} /* The workspace must have alignment at least 4 and be at least this large */ -#define HUF_WORKSPACE_SIZE (6 << 10) -#define HUF_WORKSPACE_SIZE_U32 (HUF_WORKSPACE_SIZE / sizeof(U32)) +#define HUF_COMPRESS_WORKSPACE_SIZE (6 << 10) +#define HUF_COMPRESS_WORKSPACE_SIZE_U32 (HUF_COMPRESS_WORKSPACE_SIZE / sizeof(U32)) + +/* The workspace must have alignment at least 4 and be at least this large */ +#define HUF_DECOMPRESS_WORKSPACE_SIZE (3 << 10) +#define HUF_DECOMPRESS_WORKSPACE_SIZE_U32 (HUF_DECOMPRESS_WORKSPACE_SIZE / sizeof(U32)) /* **************************************** * Advanced decompression functions ******************************************/ -size_t HUF_decompress4X_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); /**< decodes RLE and uncompressed */ -size_t HUF_decompress4X_hufOnly(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, - size_t cSrcSize); /**< considers RLE and uncompressed as errors */ -size_t HUF_decompress4X2_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); /**< single-symbol decoder */ -size_t HUF_decompress4X4_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); /**< double-symbols decoder */ +size_t HUF_decompress4X_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize); /**< decodes RLE and uncompressed */ +size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, + size_t workspaceSize); /**< considers RLE and uncompressed as errors */ +size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, + size_t workspaceSize); /**< single-symbol decoder */ +size_t HUF_decompress4X4_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, + size_t workspaceSize); /**< double-symbols decoder */ /* **************************************** * HUF detailed API @@ -111,7 +117,7 @@ HUF_compress() does the following: 1. count symbol occurrence from source[] into table count[] using FSE_count() 2. (optional) refine tableLog using HUF_optimalTableLog() 3. build Huffman table from count using HUF_buildCTable() -4. save Huffman table to memory buffer using HUF_writeCTable() +4. save Huffman table to memory buffer using HUF_writeCTable_wksp() 5. encode the data stream using HUF_compress4X_usingCTable() The following API allows targeting specific sub-functions for advanced tasks. @@ -121,7 +127,7 @@ or to save and regenerate 'CTable' using external methods. /* FSE_count() : find it within "fse.h" */ unsigned HUF_optimalTableLog(unsigned maxTableLog, size_t srcSize, unsigned maxSymbolValue); typedef struct HUF_CElt_s HUF_CElt; /* incomplete type */ -size_t HUF_writeCTable(void *dst, size_t maxDstSize, const HUF_CElt *CTable, unsigned maxSymbolValue, unsigned huffLog); +size_t HUF_writeCTable_wksp(void *dst, size_t maxDstSize, const HUF_CElt *CTable, unsigned maxSymbolValue, unsigned huffLog, void *workspace, size_t workspaceSize); size_t HUF_compress4X_usingCTable(void *dst, size_t dstSize, const void *src, size_t srcSize, const HUF_CElt *CTable); typedef enum { @@ -137,7 +143,7 @@ typedef enum { * If preferRepeat then the old table will always be used if valid. */ size_t HUF_compress4X_repeat(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace, size_t wkspSize, HUF_CElt *hufTable, HUF_repeat *repeat, - int preferRepeat); /**< `workSpace` must be a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */ + int preferRepeat); /**< `workSpace` must be a table of at least HUF_COMPRESS_WORKSPACE_SIZE_U32 unsigned */ /** HUF_buildCTable_wksp() : * Same as HUF_buildCTable(), but using externally allocated scratch buffer. @@ -150,11 +156,12 @@ size_t HUF_buildCTable_wksp(HUF_CElt *tree, const U32 *count, U32 maxSymbolValue `huffWeight` is destination buffer. @return : size read from `src` , or an error Code . Note : Needed by HUF_readCTable() and HUF_readDTableXn() . */ -size_t HUF_readStats(BYTE *huffWeight, size_t hwSize, U32 *rankStats, U32 *nbSymbolsPtr, U32 *tableLogPtr, const void *src, size_t srcSize); +size_t HUF_readStats_wksp(BYTE *huffWeight, size_t hwSize, U32 *rankStats, U32 *nbSymbolsPtr, U32 *tableLogPtr, const void *src, size_t srcSize, + void *workspace, size_t workspaceSize); /** HUF_readCTable() : * Loading a CTable saved with HUF_writeCTable() */ -size_t HUF_readCTable(HUF_CElt *CTable, unsigned maxSymbolValue, const void *src, size_t srcSize); +size_t HUF_readCTable_wksp(HUF_CElt *CTable, unsigned maxSymbolValue, const void *src, size_t srcSize, void *workspace, size_t workspaceSize); /* HUF_decompress() does the following: @@ -170,8 +177,8 @@ HUF_decompress() does the following: * Assumption : 0 < cSrcSize < dstSize <= 128 KB */ U32 HUF_selectDecoder(size_t dstSize, size_t cSrcSize); -size_t HUF_readDTableX2(HUF_DTable *DTable, const void *src, size_t srcSize); -size_t HUF_readDTableX4(HUF_DTable *DTable, const void *src, size_t srcSize); +size_t HUF_readDTableX2_wksp(HUF_DTable *DTable, const void *src, size_t srcSize, void *workspace, size_t workspaceSize); +size_t HUF_readDTableX4_wksp(HUF_DTable *DTable, const void *src, size_t srcSize, void *workspace, size_t workspaceSize); size_t HUF_decompress4X_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable); size_t HUF_decompress4X2_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable); @@ -180,7 +187,7 @@ size_t HUF_decompress4X4_usingDTable(void *dst, size_t maxDstSize, const void *c /* single stream variants */ size_t HUF_compress1X_wksp(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace, - size_t wkspSize); /**< `workSpace` must be a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */ + size_t wkspSize); /**< `workSpace` must be a table of at least HUF_COMPRESS_WORKSPACE_SIZE_U32 unsigned */ size_t HUF_compress1X_usingCTable(void *dst, size_t dstSize, const void *src, size_t srcSize, const HUF_CElt *CTable); /** HUF_compress1X_repeat() : * Same as HUF_compress1X_wksp(), but considers using hufTable if *repeat != HUF_repeat_none. @@ -189,11 +196,13 @@ size_t HUF_compress1X_usingCTable(void *dst, size_t dstSize, const void *src, si * If preferRepeat then the old table will always be used if valid. */ size_t HUF_compress1X_repeat(void *dst, size_t dstSize, const void *src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void *workSpace, size_t wkspSize, HUF_CElt *hufTable, HUF_repeat *repeat, - int preferRepeat); /**< `workSpace` must be a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */ + int preferRepeat); /**< `workSpace` must be a table of at least HUF_COMPRESS_WORKSPACE_SIZE_U32 unsigned */ -size_t HUF_decompress1X_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); -size_t HUF_decompress1X2_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); /**< single-symbol decoder */ -size_t HUF_decompress1X4_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); /**< double-symbols decoder */ +size_t HUF_decompress1X_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize); +size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, + size_t workspaceSize); /**< single-symbol decoder */ +size_t HUF_decompress1X4_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, + size_t workspaceSize); /**< double-symbols decoder */ size_t HUF_decompress1X_usingDTable(void *dst, size_t maxDstSize, const void *cSrc, size_t cSrcSize, const HUF_DTable *DTable); /**< automatic selection of sing or double symbol decoder, based on DTable */ diff --git a/contrib/linux-kernel/lib/zstd/huf_compress.c b/contrib/linux-kernel/lib/zstd/huf_compress.c index e82a136a1..0361f387f 100644 --- a/contrib/linux-kernel/lib/zstd/huf_compress.c +++ b/contrib/linux-kernel/lib/zstd/huf_compress.c @@ -43,6 +43,7 @@ #include "bitstream.h" #include "fse.h" /* header compression */ #include "huf.h" +#include #include /* memcpy, memset */ /* ************************************************************** @@ -78,7 +79,7 @@ unsigned HUF_optimalTableLog(unsigned maxTableLog, size_t srcSize, unsigned maxS * Note : all elements within weightTable are supposed to be <= HUF_TABLELOG_MAX. */ #define MAX_FSE_TABLELOG_FOR_HUFF_HEADER 6 -size_t HUF_compressWeights(void *dst, size_t dstSize, const void *weightTable, size_t wtSize) +size_t HUF_compressWeights_wksp(void *dst, size_t dstSize, const void *weightTable, size_t wtSize, void *workspace, size_t workspaceSize) { BYTE *const ostart = (BYTE *)dst; BYTE *op = ostart; @@ -87,11 +88,24 @@ size_t HUF_compressWeights(void *dst, size_t dstSize, const void *weightTable, s U32 maxSymbolValue = HUF_TABLELOG_MAX; U32 tableLog = MAX_FSE_TABLELOG_FOR_HUFF_HEADER; - FSE_CTable CTable[FSE_CTABLE_SIZE_U32(MAX_FSE_TABLELOG_FOR_HUFF_HEADER, HUF_TABLELOG_MAX)]; - BYTE scratchBuffer[1 << MAX_FSE_TABLELOG_FOR_HUFF_HEADER]; + FSE_CTable *CTable; + U32 *count; + S16 *norm; + size_t spaceUsed32 = 0; - U32 count[HUF_TABLELOG_MAX + 1]; - S16 norm[HUF_TABLELOG_MAX + 1]; + HUF_STATIC_ASSERT(sizeof(FSE_CTable) == sizeof(U32)); + + CTable = (FSE_CTable *)((U32 *)workspace + spaceUsed32); + spaceUsed32 += FSE_CTABLE_SIZE_U32(MAX_FSE_TABLELOG_FOR_HUFF_HEADER, HUF_TABLELOG_MAX); + count = (U32 *)workspace + spaceUsed32; + spaceUsed32 += HUF_TABLELOG_MAX + 1; + norm = (S16 *)((U32 *)workspace + spaceUsed32); + spaceUsed32 += ALIGN(sizeof(S16) * (HUF_TABLELOG_MAX + 1), sizeof(U32)) >> 2; + + if ((spaceUsed32 << 2) > workspaceSize) + return ERROR(tableLog_tooLarge); + workspace = (U32 *)workspace + spaceUsed32; + workspaceSize -= (spaceUsed32 << 2); /* init conditions */ if (wtSize <= 1) @@ -116,7 +130,7 @@ size_t HUF_compressWeights(void *dst, size_t dstSize, const void *weightTable, s } /* Compress */ - CHECK_F(FSE_buildCTable_wksp(CTable, norm, maxSymbolValue, tableLog, scratchBuffer, sizeof(scratchBuffer))); + CHECK_F(FSE_buildCTable_wksp(CTable, norm, maxSymbolValue, tableLog, workspace, workspaceSize)); { CHECK_V_F(cSize, FSE_compress_usingCTable(op, oend - op, weightTable, wtSize, CTable)); if (cSize == 0) @@ -132,16 +146,28 @@ struct HUF_CElt_s { BYTE nbBits; }; /* typedef'd to HUF_CElt within "huf.h" */ -/*! HUF_writeCTable() : +/*! HUF_writeCTable_wksp() : `CTable` : Huffman tree to save, using huf representation. @return : size of saved CTable */ -size_t HUF_writeCTable(void *dst, size_t maxDstSize, const HUF_CElt *CTable, U32 maxSymbolValue, U32 huffLog) +size_t HUF_writeCTable_wksp(void *dst, size_t maxDstSize, const HUF_CElt *CTable, U32 maxSymbolValue, U32 huffLog, void *workspace, size_t workspaceSize) { - BYTE bitsToWeight[HUF_TABLELOG_MAX + 1]; /* precomputed conversion table */ - BYTE huffWeight[HUF_SYMBOLVALUE_MAX]; BYTE *op = (BYTE *)dst; U32 n; + BYTE *bitsToWeight; + BYTE *huffWeight; + size_t spaceUsed32 = 0; + + bitsToWeight = (BYTE *)((U32 *)workspace + spaceUsed32); + spaceUsed32 += ALIGN(HUF_TABLELOG_MAX + 1, sizeof(U32)) >> 2; + huffWeight = (BYTE *)((U32 *)workspace + spaceUsed32); + spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX, sizeof(U32)) >> 2; + + if ((spaceUsed32 << 2) > workspaceSize) + return ERROR(tableLog_tooLarge); + workspace = (U32 *)workspace + spaceUsed32; + workspaceSize -= (spaceUsed32 << 2); + /* check conditions */ if (maxSymbolValue > HUF_SYMBOLVALUE_MAX) return ERROR(maxSymbolValue_tooLarge); @@ -155,7 +181,7 @@ size_t HUF_writeCTable(void *dst, size_t maxDstSize, const HUF_CElt *CTable, U32 /* attempt weights compression by FSE */ { - CHECK_V_F(hSize, HUF_compressWeights(op + 1, maxDstSize - 1, huffWeight, maxSymbolValue)); + CHECK_V_F(hSize, HUF_compressWeights_wksp(op + 1, maxDstSize - 1, huffWeight, maxSymbolValue, workspace, workspaceSize)); if ((hSize > 1) & (hSize < maxSymbolValue / 2)) { /* FSE compressed */ op[0] = (BYTE)hSize; return hSize + 1; @@ -174,15 +200,29 @@ size_t HUF_writeCTable(void *dst, size_t maxDstSize, const HUF_CElt *CTable, U32 return ((maxSymbolValue + 1) / 2) + 1; } -size_t HUF_readCTable(HUF_CElt *CTable, U32 maxSymbolValue, const void *src, size_t srcSize) +size_t HUF_readCTable_wksp(HUF_CElt *CTable, U32 maxSymbolValue, const void *src, size_t srcSize, void *workspace, size_t workspaceSize) { - BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1]; /* init not required, even though some static analyzer may complain */ - U32 rankVal[HUF_TABLELOG_ABSOLUTEMAX + 1]; /* large enough for values from 0 to 16 */ + U32 *rankVal; + BYTE *huffWeight; U32 tableLog = 0; U32 nbSymbols = 0; + size_t readSize; + size_t spaceUsed32 = 0; + + rankVal = (U32 *)workspace + spaceUsed32; + spaceUsed32 += HUF_TABLELOG_ABSOLUTEMAX + 1; + huffWeight = (BYTE *)((U32 *)workspace + spaceUsed32); + spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; + + if ((spaceUsed32 << 2) > workspaceSize) + return ERROR(tableLog_tooLarge); + workspace = (U32 *)workspace + spaceUsed32; + workspaceSize -= (spaceUsed32 << 2); /* get symbol weights */ - CHECK_V_F(readSize, HUF_readStats(huffWeight, HUF_SYMBOLVALUE_MAX + 1, rankVal, &nbSymbols, &tableLog, src, srcSize)); + readSize = HUF_readStats_wksp(huffWeight, HUF_SYMBOLVALUE_MAX + 1, rankVal, &nbSymbols, &tableLog, src, srcSize, workspace, workspaceSize); + if (ERR_isError(readSize)) + return readSize; /* check result */ if (tableLog > HUF_TABLELOG_MAX) @@ -680,7 +720,7 @@ static size_t HUF_compress_internal(void *dst, size_t dstSize, const void *src, /* Write table description header */ { - CHECK_V_F(hSize, HUF_writeCTable(op, dstSize, CTable, maxSymbolValue, huffLog)); + CHECK_V_F(hSize, HUF_writeCTable_wksp(op, dstSize, CTable, maxSymbolValue, huffLog, workSpace, wkspSize)); /* Check if using the previous table will be beneficial */ if (repeat && *repeat != HUF_repeat_none) { size_t const oldSize = HUF_estimateCompressedSize(oldHufTable, count, maxSymbolValue); diff --git a/contrib/linux-kernel/lib/zstd/huf_decompress.c b/contrib/linux-kernel/lib/zstd/huf_decompress.c index 950c19443..652648204 100644 --- a/contrib/linux-kernel/lib/zstd/huf_decompress.c +++ b/contrib/linux-kernel/lib/zstd/huf_decompress.c @@ -49,6 +49,7 @@ #include "fse.h" /* header compression */ #include "huf.h" #include +#include #include /* memcpy, memset */ /* ************************************************************** @@ -86,20 +87,32 @@ typedef struct { BYTE nbBits; } HUF_DEltX2; /* single-symbol decoding */ -size_t HUF_readDTableX2(HUF_DTable *DTable, const void *src, size_t srcSize) +size_t HUF_readDTableX2_wksp(HUF_DTable *DTable, const void *src, size_t srcSize, void *workspace, size_t workspaceSize) { - BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1]; - U32 rankVal[HUF_TABLELOG_ABSOLUTEMAX + 1]; /* large enough for values from 0 to 16 */ U32 tableLog = 0; U32 nbSymbols = 0; size_t iSize; void *const dtPtr = DTable + 1; HUF_DEltX2 *const dt = (HUF_DEltX2 *)dtPtr; + U32 *rankVal; + BYTE *huffWeight; + size_t spaceUsed32 = 0; + + rankVal = (U32 *)workspace + spaceUsed32; + spaceUsed32 += HUF_TABLELOG_ABSOLUTEMAX + 1; + huffWeight = (BYTE *)((U32 *)workspace + spaceUsed32); + spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; + + if ((spaceUsed32 << 2) > workspaceSize) + return ERROR(tableLog_tooLarge); + workspace = (U32 *)workspace + spaceUsed32; + workspaceSize -= (spaceUsed32 << 2); + HUF_STATIC_ASSERT(sizeof(DTableDesc) == sizeof(HUF_DTable)); /* memset(huffWeight, 0, sizeof(huffWeight)); */ /* is not necessary, even though some analyzer complain ... */ - iSize = HUF_readStats(huffWeight, HUF_SYMBOLVALUE_MAX + 1, rankVal, &nbSymbols, &tableLog, src, srcSize); + iSize = HUF_readStats_wksp(huffWeight, HUF_SYMBOLVALUE_MAX + 1, rankVal, &nbSymbols, &tableLog, src, srcSize, workspace, workspaceSize); if (HUF_isError(iSize)) return iSize; @@ -216,11 +229,11 @@ size_t HUF_decompress1X2_usingDTable(void *dst, size_t dstSize, const void *cSrc return HUF_decompress1X2_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable); } -size_t HUF_decompress1X2_DCtx(HUF_DTable *DCtx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) +size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable *DCtx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) { const BYTE *ip = (const BYTE *)cSrc; - size_t const hSize = HUF_readDTableX2(DCtx, cSrc, cSrcSize); + size_t const hSize = HUF_readDTableX2_wksp(DCtx, cSrc, cSrcSize, workspace, workspaceSize); if (HUF_isError(hSize)) return hSize; if (hSize >= cSrcSize) @@ -347,11 +360,11 @@ size_t HUF_decompress4X2_usingDTable(void *dst, size_t dstSize, const void *cSrc return HUF_decompress4X2_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable); } -size_t HUF_decompress4X2_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) +size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) { const BYTE *ip = (const BYTE *)cSrc; - size_t const hSize = HUF_readDTableX2(dctx, cSrc, cSrcSize); + size_t const hSize = HUF_readDTableX2_wksp(dctx, cSrc, cSrcSize, workspace, workspaceSize); if (HUF_isError(hSize)) return hSize; if (hSize >= cSrcSize) @@ -422,6 +435,7 @@ static void HUF_fillDTableX4Level2(HUF_DEltX4 *DTable, U32 sizeLog, const U32 co } typedef U32 rankVal_t[HUF_TABLELOG_MAX][HUF_TABLELOG_MAX + 1]; +typedef U32 rankValCol_t[HUF_TABLELOG_MAX + 1]; static void HUF_fillDTableX4(HUF_DEltX4 *DTable, const U32 targetLog, const sortedSymbol_t *sortedList, const U32 sortedListSize, const U32 *rankStart, rankVal_t rankValOrigin, const U32 maxWeight, const U32 nbBitsBaseline) @@ -465,27 +479,50 @@ static void HUF_fillDTableX4(HUF_DEltX4 *DTable, const U32 targetLog, const sort } } -size_t HUF_readDTableX4(HUF_DTable *DTable, const void *src, size_t srcSize) +size_t HUF_readDTableX4_wksp(HUF_DTable *DTable, const void *src, size_t srcSize, void *workspace, size_t workspaceSize) { - BYTE weightList[HUF_SYMBOLVALUE_MAX + 1]; - sortedSymbol_t sortedSymbol[HUF_SYMBOLVALUE_MAX + 1]; - U32 rankStats[HUF_TABLELOG_MAX + 1] = {0}; - U32 rankStart0[HUF_TABLELOG_MAX + 2] = {0}; - U32 *const rankStart = rankStart0 + 1; - rankVal_t rankVal; U32 tableLog, maxW, sizeOfSort, nbSymbols; DTableDesc dtd = HUF_getDTableDesc(DTable); U32 const maxTableLog = dtd.maxTableLog; size_t iSize; void *dtPtr = DTable + 1; /* force compiler to avoid strict-aliasing */ HUF_DEltX4 *const dt = (HUF_DEltX4 *)dtPtr; + U32 *rankStart; + + rankValCol_t *rankVal; + U32 *rankStats; + U32 *rankStart0; + sortedSymbol_t *sortedSymbol; + BYTE *weightList; + size_t spaceUsed32 = 0; + + HUF_STATIC_ASSERT((sizeof(rankValCol_t) & 3) == 0); + + rankVal = (rankValCol_t *)((U32 *)workspace + spaceUsed32); + spaceUsed32 += (sizeof(rankValCol_t) * HUF_TABLELOG_MAX) >> 2; + rankStats = (U32 *)workspace + spaceUsed32; + spaceUsed32 += HUF_TABLELOG_MAX + 1; + rankStart0 = (U32 *)workspace + spaceUsed32; + spaceUsed32 += HUF_TABLELOG_MAX + 2; + sortedSymbol = (sortedSymbol_t *)((U32 *)workspace + spaceUsed32); + spaceUsed32 += ALIGN(sizeof(sortedSymbol_t) * (HUF_SYMBOLVALUE_MAX + 1), sizeof(U32)) >> 2; + weightList = (BYTE *)((U32 *)workspace + spaceUsed32); + spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; + + if ((spaceUsed32 << 2) > workspaceSize) + return ERROR(tableLog_tooLarge); + workspace = (U32 *)workspace + spaceUsed32; + workspaceSize -= (spaceUsed32 << 2); + + rankStart = rankStart0 + 1; + memset(rankStats, 0, sizeof(U32) * (2 * HUF_TABLELOG_MAX + 2 + 1)); HUF_STATIC_ASSERT(sizeof(HUF_DEltX4) == sizeof(HUF_DTable)); /* if compiler fails here, assertion is wrong */ if (maxTableLog > HUF_TABLELOG_MAX) return ERROR(tableLog_tooLarge); /* memset(weightList, 0, sizeof(weightList)); */ /* is not necessary, even though some analyzer complain ... */ - iSize = HUF_readStats(weightList, HUF_SYMBOLVALUE_MAX + 1, rankStats, &nbSymbols, &tableLog, src, srcSize); + iSize = HUF_readStats_wksp(weightList, HUF_SYMBOLVALUE_MAX + 1, rankStats, &nbSymbols, &tableLog, src, srcSize, workspace, workspaceSize); if (HUF_isError(iSize)) return iSize; @@ -652,11 +689,11 @@ size_t HUF_decompress1X4_usingDTable(void *dst, size_t dstSize, const void *cSrc return HUF_decompress1X4_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable); } -size_t HUF_decompress1X4_DCtx(HUF_DTable *DCtx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) +size_t HUF_decompress1X4_DCtx_wksp(HUF_DTable *DCtx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) { const BYTE *ip = (const BYTE *)cSrc; - size_t const hSize = HUF_readDTableX4(DCtx, cSrc, cSrcSize); + size_t const hSize = HUF_readDTableX4_wksp(DCtx, cSrc, cSrcSize, workspace, workspaceSize); if (HUF_isError(hSize)) return hSize; if (hSize >= cSrcSize) @@ -785,11 +822,11 @@ size_t HUF_decompress4X4_usingDTable(void *dst, size_t dstSize, const void *cSrc return HUF_decompress4X4_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable); } -size_t HUF_decompress4X4_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) +size_t HUF_decompress4X4_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) { const BYTE *ip = (const BYTE *)cSrc; - size_t hSize = HUF_readDTableX4(dctx, cSrc, cSrcSize); + size_t hSize = HUF_readDTableX4_wksp(dctx, cSrc, cSrcSize, workspace, workspaceSize); if (HUF_isError(hSize)) return hSize; if (hSize >= cSrcSize) @@ -861,7 +898,7 @@ U32 HUF_selectDecoder(size_t dstSize, size_t cSrcSize) typedef size_t (*decompressionAlgo)(void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize); -size_t HUF_decompress4X_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) +size_t HUF_decompress4X_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) { /* validation checks */ if (dstSize == 0) @@ -879,11 +916,12 @@ size_t HUF_decompress4X_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const { U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize); - return algoNb ? HUF_decompress4X4_DCtx(dctx, dst, dstSize, cSrc, cSrcSize) : HUF_decompress4X2_DCtx(dctx, dst, dstSize, cSrc, cSrcSize); + return algoNb ? HUF_decompress4X4_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize) + : HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize); } } -size_t HUF_decompress4X_hufOnly(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) +size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) { /* validation checks */ if (dstSize == 0) @@ -893,11 +931,12 @@ size_t HUF_decompress4X_hufOnly(HUF_DTable *dctx, void *dst, size_t dstSize, con { U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize); - return algoNb ? HUF_decompress4X4_DCtx(dctx, dst, dstSize, cSrc, cSrcSize) : HUF_decompress4X2_DCtx(dctx, dst, dstSize, cSrc, cSrcSize); + return algoNb ? HUF_decompress4X4_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize) + : HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize); } } -size_t HUF_decompress1X_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize) +size_t HUF_decompress1X_DCtx_wksp(HUF_DTable *dctx, void *dst, size_t dstSize, const void *cSrc, size_t cSrcSize, void *workspace, size_t workspaceSize) { /* validation checks */ if (dstSize == 0) @@ -915,6 +954,7 @@ size_t HUF_decompress1X_DCtx(HUF_DTable *dctx, void *dst, size_t dstSize, const { U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize); - return algoNb ? HUF_decompress1X4_DCtx(dctx, dst, dstSize, cSrc, cSrcSize) : HUF_decompress1X2_DCtx(dctx, dst, dstSize, cSrc, cSrcSize); + return algoNb ? HUF_decompress1X4_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize) + : HUF_decompress1X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workspace, workspaceSize); } } diff --git a/contrib/linux-kernel/lib/zstd/zstd_common.c b/contrib/linux-kernel/lib/zstd/zstd_common.c index 6ebf68d2a..a282624ee 100644 --- a/contrib/linux-kernel/lib/zstd/zstd_common.c +++ b/contrib/linux-kernel/lib/zstd/zstd_common.c @@ -51,7 +51,7 @@ ZSTD_customMem ZSTD_initStack(void *workspace, size_t workspaceSize) void *ZSTD_stackAllocAll(void *opaque, size_t *size) { ZSTD_stack *stack = (ZSTD_stack *)opaque; - *size = stack->end - ZSTD_PTR_ALIGN(stack->ptr); + *size = (BYTE const *)stack->end - (BYTE *)ZSTD_PTR_ALIGN(stack->ptr); return stack_push(stack, *size); } diff --git a/contrib/linux-kernel/test/Makefile b/contrib/linux-kernel/test/Makefile index 892264f4c..8411462c9 100644 --- a/contrib/linux-kernel/test/Makefile +++ b/contrib/linux-kernel/test/Makefile @@ -5,21 +5,21 @@ SOURCES := $(wildcard ../lib/zstd/*.c) OBJECTS := $(patsubst %.c,%.o,$(SOURCES)) ARFLAGS := rcs -CXXFLAGS += -std=c++11 -CFLAGS += -g -O0 +CXXFLAGS += -std=c++11 -g -O3 -Wcast-align +CFLAGS += -g -O3 -Wframe-larger-than=400 -Wcast-align CPPFLAGS += $(IFLAGS) ../lib/zstd/libzstd.a: $(OBJECTS) $(AR) $(ARFLAGS) $@ $^ DecompressCrash: DecompressCrash.o $(OBJECTS) libFuzzer.a - $(CXX) $(TEST_CPPFLAGS) $(TEST_CXXFLAGS) $(LDFLAGS) $^ -o $@ + $(CXX) $(CPPFLAGS) $(CXXFLAGS) $(LDFLAGS) $^ -o $@ RoundTripCrash: RoundTripCrash.o $(OBJECTS) ../lib/xxhash.o libFuzzer.a - $(CXX) $(TEST_CPPFLAGS) $(TEST_CXXFLAGS) $(LDFLAGS) $^ -o $@ + $(CXX) $(CPPFLAGS) $(CXXFLAGS) $(LDFLAGS) $^ -o $@ UserlandTest: UserlandTest.cpp ../lib/zstd/libzstd.a ../lib/xxhash.o - $(CXX) $(CXXFLAGS) $(CFLAGS) $(CPPFLAGS) $^ googletest/build/googlemock/gtest/libgtest.a googletest/build/googlemock/gtest/libgtest_main.a -o $@ + $(CXX) $(CXXFLAGS) $(CPPFLAGS) $^ googletest/build/googlemock/gtest/libgtest.a googletest/build/googlemock/gtest/libgtest_main.a -o $@ XXHashUserlandTest: XXHashUserlandTest.cpp ../lib/xxhash.o ../../../lib/common/xxhash.o $(CXX) $(CXXFLAGS) $(CFLAGS) $(CPPFLAGS) $^ googletest/build/googlemock/gtest/libgtest.a googletest/build/googlemock/gtest/libgtest_main.a -o $@ @@ -39,5 +39,5 @@ googletest: @cd googletest/build && cmake .. && $(MAKE) clean: - $(RM) -f *.{o,a} ../lib/zstd/*.{o,a} + $(RM) -f *.{o,a} ../lib/zstd/*.{o,a} ../lib/*.o $(RM) -f DecompressCrash RoundTripCrash UserlandTest XXHashUserlandTest diff --git a/contrib/linux-kernel/test/include/linux/math64.h b/contrib/linux-kernel/test/include/linux/math64.h new file mode 100644 index 000000000..3d0ae72d5 --- /dev/null +++ b/contrib/linux-kernel/test/include/linux/math64.h @@ -0,0 +1,11 @@ +#ifndef LINUX_MATH64_H +#define LINUX_MATH64_H + +#include + +static uint64_t div_u64(uint64_t n, uint32_t d) +{ + return n / d; +} + +#endif From acbef3decdde96e30b11a09e13563cff08d35311 Mon Sep 17 00:00:00 2001 From: Yann Collet Date: Thu, 29 Jun 2017 05:18:09 -0700 Subject: [PATCH 02/14] ZSTD_getFrameContentSize() is promoted to "stable" status --- NEWS | 5 ++- lib/zstd.h | 91 ++++++++++++++++++++++++++++-------------------------- 2 files changed, 51 insertions(+), 45 deletions(-) diff --git a/NEWS b/NEWS index db9df9634..d23a58f02 100644 --- a/NEWS +++ b/NEWS @@ -1,12 +1,15 @@ v1.3.0 cli : new : `--list` command, by Paul Cruz +cli : changed : xz/lzma support enabled by default cli : changed : `-t *` continue processing list after a decompression error API : added : ZSTD_versionString() +API : promoted to stable status : ZSTD_getFrameContentSize(), by Sean Purcell API exp : new advanced API : ZSTD_compress_generic(), ZSTD_CCtx_setParameter() API exp : new : API for static or external allocation : ZSTD_initStatic?Ctx() API exp : added : ZSTD_decompressBegin_usingDDict(), requested by Guy Riddle (#700) +API exp : clarified memory estimation / measurement functions. API exp : changed : strongest strategy renamed ZSTD_btultra, fastest strategy ZSTD_fast set to 1 -API exp : clarified presentation of memory estimation / measurement functions. +tools : decodecorpus can generate random dictionary-compressed samples, by Paul Cruz new : contrib/seekable_format, demo and API, by Sean Purcell changed : contrib/linux-kernel, updated version and license, by Nick Terrell diff --git a/lib/zstd.h b/lib/zstd.h index 59a09be6f..71b641004 100644 --- a/lib/zstd.h +++ b/lib/zstd.h @@ -68,7 +68,7 @@ ZSTDLIB_API unsigned ZSTD_versionNumber(void); /**< useful to check dll versio #define ZSTD_QUOTE(str) #str #define ZSTD_EXPAND_AND_QUOTE(str) ZSTD_QUOTE(str) #define ZSTD_VERSION_STRING ZSTD_EXPAND_AND_QUOTE(ZSTD_LIB_VERSION) -ZSTDLIB_API const char* ZSTD_versionString(void); /* >= v1.3.0 */ +ZSTDLIB_API const char* ZSTD_versionString(void); /* v1.3.0 */ /*************************************** @@ -92,27 +92,41 @@ ZSTDLIB_API size_t ZSTD_compress( void* dst, size_t dstCapacity, ZSTDLIB_API size_t ZSTD_decompress( void* dst, size_t dstCapacity, const void* src, size_t compressedSize); -/*! ZSTD_getDecompressedSize() : - * NOTE: This function is planned to be obsolete, in favor of ZSTD_getFrameContentSize(). - * ZSTD_getFrameContentSize() works the same way, - * returning the decompressed size of a single frame, - * but distinguishes empty frames from frames with an unknown size, or errors. - * - * 'src' is the start of a zstd compressed frame. - * @return : content size to be decompressed, as a 64-bits value _if known_, 0 otherwise. - * note 1 : decompressed size is an optional field, it may not be present, typically in streaming mode. - * When `return==0`, data to decompress could be any size. +/*! ZSTD_getFrameContentSize() : v1.3.0 + * `src` should point to the start of a ZSTD encoded frame. + * `srcSize` must be at least as large as the frame header. + * hint : any size >= `ZSTD_frameHeaderSize_max` is large enough. + * @return : - decompressed size of the frame in `src`, if known + * - ZSTD_CONTENTSIZE_UNKNOWN if the size cannot be determined + * - ZSTD_CONTENTSIZE_ERROR if an error occurred (e.g. invalid magic number, srcSize too small) + * note 1 : a 0 return value means the frame is valid but "empty". + * note 2 : decompressed size is an optional field, it may not be present, typically in streaming mode. + * When `return==ZSTD_CONTENTSIZE_UNKNOWN`, data to decompress could be any size. * In which case, it's necessary to use streaming mode to decompress data. - * Optionally, application can use ZSTD_decompress() while relying on implied limits. - * (For example, data may be necessarily cut into blocks <= 16 KB). - * note 2 : decompressed size is always present when compression is done with ZSTD_compress() - * note 3 : decompressed size can be very large (64-bits value), + * Optionally, application can rely on some implicit limit, + * as ZSTD_decompress() only needs an upper bound of decompressed size. + * (For example, data could be necessarily cut into blocks <= 16 KB). + * note 3 : decompressed size is always present when compression is done with ZSTD_compress() + * note 4 : decompressed size can be very large (64-bits value), * potentially larger than what local system can handle as a single memory segment. * In which case, it's necessary to use streaming mode to decompress data. - * note 4 : If source is untrusted, decompressed size could be wrong or intentionally modified. - * Always ensure result fits within application's authorized limits. + * note 5 : If source is untrusted, decompressed size could be wrong or intentionally modified. + * Always ensure return value fits within application's authorized limits. * Each application can set its own limits. - * note 5 : when `return==0`, if precise failure cause is needed, use ZSTD_getFrameHeader() to know more. */ + * note 6 : This function replaces ZSTD_getDecompressedSize() */ +#define ZSTD_CONTENTSIZE_UNKNOWN (0ULL - 1) +#define ZSTD_CONTENTSIZE_ERROR (0ULL - 2) +ZSTDLIB_API unsigned long long ZSTD_getFrameContentSize(const void *src, size_t srcSize); + +/*! ZSTD_getDecompressedSize() : + * NOTE: This function is now obsolete, in favor of ZSTD_getFrameContentSize(). + * Both functions work the same way, + * but ZSTD_getDecompressedSize() blends + * "empty", "unknown" and "error" results in the same return value (0), + * while ZSTD_getFrameContentSize() distinguishes them. + * + * 'src' is the start of a zstd compressed frame. + * @return : content size to be decompressed, as a 64-bits value _if known and not empty_, 0 otherwise. */ ZSTDLIB_API unsigned long long ZSTD_getDecompressedSize(const void* src, size_t srcSize); @@ -291,7 +305,7 @@ typedef struct ZSTD_outBuffer_s { * *******************************************************************/ typedef ZSTD_CCtx ZSTD_CStream; /**< CCtx and CStream are now effectively same object (>= v1.3.0) */ - /* But continue to distinguish them for compatibility with versions <= v1.2.0 */ + /* Continue to distinguish them for compatibility with versions <= v1.2.0 */ /*===== ZSTD_CStream management functions =====*/ ZSTDLIB_API ZSTD_CStream* ZSTD_createCStream(void); ZSTDLIB_API size_t ZSTD_freeCStream(ZSTD_CStream* zcs); @@ -330,7 +344,7 @@ ZSTDLIB_API size_t ZSTD_CStreamOutSize(void); /**< recommended size for output * *******************************************************************************/ typedef ZSTD_DCtx ZSTD_DStream; /**< DCtx and DStream are now effectively same object (>= v1.3.0) */ - /* But continue to distinguish them for compatibility with versions <= v1.2.0 */ + /* Continue to distinguish them for compatibility with versions <= v1.2.0 */ /*===== ZSTD_DStream management functions =====*/ ZSTDLIB_API ZSTD_DStream* ZSTD_createDStream(void); ZSTDLIB_API size_t ZSTD_freeDStream(ZSTD_DStream* zds); @@ -349,8 +363,8 @@ ZSTDLIB_API size_t ZSTD_DStreamOutSize(void); /*!< recommended size for output /**************************************************************************************** * START OF ADVANCED AND EXPERIMENTAL FUNCTIONS * The definitions in this section are considered experimental. - * They should never be used with a dynamic library, as they may change in the future. - * They are provided for advanced usages. + * They should never be used with a dynamic library, as prototypes may change in the future. + * They are provided for advanced scenarios. * Use them only in association with static linking. * ***************************************************************************************/ @@ -381,8 +395,8 @@ ZSTDLIB_API size_t ZSTD_DStreamOutSize(void); /*!< recommended size for output #define ZSTD_FRAMEHEADERSIZE_MAX 18 /* for static allocation */ #define ZSTD_FRAMEHEADERSIZE_MIN 6 static const size_t ZSTD_frameHeaderSize_prefix = 5; /* minimum input size to know frame header size */ -static const size_t ZSTD_frameHeaderSize_min = ZSTD_FRAMEHEADERSIZE_MIN; static const size_t ZSTD_frameHeaderSize_max = ZSTD_FRAMEHEADERSIZE_MAX; +static const size_t ZSTD_frameHeaderSize_min = ZSTD_FRAMEHEADERSIZE_MIN; static const size_t ZSTD_skippableHeaderSize = 8; /* magic number + skippable frame length */ @@ -433,26 +447,15 @@ static const ZSTD_customMem ZSTD_defaultCMem = { NULL, NULL, NULL }; /*! ZSTD_findFrameCompressedSize() : * `src` should point to the start of a ZSTD encoded frame or skippable frame * `srcSize` must be at least as large as the frame - * @return : the compressed size of the frame pointed to by `src`, + * @return : the compressed size of the first frame starting at `src`, * suitable to pass to `ZSTD_decompress` or similar, - * or an error code if given invalid input. */ + * or an error code if input is invalid */ ZSTDLIB_API size_t ZSTD_findFrameCompressedSize(const void* src, size_t srcSize); -/*! ZSTD_getFrameContentSize() : - * `src` should point to the start of a ZSTD encoded frame. - * `srcSize` must be at least as large as the frame header. - * A value >= `ZSTD_frameHeaderSize_max` is guaranteed to be large enough. - * @return : - decompressed size of the frame pointed to be `src` if known - * - ZSTD_CONTENTSIZE_UNKNOWN if the size cannot be determined - * - ZSTD_CONTENTSIZE_ERROR if an error occurred (e.g. invalid magic number, srcSize too small) */ -#define ZSTD_CONTENTSIZE_UNKNOWN (0ULL - 1) -#define ZSTD_CONTENTSIZE_ERROR (0ULL - 2) -ZSTDLIB_API unsigned long long ZSTD_getFrameContentSize(const void *src, size_t srcSize); - /*! ZSTD_findDecompressedSize() : * `src` should point the start of a series of ZSTD encoded and/or skippable frames - * `srcSize` must be the _exact_ size of this series - * (i.e. there should be a frame boundary exactly `srcSize` bytes after `src`) + * `srcSize` must be the _exact_ size of this serie + * (i.e. there should be a frame boundary exactly at `srcSize` bytes after `src`) * @return : - decompressed size of all data in all successive frames * - if the decompressed size cannot be determined: ZSTD_CONTENTSIZE_UNKNOWN * - if an error occurred: ZSTD_CONTENTSIZE_ERROR @@ -460,8 +463,6 @@ ZSTDLIB_API unsigned long long ZSTD_getFrameContentSize(const void *src, size_t * note 1 : decompressed size is an optional field, that may not be present, especially in streaming mode. * When `return==ZSTD_CONTENTSIZE_UNKNOWN`, data to decompress could be any size. * In which case, it's necessary to use streaming mode to decompress data. - * Optionally, application can still use ZSTD_decompress() while relying on implied limits. - * (For example, data may be necessarily cut into blocks <= 16 KB). * note 2 : decompressed size is always present when compression is done with ZSTD_compress() * note 3 : decompressed size can be very large (64-bits value), * potentially larger than what local system can handle as a single memory segment. @@ -470,7 +471,7 @@ ZSTDLIB_API unsigned long long ZSTD_getFrameContentSize(const void *src, size_t * Always ensure result fits within application's authorized limits. * Each application can set its own limits. * note 5 : ZSTD_findDecompressedSize handles multiple frames, and so it must traverse the input to - * read each contained frame header. This is efficient as most of the data is skipped, + * read each contained frame header. This is fast as most of the data is skipped, * however it does mean that all frame data must be present and valid. */ ZSTDLIB_API unsigned long long ZSTD_findDecompressedSize(const void* src, size_t srcSize); @@ -478,7 +479,8 @@ ZSTDLIB_API unsigned long long ZSTD_findDecompressedSize(const void* src, size_t * `src` should point to the start of a ZSTD frame * `srcSize` must be >= ZSTD_frameHeaderSize_prefix. * @return : size of the Frame Header */ -size_t ZSTD_frameHeaderSize(const void* src, size_t srcSize); +ZSTDLIB_API size_t ZSTD_frameHeaderSize(const void* src, size_t srcSize); + /*************************************** * Context memory usage @@ -575,7 +577,7 @@ ZSTDLIB_API size_t ZSTD_setCCtxParameter(ZSTD_CCtx* cctx, ZSTD_CCtxParameter par ZSTDLIB_API ZSTD_CDict* ZSTD_createCDict_byReference(const void* dictBuffer, size_t dictSize, int compressionLevel); -typedef enum { ZSTD_dm_auto=0, /* dictionary is "full" if it starts with ZSTD_MAGIC_DICTIONARY, rawContent otherwize */ +typedef enum { ZSTD_dm_auto=0, /* dictionary is "full" if it starts with ZSTD_MAGIC_DICTIONARY, otherwise it is "rawContent" */ ZSTD_dm_rawContent, /* ensures dictionary is always loaded as rawContent, even if it starts with ZSTD_MAGIC_DICTIONARY */ ZSTD_dm_fullDict /* refuses to load a dictionary if it does not respect Zstandard's specification */ } ZSTD_dictMode_e; @@ -583,7 +585,8 @@ typedef enum { ZSTD_dm_auto=0, /* dictionary is "full" if it starts with * Create a ZSTD_CDict using external alloc and free, and customized compression parameters */ ZSTDLIB_API ZSTD_CDict* ZSTD_createCDict_advanced(const void* dict, size_t dictSize, unsigned byReference, ZSTD_dictMode_e dictMode, - ZSTD_compressionParameters cParams, ZSTD_customMem customMem); + ZSTD_compressionParameters cParams, + ZSTD_customMem customMem); /*! ZSTD_initStaticCDict_advanced() : * Generate a digested dictionary in provided memory area. From 97f2bf66dace6b837bec09be182498a547053dd2 Mon Sep 17 00:00:00 2001 From: Yann Collet Date: Thu, 29 Jun 2017 11:31:40 -0700 Subject: [PATCH 03/14] minor : fix typo --- lib/zstd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/zstd.h b/lib/zstd.h index 71b641004..58e9a5606 100644 --- a/lib/zstd.h +++ b/lib/zstd.h @@ -454,7 +454,7 @@ ZSTDLIB_API size_t ZSTD_findFrameCompressedSize(const void* src, size_t srcSize) /*! ZSTD_findDecompressedSize() : * `src` should point the start of a series of ZSTD encoded and/or skippable frames - * `srcSize` must be the _exact_ size of this serie + * `srcSize` must be the _exact_ size of this series * (i.e. there should be a frame boundary exactly at `srcSize` bytes after `src`) * @return : - decompressed size of all data in all successive frames * - if the decompressed size cannot be determined: ZSTD_CONTENTSIZE_UNKNOWN From 99e315999c69f3d443c5992e4d21c6b5848e23ed Mon Sep 17 00:00:00 2001 From: Stella Lau Date: Thu, 29 Jun 2017 11:49:59 -0700 Subject: [PATCH 04/14] Reduce stack usage of HUF_readDTableX4 and HUF_readDTableX2 --- lib/common/huf.h | 11 ++ lib/decompress/huf_decompress.c | 172 +++++++++++++++++++++++++++---- lib/decompress/zstd_decompress.c | 11 +- 3 files changed, 170 insertions(+), 24 deletions(-) diff --git a/lib/common/huf.h b/lib/common/huf.h index 7873ca3d4..801f74c1e 100644 --- a/lib/common/huf.h +++ b/lib/common/huf.h @@ -111,6 +111,9 @@ HUF_PUBLIC_API size_t HUF_compress2 (void* dst, size_t dstCapacity, const void* #define HUF_WORKSPACE_SIZE_U32 (HUF_WORKSPACE_SIZE / sizeof(U32)) HUF_PUBLIC_API size_t HUF_compress4X_wksp (void* dst, size_t dstCapacity, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, size_t wkspSize); +/* The workspace must have alignment at least 4 and be at least this large */ +#define HUF_DECOMPRESS_WORKSPACE_SIZE (2 << 10) +#define HUF_DECOMPRESS_WORKSPACE_SIZE_U32 (HUF_DECOMPRESS_WORKSPACE_SIZE / sizeof(U32)) /* ****************************************************************** @@ -170,8 +173,11 @@ size_t HUF_decompress4X4 (void* dst, size_t dstSize, const void* cSrc, size_t cS size_t HUF_decompress4X_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize); /**< decodes RLE and uncompressed */ size_t HUF_decompress4X_hufOnly(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize); /**< considers RLE and uncompressed as errors */ +size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize); /**< considers RLE and uncompressed as errors */ size_t HUF_decompress4X2_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize); /**< single-symbol decoder */ +size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize); /**< single-symbol decoder */ size_t HUF_decompress4X4_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize); /**< double-symbols decoder */ +size_t HUF_decompress4X4_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize); /**< double-symbols decoder */ /* **************************************** @@ -243,7 +249,9 @@ HUF_decompress() does the following: U32 HUF_selectDecoder (size_t dstSize, size_t cSrcSize); size_t HUF_readDTableX2 (HUF_DTable* DTable, const void* src, size_t srcSize); +size_t HUF_readDTableX2_wksp (HUF_DTable* DTable, const void* src, size_t srcSize, void* workSpace, size_t wkspSize); size_t HUF_readDTableX4 (HUF_DTable* DTable, const void* src, size_t srcSize); +size_t HUF_readDTableX4_wksp (HUF_DTable* DTable, const void* src, size_t srcSize, void* workSpace, size_t wkspSize); size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable); size_t HUF_decompress4X2_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable); @@ -266,8 +274,11 @@ size_t HUF_decompress1X2 (void* dst, size_t dstSize, const void* cSrc, size_t cS size_t HUF_decompress1X4 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize); /* double-symbol decoder */ size_t HUF_decompress1X_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize); +size_t HUF_decompress1X_DCtx_wksp (HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize); size_t HUF_decompress1X2_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize); /**< single-symbol decoder */ +size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize); /**< single-symbol decoder */ size_t HUF_decompress1X4_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize); /**< double-symbols decoder */ +size_t HUF_decompress1X4_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize); /**< double-symbols decoder */ size_t HUF_decompress1X_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable); /**< automatic selection of sing or double symbol decoder, based on DTable */ size_t HUF_decompress1X2_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable); diff --git a/lib/decompress/huf_decompress.c b/lib/decompress/huf_decompress.c index d40b3be04..4011dee39 100644 --- a/lib/decompress/huf_decompress.c +++ b/lib/decompress/huf_decompress.c @@ -67,6 +67,12 @@ #define HUF_STATIC_ASSERT(c) { enum { HUF_static_assert = 1/(int)(!!(c)) }; } /* use only *after* variable declarations */ +/* ************************************************************** +* Byte alignment for workSpace management +****************************************************************/ +#define ALIGN(x, a) ALIGN_MASK((x), (a) - 1) +#define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask)) + /*-***************************/ /* generic DTableDesc */ /*-***************************/ @@ -87,16 +93,29 @@ static DTableDesc HUF_getDTableDesc(const HUF_DTable* table) typedef struct { BYTE byte; BYTE nbBits; } HUF_DEltX2; /* single-symbol decoding */ -size_t HUF_readDTableX2 (HUF_DTable* DTable, const void* src, size_t srcSize) +size_t HUF_readDTableX2_wksp(HUF_DTable* DTable, const void* src, size_t srcSize, void* workSpace, size_t wkspSize) { - BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1]; - U32 rankVal[HUF_TABLELOG_ABSOLUTEMAX + 1]; /* large enough for values from 0 to 16 */ U32 tableLog = 0; U32 nbSymbols = 0; size_t iSize; void* const dtPtr = DTable + 1; HUF_DEltX2* const dt = (HUF_DEltX2*)dtPtr; + U32 *rankVal; + BYTE *huffWeight; + size_t spaceUsed = 0; + + rankVal = (U32 *)((BYTE *)workSpace + spaceUsed); + spaceUsed += sizeof(U32) * (HUF_TABLELOG_ABSOLUTEMAX + 1); + huffWeight = (BYTE *)workSpace + spaceUsed; + spaceUsed += HUF_SYMBOLVALUE_MAX + 1; + spaceUsed = ALIGN(spaceUsed, sizeof(U32)); + + if (spaceUsed > wkspSize) + return ERROR(tableLog_tooLarge); + workSpace = (BYTE *)workSpace + spaceUsed; + wkspSize -= spaceUsed; + HUF_STATIC_ASSERT(sizeof(DTableDesc) == sizeof(HUF_DTable)); /* memset(huffWeight, 0, sizeof(huffWeight)); */ /* is not necessary, even though some analyzer complain ... */ @@ -135,6 +154,13 @@ size_t HUF_readDTableX2 (HUF_DTable* DTable, const void* src, size_t srcSize) return iSize; } +size_t HUF_readDTableX2(HUF_DTable* DTable, const void* src, size_t srcSize) +{ + U32 workSpace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32]; + return HUF_readDTableX2_wksp(DTable, src, srcSize, + workSpace, sizeof(workSpace)); +} + static BYTE HUF_decodeSymbolX2(BIT_DStream_t* Dstream, const HUF_DEltX2* dt, const U32 dtLog) { @@ -212,11 +238,13 @@ size_t HUF_decompress1X2_usingDTable( return HUF_decompress1X2_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable); } -size_t HUF_decompress1X2_DCtx (HUF_DTable* DCtx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize) +size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable* DCtx, void* dst, size_t dstSize, + const void* cSrc, size_t cSrcSize, + void* workSpace, size_t wkspSize) { const BYTE* ip = (const BYTE*) cSrc; - size_t const hSize = HUF_readDTableX2 (DCtx, cSrc, cSrcSize); + size_t const hSize = HUF_readDTableX2_wksp(DCtx, cSrc, cSrcSize, workSpace, wkspSize); if (HUF_isError(hSize)) return hSize; if (hSize >= cSrcSize) return ERROR(srcSize_wrong); ip += hSize; cSrcSize -= hSize; @@ -224,6 +252,15 @@ size_t HUF_decompress1X2_DCtx (HUF_DTable* DCtx, void* dst, size_t dstSize, cons return HUF_decompress1X2_usingDTable_internal (dst, dstSize, ip, cSrcSize, DCtx); } + +size_t HUF_decompress1X2_DCtx(HUF_DTable* DCtx, void* dst, size_t dstSize, + const void* cSrc, size_t cSrcSize) +{ + U32 workSpace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32]; + return HUF_decompress1X2_DCtx_wksp(DCtx, dst, dstSize, cSrc, cSrcSize, + workSpace, sizeof(workSpace)); +} + size_t HUF_decompress1X2 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize) { HUF_CREATE_STATIC_DTABLEX2(DTable, HUF_TABLELOG_MAX); @@ -335,11 +372,14 @@ size_t HUF_decompress4X2_usingDTable( } -size_t HUF_decompress4X2_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize) +size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize, + const void* cSrc, size_t cSrcSize, + void* workSpace, size_t wkspSize) { const BYTE* ip = (const BYTE*) cSrc; - size_t const hSize = HUF_readDTableX2 (dctx, cSrc, cSrcSize); + size_t const hSize = HUF_readDTableX2_wksp (dctx, cSrc, cSrcSize, + workSpace, wkspSize); if (HUF_isError(hSize)) return hSize; if (hSize >= cSrcSize) return ERROR(srcSize_wrong); ip += hSize; cSrcSize -= hSize; @@ -347,6 +387,13 @@ size_t HUF_decompress4X2_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize, cons return HUF_decompress4X2_usingDTable_internal (dst, dstSize, ip, cSrcSize, dctx); } + +size_t HUF_decompress4X2_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize) +{ + U32 workSpace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32]; + return HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, + workSpace, sizeof(workSpace)); +} size_t HUF_decompress4X2 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize) { HUF_CREATE_STATIC_DTABLEX2(DTable, HUF_TABLELOG_MAX); @@ -404,6 +451,7 @@ static void HUF_fillDTableX4Level2(HUF_DEltX4* DTable, U32 sizeLog, const U32 co } typedef U32 rankVal_t[HUF_TABLELOG_MAX][HUF_TABLELOG_MAX + 1]; +typedef U32 rankValCol_t[HUF_TABLELOG_MAX + 1]; static void HUF_fillDTableX4(HUF_DEltX4* DTable, const U32 targetLog, const sortedSymbol_t* sortedList, const U32 sortedListSize, @@ -447,20 +495,44 @@ static void HUF_fillDTableX4(HUF_DEltX4* DTable, const U32 targetLog, } } -size_t HUF_readDTableX4 (HUF_DTable* DTable, const void* src, size_t srcSize) +size_t HUF_readDTableX4_wksp(HUF_DTable* DTable, const void* src, + size_t srcSize, void* workSpace, + size_t wkspSize) { - BYTE weightList[HUF_SYMBOLVALUE_MAX + 1]; - sortedSymbol_t sortedSymbol[HUF_SYMBOLVALUE_MAX + 1]; - U32 rankStats[HUF_TABLELOG_MAX + 1] = { 0 }; - U32 rankStart0[HUF_TABLELOG_MAX + 2] = { 0 }; - U32* const rankStart = rankStart0+1; - rankVal_t rankVal; U32 tableLog, maxW, sizeOfSort, nbSymbols; DTableDesc dtd = HUF_getDTableDesc(DTable); U32 const maxTableLog = dtd.maxTableLog; size_t iSize; void* dtPtr = DTable+1; /* force compiler to avoid strict-aliasing */ HUF_DEltX4* const dt = (HUF_DEltX4*)dtPtr; + U32 *rankStart; + + rankValCol_t *rankVal; + U32 *rankStats; + U32 *rankStart0; + sortedSymbol_t *sortedSymbol; + BYTE *weightList; + size_t spaceUsed = 0; + + rankVal = (rankValCol_t *)((BYTE *)workSpace + spaceUsed); + spaceUsed += sizeof(rankValCol_t) * HUF_TABLELOG_MAX; + rankStats = (U32 *)((BYTE *)workSpace + spaceUsed); + spaceUsed += sizeof(U32) * (HUF_TABLELOG_MAX + 1); + rankStart0 = (U32 *)((BYTE *)workSpace + spaceUsed); + spaceUsed += sizeof(U32) * (HUF_TABLELOG_MAX + 2); + sortedSymbol = (sortedSymbol_t *)((BYTE *)workSpace + spaceUsed); + spaceUsed += sizeof(sortedSymbol_t) * (HUF_SYMBOLVALUE_MAX + 1); + weightList = (BYTE *)workSpace + spaceUsed; + spaceUsed += HUF_SYMBOLVALUE_MAX + 1; + spaceUsed = ALIGN(spaceUsed, sizeof(U32)); + + if (spaceUsed > wkspSize) + return ERROR(tableLog_tooLarge); + workSpace = (BYTE *)workSpace + spaceUsed; + wkspSize -= spaceUsed; + + rankStart = rankStart0 + 1; + memset(rankStats, 0, sizeof(U32) * (2 * HUF_TABLELOG_MAX + 2 + 1)); HUF_STATIC_ASSERT(sizeof(HUF_DEltX4) == sizeof(HUF_DTable)); /* if compiler fails here, assertion is wrong */ if (maxTableLog > HUF_TABLELOG_MAX) return ERROR(tableLog_tooLarge); @@ -527,6 +599,12 @@ size_t HUF_readDTableX4 (HUF_DTable* DTable, const void* src, size_t srcSize) return iSize; } +size_t HUF_readDTableX4(HUF_DTable* DTable, const void* src, size_t srcSize) +{ + U32 workSpace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32]; + return HUF_readDTableX4_wksp(DTable, src, srcSize, + workSpace, sizeof(workSpace)); +} static U32 HUF_decodeSymbolX4(void* op, BIT_DStream_t* DStream, const HUF_DEltX4* dt, const U32 dtLog) { @@ -627,11 +705,14 @@ size_t HUF_decompress1X4_usingDTable( return HUF_decompress1X4_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable); } -size_t HUF_decompress1X4_DCtx (HUF_DTable* DCtx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize) +size_t HUF_decompress1X4_DCtx_wksp(HUF_DTable* DCtx, void* dst, size_t dstSize, + const void* cSrc, size_t cSrcSize, + void* workSpace, size_t wkspSize) { const BYTE* ip = (const BYTE*) cSrc; - size_t const hSize = HUF_readDTableX4 (DCtx, cSrc, cSrcSize); + size_t const hSize = HUF_readDTableX4_wksp(DCtx, cSrc, cSrcSize, + workSpace, wkspSize); if (HUF_isError(hSize)) return hSize; if (hSize >= cSrcSize) return ERROR(srcSize_wrong); ip += hSize; cSrcSize -= hSize; @@ -639,6 +720,15 @@ size_t HUF_decompress1X4_DCtx (HUF_DTable* DCtx, void* dst, size_t dstSize, cons return HUF_decompress1X4_usingDTable_internal (dst, dstSize, ip, cSrcSize, DCtx); } + +size_t HUF_decompress1X4_DCtx(HUF_DTable* DCtx, void* dst, size_t dstSize, + const void* cSrc, size_t cSrcSize) +{ + U32 workSpace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32]; + return HUF_decompress1X4_DCtx_wksp(DCtx, dst, dstSize, cSrc, cSrcSize, + workSpace, sizeof(workSpace)); +} + size_t HUF_decompress1X4 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize) { HUF_CREATE_STATIC_DTABLEX4(DTable, HUF_TABLELOG_MAX); @@ -749,11 +839,14 @@ size_t HUF_decompress4X4_usingDTable( } -size_t HUF_decompress4X4_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize) +size_t HUF_decompress4X4_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize, + const void* cSrc, size_t cSrcSize, + void* workSpace, size_t wkspSize) { const BYTE* ip = (const BYTE*) cSrc; - size_t hSize = HUF_readDTableX4 (dctx, cSrc, cSrcSize); + size_t hSize = HUF_readDTableX4_wksp(dctx, cSrc, cSrcSize, + workSpace, wkspSize); if (HUF_isError(hSize)) return hSize; if (hSize >= cSrcSize) return ERROR(srcSize_wrong); ip += hSize; cSrcSize -= hSize; @@ -761,6 +854,15 @@ size_t HUF_decompress4X4_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize, cons return HUF_decompress4X4_usingDTable_internal(dst, dstSize, ip, cSrcSize, dctx); } + +size_t HUF_decompress4X4_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize, + const void* cSrc, size_t cSrcSize) +{ + U32 workSpace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32]; + return HUF_decompress4X4_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, + workSpace, sizeof(workSpace)); +} + size_t HUF_decompress4X4 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize) { HUF_CREATE_STATIC_DTABLEX4(DTable, HUF_TABLELOG_MAX); @@ -874,7 +976,25 @@ size_t HUF_decompress4X_hufOnly (HUF_DTable* dctx, void* dst, size_t dstSize, co } } -size_t HUF_decompress1X_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize) + +size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable* dctx, void* dst, + size_t dstSize, const void* cSrc, + size_t cSrcSize, void* workSpace, + size_t wkspSize) +{ + /* validation checks */ + if (dstSize == 0) return ERROR(dstSize_tooSmall); + if ((cSrcSize >= dstSize) || (cSrcSize <= 1)) return ERROR(corruption_detected); /* invalid */ + + { U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize); + return algoNb ? HUF_decompress4X4_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize): + HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize); + } +} + +size_t HUF_decompress1X_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize, + const void* cSrc, size_t cSrcSize, + void* workSpace, size_t wkspSize) { /* validation checks */ if (dstSize == 0) return ERROR(dstSize_tooSmall); @@ -883,7 +1003,17 @@ size_t HUF_decompress1X_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize, const if (cSrcSize == 1) { memset(dst, *(const BYTE*)cSrc, dstSize); return dstSize; } /* RLE */ { U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize); - return algoNb ? HUF_decompress1X4_DCtx(dctx, dst, dstSize, cSrc, cSrcSize) : - HUF_decompress1X2_DCtx(dctx, dst, dstSize, cSrc, cSrcSize) ; + return algoNb ? HUF_decompress1X4_DCtx_wksp(dctx, dst, dstSize, cSrc, + cSrcSize, workSpace, wkspSize): + HUF_decompress1X2_DCtx_wksp(dctx, dst, dstSize, cSrc, + cSrcSize, workSpace, wkspSize); } } + +size_t HUF_decompress1X_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize, + const void* cSrc, size_t cSrcSize) +{ + U32 workSpace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32]; + return HUF_decompress1X_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, + workSpace, sizeof(workSpace)); +} diff --git a/lib/decompress/zstd_decompress.c b/lib/decompress/zstd_decompress.c index 95d18d4b5..b87925c54 100644 --- a/lib/decompress/zstd_decompress.c +++ b/lib/decompress/zstd_decompress.c @@ -93,6 +93,7 @@ typedef struct { FSE_DTable OFTable[FSE_DTABLE_SIZE_U32(OffFSELog)]; FSE_DTable MLTable[FSE_DTABLE_SIZE_U32(MLFSELog)]; HUF_DTable hufTable[HUF_DTABLE_SIZE(HufLog)]; /* can accommodate HUF_decompress4X */ + U32 workspace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32]; U32 rep[ZSTD_REP_NUM]; } ZSTD_entropyTables_t; @@ -559,8 +560,10 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, HUF_decompress1X_usingDTable(dctx->litBuffer, litSize, istart+lhSize, litCSize, dctx->HUFptr) : HUF_decompress4X_usingDTable(dctx->litBuffer, litSize, istart+lhSize, litCSize, dctx->HUFptr) ) : ( singleStream ? - HUF_decompress1X2_DCtx(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart+lhSize, litCSize) : - HUF_decompress4X_hufOnly (dctx->entropy.hufTable, dctx->litBuffer, litSize, istart+lhSize, litCSize)) )) + HUF_decompress1X2_DCtx_wksp(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart+lhSize, litCSize, + dctx->entropy.workspace, sizeof(dctx->entropy.workspace)) : + HUF_decompress4X_hufOnly_wksp(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart+lhSize, litCSize, + dctx->entropy.workspace, sizeof(dctx->entropy.workspace))))) return ERROR(corruption_detected); dctx->litPtr = dctx->litBuffer; @@ -1836,7 +1839,9 @@ static size_t ZSTD_loadEntropy(ZSTD_entropyTables_t* entropy, const void* const dictPtr += 8; /* skip header = magic + dictID */ - { size_t const hSize = HUF_readDTableX4(entropy->hufTable, dictPtr, dictEnd-dictPtr); + { size_t const hSize = HUF_readDTableX4_wksp( + entropy->hufTable, dictPtr, dictEnd - dictPtr, + entropy->workspace, sizeof(entropy->workspace)); if (HUF_isError(hSize)) return ERROR(dictionary_corrupted); dictPtr += hSize; } From c6a5275a281accb56ac6023aafb5adaea9285b26 Mon Sep 17 00:00:00 2001 From: Stella Lau Date: Thu, 29 Jun 2017 12:39:34 -0700 Subject: [PATCH 05/14] Fix alignment warnings with pointer casting --- lib/decompress/huf_decompress.c | 60 ++++++++++++++++----------------- 1 file changed, 29 insertions(+), 31 deletions(-) diff --git a/lib/decompress/huf_decompress.c b/lib/decompress/huf_decompress.c index 4011dee39..45e00618f 100644 --- a/lib/decompress/huf_decompress.c +++ b/lib/decompress/huf_decompress.c @@ -101,20 +101,19 @@ size_t HUF_readDTableX2_wksp(HUF_DTable* DTable, const void* src, size_t srcSize void* const dtPtr = DTable + 1; HUF_DEltX2* const dt = (HUF_DEltX2*)dtPtr; - U32 *rankVal; - BYTE *huffWeight; - size_t spaceUsed = 0; + U32* rankVal; + BYTE* huffWeight; + size_t spaceUsed32 = 0; - rankVal = (U32 *)((BYTE *)workSpace + spaceUsed); - spaceUsed += sizeof(U32) * (HUF_TABLELOG_ABSOLUTEMAX + 1); - huffWeight = (BYTE *)workSpace + spaceUsed; - spaceUsed += HUF_SYMBOLVALUE_MAX + 1; - spaceUsed = ALIGN(spaceUsed, sizeof(U32)); + rankVal = (U32 *)workSpace + spaceUsed32; + spaceUsed32 += HUF_TABLELOG_ABSOLUTEMAX + 1; + huffWeight = (BYTE *)((U32 *)workSpace + spaceUsed32); + spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; - if (spaceUsed > wkspSize) + if ((spaceUsed32 >> 2) > wkspSize) return ERROR(tableLog_tooLarge); - workSpace = (BYTE *)workSpace + spaceUsed; - wkspSize -= spaceUsed; + workSpace = (U32 *)workSpace + spaceUsed32; + wkspSize -= (spaceUsed32 << 2); HUF_STATIC_ASSERT(sizeof(DTableDesc) == sizeof(HUF_DTable)); /* memset(huffWeight, 0, sizeof(huffWeight)); */ /* is not necessary, even though some analyzer complain ... */ @@ -507,29 +506,28 @@ size_t HUF_readDTableX4_wksp(HUF_DTable* DTable, const void* src, HUF_DEltX4* const dt = (HUF_DEltX4*)dtPtr; U32 *rankStart; - rankValCol_t *rankVal; - U32 *rankStats; - U32 *rankStart0; - sortedSymbol_t *sortedSymbol; - BYTE *weightList; - size_t spaceUsed = 0; + rankValCol_t* rankVal; + U32* rankStats; + U32* rankStart0; + sortedSymbol_t* sortedSymbol; + BYTE* weightList; + size_t spaceUsed32 = 0; - rankVal = (rankValCol_t *)((BYTE *)workSpace + spaceUsed); - spaceUsed += sizeof(rankValCol_t) * HUF_TABLELOG_MAX; - rankStats = (U32 *)((BYTE *)workSpace + spaceUsed); - spaceUsed += sizeof(U32) * (HUF_TABLELOG_MAX + 1); - rankStart0 = (U32 *)((BYTE *)workSpace + spaceUsed); - spaceUsed += sizeof(U32) * (HUF_TABLELOG_MAX + 2); - sortedSymbol = (sortedSymbol_t *)((BYTE *)workSpace + spaceUsed); - spaceUsed += sizeof(sortedSymbol_t) * (HUF_SYMBOLVALUE_MAX + 1); - weightList = (BYTE *)workSpace + spaceUsed; - spaceUsed += HUF_SYMBOLVALUE_MAX + 1; - spaceUsed = ALIGN(spaceUsed, sizeof(U32)); + rankVal = (rankValCol_t *)((U32 *)workSpace + spaceUsed32); + spaceUsed32 += (sizeof(rankValCol_t) * HUF_TABLELOG_MAX) >> 2; + rankStats = (U32 *)workSpace + spaceUsed32; + spaceUsed32 += HUF_TABLELOG_MAX + 1; + rankStart0 = (U32 *)workSpace + spaceUsed32; + spaceUsed32 += HUF_TABLELOG_MAX + 2; + sortedSymbol = (sortedSymbol_t *)((U32 *)workSpace + spaceUsed32); + spaceUsed32 += ALIGN(sizeof(sortedSymbol_t) * (HUF_SYMBOLVALUE_MAX + 1), sizeof(U32)) >> 2; + weightList = (BYTE *)((U32 *)workSpace + spaceUsed32); + spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; - if (spaceUsed > wkspSize) + if ((spaceUsed32 >> 2) > wkspSize) return ERROR(tableLog_tooLarge); - workSpace = (BYTE *)workSpace + spaceUsed; - wkspSize -= spaceUsed; + workSpace = (U32 *)workSpace + spaceUsed32; + wkspSize -= (spaceUsed32 << 2); rankStart = rankStart0 + 1; memset(rankStats, 0, sizeof(U32) * (2 * HUF_TABLELOG_MAX + 2 + 1)); From fedc94de8c1d7265db7d2bdebb2458270f8ceab1 Mon Sep 17 00:00:00 2001 From: Stella Lau Date: Thu, 29 Jun 2017 13:04:15 -0700 Subject: [PATCH 06/14] Fix pointer casting warning --- lib/decompress/huf_decompress.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/decompress/huf_decompress.c b/lib/decompress/huf_decompress.c index 45e00618f..cbf78e807 100644 --- a/lib/decompress/huf_decompress.c +++ b/lib/decompress/huf_decompress.c @@ -519,7 +519,7 @@ size_t HUF_readDTableX4_wksp(HUF_DTable* DTable, const void* src, spaceUsed32 += HUF_TABLELOG_MAX + 1; rankStart0 = (U32 *)workSpace + spaceUsed32; spaceUsed32 += HUF_TABLELOG_MAX + 2; - sortedSymbol = (sortedSymbol_t *)((U32 *)workSpace + spaceUsed32); + sortedSymbol = (sortedSymbol_t *)workSpace + (spaceUsed32 * sizeof(U32)) / sizeof(sortedSymbol_t); spaceUsed32 += ALIGN(sizeof(sortedSymbol_t) * (HUF_SYMBOLVALUE_MAX + 1), sizeof(U32)) >> 2; weightList = (BYTE *)((U32 *)workSpace + spaceUsed32); spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; From 104c4d57c1fa7ca6872ad95925a9448bc3d1f1db Mon Sep 17 00:00:00 2001 From: Stella Lau Date: Thu, 29 Jun 2017 15:40:49 -0700 Subject: [PATCH 07/14] Fix bitshift error --- lib/decompress/huf_decompress.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/decompress/huf_decompress.c b/lib/decompress/huf_decompress.c index cbf78e807..68938aebd 100644 --- a/lib/decompress/huf_decompress.c +++ b/lib/decompress/huf_decompress.c @@ -110,8 +110,8 @@ size_t HUF_readDTableX2_wksp(HUF_DTable* DTable, const void* src, size_t srcSize huffWeight = (BYTE *)((U32 *)workSpace + spaceUsed32); spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; - if ((spaceUsed32 >> 2) > wkspSize) - return ERROR(tableLog_tooLarge); + if ((spaceUsed32 << 2) > wkspSize) + return ERROR(tableLog_tooLarge); workSpace = (U32 *)workSpace + spaceUsed32; wkspSize -= (spaceUsed32 << 2); @@ -524,7 +524,7 @@ size_t HUF_readDTableX4_wksp(HUF_DTable* DTable, const void* src, weightList = (BYTE *)((U32 *)workSpace + spaceUsed32); spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; - if ((spaceUsed32 >> 2) > wkspSize) + if ((spaceUsed32 << 2) > wkspSize) return ERROR(tableLog_tooLarge); workSpace = (U32 *)workSpace + spaceUsed32; wkspSize -= (spaceUsed32 << 2); From 70ad6829e751fb542ae5f3534c0c62096bab3ce9 Mon Sep 17 00:00:00 2001 From: Stella Lau Date: Thu, 29 Jun 2017 16:22:32 -0700 Subject: [PATCH 08/14] Delegate HUF_decompress4X_hufOnly to workspace version --- lib/decompress/huf_decompress.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/lib/decompress/huf_decompress.c b/lib/decompress/huf_decompress.c index 68938aebd..4343eb6ea 100644 --- a/lib/decompress/huf_decompress.c +++ b/lib/decompress/huf_decompress.c @@ -962,16 +962,11 @@ size_t HUF_decompress4X_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize, const } } -size_t HUF_decompress4X_hufOnly (HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize) +size_t HUF_decompress4X_hufOnly(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize) { - /* validation checks */ - if (dstSize == 0) return ERROR(dstSize_tooSmall); - if ((cSrcSize >= dstSize) || (cSrcSize <= 1)) return ERROR(corruption_detected); /* invalid */ - - { U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize); - return algoNb ? HUF_decompress4X4_DCtx(dctx, dst, dstSize, cSrc, cSrcSize) : - HUF_decompress4X2_DCtx(dctx, dst, dstSize, cSrc, cSrcSize) ; - } + U32 workSpace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32]; + return HUF_decompress4X_hufOnly_wksp(dctx, dst, dstSize, cSrc, cSrcSize, + workSpace, sizeof(workSpace)); } From 28f711ef9513f30c8e20604542f84328b053abcd Mon Sep 17 00:00:00 2001 From: Stella Lau Date: Fri, 30 Jun 2017 09:38:11 -0700 Subject: [PATCH 09/14] Rename ALIGN and ALIGN_MASK to HUF_ALIGN and HUF_ALIGN_MASK --- lib/decompress/huf_decompress.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/decompress/huf_decompress.c b/lib/decompress/huf_decompress.c index 4343eb6ea..b9fc4d51e 100644 --- a/lib/decompress/huf_decompress.c +++ b/lib/decompress/huf_decompress.c @@ -70,8 +70,8 @@ /* ************************************************************** * Byte alignment for workSpace management ****************************************************************/ -#define ALIGN(x, a) ALIGN_MASK((x), (a) - 1) -#define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask)) +#define HUF_ALIGN(x, a) HUF_ALIGN_MASK((x), (a) - 1) +#define HUF_ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask)) /*-***************************/ /* generic DTableDesc */ @@ -108,7 +108,7 @@ size_t HUF_readDTableX2_wksp(HUF_DTable* DTable, const void* src, size_t srcSize rankVal = (U32 *)workSpace + spaceUsed32; spaceUsed32 += HUF_TABLELOG_ABSOLUTEMAX + 1; huffWeight = (BYTE *)((U32 *)workSpace + spaceUsed32); - spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; + spaceUsed32 += HUF_ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; if ((spaceUsed32 << 2) > wkspSize) return ERROR(tableLog_tooLarge); @@ -520,9 +520,9 @@ size_t HUF_readDTableX4_wksp(HUF_DTable* DTable, const void* src, rankStart0 = (U32 *)workSpace + spaceUsed32; spaceUsed32 += HUF_TABLELOG_MAX + 2; sortedSymbol = (sortedSymbol_t *)workSpace + (spaceUsed32 * sizeof(U32)) / sizeof(sortedSymbol_t); - spaceUsed32 += ALIGN(sizeof(sortedSymbol_t) * (HUF_SYMBOLVALUE_MAX + 1), sizeof(U32)) >> 2; + spaceUsed32 += HUF_ALIGN(sizeof(sortedSymbol_t) * (HUF_SYMBOLVALUE_MAX + 1), sizeof(U32)) >> 2; weightList = (BYTE *)((U32 *)workSpace + spaceUsed32); - spaceUsed32 += ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; + spaceUsed32 += HUF_ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; if ((spaceUsed32 << 2) > wkspSize) return ERROR(tableLog_tooLarge); From 4c71f59c775340a267f0c6070cbc4cbab50923cc Mon Sep 17 00:00:00 2001 From: Stella Lau Date: Fri, 30 Jun 2017 09:52:20 -0700 Subject: [PATCH 10/14] Clarify typedef of rankVal_t and rankValCol_t --- lib/decompress/huf_decompress.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/decompress/huf_decompress.c b/lib/decompress/huf_decompress.c index b9fc4d51e..2a1b70ea5 100644 --- a/lib/decompress/huf_decompress.c +++ b/lib/decompress/huf_decompress.c @@ -449,8 +449,8 @@ static void HUF_fillDTableX4Level2(HUF_DEltX4* DTable, U32 sizeLog, const U32 co } } } -typedef U32 rankVal_t[HUF_TABLELOG_MAX][HUF_TABLELOG_MAX + 1]; typedef U32 rankValCol_t[HUF_TABLELOG_MAX + 1]; +typedef rankValCol_t rankVal_t[HUF_TABLELOG_MAX]; static void HUF_fillDTableX4(HUF_DEltX4* DTable, const U32 targetLog, const sortedSymbol_t* sortedList, const U32 sortedListSize, From b0513b519c81d51565e4ffd6c0856eefb8b03fc0 Mon Sep 17 00:00:00 2001 From: Stella Lau Date: Fri, 30 Jun 2017 12:53:56 -0700 Subject: [PATCH 11/14] Add comment to HUF_DECOMPRESS_WORKSPACE_SIZE --- lib/common/huf.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/lib/common/huf.h b/lib/common/huf.h index 801f74c1e..16184e490 100644 --- a/lib/common/huf.h +++ b/lib/common/huf.h @@ -111,7 +111,16 @@ HUF_PUBLIC_API size_t HUF_compress2 (void* dst, size_t dstCapacity, const void* #define HUF_WORKSPACE_SIZE_U32 (HUF_WORKSPACE_SIZE / sizeof(U32)) HUF_PUBLIC_API size_t HUF_compress4X_wksp (void* dst, size_t dstCapacity, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, size_t wkspSize); -/* The workspace must have alignment at least 4 and be at least this large */ +/** + * The minimum workspace size for the `workSpace` used in + * HUF_readDTableX2_wksp() and HUF_readDTableX4_wksp(). + * + * The space used depends on HUF_TABLELOG_MAX, ranging from ~1500 bytes when + * HUF_TABLE_LOG_MAX=12 to ~1850 bytes when HUF_TABLE_LOG_MAX=15. + * Buffer overflow errors may potentially occur if code modifications result in + * a requiredw workspace size greater than that specified in the following + * macro. + */ #define HUF_DECOMPRESS_WORKSPACE_SIZE (2 << 10) #define HUF_DECOMPRESS_WORKSPACE_SIZE_U32 (HUF_DECOMPRESS_WORKSPACE_SIZE / sizeof(U32)) From 32df49e9f85f11f961655c70bae85203087db730 Mon Sep 17 00:00:00 2001 From: Stella Lau Date: Fri, 30 Jun 2017 12:56:24 -0700 Subject: [PATCH 12/14] Fix typo --- lib/common/huf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/common/huf.h b/lib/common/huf.h index 16184e490..dabd35991 100644 --- a/lib/common/huf.h +++ b/lib/common/huf.h @@ -118,7 +118,7 @@ HUF_PUBLIC_API size_t HUF_compress4X_wksp (void* dst, size_t dstCapacity, const * The space used depends on HUF_TABLELOG_MAX, ranging from ~1500 bytes when * HUF_TABLE_LOG_MAX=12 to ~1850 bytes when HUF_TABLE_LOG_MAX=15. * Buffer overflow errors may potentially occur if code modifications result in - * a requiredw workspace size greater than that specified in the following + * a required workspace size greater than that specified in the following * macro. */ #define HUF_DECOMPRESS_WORKSPACE_SIZE (2 << 10) From 56e3964d8589aee1b7f27e3ced5bd9a175327f0c Mon Sep 17 00:00:00 2001 From: Nick Terrell Date: Fri, 30 Jun 2017 16:29:37 -0700 Subject: [PATCH 13/14] [man] Specify that strategies start at 1 --- programs/zstd.1 | 6 +++++- programs/zstd.1.md | 4 ++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/programs/zstd.1 b/programs/zstd.1 index 7e2e99159..5df45db21 100644 --- a/programs/zstd.1 +++ b/programs/zstd.1 @@ -89,6 +89,10 @@ Benchmark file(s) using compression level # \fB\-\-train FILEs\fR Use FILEs as a training set to create a dictionary\. The training set should contain a lot of small files (> 100)\. . +.TP +\fB\-l\fR, \fB\-\-list\fR +Display information related to a zstd compressed file, such as size, ratio, and checksum\. Some of these fields may not be available\. This command can be augmented with the \fB\-v\fR modifier\. +. .SS "Operation modifiers" . .TP @@ -252,7 +256,7 @@ set process priority to real\-time Specify a strategy used by a match finder\. . .IP -There are 8 strategies numbered from 0 to 7, from faster to stronger: 0=ZSTD_fast, 1=ZSTD_dfast, 2=ZSTD_greedy, 3=ZSTD_lazy, 4=ZSTD_lazy2, 5=ZSTD_btlazy2, 6=ZSTD_btopt, 7=ZSTD_btultra\. +There are 8 strategies numbered from 1 to 8, from faster to stronger: 1=ZSTD_fast, 2=ZSTD_dfast, 3=ZSTD_greedy, 4=ZSTD_lazy, 5=ZSTD_lazy2, 6=ZSTD_btlazy2, 7=ZSTD_btopt, 8=ZSTD_btultra\. . .TP \fBwindowLog\fR=\fIwlog\fR, \fBwlog\fR=\fIwlog\fR diff --git a/programs/zstd.1.md b/programs/zstd.1.md index d68efe764..24e25a2f3 100644 --- a/programs/zstd.1.md +++ b/programs/zstd.1.md @@ -96,7 +96,7 @@ the last one takes effect. * `-l`, `--list`: Display information related to a zstd compressed file, such as size, ratio, and checksum. Some of these fields may not be available. - This command can be augmented with the `-v` modifier. + This command can be augmented with the `-v` modifier. ### Operation modifiers @@ -254,7 +254,7 @@ The list of available _options_: - `strategy`=_strat_, `strat`=_strat_: Specify a strategy used by a match finder. - There are 8 strategies numbered from 0 to 7, from faster to stronger: + There are 8 strategies numbered from 1 to 8, from faster to stronger: 1=ZSTD\_fast, 2=ZSTD\_dfast, 3=ZSTD\_greedy, 4=ZSTD\_lazy, 5=ZSTD\_lazy2, 6=ZSTD\_btlazy2, 7=ZSTD\_btopt, 8=ZSTD\_btultra. From 88481e40520b0c764b0f6513d0c650065e501733 Mon Sep 17 00:00:00 2001 From: Nick Terrell Date: Fri, 30 Jun 2017 16:31:11 -0700 Subject: [PATCH 14/14] [pzstd] Remove appveyor tests The appveyor tests sometimes hang, and pzstd is now deprecated in favor of zstdmt, so delete the tests. --- appveyor.yml | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/appveyor.yml b/appveyor.yml index 1f8b8cf8a..1815563e7 100644 --- a/appveyor.yml +++ b/appveyor.yml @@ -30,12 +30,6 @@ SCRIPT: "" TEST: "cmake" - - COMPILER: "gcc" - HOST: "mingw" - PLATFORM: "x64" - SCRIPT: "" - TEST: "pzstd" - - COMPILER: "visual" HOST: "visual" PLATFORM: "x64" @@ -157,13 +151,6 @@ cd ..\..\.. && make clean ) - - if [%TEST%]==[pzstd] ( - make -C contrib\pzstd googletest-mingw64 && - make -C contrib\pzstd pzstd.exe && - make -C contrib\pzstd tests && - make -C contrib\pzstd check && - make -C contrib\pzstd clean - ) - SET "FUZZERTEST=-T30s" - if [%HOST%]==[visual] if [%CONFIGURATION%]==[Release] ( CD tests &&