Clarified API comments, from suggestions by ‎Bryan O'Sullivan‎

This commit is contained in:
Yann Collet 2016-07-28 03:47:45 +02:00
parent 19c1002e46
commit d469a98c01
2 changed files with 63 additions and 55 deletions

View File

@ -38,43 +38,28 @@
extern "C" { extern "C" {
#endif #endif
/*-*************************************
* Public functions
***************************************/
/*! ZDICT_trainFromBuffer() : /*! ZDICT_trainFromBuffer() :
Train a dictionary from a memory buffer `samplesBuffer`, Train a dictionary from an array of samples.
where `nbSamples` samples have been stored concatenated. Samples must be stored concatenated in a single flat buffer `samplesBuffer`,
Each sample size is provided into an orderly table `samplesSizes`. supplied with an array of sizes `samplesSizes`, providing the size of each sample, in order.
Resulting dictionary will be saved into `dictBuffer`. The resulting dictionary will be saved into `dictBuffer`.
@return : size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`) @return : size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
or an error code, which can be tested by ZDICT_isError(). or an error code, which can be tested with ZDICT_isError().
Tips : In general, a reasonable dictionary has a size of ~ 100 KB.
It's obviously possible to target smaller or larger ones, just by specifying different `dictBufferCapacity`.
In general, it's recommended to provide a few thousands samples, but this can vary a lot.
It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
*/ */
size_t ZDICT_trainFromBuffer(void* dictBuffer, size_t dictBufferCapacity, size_t ZDICT_trainFromBuffer(void* dictBuffer, size_t dictBufferCapacity,
const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples); const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples);
/*! ZDICT_addEntropyTablesFromBuffer() :
Given a content-only dictionary (built for example from common strings in
the input), add entropy tables computed from the memory buffer
`samplesBuffer`, where `nbSamples` samples have been stored concatenated.
Each sample size is provided into an orderly table `samplesSizes`.
The input dictionary is the last `dictContentSize` bytes of `dictBuffer`. The
resulting dictionary with added entropy tables will written back to
`dictBuffer`.
@return : size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`).
*/
size_t ZDICT_addEntropyTablesFromBuffer(void* dictBuffer, size_t dictContentSize, size_t dictBufferCapacity,
const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples);
/*-************************************* /*====== Helper functions ======*/
* Helper functions
***************************************/
unsigned ZDICT_isError(size_t errorCode); unsigned ZDICT_isError(size_t errorCode);
const char* ZDICT_getErrorName(size_t errorCode); const char* ZDICT_getErrorName(size_t errorCode);
#ifdef ZDICT_STATIC_LINKING_ONLY #ifdef ZDICT_STATIC_LINKING_ONLY
/* ==================================================================================== /* ====================================================================================
@ -85,7 +70,7 @@ const char* ZDICT_getErrorName(size_t errorCode);
* ==================================================================================== */ * ==================================================================================== */
typedef struct { typedef struct {
unsigned selectivityLevel; /* 0 means default; larger => bigger selection => larger dictionary */ unsigned selectivityLevel; /* 0 means default; larger => select more => larger dictionary */
int compressionLevel; /* 0 means default; target a specific zstd compression level */ int compressionLevel; /* 0 means default; target a specific zstd compression level */
unsigned notificationLevel; /* Write to stderr; 0 = none (default); 1 = errors; 2 = progression; 3 = details; 4 = debug; */ unsigned notificationLevel; /* Write to stderr; 0 = none (default); 1 = errors; 2 = progression; 3 = details; 4 = debug; */
unsigned dictID; /* 0 means auto mode (32-bits random value); other : force dictID value */ unsigned dictID; /* 0 means auto mode (32-bits random value); other : force dictID value */
@ -96,13 +81,32 @@ typedef struct {
/*! ZDICT_trainFromBuffer_advanced() : /*! ZDICT_trainFromBuffer_advanced() :
Same as ZDICT_trainFromBuffer() with control over more parameters. Same as ZDICT_trainFromBuffer() with control over more parameters.
`parameters` is optional and can be provided with values set to 0 to mean "default". `parameters` is optional and can be provided with values set to 0 to mean "default".
@return : size of dictionary stored into `dictBuffer` (<= `dictBufferSize`) @return : size of dictionary stored into `dictBuffer` (<= `dictBufferSize`),
or an error code, which can be tested by ZDICT_isError(). or an error code, which can be tested by ZDICT_isError().
note : ZDICT_trainFromBuffer_advanced() will send notifications into stderr if instructed to, using ZDICT_setNotificationLevel() note : ZDICT_trainFromBuffer_advanced() will send notifications into stderr if instructed to, using notificationLevel>0.
*/ */
size_t ZDICT_trainFromBuffer_advanced(void* dictBuffer, size_t dictBufferCapacity, size_t ZDICT_trainFromBuffer_advanced(void* dictBuffer, size_t dictBufferCapacity,
const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples, const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
ZDICT_params_t parameters); ZDICT_params_t parameters);
/*! ZDICT_addEntropyTablesFromBuffer() :
Given a content-only dictionary (built using any 3rd party algorithm),
add entropy tables computed from an array of samples.
Samples must be stored concatenated in a flat buffer `samplesBuffer`,
supplied with an array of sizes `samplesSizes`, providing the size of each sample in order.
The input dictionary content must be stored *at the end* of `dictBuffer`.
Its size is `dictContentSize`.
The resulting dictionary with added entropy tables will be *written back to `dictBuffer`*,
starting from its beginning.
@return : size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`).
*/
size_t ZDICT_addEntropyTablesFromBuffer(void* dictBuffer, size_t dictContentSize, size_t dictBufferCapacity,
const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples);
#endif /* ZDICT_STATIC_LINKING_ONLY */ #endif /* ZDICT_STATIC_LINKING_ONLY */

View File

@ -80,13 +80,13 @@ ZSTDLIB_API size_t ZSTD_compress( void* dst, size_t dstCapacity,
/*! ZSTD_getDecompressedSize() : /*! ZSTD_getDecompressedSize() :
* @return : decompressed size if known, 0 otherwise. * @return : decompressed size if known, 0 otherwise.
note 1 : if `0`, follow up with ZSTD_getFrameParams() to know precise failure cause. note 1 : decompressed size could be wrong or intentionally modified !
note 2 : decompressed size could be wrong or intentionally modified ! always ensure results fit within application's authorized limits !
always ensure results fit within application's authorized limits */ note 2 : when `0`, if precise failure cause is needed, use ZSTD_getFrameParams() to know more. */
unsigned long long ZSTD_getDecompressedSize(const void* src, size_t srcSize); unsigned long long ZSTD_getDecompressedSize(const void* src, size_t srcSize);
/*! ZSTD_decompress() : /*! ZSTD_decompress() :
`compressedSize` : must be _exact_ size of compressed input, otherwise decompression will fail. `compressedSize` : must be the _exact_ size of compressed input, otherwise decompression will fail.
`dstCapacity` must be equal or larger than originalSize. `dstCapacity` must be equal or larger than originalSize.
@return : the number of bytes decompressed into `dst` (<= `dstCapacity`), @return : the number of bytes decompressed into `dst` (<= `dstCapacity`),
or an errorCode if it fails (which can be tested using ZSTD_isError()) */ or an errorCode if it fails (which can be tested using ZSTD_isError()) */
@ -107,16 +107,16 @@ ZSTDLIB_API const char* ZSTD_getErrorName(size_t code); /*!< provides readab
/** Compression context */ /** Compression context */
typedef struct ZSTD_CCtx_s ZSTD_CCtx; /*< incomplete type */ typedef struct ZSTD_CCtx_s ZSTD_CCtx; /*< incomplete type */
ZSTDLIB_API ZSTD_CCtx* ZSTD_createCCtx(void); ZSTDLIB_API ZSTD_CCtx* ZSTD_createCCtx(void);
ZSTDLIB_API size_t ZSTD_freeCCtx(ZSTD_CCtx* cctx); /*!< @return : errorCode */ ZSTDLIB_API size_t ZSTD_freeCCtx(ZSTD_CCtx* cctx);
/** ZSTD_compressCCtx() : /** ZSTD_compressCCtx() :
Same as ZSTD_compress(), requires an allocated ZSTD_CCtx (see ZSTD_createCCtx()) */ Same as ZSTD_compress(), requires an allocated ZSTD_CCtx (see ZSTD_createCCtx()) */
ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* ctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize, int compressionLevel); ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* ctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize, int compressionLevel);
/** Decompression context */ /** Decompression context */
typedef struct ZSTD_DCtx_s ZSTD_DCtx; typedef struct ZSTD_DCtx_s ZSTD_DCtx; /*< incomplete type */
ZSTDLIB_API ZSTD_DCtx* ZSTD_createDCtx(void); ZSTDLIB_API ZSTD_DCtx* ZSTD_createDCtx(void);
ZSTDLIB_API size_t ZSTD_freeDCtx(ZSTD_DCtx* dctx); /*!< @return : errorCode */ ZSTDLIB_API size_t ZSTD_freeDCtx(ZSTD_DCtx* dctx);
/** ZSTD_decompressDCtx() : /** ZSTD_decompressDCtx() :
* Same as ZSTD_decompress(), requires an allocated ZSTD_DCtx (see ZSTD_createDCtx()) */ * Same as ZSTD_decompress(), requires an allocated ZSTD_DCtx (see ZSTD_createDCtx()) */
@ -127,7 +127,7 @@ ZSTDLIB_API size_t ZSTD_decompressDCtx(ZSTD_DCtx* ctx, void* dst, size_t dstCapa
* Simple dictionary API * Simple dictionary API
***************************/ ***************************/
/*! ZSTD_compress_usingDict() : /*! ZSTD_compress_usingDict() :
* Compression using a pre-defined Dictionary content (see dictBuilder). * Compression using a predefined Dictionary (see dictBuilder/zdict.h).
* Note : This function load the dictionary, resulting in a significant startup time. */ * Note : This function load the dictionary, resulting in a significant startup time. */
ZSTDLIB_API size_t ZSTD_compress_usingDict(ZSTD_CCtx* ctx, ZSTDLIB_API size_t ZSTD_compress_usingDict(ZSTD_CCtx* ctx,
void* dst, size_t dstCapacity, void* dst, size_t dstCapacity,
@ -136,7 +136,7 @@ ZSTDLIB_API size_t ZSTD_compress_usingDict(ZSTD_CCtx* ctx,
int compressionLevel); int compressionLevel);
/*! ZSTD_decompress_usingDict() : /*! ZSTD_decompress_usingDict() :
* Decompression using a pre-defined Dictionary content (see dictBuilder). * Decompression using a predefined Dictionary (see dictBuilder/zdict.h).
* Dictionary must be identical to the one used during compression. * Dictionary must be identical to the one used during compression.
* Note : This function load the dictionary, resulting in a significant startup time */ * Note : This function load the dictionary, resulting in a significant startup time */
ZSTDLIB_API size_t ZSTD_decompress_usingDict(ZSTD_DCtx* dctx, ZSTDLIB_API size_t ZSTD_decompress_usingDict(ZSTD_DCtx* dctx,
@ -146,7 +146,7 @@ ZSTDLIB_API size_t ZSTD_decompress_usingDict(ZSTD_DCtx* dctx,
/*-************************** /*-**************************
* Advanced Dictionary API * Fast Dictionary API
****************************/ ****************************/
/*! ZSTD_createCDict() : /*! ZSTD_createCDict() :
* Create a digested dictionary, ready to start compression operation without startup delay. * Create a digested dictionary, ready to start compression operation without startup delay.
@ -156,7 +156,7 @@ ZSTDLIB_API ZSTD_CDict* ZSTD_createCDict(const void* dict, size_t dictSize, int
ZSTDLIB_API size_t ZSTD_freeCDict(ZSTD_CDict* CDict); ZSTDLIB_API size_t ZSTD_freeCDict(ZSTD_CDict* CDict);
/*! ZSTD_compress_usingCDict() : /*! ZSTD_compress_usingCDict() :
* Compression using a pre-digested Dictionary. * Compression using a digested Dictionary.
* Faster startup than ZSTD_compress_usingDict(), recommended when same dictionary is used multiple times. * Faster startup than ZSTD_compress_usingDict(), recommended when same dictionary is used multiple times.
* Note that compression level is decided during dictionary creation */ * Note that compression level is decided during dictionary creation */
ZSTDLIB_API size_t ZSTD_compress_usingCDict(ZSTD_CCtx* cctx, ZSTDLIB_API size_t ZSTD_compress_usingCDict(ZSTD_CCtx* cctx,
@ -172,7 +172,7 @@ ZSTDLIB_API ZSTD_DDict* ZSTD_createDDict(const void* dict, size_t dictSize);
ZSTDLIB_API size_t ZSTD_freeDDict(ZSTD_DDict* ddict); ZSTDLIB_API size_t ZSTD_freeDDict(ZSTD_DDict* ddict);
/*! ZSTD_decompress_usingDDict() : /*! ZSTD_decompress_usingDDict() :
* Decompression using a pre-digested Dictionary * Decompression using a digested Dictionary
* Faster startup than ZSTD_decompress_usingDict(), recommended when same dictionary is used multiple times. */ * Faster startup than ZSTD_decompress_usingDict(), recommended when same dictionary is used multiple times. */
ZSTDLIB_API size_t ZSTD_decompress_usingDDict(ZSTD_DCtx* dctx, ZSTDLIB_API size_t ZSTD_decompress_usingDDict(ZSTD_DCtx* dctx,
void* dst, size_t dstCapacity, void* dst, size_t dstCapacity,
@ -312,6 +312,10 @@ ZSTDLIB_API size_t ZSTD_sizeofDCtx(const ZSTD_DCtx* dctx);
/* ****************************************************************** /* ******************************************************************
* Buffer-less streaming functions (synchronous mode) * Buffer-less streaming functions (synchronous mode)
********************************************************************/ ********************************************************************/
/* This is an advanced API, giving full control over buffer management, for users which need direct control over memory.
* But it's also a complex one, with a lot of restrictions (documented below).
* For an easier streaming API, look into common/zbuff.h
* which removes all restrictions by implementing and managing its own internal buffer */
ZSTDLIB_API size_t ZSTD_compressBegin(ZSTD_CCtx* cctx, int compressionLevel); ZSTDLIB_API size_t ZSTD_compressBegin(ZSTD_CCtx* cctx, int compressionLevel);
ZSTDLIB_API size_t ZSTD_compressBegin_usingDict(ZSTD_CCtx* cctx, const void* dict, size_t dictSize, int compressionLevel); ZSTDLIB_API size_t ZSTD_compressBegin_usingDict(ZSTD_CCtx* cctx, const void* dict, size_t dictSize, int compressionLevel);
ZSTDLIB_API size_t ZSTD_compressBegin_advanced(ZSTD_CCtx* cctx, const void* dict, size_t dictSize, ZSTD_parameters params, unsigned long long pledgedSrcSize); ZSTDLIB_API size_t ZSTD_compressBegin_advanced(ZSTD_CCtx* cctx, const void* dict, size_t dictSize, ZSTD_parameters params, unsigned long long pledgedSrcSize);
@ -373,16 +377,16 @@ ZSTDLIB_API size_t ZSTD_decompressContinue(ZSTD_DCtx* dctx, void* dst, size_t ds
Use ZSTD_createDCtx() / ZSTD_freeDCtx() to manage it. Use ZSTD_createDCtx() / ZSTD_freeDCtx() to manage it.
A ZSTD_DCtx object can be re-used multiple times. A ZSTD_DCtx object can be re-used multiple times.
First optional operation is to retrieve frame parameters, using ZSTD_getFrameParams(), which doesn't consume the input. First optional operation is to retrieve frame parameters, using ZSTD_getFrameParams().
It can provide the minimum size of rolling buffer required to properly decompress data (`windowSize`), ZSTD_getFrameParams() fills a ZSTD_frameParams structure,
which can provide the minimum size of rolling buffer required to decompress data (`windowSize`),
and optionally the final size of uncompressed content. and optionally the final size of uncompressed content.
(Note : content size is an optional info that may not be present. 0 means : content size unknown) (Note : content size is an optional info that may not be present. 0 means : content size unknown).
Frame parameters are extracted from the beginning of compressed frame. These information are extracted from the beginning of the compressed frame.
The amount of data to read is variable, from ZSTD_frameHeaderSize_min to ZSTD_frameHeaderSize_max (so if `srcSize` >= ZSTD_frameHeaderSize_max, it will always work) Size of data fragment must be large enough to ensure successful decoding, typically ZSTD_frameHeaderSize_max bytes.
If `srcSize` is too small for operation to succeed, function will return the minimum size it requires to produce a result. @result : 0 : successful decoding, it means the ZSTD_frameParams structure is correctly filled.
Result : 0 when successful, it means the ZSTD_frameParams structure has been filled. >0 : `srcSize` is too small, please provide at least @result bytes on next try.
>0 : means there is not enough data into `src`. Provides the expected size to successfully decode header. errorCode, which can be tested using ZSTD_isError().
errorCode, which can be tested using ZSTD_isError()
Start decompression, with ZSTD_decompressBegin() or ZSTD_decompressBegin_usingDict(). Start decompression, with ZSTD_decompressBegin() or ZSTD_decompressBegin_usingDict().
Alternatively, you can copy a prepared context, using ZSTD_copyDCtx(). Alternatively, you can copy a prepared context, using ZSTD_copyDCtx().
@ -399,7 +403,7 @@ ZSTDLIB_API size_t ZSTD_decompressContinue(ZSTD_DCtx* dctx, void* dst, size_t ds
Alternatively, a round buffer of sufficient size is also possible. Sufficient size is determined by frame parameters. Alternatively, a round buffer of sufficient size is also possible. Sufficient size is determined by frame parameters.
ZSTD_decompressContinue() is very sensitive to contiguity, ZSTD_decompressContinue() is very sensitive to contiguity,
if 2 blocks don't follow each other, make sure that either the compressor breaks contiguity at the same place, if 2 blocks don't follow each other, make sure that either the compressor breaks contiguity at the same place,
or that previous contiguous segment is large enough to properly handle maximum back-reference. or that previous contiguous segment is large enough to properly handle maximum back-reference.
A frame is fully decoded when ZSTD_nextSrcSizeToDecompress() returns zero. A frame is fully decoded when ZSTD_nextSrcSizeToDecompress() returns zero.
Context can then be reset to start a new decompression. Context can then be reset to start a new decompression.
@ -407,7 +411,7 @@ ZSTDLIB_API size_t ZSTD_decompressContinue(ZSTD_DCtx* dctx, void* dst, size_t ds
== Special case : skippable frames == == Special case : skippable frames ==
Skippable frames allow the integration of user-defined data into a flow of concatenated frames. Skippable frames allow integration of user-defined data into a flow of concatenated frames.
Skippable frames will be ignored (skipped) by a decompressor. The format of skippable frames is as follows : Skippable frames will be ignored (skipped) by a decompressor. The format of skippable frames is as follows :
a) Skippable frame ID - 4 Bytes, Little endian format, any value from 0x184D2A50 to 0x184D2A5F a) Skippable frame ID - 4 Bytes, Little endian format, any value from 0x184D2A50 to 0x184D2A5F
b) Frame Size - 4 Bytes, Little endian format, unsigned 32-bits b) Frame Size - 4 Bytes, Little endian format, unsigned 32-bits