mirror of
https://github.com/facebook/zstd.git
synced 2025-10-08 00:04:02 -04:00
Merge pull request #3547 from facebook/seekable_doc
added documentation for the seekable format
This commit is contained in:
commit
488e45f38b
42
contrib/seekable_format/README.md
Normal file
42
contrib/seekable_format/README.md
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
# Zstandard Seekable Format
|
||||||
|
|
||||||
|
The seekable format splits compressed data into a series of independent "frames",
|
||||||
|
each compressed individually,
|
||||||
|
so that decompression of a section in the middle of an archive
|
||||||
|
only requires zstd to decompress at most a frame's worth of extra data,
|
||||||
|
instead of the entire archive.
|
||||||
|
|
||||||
|
The frames are appended, so that the decompression of the entire payload
|
||||||
|
still regenerates the original content, using any compliant zstd decoder.
|
||||||
|
|
||||||
|
On top of that, the seekable format generates a jump table,
|
||||||
|
which makes it possible to jump directly to the position of the relevant frame
|
||||||
|
when requesting only a segment of the data.
|
||||||
|
The jump table is simply ignored by zstd decoders unaware of the seekable format.
|
||||||
|
|
||||||
|
The format is delivered with an API to create seekable archives
|
||||||
|
and to retrieve arbitrary segments inside the archive.
|
||||||
|
|
||||||
|
### Maximum Frame Size parameter
|
||||||
|
|
||||||
|
When creating a seekable archive, the main parameter is the maximum frame size.
|
||||||
|
|
||||||
|
At compression time, user can manually select the boundaries between segments,
|
||||||
|
but they don't have to: long segments will be automatically split
|
||||||
|
when larger than selected maximum frame size.
|
||||||
|
|
||||||
|
Small frame sizes reduce decompression cost when requesting small segments,
|
||||||
|
because the decoder will nonetheless have to decompress an entire frame
|
||||||
|
to recover just a single byte from it.
|
||||||
|
|
||||||
|
A good rule of thumb is to select a maximum frame size roughly equivalent
|
||||||
|
to the access pattern when it's known.
|
||||||
|
For example, if the application tends to request 4KB blocks,
|
||||||
|
then it's a good idea to set a maximum frame size in the vicinity of 4 KB.
|
||||||
|
|
||||||
|
But small frame sizes also reduce compression ratio,
|
||||||
|
and increase the cost for the jump table,
|
||||||
|
so there is a balance to find.
|
||||||
|
|
||||||
|
In general, try to avoid really tiny frame sizes (<1 KB),
|
||||||
|
which would have a large negative impact on compression ratio.
|
@ -48,10 +48,19 @@ typedef struct ZSTD_seekTable_s ZSTD_seekTable;
|
|||||||
*
|
*
|
||||||
* Use ZSTD_seekable_initCStream() to initialize a ZSTD_seekable_CStream object
|
* Use ZSTD_seekable_initCStream() to initialize a ZSTD_seekable_CStream object
|
||||||
* for a new compression operation.
|
* for a new compression operation.
|
||||||
* `maxFrameSize` indicates the size at which to automatically start a new
|
* - `maxFrameSize` indicates the size at which to automatically start a new
|
||||||
* seekable frame. `maxFrameSize == 0` implies the default maximum size.
|
* seekable frame.
|
||||||
* `checksumFlag` indicates whether or not the seek table should include frame
|
* `maxFrameSize == 0` implies the default maximum size.
|
||||||
* checksums on the uncompressed data for verification.
|
* Smaller frame sizes allow faster decompression of small segments,
|
||||||
|
* since retrieving a single byte requires decompression of
|
||||||
|
* the full frame where the byte belongs.
|
||||||
|
* In general, size the frames to roughly correspond to
|
||||||
|
* the access granularity (when it's known).
|
||||||
|
* But small sizes also reduce compression ratio.
|
||||||
|
* Avoid really tiny frame sizes (< 1 KB),
|
||||||
|
* that would hurt compression ratio considerably.
|
||||||
|
* - `checksumFlag` indicates whether or not the seek table should include frame
|
||||||
|
* checksums on the uncompressed data for verification.
|
||||||
* @return : a size hint for input to provide for compression, or an error code
|
* @return : a size hint for input to provide for compression, or an error code
|
||||||
* checkable with ZSTD_isError()
|
* checkable with ZSTD_isError()
|
||||||
*
|
*
|
||||||
|
Loading…
x
Reference in New Issue
Block a user