mirror of
https://github.com/facebook/zstd.git
synced 2025-10-19 00:05:29 -04:00
Merge pull request #1338 from facebook/rfc
updated format documentation
This commit is contained in:
commit
3dae90c4e0
@ -16,7 +16,7 @@ Distribution of this document is unlimited.
|
|||||||
|
|
||||||
### Version
|
### Version
|
||||||
|
|
||||||
0.2.9 (05/09/18)
|
0.3.0 (25/09/18)
|
||||||
|
|
||||||
|
|
||||||
Introduction
|
Introduction
|
||||||
@ -72,7 +72,7 @@ A frame is completely independent, has a defined beginning and end,
|
|||||||
and a set of parameters which tells the decoder how to decompress it.
|
and a set of parameters which tells the decoder how to decompress it.
|
||||||
|
|
||||||
A frame encapsulates one or multiple __blocks__.
|
A frame encapsulates one or multiple __blocks__.
|
||||||
Each block can be compressed or not,
|
Each block contains arbitrary content, which is described by its header,
|
||||||
and has a guaranteed maximum content size, which depends on frame parameters.
|
and has a guaranteed maximum content size, which depends on frame parameters.
|
||||||
Unlike frames, each block depends on previous blocks for proper decoding.
|
Unlike frames, each block depends on previous blocks for proper decoding.
|
||||||
However, each block can be decompressed without waiting for its successor,
|
However, each block can be decompressed without waiting for its successor,
|
||||||
@ -591,7 +591,7 @@ It is the number of bytes to be copied (or extracted) from the Literals Section.
|
|||||||
A match copy command specifies an offset and a length.
|
A match copy command specifies an offset and a length.
|
||||||
|
|
||||||
When all _sequences_ are decoded,
|
When all _sequences_ are decoded,
|
||||||
if there are literals left in the _literal section_,
|
if there are literals left in the _literals section_,
|
||||||
these bytes are added at the end of the block.
|
these bytes are added at the end of the block.
|
||||||
|
|
||||||
This is described in more detail in [Sequence Execution](#sequence-execution).
|
This is described in more detail in [Sequence Execution](#sequence-execution).
|
||||||
@ -608,7 +608,7 @@ followed by the bitstream.
|
|||||||
| -------------------------- | ------------------------- | ---------------- | ---------------------- | --------- |
|
| -------------------------- | ------------------------- | ---------------- | ---------------------- | --------- |
|
||||||
|
|
||||||
To decode the `Sequences_Section`, it's required to know its size.
|
To decode the `Sequences_Section`, it's required to know its size.
|
||||||
This size is deduced from the literals section size:
|
Its size is deduced from the size of `Literals_Section`:
|
||||||
`Sequences_Section_Size = Block_Size - Literals_Section_Size`.
|
`Sequences_Section_Size = Block_Size - Literals_Section_Size`.
|
||||||
|
|
||||||
|
|
||||||
@ -805,7 +805,7 @@ one and ending with the first.
|
|||||||
|
|
||||||
##### Decoding a sequence
|
##### Decoding a sequence
|
||||||
For each of the symbol types, the FSE state can be used to determine the appropriate code.
|
For each of the symbol types, the FSE state can be used to determine the appropriate code.
|
||||||
The code then defines the baseline and number of bits to read for each type.
|
The code then defines the `Baseline` and `Number_of_Bits` to read for each type.
|
||||||
See the [description of the codes] for how to determine these values.
|
See the [description of the codes] for how to determine these values.
|
||||||
|
|
||||||
[description of the codes]: #the-codes-for-literals-lengths-match-lengths-and-offsets
|
[description of the codes]: #the-codes-for-literals-lengths-match-lengths-and-offsets
|
||||||
@ -872,8 +872,8 @@ they are combined to produce the decoded content of a block.
|
|||||||
|
|
||||||
Each sequence consists of a tuple of (`literals_length`, `offset_value`, `match_length`),
|
Each sequence consists of a tuple of (`literals_length`, `offset_value`, `match_length`),
|
||||||
decoded as described in the [Sequences Section](#sequences-section).
|
decoded as described in the [Sequences Section](#sequences-section).
|
||||||
To execute a sequence, first copy `literals_length` bytes from the literals section
|
To execute a sequence, first copy `literals_length` bytes
|
||||||
to the output.
|
from the decoded literals to the output.
|
||||||
|
|
||||||
Then `match_length` bytes are copied from previous decoded data.
|
Then `match_length` bytes are copied from previous decoded data.
|
||||||
The offset to copy from is determined by `offset_value`:
|
The offset to copy from is determined by `offset_value`:
|
||||||
@ -1219,8 +1219,8 @@ It gives the following series of weights :
|
|||||||
|
|
||||||
The decoder will do the inverse operation :
|
The decoder will do the inverse operation :
|
||||||
having collected weights of literal symbols from `0` to `4`,
|
having collected weights of literal symbols from `0` to `4`,
|
||||||
it knows the last literal, `5`, is present with a non-zero weight.
|
it knows the last literal, `5`, is present with a non-zero `Weight`.
|
||||||
The weight of `5` can be determined by advancing to the next power of 2.
|
The `Weight` of `5` can be determined by advancing to the next power of 2.
|
||||||
The sum of `2^(Weight-1)` (excluding 0's) is :
|
The sum of `2^(Weight-1)` (excluding 0's) is :
|
||||||
`8 + 4 + 2 + 0 + 1 = 15`.
|
`8 + 4 + 2 + 0 + 1 = 15`.
|
||||||
Nearest larger power of 2 value is 16.
|
Nearest larger power of 2 value is 16.
|
||||||
@ -1265,7 +1265,7 @@ To decode an FSE bitstream, it is necessary to know its compressed size.
|
|||||||
Compressed size is provided by `headerByte`.
|
Compressed size is provided by `headerByte`.
|
||||||
It's also necessary to know its _maximum possible_ decompressed size,
|
It's also necessary to know its _maximum possible_ decompressed size,
|
||||||
which is `255`, since literal values span from `0` to `255`,
|
which is `255`, since literal values span from `0` to `255`,
|
||||||
and last symbol's weight is not represented.
|
and last symbol's `Weight` is not represented.
|
||||||
|
|
||||||
An FSE bitstream starts by a header, describing probabilities distribution.
|
An FSE bitstream starts by a header, describing probabilities distribution.
|
||||||
It will create a Decoding Table.
|
It will create a Decoding Table.
|
||||||
@ -1275,7 +1275,7 @@ For more description see the [FSE header description](#fse-table-description)
|
|||||||
The Huffman header compression uses 2 states,
|
The Huffman header compression uses 2 states,
|
||||||
which share the same FSE distribution table.
|
which share the same FSE distribution table.
|
||||||
The first state (`State1`) encodes the even indexed symbols,
|
The first state (`State1`) encodes the even indexed symbols,
|
||||||
and the second (`State2`) encodes the odd indexes.
|
and the second (`State2`) encodes the odd indexed symbols.
|
||||||
`State1` is initialized first, and then `State2`, and they take turns
|
`State1` is initialized first, and then `State2`, and they take turns
|
||||||
decoding a single symbol and updating their state.
|
decoding a single symbol and updating their state.
|
||||||
For more details on these FSE operations, see the [FSE section](#fse).
|
For more details on these FSE operations, see the [FSE section](#fse).
|
||||||
@ -1296,7 +1296,7 @@ Number_of_Bits = (Weight>0) ? Max_Number_of_Bits + 1 - Weight : 0
|
|||||||
Symbols are sorted by `Weight`.
|
Symbols are sorted by `Weight`.
|
||||||
Within same `Weight`, symbols keep natural sequential order.
|
Within same `Weight`, symbols keep natural sequential order.
|
||||||
Symbols with a `Weight` of zero are removed.
|
Symbols with a `Weight` of zero are removed.
|
||||||
Then, starting from lowest weight, prefix codes are distributed in sequential order.
|
Then, starting from lowest `Weight`, prefix codes are distributed in sequential order.
|
||||||
|
|
||||||
__Example__ :
|
__Example__ :
|
||||||
Let's presume the following list of weights has been decoded :
|
Let's presume the following list of weights has been decoded :
|
||||||
@ -1323,7 +1323,7 @@ Each bitstream must be read _backward_,
|
|||||||
that is starting from the end down to the beginning.
|
that is starting from the end down to the beginning.
|
||||||
Therefore it's necessary to know the size of each bitstream.
|
Therefore it's necessary to know the size of each bitstream.
|
||||||
|
|
||||||
It's also necessary to know exactly which _bit_ is the latest.
|
It's also necessary to know exactly which _bit_ is the last one.
|
||||||
This is detected by a final bit flag :
|
This is detected by a final bit flag :
|
||||||
the highest bit of latest byte is a final-bit-flag.
|
the highest bit of latest byte is a final-bit-flag.
|
||||||
Consequently, a last byte of `0` is not possible.
|
Consequently, a last byte of `0` is not possible.
|
||||||
@ -1629,6 +1629,7 @@ or at least provide a meaningful error code explaining for which reason it canno
|
|||||||
|
|
||||||
Version changes
|
Version changes
|
||||||
---------------
|
---------------
|
||||||
|
- 0.3.0 : minor edits to match RFC8478
|
||||||
- 0.2.9 : clarifications for huffman weights direct representation, by Ulrich Kunitz
|
- 0.2.9 : clarifications for huffman weights direct representation, by Ulrich Kunitz
|
||||||
- 0.2.8 : clarifications for IETF RFC discuss
|
- 0.2.8 : clarifications for IETF RFC discuss
|
||||||
- 0.2.7 : clarifications from IETF RFC review, by Vijay Gurbani and Nick Terrell
|
- 0.2.7 : clarifications from IETF RFC review, by Vijay Gurbani and Nick Terrell
|
||||||
|
Loading…
x
Reference in New Issue
Block a user