Merge pull request #1338 from facebook/rfc

updated format documentation
This commit is contained in:
Yann Collet 2018-09-26 13:40:35 -07:00 committed by GitHub
commit 3dae90c4e0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -16,7 +16,7 @@ Distribution of this document is unlimited.
### Version ### Version
0.2.9 (05/09/18) 0.3.0 (25/09/18)
Introduction Introduction
@ -72,7 +72,7 @@ A frame is completely independent, has a defined beginning and end,
and a set of parameters which tells the decoder how to decompress it. and a set of parameters which tells the decoder how to decompress it.
A frame encapsulates one or multiple __blocks__. A frame encapsulates one or multiple __blocks__.
Each block can be compressed or not, Each block contains arbitrary content, which is described by its header,
and has a guaranteed maximum content size, which depends on frame parameters. and has a guaranteed maximum content size, which depends on frame parameters.
Unlike frames, each block depends on previous blocks for proper decoding. Unlike frames, each block depends on previous blocks for proper decoding.
However, each block can be decompressed without waiting for its successor, However, each block can be decompressed without waiting for its successor,
@ -591,7 +591,7 @@ It is the number of bytes to be copied (or extracted) from the Literals Section.
A match copy command specifies an offset and a length. A match copy command specifies an offset and a length.
When all _sequences_ are decoded, When all _sequences_ are decoded,
if there are literals left in the _literal section_, if there are literals left in the _literals section_,
these bytes are added at the end of the block. these bytes are added at the end of the block.
This is described in more detail in [Sequence Execution](#sequence-execution). This is described in more detail in [Sequence Execution](#sequence-execution).
@ -608,7 +608,7 @@ followed by the bitstream.
| -------------------------- | ------------------------- | ---------------- | ---------------------- | --------- | | -------------------------- | ------------------------- | ---------------- | ---------------------- | --------- |
To decode the `Sequences_Section`, it's required to know its size. To decode the `Sequences_Section`, it's required to know its size.
This size is deduced from the literals section size: Its size is deduced from the size of `Literals_Section`:
`Sequences_Section_Size = Block_Size - Literals_Section_Size`. `Sequences_Section_Size = Block_Size - Literals_Section_Size`.
@ -805,7 +805,7 @@ one and ending with the first.
##### Decoding a sequence ##### Decoding a sequence
For each of the symbol types, the FSE state can be used to determine the appropriate code. For each of the symbol types, the FSE state can be used to determine the appropriate code.
The code then defines the baseline and number of bits to read for each type. The code then defines the `Baseline` and `Number_of_Bits` to read for each type.
See the [description of the codes] for how to determine these values. See the [description of the codes] for how to determine these values.
[description of the codes]: #the-codes-for-literals-lengths-match-lengths-and-offsets [description of the codes]: #the-codes-for-literals-lengths-match-lengths-and-offsets
@ -872,8 +872,8 @@ they are combined to produce the decoded content of a block.
Each sequence consists of a tuple of (`literals_length`, `offset_value`, `match_length`), Each sequence consists of a tuple of (`literals_length`, `offset_value`, `match_length`),
decoded as described in the [Sequences Section](#sequences-section). decoded as described in the [Sequences Section](#sequences-section).
To execute a sequence, first copy `literals_length` bytes from the literals section To execute a sequence, first copy `literals_length` bytes
to the output. from the decoded literals to the output.
Then `match_length` bytes are copied from previous decoded data. Then `match_length` bytes are copied from previous decoded data.
The offset to copy from is determined by `offset_value`: The offset to copy from is determined by `offset_value`:
@ -1219,8 +1219,8 @@ It gives the following series of weights :
The decoder will do the inverse operation : The decoder will do the inverse operation :
having collected weights of literal symbols from `0` to `4`, having collected weights of literal symbols from `0` to `4`,
it knows the last literal, `5`, is present with a non-zero weight. it knows the last literal, `5`, is present with a non-zero `Weight`.
The weight of `5` can be determined by advancing to the next power of 2. The `Weight` of `5` can be determined by advancing to the next power of 2.
The sum of `2^(Weight-1)` (excluding 0's) is : The sum of `2^(Weight-1)` (excluding 0's) is :
`8 + 4 + 2 + 0 + 1 = 15`. `8 + 4 + 2 + 0 + 1 = 15`.
Nearest larger power of 2 value is 16. Nearest larger power of 2 value is 16.
@ -1265,7 +1265,7 @@ To decode an FSE bitstream, it is necessary to know its compressed size.
Compressed size is provided by `headerByte`. Compressed size is provided by `headerByte`.
It's also necessary to know its _maximum possible_ decompressed size, It's also necessary to know its _maximum possible_ decompressed size,
which is `255`, since literal values span from `0` to `255`, which is `255`, since literal values span from `0` to `255`,
and last symbol's weight is not represented. and last symbol's `Weight` is not represented.
An FSE bitstream starts by a header, describing probabilities distribution. An FSE bitstream starts by a header, describing probabilities distribution.
It will create a Decoding Table. It will create a Decoding Table.
@ -1275,7 +1275,7 @@ For more description see the [FSE header description](#fse-table-description)
The Huffman header compression uses 2 states, The Huffman header compression uses 2 states,
which share the same FSE distribution table. which share the same FSE distribution table.
The first state (`State1`) encodes the even indexed symbols, The first state (`State1`) encodes the even indexed symbols,
and the second (`State2`) encodes the odd indexes. and the second (`State2`) encodes the odd indexed symbols.
`State1` is initialized first, and then `State2`, and they take turns `State1` is initialized first, and then `State2`, and they take turns
decoding a single symbol and updating their state. decoding a single symbol and updating their state.
For more details on these FSE operations, see the [FSE section](#fse). For more details on these FSE operations, see the [FSE section](#fse).
@ -1296,7 +1296,7 @@ Number_of_Bits = (Weight>0) ? Max_Number_of_Bits + 1 - Weight : 0
Symbols are sorted by `Weight`. Symbols are sorted by `Weight`.
Within same `Weight`, symbols keep natural sequential order. Within same `Weight`, symbols keep natural sequential order.
Symbols with a `Weight` of zero are removed. Symbols with a `Weight` of zero are removed.
Then, starting from lowest weight, prefix codes are distributed in sequential order. Then, starting from lowest `Weight`, prefix codes are distributed in sequential order.
__Example__ : __Example__ :
Let's presume the following list of weights has been decoded : Let's presume the following list of weights has been decoded :
@ -1323,7 +1323,7 @@ Each bitstream must be read _backward_,
that is starting from the end down to the beginning. that is starting from the end down to the beginning.
Therefore it's necessary to know the size of each bitstream. Therefore it's necessary to know the size of each bitstream.
It's also necessary to know exactly which _bit_ is the latest. It's also necessary to know exactly which _bit_ is the last one.
This is detected by a final bit flag : This is detected by a final bit flag :
the highest bit of latest byte is a final-bit-flag. the highest bit of latest byte is a final-bit-flag.
Consequently, a last byte of `0` is not possible. Consequently, a last byte of `0` is not possible.
@ -1629,6 +1629,7 @@ or at least provide a meaningful error code explaining for which reason it canno
Version changes Version changes
--------------- ---------------
- 0.3.0 : minor edits to match RFC8478
- 0.2.9 : clarifications for huffman weights direct representation, by Ulrich Kunitz - 0.2.9 : clarifications for huffman weights direct representation, by Ulrich Kunitz
- 0.2.8 : clarifications for IETF RFC discuss - 0.2.8 : clarifications for IETF RFC discuss
- 0.2.7 : clarifications from IETF RFC review, by Vijay Gurbani and Nick Terrell - 0.2.7 : clarifications from IETF RFC review, by Vijay Gurbani and Nick Terrell