mirror of
https://github.com/facebook/zstd.git
synced 2025-11-27 00:05:09 -05:00
clarify zstd specification for Huffman blocks
Following detailed comments from @dweiller in #3508.
This commit is contained in:
parent
4ebaf36582
commit
832f559b0b
@ -16,7 +16,7 @@ Distribution of this document is unlimited.
|
||||
|
||||
### Version
|
||||
|
||||
0.3.7 (2020-12-09)
|
||||
0.3.8 (2023-02-18)
|
||||
|
||||
|
||||
Introduction
|
||||
@ -470,6 +470,7 @@ This field uses 2 lowest bits of first byte, describing 4 different block types
|
||||
repeated `Regenerated_Size` times.
|
||||
- `Compressed_Literals_Block` - This is a standard Huffman-compressed block,
|
||||
starting with a Huffman tree description.
|
||||
In this mode, there are at least 2 different literals represented in the Huffman tree description.
|
||||
See details below.
|
||||
- `Treeless_Literals_Block` - This is a Huffman-compressed block,
|
||||
using Huffman tree _from previous Huffman-compressed literals block_.
|
||||
@ -566,6 +567,7 @@ or from a dictionary.
|
||||
|
||||
### `Huffman_Tree_Description`
|
||||
This section is only present when `Literals_Block_Type` type is `Compressed_Literals_Block` (`2`).
|
||||
The tree describes the weights of all literals symbols that can be present in the literals block, at least 2 and up to 256.
|
||||
The format of the Huffman tree description can be found at [Huffman Tree description](#huffman-tree-description).
|
||||
The size of `Huffman_Tree_Description` is determined during decoding process,
|
||||
it must be used to determine where streams begin.
|
||||
@ -1197,7 +1199,7 @@ Huffman Coding
|
||||
--------------
|
||||
Zstandard Huffman-coded streams are read backwards,
|
||||
similar to the FSE bitstreams.
|
||||
Therefore, to find the start of the bitstream, it is therefore to
|
||||
Therefore, to find the start of the bitstream, it is required to
|
||||
know the offset of the last byte of the Huffman-coded stream.
|
||||
|
||||
After writing the last bit containing information, the compressor
|
||||
@ -1239,9 +1241,15 @@ Transformation from `Weight` to `Number_of_Bits` follows this formula :
|
||||
```
|
||||
Number_of_Bits = Weight ? (Max_Number_of_Bits + 1 - Weight) : 0
|
||||
```
|
||||
The last symbol's `Weight` is deduced from previously decoded ones,
|
||||
by completing to the nearest power of 2.
|
||||
This power of 2 gives `Max_Number_of_Bits`, the depth of the current tree.
|
||||
When a literal value is not present, it receives a `Weight` of 0.
|
||||
The least frequent symbol receives a `Weight` of 1.
|
||||
Consequently, the `Weight` 1 is necessarily present.
|
||||
The most frequent symbol receives a `Weight` anywhere between 1 and 11 (max).
|
||||
The last symbol's `Weight` is deduced from previously retrieved Weights,
|
||||
by completing to the nearest power of 2. It's necessarily non 0.
|
||||
If it's not possible to reach a clean power of 2 with a single `Weight` value,
|
||||
the Huffman Tree Description is considered invalid.
|
||||
This final power of 2 gives `Max_Number_of_Bits`, the depth of the current tree.
|
||||
`Max_Number_of_Bits` must be <= 11,
|
||||
otherwise the representation is considered corrupted.
|
||||
|
||||
@ -1254,7 +1262,7 @@ Let's presume the following Huffman tree must be described :
|
||||
|
||||
The tree depth is 4, since its longest elements uses 4 bits
|
||||
(longest elements are the one with smallest frequency).
|
||||
Value `5` will not be listed, as it can be determined from values for 0-4,
|
||||
Literal value `5` will not be listed, as it can be determined from previous values 0-4,
|
||||
nor will values above `5` as they are all 0.
|
||||
Values from `0` to `4` will be listed using `Weight` instead of `Number_of_Bits`.
|
||||
Weight formula is :
|
||||
@ -1274,7 +1282,7 @@ The `Weight` of `5` can be determined by advancing to the next power of 2.
|
||||
The sum of `2^(Weight-1)` (excluding 0's) is :
|
||||
`8 + 4 + 2 + 0 + 1 = 15`.
|
||||
Nearest larger power of 2 value is 16.
|
||||
Therefore, `Max_Number_of_Bits = 4` and `Weight[5] = 16-15 = 1`.
|
||||
Therefore, `Max_Number_of_Bits = 4` and `Weight[5] = log_2(16 - 15) + 1 = 1`.
|
||||
|
||||
#### Huffman Tree header
|
||||
|
||||
@ -1683,6 +1691,7 @@ or at least provide a meaningful error code explaining for which reason it canno
|
||||
|
||||
Version changes
|
||||
---------------
|
||||
- 0.3.8 : clarifications for Huffman Blocks and Huffman Tree descriptions.
|
||||
- 0.3.7 : clarifications for Repeat_Offsets, matching RFC8878
|
||||
- 0.3.6 : clarifications for Dictionary_ID
|
||||
- 0.3.5 : clarifications for Block_Maximum_Size
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user