mirror of
https://github.com/facebook/zstd.git
synced 2025-10-18 00:03:50 -04:00
updated spec
This commit is contained in:
parent
c5fb5b7fcd
commit
c40ba718d7
@ -409,7 +409,7 @@ To decode a compressed block, the following elements are necessary :
|
|||||||
|
|
||||||
### Literals section
|
### Literals section
|
||||||
|
|
||||||
Literals are compressed using huffman compression.
|
Literals are compressed using Huffman prefix codes.
|
||||||
During sequence phase, literals will be entangled with match copy operations.
|
During sequence phase, literals will be entangled with match copy operations.
|
||||||
All literals are regrouped in the first part of the block.
|
All literals are regrouped in the first part of the block.
|
||||||
They can be decoded first, and then copied during sequence operations,
|
They can be decoded first, and then copied during sequence operations,
|
||||||
@ -718,6 +718,9 @@ The Sequences section starts by a header,
|
|||||||
followed by optional Probability tables for each symbol type,
|
followed by optional Probability tables for each symbol type,
|
||||||
followed by the bitstream.
|
followed by the bitstream.
|
||||||
|
|
||||||
|
| Header | (LitLengthTable) | (OffsetTable) | (MatchLengthTable) | bitStream |
|
||||||
|
| ------ | ---------------- | ------------- | ------------------ | --------- |
|
||||||
|
|
||||||
To decode the Sequence section, it's required to know its size.
|
To decode the Sequence section, it's required to know its size.
|
||||||
This size is deducted from `blockSize - literalSectionSize`.
|
This size is deducted from `blockSize - literalSectionSize`.
|
||||||
|
|
||||||
@ -774,7 +777,7 @@ They define lengths from 0 to 131071 bytes.
|
|||||||
|
|
||||||
| Code | 0-15 |
|
| Code | 0-15 |
|
||||||
| ------ | ---- |
|
| ------ | ---- |
|
||||||
| value | Code |
|
| length | Code |
|
||||||
| nbBits | 0 |
|
| nbBits | 0 |
|
||||||
|
|
||||||
|
|
||||||
@ -798,7 +801,7 @@ __Default distribution__
|
|||||||
When "compression mode" is "predef"",
|
When "compression mode" is "predef"",
|
||||||
a pre-defined distribution is used for FSE compression.
|
a pre-defined distribution is used for FSE compression.
|
||||||
|
|
||||||
Here is its definition. It uses an accuracy of 6 bits (64 states).
|
Below is its definition. It uses an accuracy of 6 bits (64 states).
|
||||||
```
|
```
|
||||||
short literalLengths_defaultDistribution[36] =
|
short literalLengths_defaultDistribution[36] =
|
||||||
{ 4, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1,
|
{ 4, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1,
|
||||||
@ -833,7 +836,7 @@ They define lengths from 3 to 131074 bytes.
|
|||||||
|
|
||||||
__Default distribution__
|
__Default distribution__
|
||||||
|
|
||||||
When "compression mode" is defined as "default distribution",
|
When "compression mode" is defined as "predef",
|
||||||
a pre-defined distribution is used for FSE compression.
|
a pre-defined distribution is used for FSE compression.
|
||||||
|
|
||||||
Here is its definition. It uses an accuracy of 6 bits (64 states).
|
Here is its definition. It uses an accuracy of 6 bits (64 states).
|
||||||
@ -950,9 +953,11 @@ Probability is obtained from Value decoded by following formulae :
|
|||||||
|
|
||||||
It means value `0` becomes negative probability `-1`.
|
It means value `0` becomes negative probability `-1`.
|
||||||
`-1` is a special probability, which means `less than 1`.
|
`-1` is a special probability, which means `less than 1`.
|
||||||
Its effect on distribution table is described in next paragraph.
|
Its effect on distribution table is described in [next paragraph].
|
||||||
For the purpose of calculating cumulated distribution, it counts as one.
|
For the purpose of calculating cumulated distribution, it counts as one.
|
||||||
|
|
||||||
|
[next paragraph]:#fse-decoding--from-normalized-distribution-to-decoding-tables
|
||||||
|
|
||||||
When a symbol has a probability of `zero`,
|
When a symbol has a probability of `zero`,
|
||||||
it is followed by a 2-bits repeat flag.
|
it is followed by a 2-bits repeat flag.
|
||||||
This repeat flag tells how many probabilities of zeroes follow the current one.
|
This repeat flag tells how many probabilities of zeroes follow the current one.
|
||||||
@ -1040,7 +1045,7 @@ All sequences are stored in a single bitstream, read _backward_.
|
|||||||
It is therefore necessary to know the bitstream size,
|
It is therefore necessary to know the bitstream size,
|
||||||
which is deducted from compressed block size.
|
which is deducted from compressed block size.
|
||||||
|
|
||||||
The bit of the stream is followed by a set-bit-flag.
|
The last useful bit of the stream is followed by an end-bit-flag.
|
||||||
Highest bit of last byte is this flag.
|
Highest bit of last byte is this flag.
|
||||||
It does not belong to the useful part of the bitstream.
|
It does not belong to the useful part of the bitstream.
|
||||||
Therefore, last byte has 0-7 useful bits.
|
Therefore, last byte has 0-7 useful bits.
|
||||||
@ -1068,7 +1073,9 @@ Decoding starts by reading the nb of bits required to decode offset.
|
|||||||
It then does the same for match length,
|
It then does the same for match length,
|
||||||
and then for literal length.
|
and then for literal length.
|
||||||
|
|
||||||
Offset / matchLength / litLength define a sequence, which can be applied.
|
Offset / matchLength / litLength define a sequence.
|
||||||
|
It starts by inserting the number of literals defined by `litLength`,
|
||||||
|
then continue by copying `matchLength` bytes from `currentPos - offset`.
|
||||||
|
|
||||||
The next operation is to update states.
|
The next operation is to update states.
|
||||||
Using rules pre-calculated in the decoding tables,
|
Using rules pre-calculated in the decoding tables,
|
||||||
|
Loading…
x
Reference in New Issue
Block a user