Reverse Engineering: [Encoding] A qualitative overview of x264's ratecontrol methods

A qualitative overview of x264's ratecontrol methods
By Loren Merritt

Historical note:
This document is outdated, but a significant part of it is still accurate. Here are some important ways ratecontrol has changed since the authoring of this document:
- By default, MB-tree is used instead of qcomp for weighting frame quality based on complexity. MB-tree is effectively a generalization of qcomp to the macroblock level. MB-tree also replaces the constant offsets for B-frame quantizers. The legacy algorithm is still available for low-latency applications.
- Adaptive quantization is now used to distribute quality among each frame; frames are no longer constant quantizer, even if MB-tree is off.
- VBV runs per-row rather than per-frame to improve accuracy.

x264's ratecontrol is based on libavcodec's, and is mostly empirical. But I can retroactively propose the following theoretical points which underlie most of the algorithms:

- You want the movie to be somewhere approaching constant quality. However, constant quality does not mean constant PSNR nor constant QP. Details are less noticeable in high-complexity or high-motion scenes, so you can get away with somewhat higher QP for the same perceived quality.
- On the other hand, you get more quality per bit if you spend those bits in scenes where motion compensation works well: A given artifact may stick around several seconds in a low-motion scene, and you only have to fix it in one frame to improve the quality of the whole scene.
- Both of the above are correlated with the number of bits it takes to encode a frame at a given QP.
- Given one encoding of a frame, we can predict the number of bits needed to encode it at a different QP. This prediction gets less accurate if the QPs are far apart.
- The importance of a frame depends on the number of other frames that are predicted from it. Hence I-frames get reduced QP depending on the number and complexity of following inter-frames, disposable B-frames get higher QP than P-frames, and referenced B-frames are between P-frames and disposable B-frames.

The modes:

2pass:
Given some data about each frame of a 1st pass (e.g. generated by 1pass ABR, below), we try to choose QPs to maximize quality while matching a specified total size. This is separated into 3 parts:
(1) Before starting the 2nd pass, select the relative number of bits to allocate between frames. This pays no attention to the total size of the encode. The default formula, empirically selected to balance between the 1st 2 theoretical points, is "complexity ** 0.6", where complexity is defined to be the bit size of the frame at a constant QP (estimated from the 1st pass).
(2) Scale the results of (1) to fill the requested total size. Optional: Impose VBV limitations. Due to nonlinearities in the frame size predictor and in VBV, this is an iterative process.
(3) Now start encoding. After each frame, update future QPs to compensate for mispredictions in size. If the 2nd pass is consistently off from the predicted size (usually because we use slower compression options than the 1st pass), then we multiply all future frames' qscales by the reciprocal of the error. Additionally, there is a short-term compensation to prevent us from deviating too far from the desired size near the beginning (when we don't have much data for the global compensation) and near the end (when global doesn't have time to react).

1pass, average bitrate:
The goal is the same as in 2pass, but here we don't have the benefit of a previous encode, so all ratecontrol must be done during the encode.
(1) This is the same as in 2pass, except that instead of estimating complexity from a previous encode, we run a fast motion estimation algo over a half-resolution version of the frame, and use the SATD residuals (these are also used in the decision between P- and B-frames). Also, we don't know the size or complexity of the following GOP, so I-frame bonus is based on the past.
(2) We don't know the complexities of future frames, so we can only scale based on the past. The scaling factor is chosen to be the one that would have resulted in the desired bitrate if it had been applied to all frames so far.
(3) Overflow compensation is the same as in 2pass. By tuning the strength of compensation, you can get anywhere from near the quality of 2pass (but unpredictable size, like +- 10%) to reasonably strict filesize but lower quality.

1pass, constant bitrate (VBV compliant):
(1) Same as ABR.
(2) Scaling factor is based on a local average (dependent on VBV buffer size) instead of all past frames.
(3) Overflow compensation is stricter, and has an additional term to hard limit the QPs if the VBV is near empty. Note that no hard limit is done for a full VBV, so CBR may use somewhat less than the requested bitrate. Note also that if a frame violates VBV constraints despite the best efforts of prediction, it is not re-encoded.

1pass, constant ratefactor:
(1) Same as ABR.
(2) The scaling factor is a constant based on the --crf argument.
(3) No overflow compensation is done.

constant quantizer:
QPs are simply based on frame type.

http://git.videolan.org/?p=x264.git;a=blob_plain;f=doc/ratecontrol.txt;hb=HEAD

Reverse Engineering

Trao đổi với tôi

11/11/13

[Encoding] A qualitative overview of x264's ratecontrol methods