After digitally encoded audio signals, one of the most difficult problems is mass data storage and transmission. The compression technology of digital audio signal is a very important link in digital television broadcasting system. Compression efficiency and compression quality directly affect the transmission efficiency of digital TV broadcasting and the quality of audio and video transmission. This paper mainly analyzes the digital audio compression technology.
Compared to analog signals, digital signals have obvious advantages, but digital signals also have their own corresponding shortcomings, that is, the demand for storage capacity and the increase of the capacity requirements for the transmission of the channel. Audio compression technology refers to the use of the appropriate digital signal processing technology for the original digital audio signal flow (PCM code). In the condition that the amount of useful information is not lost, or the loss can be ignored, the code rate is reduced (compressed), also called compression coding. It must have the corresponding inverse transformation, called decompression or decoding. Generally speaking, audio compression technology can be divided into two categories: lossless data compression and lossy data compression.
Lossless data compression
Using lossless compression scheme, the original data can be restored bit by bit after decompression. They eliminate the statistical redundancy existing in the audio signal by predicting the values in the past samples. Small compression ratio can be achieved, preferably about 2:1, depending on the complexity of the original audio signal. Time domain predictive coding technology makes lossless compression feasible, thanks to the time domain prediction code technology. They are：
1. difference algorithm
Audio signals contain repetitive sounds, and a large number of redundant and perceptive unrelated sounds. The repeated data information is deleted in the encoding process and is reintroduced at the time of decoding. The audio signal is first decomposed into several sub bands containing discrete tones. DPCM is then applied to a predictor suitable for short-term periodic signals. This encoding is adaptive. It looks at the input signal energy to modify the quantization step size. This leads to the so-called adaptive DPCM (ADPCM).
2. entropy encoder
The redundancy of quantization subband representation is used to improve the efficiency of entropy coding. These coefficients are transmitted in a gradually increasing frequency order, generating larger values at low frequencies, and generating long trips with a smaller value near zero at high frequencies. VLC is derived from the different Huffman tables which are most consistent with the statistics of low frequency and high frequency.
3. floating-point system
The binary values from the A/D conversion process are divided into data blocks, or in the time domain, the adjacent samples are adopted by the transfer end of the A/D conversion, or in the frequency domain, the adjacent frequency coefficients are adopted at the FDCT output end. Then the binary values in the data block are increased proportionally so that the maximum value is only below the full conversion value. The conversion factor is called index, which is universal for all values in the block.
Therefore, each value can be determined by a mantissa (a sample value) and an indicated positive number. The bit allocation calculation is derived from the HAS model, and the way to achieve data rate compression is to send the index value to each block once. The coding performance is good, but the noise is related to the content of the signal. Shielding technology helps reduce this audible noise.
Lossy data compression
The way to achieve lossy data compression is to combine two or more processing techniques to take advantage of the HAS's ability to detect the particular spectrum component of other high amplitude. In this way, a high performance data compression scheme and a much higher compression ratio from 2:1 to 20:1 can be obtained, depending on the complexity of the encoding / decoding process and the audio quality requirements.
Lossy data compression systems use perceptual coding technology. The basic principle is to give up all signals below the threshold curve to eliminate the perceptual redundancy in the audio signal. Therefore, these lossy data compression systems are also known as perceived lossless. Perceptual lossless compression is feasible because of the combination of several technologies, such as:
The time and frequency domain shielding of the 1. signal components.
2. quantify the noise shielding of each audible tone
By assigning enough bits, we ensure that the quantization noise level is always lower than the shielding curve. At frequencies close to audible signals, SNR of 20 or 30DB is acceptable.
3. joint coding
The technology takes advantage of redundancy in multi-channel audio systems. A large number of identical data have been found in all channels. Therefore, data compression can be obtained by encoding these same data at once, and it is indicated to the decoder that these data must be duplicated in other channels.
Implementation of audio decoding process
The most important shielding effect occurs in the frequency domain. To make use of this property, the spectrum of audio signals is decomposed into several sub bands according to the time and frequency resolution matched with the critical bandwidth of HAS.
The structure of perceptual encoder is composed of the following parts:
1. multi band filter
Usually referred to as a filter bank, its function is to decompose the spectrum into subbands.
2. bit distributor
It is used to estimate the shielding threshold and allocate bits based on the spectrum energy of the audio signal and the psychological model.
3. conversion and quantization processor
4. data multiplexer
It is used to receive quantified data and to add secondary information (bit allocation and conversion factor information) to the decoding process.
3.1 filter banks (there are three types of filter banks).
(1) subband group. The signal spectrum is divided into the equal wide-band subband. This is analogous to the HAS process of frequency analysis, which divides the audio spectrum into a critical frequency band. The width of the critical subband is variable. The bandwidth below 500Hz is 100Hz, at 10KHz,