1、INTERNATIONAL TELECOMMUNICATION UNION)45G134 TELECOMMUNICATION (03/96)STANDARDIZATION SECTOROF ITU%.%2!,G0G0!30%#43G0G0/ the speech is reconstructed by filtering the excitation through the LP synthesis filter; the reconstructed speech signal is passed through a post-processing stage, which includes
2、an adaptivepostfilter based on the long-term and short-term synthesis filters, followed by a high-pass filter andscaling operation.4 Recommendation G.729 (03/96)T1518660-95/d03FixedcodebookShort-termfilterPost-processingGCGPAdaptivecodebookFIGURE 3/G.729Principle of the CS-ACELP decoderFIGURE 3/G.72
3、9.D03 = 7 CM2.3 DelayThis coder encodes speech and other audio signals with 10 ms frames. In addition, there is a look-ahead of 5 ms,resulting in a total algorithmic delay of 15 ms. All additional delays in a practical implementation of this coder are dueto: processing time needed for encoding and d
4、ecoding operations; transmission time on the communication link; multiplexing delay when combining audio data with other data.2.4 Speech coder descriptionThe description of the speech coding algorithm of this Recommendation is made in terms of bit-exact, fixed-pointmathematical operations. The ANSI
5、C code indicated in clause 5, which constitutes an integral part of thisRecommendation, reflects this bit-exact, fixed-point descriptive approach. The mathematical descriptions of the encoder(clause 3), and decoder (clause 4), can be implemented in several other fashions, possibly leading to a codec
6、implementation not complying with this Recommendation. Therefore, the algorithm description of the ANSI C code ofclause 5 shall take precedence over the mathematical descriptions of clauses 3 and 4 whenever discrepancies are found.A non-exhaustive set of test signals, which can be used with ANSI C c
7、ode, are available from the ITU.2.5 Notational conventionsThroughout this Recommendation, it is tried to maintain the following notational conventions: Codebooks are denoted by caligraphic characters (e.g. ). Time signals are denoted by their symbol and a sample index between parenthesis e.g. s(n).
8、The symboln is used as sample index. Superscript indices between parenthesis (e.g. g(m)are used to indicate time-dependency of variables. Thevariable m refers, depending on the context, to either a frame or subframe index, and the variable n to asample index. Recursion indices are identified by a su
9、perscript between square brackets (e.g. Ek). Subscripts indices identify a particular element in a coefficient array. The symbol identifies a quantized version of a parameter (e.g. gc). Parameter ranges are given between square brackets, and include the boundaries (e.g. 0.6, 0.9).Recommendation G.72
10、9 (03/96) 5 The function log denotes a logarithm with base 10. The function int denotes truncation to its integer value. The decimal floating-point numbers used are rounded versions of the values used in the 16 bit fixed-pointANSI C implementation.Table 2 lists the most relevant symbols used through
11、out this Recommendation. A glossary of the most relevant signals isgiven in Table 3. Table 4 summarizes relevant variables and their dimension. Constant parameters are listed in Table 5.The acronyms used in this Recommendation are summarized in Table 6.TABLE 2/G.729Glossary of most relevant symbolsT
12、ABLE 3/G.729Glossary of most relevant signalsName Reference Description1/(z) Equation (2) LP synthesis filterHh1(z) Equation (1) Input high-pass filterHp(z) Equation (78) Long-term postfilterHf(z) Equation (84) Short-term postfilterHt(z) Equation (86) Tilt-compensation filterHh2(z) Equation (91) Out
13、put high-pass filterP(z) Equation (46) Pre-filter for fixed codebookW(z) Equation (27) Weighting filterName Reference Descriptionc(n) 3.8 Fixed-codebook contributiond(n) 3.8.1 Correlation between target signal and h(n)ew(n) 3.10 Error signalh(n) 3.5 Impulse response of weighting and synthesis filter
14、sr(n) 3.6 Residual signals(n) 3.1 Pre-processed speech signals(n) 4.1.6 Reconstructed speech signals(n) 3.2.1 Windowed speech signalsf(n) 4.2 Postfiltered outputsf (n) 4.2 Gain-scaled postfiltered outputsw(n) 3.6 Weighted speech signalx(n) 3.6 Target signalx(n) 3.8.1 Second target signalu(n) 3.10 Ex
15、citation to LP synthesis filterv(n) 3.7.1 Adaptive-codebook contributiony(n) 3.7.3 Convolution v(n) * h(n)z(n) 3.9 Convolution c(n) * h(n)6 Recommendation G.729 (03/96)TABLE 4/G.729Glossary of most relevant variablesTABLE 5/G.729Glossary of most relevant constantsName Size Descriptiongp1 Adaptive-co
16、debook gaingc1 Fixed-codebook gaingl1 Gain term for long-term postfiltergf1 Gain term for short-term postfiltergt1 Gain term for tilt postfilterG 1 Gain for gain normalizationTop1 Open-loop pitch delayai11 LP coefficients (a0= 1.0)ki10 Reflection coefficientsk11 Reflection coefficient for tilt postf
17、ilteroi2 LAR coefficientsi10 LSF normalized frequenciespi, j40 MA predictor for LSF quantizationqi10 LSP coefficientsr(k) 11 Auto-correlation coefficientsr(k) 11 Modified auto-correlation coefficientswi10 LSP weighting coefficientsli10 LSP quantizer outputName Value Descriptionfs8000 Sampling freque
18、ncyf060 Bandwidth expansion10.94/0.98 Weight factor perceptual weighting filter20.60/0.4 0.7 Weight factor perceptual weighting filtern0.55 Weight factor postfilterd0.70 Weight factor postfilterp0.50 Weight factor pitch postfiltert0.90/0.2 Weight factor tilt postfilterTable 7 Fixed (algebraic) codeb
19、ookL0 3.2.4 Moving-average predictor codebookL1 3.2.4 First stage LSP codebookL2 3.2.4 Second stage LSP codebook (low part)L3 3.2.4 Second stage LSP codebook (high part)3.9 Gain codebook (first stage)3.9 Gain codebook (second stage)wlagEquation (6) Correlation lag windowwlpEquation (3) LP analysis w
20、indowRecommendation G.729 (03/96) 7TABLE 6/G.729Glossary of acronyms3 Functional description of the encoderIn this clause the different functions of the encoder represented in the blocks of Figure 2 are described. A detailed signalflow is shown in Figure 4.3.1 Pre-processingAs stated in clause 2, th
21、e input to the speech encoder is assumed to be a 16 bit PCM signal. Two pre-processingfunctions are applied before the encoding process:1) signal scaling; and2) high-pass filtering.The scaling consists of dividing the input by a factor 2 to reduce the possibility of overflows in the fixed-pointimple
22、mentation. The high-pass filter serves as a precaution against undesired low-frequency components. A secondorder pole/zero filter with a cut-off frequency of 140 Hz is used. Both the scaling and high-pass filtering are combinedby dividing the coefficients at the numerator of this filter by 2. The re
23、sulting filter is given by:Hh1(z) = 0.46363718 0.92724705z 1+ 0.46363718z 21 1.9059465z 1+ 0.9114024z 2(1)The input signal filtered through Hh1(z) is referred to as s(n), and will be used in all subsequent coder operations.3.2 Linear prediction analysis and quantizationThe short-term analysis and sy
24、nthesis filters are based on 10th order Linear Prediction (LP) filters.The LP synthesis filter is defined as:1(z)= 11 + i = 110i z i(2)where i, i = 1,.,10, are the (quantized) Linear Prediction (LP) coefficients. Short-term prediction, or linear predictionanalysis is performed once per speech frame
25、using the autocorrelation method with a 30 ms asymmetric window. Every80 samples (10 ms), the autocorrelation coefficients of windowed speech are computed and converted to the LPcoefficients using the Levinson algorithm. Then the LP coefficients are transformed to the LSP domain for quantizationand
26、interpolation purposes. The interpolated quantized and unquantized filters are converted back to the LP filtercoefficients (to construct the synthesis and weighting filters for each subframe).Acronym DescriptionCELP Code-Excited Linear-PredictionCS-ACELP Conjugate-Structure Algebraic-CELPMA Moving A
27、verageMSB Most Significant BitMSE Mean-Squared ErrorLAR Log Area RatioLP Linear PredictionLSP Line Spectral PairLSF Line Spectral FrequencyVQ Vector quantization8Recommendation G.729 (03/96)T1518670-95/d04per frameper subframeHigh pass63.2.33.43.33.3WindowingautocorrelationsLevinson Durbin3.2.1;2A(z
28、)A(z)A(z)L0, L1L2, L3LSPindexP0, P1P2Pitchdelay(z)(z)(z)x(n)x(n)v(n)3.7.13.63.73.5ConjugatestructureVQ 3.9LSPquantization3.2.4dcccTkkTkej2InputsamplesA(z) LSPInterpolation that is:f 1(i) = f1(i) + f1(i 1) i = 1,.,5f 2(i) = f2(i) f2(i 1) i = 1,.,5(25)Finally the LP coefficients are computed from f 1(
29、i) and f 2(i) by:ai= 0.5 f 1(i) + 0.5 f 2(i) i = 1,.,50.5 f 1(11 i) 0.5f 2(11 i) i = 6,.,10(26)This is directly derived from the relation A(z) = (F1(z) + F2(z)/2, and because F1(z) and F2(z) are symmetric andantisymmetric polynomials, respectively.3.3 Perceptual weightingThe perceptual weighting fil
30、ter is based on the unquantized LP filter coefficients ai, and is given by:W(z) = A(z / 1)A(z / 2)= 1 + i = 110 i1aizi1 + i = 110 i2aizi(27)The values of 1and 2determine the frequency response of the filter W(z). By proper adjustment of these variables it ispossible to make the weighting more effect
31、ive. This is done by making 1and 2a function of the spectral shape of theinput signal. This adaptation is done once per 10 ms frame, but an interpolation procedure for each first subframe is usedto smooth this adaptation process. The spectral shape is obtained from a 2nd order linear prediction filt
32、er, obtained as aby-product from the Levinson-Durbin recursion (3.2.2). The reflection coefficients kiare converted to Log Area Ratio(LAR) coefficients oiby:oi= log (1.0 + ki)(1.0 ki)i = 1, 2 (28)Recommendation G.729 (03/96) 15The LAR coefficients corresponding to the current 10 ms frame are used fo
33、r the second subframe. The LAR coefficientsfor the first subframe are obtained through linear interpolation with the LAR parameters from the previous frame. Theinterpolated LAR coefficients in each of the two subframes are given by:Subframe 1: oi(1)= 0.5 oi(previous)+ 0.5 oi(current)i = 1, 2Subframe
34、 2: oi(2)= oi(current)i = 1, 2(29)The spectral envelope is characterized as being either flat (flat = 1) or tilted (flat = 0). For each subframe thischaracterization is obtained by applying a threshold function to the LAR coefficients. To avoid rapid changes, ahysteresis is used by taking into accou
35、nt the value of flat in the previous subframe m 1,flat (m)=0 if o1(m)0.65 and flat (m 1)= 11 if o1(m) 1.52 or o2(m)143 thentmax= 143tmin= tmax 6endFor the second subframe, closed-loop pitch analysis is done around the pitch selected in the first subframe to find theoptimal delay T2. The search bound
36、aries are between tmin 23and tmax+ 23, where tminand tmaxare derived from T1asfollows:tmin= int (T1) 5if tmin143 thentmax= 143tmin= tmax 9end18 Recommendation G.729 (03/96)The closed-loop pitch search minimizes the mean-squared weighted error between the original and reconstructed speech.This is ach
37、ieved by maximizing the term:R(k) = n = 039x(n) yk(n)n = 039yk(n) yk(n)(37)where x(n) is the target signal and yk(n) is the past filtered excitation at delay k past excitation convolved with h(n).Note that the search range is limited around a preselected value, which is the open-loop pitch Topfor th
38、e first subframe,and T1for the second subframe.The convolution yk(n) is computed for the delay tmin. For the other integer delays in the search range k = tmin+ 1,.,tmax,it is updated using the recursive relation:yk(n) = yk1(n 1) + u(k) h(n) n = 39,.,0 (38)where u(n), n = 143,.,39, is the excitation
39、buffer, and yk1(1) = 0. Note that in the search stage, the samples u(n),n = 0,.,39 are not known, and they are needed for pitch delays less than 40. To simplify the search, the LP residual iscopied to u(n) to make the relation in Equation (38) valid for all delays.For the determination of T2, and T1
40、if the optimum integer closed-loop delay is less than 85, the fractions around theoptimum integer delay have to be tested. The fractional pitch search is done by interpolating the normalized correlationin Equation (37) and searching for its maximum. The interpolation is done using a FIR filter b12ba
41、sed on a Hammingwindowed sinc function with the sinc truncated at 11 and padded with zeros at 12 (b12(12) = 0). The filter has itscut-off frequency (3 dB) at 3600 Hz in the oversampled domain. The interpolated values of R(k) for the fractions 23, 13, 0, 13and 23are obtained using the interpolation f
42、ormulaR(k)t= i = 03R(k i)b12(t + 3i) + i = 03R(k + 1 + i)b12(3 t + 3i) t = 0, 1, 2 (39)where t = 0, 1, 2 corresponds to the fractions 0, 13and 23, respectively. Note that it is necessary to compute the correlationterms in Equation (37) using a range tmin 4, tmax+ 4, to allow for the proper interpola
43、tion.3.7.1 Generation of the adaptive-codebook vectorOnce the pitch delay has been determined, the adaptive-codebook vector v(n) is computed by interpolating the pastexcitation signal u(n) at the given integer delay k and fraction t:v(n) = i = 09u(n k + i)b30(t + 3i) + i = 09u(n k + 1 + i)b30(3 t +
44、3i) n = 0,.,39 t = 0, 1, 2 (40)The interpolation filter b30is based on a Hamming windowed sinc functions truncated at 29 and padded with zeros at 30 b30(30) = 0. The filter has a cut-off frequency (3 dB) at 3600 Hz in the oversampled domain.Recommendation G.729 (03/96) 193.7.2 Codeword computation f
45、or adaptive-codebook delaysThe pitch delay T1is encoded with 8 bits in the first subframe and the relative delay in the second subframe is encodedwith 5 bits. A fractional delay T is represented by its integer part int(T), and a fractional part frac/3, frac = 1,0,1. Thepitch index P1 is now encoded
46、as:P1 = 3(int(T1) 19) + frac 1 if T1= 19,.,85, frac = 1, 0, 1(int(T1) 85) + 197 if T1= 86,.,143, frac = 0(41)The value of the pitch delay T2is encoded relative to the value of T1. Using the same interpretation as before, thefractional delay T2represented by its integer part int(T2), and a fractional
47、 part frac/3, frac = 1,0,1, is encoded as:P2 = 3(int(T2) tmin) + frac + 2 (42)where tminis derived from T1as in 3.7.To make the coder more robust against random bit errors, a parity bit P0 is computed on the delay index P1 of the firstsubframe. The parity bit is generated through an XOR operation on
48、 the six most significant bits of P1. At the decoderthis parity bit is recomputed and if the recomputed value does not agree with the transmitted value, an error concealmentprocedure is applied.3.7.3 Computation of the adaptive-codebook gainOnce the adaptive-codebook delay is determined, the adaptiv
49、e-codebook gain gpis computed as:gp= n = 039x(n) y(n)n = 039y(n) y(n)bounded by 0 gp 1.2 (43)where x(n) is the target signal and y(n) is the filtered adaptive-codebook vector zero-state response of W(z)/(z) to v(n).This vector is obtained by convolving v(n) with h(n):y(n) = i = 0nv(i)h(n i) n = 0,.,39 (44)3.8 Fixed codebook Structure and searchThe fixed code