Simple ways to manage different clock frequencies of audio codecs
Audio processing is essential to many consumer electronic applications such as mobile phones, MP3 players and a host of other products. While size and power consumption are often critical SoC design requirements, the market demands high-quality high fidelity (Hi-Fi) audio capabilities. To meet this consumer demand, designers are now embedding audio codecs into their next-generation, advanced SoCs.
The audio codec creates the interface between the digital host processor and the audio transducers, such as microphones and speakers. It is also responsible for several routine audio functions, thereby alleviating the workload on the host processor.
The clocks required by the data converters on an audio codec depend on the audio material sampling rates as well as on the clocks available on the host application and SoC. The combinations are quite complex due to the multitude of audio sample rate options and available host clocks. To further complicate matters, in audio-video (A/V) applications, the audio clocks need to also be synchronized with the video clocks required by the video data converters. Therefore, many designers are confronted with complex choices when deciding on trade-offs to minimize system costs related to clock generation and interfacing a multitude of sample rates.
The digital filters play an important role in synchronizing the different clocks because they process the digital samples between the digital audio interface and the audio data converters, and therefore, can perform sampling rate conversions. This article will review the functions of digital filters in audio codecs and will provide several examples to illustrate how they can interface to a range of sample-rates and clock environments.
The audio codec is composed of two types of data converters: a digital-to-analog converter (DAC) for playback and an analog-to-digital converter (ADC) for recording.
On the digital side, there are multiple blocks. The most important are the digital audio filters that convert the data rate to the oversampled clocks of the data converters and remove the high-frequency noise outside the audio band. Also important is a clock management block, which makes sure that all multi-rate blocks are synchronized with each other and supports all the required sampling rate combinations.
Today, data converters in audio codecs operate at highly oversampled frequencies, which mean that their conversion frequency is much higher than the audio band, often by a factor of 100 or more. For example, assuming a Redbook CD player has an audio data rate of 44.1 kSamples per second (kS/s), the typical oversampling rate is 128X, leading to the DAC’s conversion rate of 5.6448 MSamples per second (MS/s).
Why are Digital Audio Filters Required on an Audio Codec?
The main reason filters are required on an audio codec is to remove the aliasing or imaging bands. These are replicas of the signal band around the multiples of the audio sampling rate (FS) and are a result of the multi-rate operation. For example, an audio stream at 44.1 kS/s up-sampled to 5.6448 MS/s has spectrum replicas of around 88.2 kHz, 132.3 kHz. This is a result of the Nyquist Sampling theorem, as illustrated in Figure 1.
Figure 1: Audio signal sampled at FS and its spectrum replicas at 2FS, 3FS, (in orange)
On a DAC, the image bands cause a stair-like waveform as shown in Figure 2. The filter smoothes the waveform and reduces the high-frequency energy. If this high-frequency energy was not removed, it would waste power and cause intermodulation distortion in the output drivers, causing the loud-speakers to generate audible noises.
Figure 2: The digital filter up-samples and smoothes the signal waveform before being applied to the DAC
On an ADC, the filter removes any out-of-band noise picked up at the input or generated within the ADC, as shown in Figure 3. If this is not removed when the signal is re-sampled at the standard audio rate, the noise would be folded down in-band due to aliasing and becomes audible.
Figure 3: On an ADC, any out-of-band noises (red signal on the left diagram) would be folded down into the signal band when the sample rate is re-sampled to the standard audio rate at the output (diagram on the right)
Clocks and Sampling Rates
Digital audio signals are sampled at standard frequencies. Due to legacy from the old Redbook CD, many audio recordings use the standard 44.1 kS/s rate. This unconventional number is derived from an early practice of reusing PAL videotape equipment for audio recordings. Modern audio systems, like DVDs, use 48 kS/s and its multiples 96 kS/s and 192 kS/s.
Voice applications, such as those in cell phones, use 8 kS/s and its multiples, 16 kS/s and 32 kS/s. Some applications may also use multiples of 44.1 kS/s, namely 88.2 kS/s and 176.4 kS/s. Since the data converters must operate at oversampled frequencies, typically 128X, or 256X, the required master clock frequencies to drive the data converters would be in the range of 5 to 12 MHz.
An audio codec must therefore support a wide variety of audio sample rates and accommodate a range of master clock frequencies facilitating its operation in the application. It is not a straightforward objective due to the multitude of combinations and restrictions in the possible clock frequency ratios. For this reason, the digital filters must include programming of its sample rate conversion.
For example, let’s consider a practical case with an audio rate of 48 kS/s and the converter’s sampling frequency of 12.288 MS/s. The resulting sample rate conversion is 256X. Now, for supporting 96 kS/s, the filters are reconfigured for a sample rate conversion of 128X. And for supporting 192 kS/s, the filters are reconfigured for a sample rate conversion of 64X. The sampling frequency of the data converters stays the same at 12.288 MS/s because the audio band limit is fixed at 20 kHz. For the 44.1-kS/s audio rate family, the corresponding master clock would be 11.2896 kHz.
Solutions to Support Audio-Video Applications with Advanced Audio Codecs
By proper reconfiguration of the digital filters, sample rate conversion and flexible clock frequency choices, it is possible to support a wide range of audio-video applications with advanced audio codecs. There are several solutions for these applications that help designers understand the trade-offs needed to minimize the costs of their SoCs.
Phase-Locked Loop (PLL) for the Audio Clock
Many applications such as portable products cannot have dedicated crystal oscillators for the audio codec, due to its space and/or cost limitations. The audio codec must be able to support the different audio rates from the available host master clock, which is often the USB clock operating at 12 MHz or a multiple of it. In which case, a phase-locked loop (PLL) can be used to generate the required audio clocks. But this PLL is relatively complex due to the very fine frequency resolution required for supporting all the frequency combinations, while at the same time providing a low-jitter output clock for performance. Other solutions not requiring a PLL would be preferable.
An alternative solution is the PLL-less technique of re-using the USB clock and avoiding adding a dedicated PLL for audio. USB is a very popular interface and almost universally present in any application. Either 12-MHz or 24-MHz clocks are used and have relatively low jitter, which is an important requirement for audio. A USB clock of 12 MHz can support the 48-kS/s audio rate because it is an integer multiple (12,000 = 250 x 48). To use it, the filters sample rate conversion needs to be reconfigured from the nominal 256X to 250X.
The 44.1 kS/s audio rate, however, can only be approximately generated. Using a sample rate conversion of 272X, the audio clock can be approximated to 44.1176 kS/s, which is slightly different from the nominal. But the difference is quite small and hardly noticeable. In effect, it is just a 0.04% change in pitch. To put that in perspective, it is 100 times less than a semitone. Another way to appreciate the effect of the clock approximation is in the duration of a song: 0.04% faster playback corresponds to a 3-minute song completing 10 milliseconds (ms) earlier.
A/V multimedia equipment produces data streams that are both video and audio. Examples are DVD players and MPEG media readers. The sampling rates are independent for video and audio. Video uses 27 MHz clocks or multiples. The audio clocks must be derived from 27 MHz and all standards based on 44.1 kS/s, 48 kS/s and 8 kS/s must be supported.
These audio clocks are best generated with a PLL having as reference a division of the 27 MHz video clock. Due to the synchronism between audio and video data, the PLL-less technique is not applicable because it would change the audio playback cadence relative to the video. This would cause the audio to get misaligned with the video image, causing loss of lip-sync.
Synchronization with the Source Clock
There are applications where the source of data is remote to the equipment. An example of this would be TV broadcasts (cable, terrestrial or satellite), HDMI cables and web streaming. In these cases, the audio and video clocks must be synchronized with the source clocks.
To illustrate this situation, let’s consider a digital TV transmission (as illustrated in Figure 4). The data transmitted by the TV station is synchronized to a reference 27-MHz clock at the transmitter site. Due to tolerances on the transmitter crystal oscillator and Doppler effects during propagation, the frequency received by the TV antenna at the receiver site may vary. In order to ensure synchronization of the data, a coded time stamp (CTS) is included in the transmitter MPEG data signal. The CTS allows synchronization of the video and audio packets for an accurate lip-sync during playback.
Since the received frequency varies over time, data may be arriving slower or faster, producing clock drifts relative to the receiver’s own 27-MHz clock. This variation in data flow will result in either too much or too little data arriving at the receiver to be processed. For video data, this drift is not significant and can be corrected by dropping or repeating frames. While the video is decoded, the receiver is constantly comparing the time stamp received from the transmitter with that of the decoded video.
When too much data accumulates in the decoder (i.e., the receiver’s 27-MHz clock is slower than the transmitter clock), a frame is dropped. When too little data is available for processing, a frame is repeated. Since frames are either dropped or repeated so infrequently, the effect is not noticeable to the viewer.
Audio data must also be synchronized with the transmitter clock. The audio clocks must be derived from the same 27 MHz video clock and all standards must be supported. But dropping/repeating samples are unacceptable in audio because it would be highly perceptible to our ear. The solution is to have a PLL to generate the audio clock and use the time stamp information to lock the PLL to the received sampling data.
For more information on Synopsys’ DesignWare Audio IP solutions, please visit www.synopsys.com/audio.
About the author:
Carlos Azeredo-Leme is a senior staff engineer for the DesignWare Analog IP at Synopsys since 2009. Prior to joining Synopsys, he was co-founder and member of the Board of Directors of Chipidea Microelectronics in 1993, where he held the position of Chief Technical Officer. There, he was responsible for complete mixed-signal solutions, analog front-ends and RF. He worked in the areas of audio, power management, cellular and wireless communications and RF transceivers. Since 1994 he holds a position as Teacher at the Technical University of Lisbon (UTL-IST) in Portugal. His research interests are in analog and mixed-signal design, focusing on low-power and low-voltage. Carlos holds an MSEE from Technical University of Lisbon (UTL-IST) in Portugal and a Ph.D. from ETH-Zurich in Switzerland. Carlos can be reached at firstname.lastname@example.org.