Let’s take our good old standby, the Red Book standard of CD Audio, where we sample using 16-bit data at a 44.1kHz sample rate. This sample rate implies a series of specific timing events, and when we play back the audio waveform we need to be sure we recreate those 16-bit waveform values at those precise instants in time. All of the mathematical theories underpinning digital audio assume that this is exactly what happens. Imagine, though, that we make a small timing error when it comes to creating the output waveform. The stored data does not represent the value of the original waveform at that erroneous timing point. In other words – and this is a mission-critical observation – “the right data at the wrong time is the wrong data”.
The interesting question becomes, how big does the timing error have to be before it becomes important? The way to understand this is to consider what the magnitude of the error is, and compare it to other limiting factors in the playback chain. The most important of these – certainly for 16-bit audio – is the Bit Depth. The question then becomes, how much time has to pass before the magnitude of the original waveform changes by more than the resolution limit of the 16th bit? That would be the maximum timing error that we could tolerate.
The point at which such timing errors create the greatest error is when the rate of change of the encoded signal is at its maximum. If it was an analog signal we would be talking about the ‘slew rate’, and we know, for example, that amplifier circuits sound compromised if they cannot keep up with the maximum slew rate demanded by the signal. The maximum encodable slew rate is equivalent to the signal changing from maximum to minimum between consecutive samples. For standard CD audio this means that the smallest encodable change in the signal amplitude can occur in as little as 1/65,536 of one sample interval, which works out to be 346 picoseconds. In other words, if the timing in the DAC is out by more than this amount we have the potential for the errors of some description in the output signal.
Just how precise is 346 picoseconds? Well, if your wristwatch lost 346ps every second it would take a little over 13 years for it to lose a whole second. In 346ps light travels 4 inches, and a speeding bullet will penetrate 1/100 of the way through a single sheet of paper. It is a very small snippet of time.
These timing errors are referred to as ‘jitter’. Most of you will have already heard about it. What I have just described has been the standard view of jitter among audiophiles for the last 25 years. There’s more to it than that.
This simplistic picture assumes that setting the output voltage of the DAC to the desired level is an instantaneous event that happens at the precise instant the clock triggers it. It also assumes that that output voltage is held at the exact correct level until the next clock trigger comes along and causes it to be set to the next required level. Neither are accurate descriptions of reality. The process of recognizing a trigger event occurs in an analog circuit, and in order to have the capability to recognize the trigger with a precision of 346ps it needs to have a bandwidth of at least 3GHz. That bandwidth lies deep in the RF frequency spectrum, which means that it will be susceptible to noise and desperately sensitive to external interference. Circuits with GHz bandwidth and above become exponentially more difficult to design, and exponentially more expensive to construct as you attempt to reduce their noise. Have you ever wondered why DACs with so-called ‘femto-second clocks’ cost thousands – even tens of thousands – of dollars?
The clock circuit registers the clock signal when it detects that the voltage in the circuit has gone from below to above a certain trigger level. But any noise present in a circuit means that the instantaneous signal level in that circuit is perpetually fluttering about. Such noise means that sometimes it pushes the signal level over the clock detection threshold when it wasn’t supposed to, and registers a false clock signal. And sometimes it drags it below the threshold and prevents the correct registration of the clock pulse. The presence of RF noise in any part of the DAC circuit that interacts with the clocking function will cause this kind of jitter-like behviour – even with a perfect, jitter-free clock signal.
The actual D-to-A conversion requires the output voltage to be set to a target level as soon as possible after the clock pulse is detected, and held to that level until the next clock pulse is detected. Once again, that circuit is going to need to have a bandwidth close to 3GHz if it is going to respond quickly enough and accurately enough to the clock pulse, and consequently is going to be highly susceptible to noise. Additionally, it is important to the operation of the DAC that the target output voltage level is accurately maintained over the duration of the clock cycle. However, at one instant, the presence of the noise will increase that level, and at another instant it will decrease it. In effect, the noise can change the effective signal level encoded by the bitstream, in the sense that a certain value was expected according to the bitstream, but a different value was delivered thanks to the noise.
So the presence of RF noise in the DAC circuitry can cause the DAC to deliver an output which contains errors due to at least two separate mechanisms. And both of these mechanisms can be viewed as introducing the exact same sort of problems that our simplistic understanding of ‘jitter’ is held to describe. So even though ‘jitter’ does not strictly speaking occur as such in most modern high-end DAC designs, there is still room for them to be afflicted by sonic issues with ‘jitter’-like characteristics, ultimately caused by unwanted RF noise.