From The Audiophile's Guide: Understanding Digital Audio, Part Two

From <em>The Audiophile's Guide:</em> Understanding Digital Audio, Part Two

Written by Paul McGowan

PS Audio CEO Paul McGowan has launched The Audiophile’s Guide, a 10-book set of the knowledge he’s garnered over the years. It’s a comprehensive collection of practical information, from understanding room acoustics and speaker setup to getting the best from analog and digital audio.

Copper will be featuring excerpts from The Audiophile’s Guide in this and future issues. We continue with Part Two of a look into digital audio technology. Part One, which covered the history of early digital audio and some digital encoding fundamentals, appeared in Issue 215 and you can read it here. We continue with a look at the nature of analog versus digital audio.

 

Analog Versus Digital

In the world of audio, we often hear about analog and digital as if they’re two completely different beasts. But at their core, they’re just two ways of representing the same thing: sound.

Analog is the natural state of sound. When we speak, sing, or play an instrument, we create sound waves that travel through the air. These waves are smooth, continuous variations in air pressure. Our ears are beautifully designed to detect these pressure changes and turn them into electrical signals our brains can interpret as sound.

Digital, on the other hand, is a way of representing these smooth, continuous waves using numbers or symbols. It’s like taking a photograph of a beautiful landscape. The photo isn’t the landscape itself, but a representation of it that we can easily store, copy, and share.

In the audio world, we use digital techniques to capture and store sound in a way that’s convenient and flexible. But here’s the key point: we can’t listen to digital directly. Our ears and speakers work in the analog domain, so at some point we need to convert that digital representation back into analog form.

Think of it like a recipe. Analog is the actual meal, with all its flavors, textures, and aromas. Digital is the recipe written down – a set of instructions that, when followed correctly, can recreate the meal. But you can’t eat the recipe itself; you need to cook the meal to enjoy it. The process of turning analog sound into digital information is called analog-to-digital conversion (ADC). We take measurements, or samples, of the analog signal many thousands of times per second and assign each sample a numerical value. Later, when we want to listen to the sound, we use a DAC to turn those numbers back into a continuous analog signal that we can play through speakers or headphones.

Digital has some advantages. It’s easier to store, copy, and transmit without loss of quality. It’s also more resistant to certain types of distortion and noise. But it’s not magic – the quality of the digital representation depends on how accurately and frequently we take those measurements of the original analog signal.

Sound waves going from analog to digital and back to analog again. Courtesy of Wikimedia Commons/Teeks99.

 

Pulse Code Modulation

There are two main formats we use to represent analog sound in digital form, pulse code modulation (PCM) and pulse density modulation (PDM), also known as Direct Stream Digital (DSD). PCM is the most common method we use to represent analog audio in digital form. At its heart, PCM is a way of taking snapshots of a sound wave at regular intervals and assigning each snapshot a numeric value. The goal is to capture enough information about the original sound wave that we can later reconstruct it with high fidelity. Imagine you’re trying to describe the shape of a roller coaster to someone who’s never seen it. You might take measurements of its height every few feet along its length. With enough measurements, the person could plot these points and get a pretty good idea of the roller coaster’s shape. PCM does something similar with sound waves.

The history of PCM dates back further than you might expect. Its roots lie in the work of Alec Reeves, a British engineer who patented the concept in 1939. Reeves was working on improving telephone systems, and he realized that converting analog signals to digital form could help reduce noise and interference during transmission. However, the technology of the time wasn’t up to the task of implementing PCM for audio. It wasn’t until the 1960s and ’70s that advances in electronics made PCM practical for high-quality audio recording and playback. The CD, introduced in 1982, brought PCM audio to the mass market and ushered in the digital audio revolution.

Now, let’s dive into how PCM actually works. The process involves three key steps: sampling, quantization, and encoding.

Sampling is the process of measuring the amplitude (loudness) of the sound wave at regular intervals. The number of times we take these measurements each second is called the sample rate. For CD-quality audio, we sample 44,100 times per second, for the reasons we discussed earlier.

Quantization is the step where we assign a numeric value to each of these samples. The precision with which we can specify these values is determined by the bit depth. CD audio uses 16 bits, which allows for 65,536 different possible values for each sample. Higher bit depths, like 24-bit, allow for even more precise quantization.

Finally, encoding is the process of turning these quantized values into a stream of binary digits (bits) that can be stored or transmitted. In its simplest form, this might just mean writing out each sample’s value in binary.

An example of sampling and quantization of a signal (red) for 4-bit linear PCM over a time domain at specific frequency. Courtesy of Wikimedia Commons/unlisted author.

 

When it’s time to play back the audio, we reverse the process. We decode the binary data back into numeric values, and then use a DAC to turn these discrete values back into a smooth, continuous analog signal.

One of the strengths of PCM is its simplicity and flexibility. By adjusting the sample rate and bit depth, we can trade off between audio quality and data size. This has allowed PCM to remain relevant from the early days of digital audio right up to today’s high-resolution formats. However, PCM isn’t perfect. The process of quantization introduces a small amount of error, which we hear as noise. We use techniques like dithering to mitigate this, but it’s an inherent limitation of the PCM approach.

Despite these challenges, PCM remains the backbone of digital audio. Its straightforward approach and proven track record have made it the go-to choice for everything from streaming services to professional recording studios. As we continue to push the boundaries of audio quality, PCM evolves with us, adapting to new demands while maintaining compatibility with decades of existing hardware and software.

 

Pulse Density Modulation

PDM/DSD takes the concept of binary representation to a whole new level of elegance and simplicity. While PCM gives us detailed snapshots of sound, PDM offers a continuous stream of single bits that dance to the rhythm of our music. It’s as if we’ve moved from painting by numbers to creating art with a single, ever-flowing brushstroke. In this sense, it is more analog-like.

The history of PDM dates back to the 1940s and 1950s, when engineers were exploring different ways to digitize analog signals. However, its application in high-quality audio began in the 1990s, when Sony and Philips developed PDM for their Super Audio CD (SACD) format, which they branded as “Direct Stream Digital.” They were looking for a way to capture and reproduce audio with even greater fidelity than CD-quality PCM. (To this day, all of PS Audio’s higher-end DACs are based on PDM, as well as our record label, Octave Records. We’re committed diehard PDM fans, because to our ears, PDM is remarkably better sounding than PCM – though the debate rages on within the hallowed halls of audio nerds like me.)

At its core, PDM uses a technique called delta-sigma modulation. Here’s a simplified explanation:

Imagine you’re trying to fill a bucket with water to a specific level. With PCM, you’d measure the water level at regular intervals and add or remove water as needed. With PDM, you’d have a very small cup that you’re constantly either adding to the bucket (1) or not adding to the bucket (0), many thousands of times per second. The rate at which you add water (the density of 1s in the bitstream) corresponds to the desired water level (the amplitude of the audio signal).

In a PDM system, an analog signal is fed into a delta-sigma modulator. This modulator compares the input signal to its current output level many times per second (typically 2.8224 million times per second for standard DSD64 [used for SACDs], or 11 million times a second for DSD512 [not currently used in any commercial product]). If the input is higher than the current output, it outputs a 1. If it’s lower, it outputs a 0. This creates a stream of bits where the density of 1s corresponds to the amplitude of the original signal.

PCM vs. DSD processing. Courtesy of Wikimedia Commons/Paweł Zdziarski.

 

To help visualize the difference between PCM and PDM, let’s use an analogy of two different types of trains transporting water. Imagine PCM as a train with large tanker cars. Each car can hold a precise amount of water, measured in 16 distinct levels (representing 16-bit audio). This train moves at a steady pace, let’s say 44.1 mph (representing 44.1 kHz sample rate). At each station (sample point), the train stops, and we fill the tanker car to one of the 16 levels, depending on how much water (audio amplitude) we need to transport at that moment.

Now, picture PDM as a very different kind of train. Instead of large tanker cars, this train has a long sequence of small cups. Each cup can only be either full (1) or empty (0). But here’s the key difference: this train moves much faster, about 2,822 mph (representing the 2.8224 MHz rate of standard DSD). As the train zooms by, we’re rapidly deciding for each cup: do we fill it or leave it empty?

The PCM train gives us a series of precise measurements at regular intervals. The PDM train, on the other hand, gives us a very fast stream of simple yes/no decisions. In the PCM train, the water level in each tanker car directly represents the audio amplitude at that moment. In the PDM train, it’s the proportion of full cups over a short time that represents the audio amplitude.

To extend this analogy further, think about how we’d reconstruct our water signal at the destination. With the PCM train, we’d need a complex system to precisely measure and output the water from each tanker car. With the PDM train, we could simply pour all the cups through a fine mesh (representing our low-pass filter). The amount of water coming through at any moment would naturally correspond to our original signal.

This analogy helps illustrate why PDM can be simpler to convert back to analog. The PDM signal is already a kind of analog signal, just at a very high rate. We’re not dealing with complex digital words that need to be carefully reconstructed, but a simple stream of ones and zeros that naturally average out to our desired signal.

Of course, both systems have their strengths and challenges. The PCM train allows for very precise measurements and easy editing, while the PDM train offers simplicity and potentially more natural conversion to analog. In practice, the choice between them often depends on specific needs and preferences in recording, processing, and playback of audio.

What makes PDM potentially better sonically than PCM is its simplicity and directness. The PDM bitstream is, in a sense, already analog-like. To convert it back to an analog signal, you essentially just need to smooth it out with a low-pass filter. This is much simpler than the complex digital-to-analog conversion process required for PCM.

In fact, a PDM stream can be directly injected into an analog power amplifier with nothing more than a simple analog low-pass filter. This bypasses the entire D/A conversion process required for PCM, potentially resulting in a more direct and faithful reproduction of the original analog signal. It’s like the difference between a complex mechanical watch and a sundial – while the watch might offer more features, the sundial provides a more direct connection to the thing it’s measuring.

This simplicity in the playback chain is one reason why most audiophiles (and certainly me and everyone at PS Audio and Octave Records) prefer the sound of DSD. We argue that it sounds more “analog-like” and natural, with better preservation of subtle details and spatial information.

 


An example of pulse density modulation of 100 samples of one period of a sine wave. 1s are represented by blue and 0s are represented by white, overlaid with the sine wave. Courtesy of Wikimedia Commons/Kaldosh at en.wikipedia.

 

Comparing the Two Formats

Let’s imagine two expert painters, Peter Charles Martin (PCM) and Pamela Diane Morris (PDM), each tasked with capturing the same breathtaking landscape. PCM is armed with a high-resolution camera, taking detailed snapshots at regular intervals. PDM, on the other hand, has a magical pen that never leaves the paper, creating a continuous line that somehow captures the entire scene.

PCM’s approach has clear advantages. Each snapshot is a complete, detailed representation of the scene at that moment. If PCM wants to change the color of a tree or adjust the brightness of the sky, it’s a simple matter of editing the relevant snapshots. This flexibility is why PCM is the go-to format for most digital audio work. It’s like having a digital audio workstation where you can easily cut, paste, fade, and adjust volume to your heart’s content.

PDM’s continuous stream of 1s and 0s, while capturing the audio with incredible fidelity, presents unique challenges when it comes to processing. In this single-bit system, each 1 or 0 represents either a tiny pulse of maximum amplitude or no pulse at all – there’s no in-between. This makes it impossible to directly adjust volume or apply tonal changes, as these operations require mathematical calculations that can’t be performed on single bits.

Editing PDM, in terms of cutting and splicing, is actually straightforward – you’re simply choosing where to start and stop the bitstream. However, any operation that requires changing the audio itself, like volume adjustment or equalization, cannot be done in the PDM domain. For these manipulations, the PDM stream must first be converted to a multi-bit format (like PCM or multi-bit PDM) where mathematical operations can be performed, and then converted back to single-bit PDM.

This is why pure DSD recordings, despite their superior sound quality, are challenging to work with in traditional audio production workflows. You can’t simply adjust the volume or apply effects without first converting the PDM stream into a format that allows such manipulations, then converting back to PDM after the changes are made.

Here’s where things get interesting. Let’s say PDM starts the painting, capturing the initial landscape with her unbroken line. If we need to make changes, we can temporarily convert this line into a series of snapshots (multi-bit PDM or PCM), make our edits, and then convert back to PDM’s continuous line. Remarkably, this back-and-forth conversion doesn’t significantly degrade the quality of PDM’s original capture. It’s as if PDM’s magical pen can recreate its initial strokes with near-perfect accuracy, even after we’ve tinkered with the image.

However, the reverse isn’t quite true. If PCM takes the initial snapshots, converting to PDM’s continuous line and back again might not preserve all the original detail. It’s like trying to recreate a detailed photograph using only a single, unbroken line – you might capture the essence, but some nuances could be lost. This is why the best high-end recording studios (including our own Octave Records) use PDM/DSD for the initial capture of sound. They’re essentially starting with PDM’s unbroken line, knowing they can convert to PCM for editing if needed, and then back to PDM without losing the magic of that initial capture.

In the end, both PCM and PDM have their place in the audio world. PCM offers unparalleled flexibility and ease of use, making it the standard for most digital audio applications. PDM, with its incredible fidelity and analog-like qualities, shines in high-end audio capture and playback. The choice between them often comes down to the specific needs of the project and the preferences of the listener.

As we continue to push the boundaries of digital audio, it’s exciting to think about how these two approaches might evolve and complement each other in the future. After all, in the world of audio, the ultimate goal is always the same: to recreate the magic of live sound in a way that moves and inspires us.

 

Analog to Digital Conversion

At the heart of our modern digital audio experience lies a crucial component that often goes unnoticed: the analog-to-digital converter, or ADC. This unsung hero is responsible for transforming the continuous waves of analog sound into the discrete digital data that our computers, smartphones, and audio equipment can process and manipulate.

Imagine you’re using your smartphone. As you speak, your voice creates sound waves that travel through the air. These waves are analog – continuous and smooth – but your phone can’t understand analog signals directly. It needs those smooth waves turned into a series of numbers it can work with. That’s where the ADC comes in, acting as a translator between the analog and digital worlds.

But it’s not just smartphones. Every time you make a call on a landline phone, an ADC is at work. The moment your voice enters the microphone, it’s converted to digital data for transmission over modern telephone networks. Even if you’re using an old-fashioned rotary phone, your voice is digitized at some point in its journey to the person on the other end.

The history of the ADC is fascinating and closely tied to the development of modern communications. The concept dates back to the 1920s, when telegraph pioneer Harry Nyquist began exploring ways to convert analog signals to digital. However, it wasn’t until the 1960s and ’70s that practical ADCs began to appear, driven by the needs of the burgeoning computer industry and digital telephony.

The world of high-quality audio recording is dominated by two main types of ADCs: successive approximation (SAR) and delta-sigma. For high-quality PCM recordings, which form the basis of most digital audio we consume today, delta-sigma ADCs are the go-to choice. These converters excel at high-resolution audio, offering excellent noise performance and linearity. They’re particularly good at handling the nuances of music, capturing subtle details that cheaper converters might miss.

Delta-sigma ADCs work by oversampling the input signal (sampling at a much higher rate than the final output), then using noise shaping techniques to push quantization noise out of the audible frequency range. This allows them to achieve high resolution and low noise in the frequency range that matters most for music.

However, when you look at the crème de la crème of audio recording, such as what we do at Octave Records with DSD, you see a specialized form of delta-sigma modulation, where the ADC is essentially a delta-sigma modulator without the decimation stage found in PCM delta-sigma ADCs. (Decimation is a process that reduces the sampling rate of a signal, typically used in PCM to bring the oversampled signal down to the desired output rate.) It outputs a continuous stream of single bits, representing the audio as the density of 1s versus 0s over time. This approach allows for extremely high temporal resolution and a very analog-like quality to the digital signal. The ADCs used in top-tier DSD recording setups are often custom-built or heavily modified commercial units. These ADCs are optimized for ultra-low noise, extreme linearity, and the ability to handle the very high data rates required for DSD recording.

So how does an ADC actually work? At its core, the process involves taking rapid “snapshots” of the analog signal at regular intervals – a process called sampling. The ADC measures the voltage of the analog signal at each of these sample points and assigns it a numeric value. This is called quantization. The sampling rate determines how many of these snapshots are taken each second. For CD-quality audio, that’s 44,100 times per second. It’s a bit like one of those optical-illusion illustrations made up of hundreds of squares of solid colors. The smaller the squares on the paper (higher sampling rate) and the more colors you have to choose from (higher bit depth), the more accurately you can represent the original curve.

The importance of ADCs in the audio world cannot be overstated. Every digital recording we hear – whether it’s a chart-topping pop song, a classical symphony, or a podcast – has passed through an ADC at some point. The quality of that ADC plays a huge role in determining the fidelity of the final product we hear.

In high-end audio, we obsess over the quality of ADCs because we know that once information is lost in the conversion process, it can never be recovered. A high-quality ADC can capture the subtleties of a performance – the resonance of a concert hall, the breathiness of a vocalist, the attack of a drum stick on a cymbal – with remarkable accuracy.

As we continue to push the boundaries of digital audio, with higher sampling rates and greater bit depths, ADCs remain at the forefront of innovation. They are the gatekeepers of our digital audio world, ensuring that the richness and complexity of analog sound is preserved as faithfully as possible in the realm of ones and zeros.

In the next installment, I’ll discuss the particulars of different types of digital to analog converters. 

Back to Copper home page

1 of 2