From The Audiophile's Guide: Understanding Digital Audio

From <em>The Audiophile's Guide:</em> Understanding Digital Audio

Written by Paul McGowan

PS Audio CEO Paul McGowan has launched The Audiophile’s Guide, a 10-book set of the knowledge he’s garnered over the years. It’s a comprehensive collection of practical information, from understanding room acoustics and speaker setup to getting the best from analog and digital audio.

Copper will be featuring excerpts from The Audiophile’s Guide in this and future issues. We begin with a look at the history and technology of digital audio.

 

Understanding Digital Audio

The history of digital audio is a fascinating journey that spans over a century, intertwining with the broader story of technological advancement and our ever-growing capacity to store and process information.

Our tale begins in the late 19th century with Thomas Edison’s invention of the phonograph. This remarkable device, capable of capturing and reproducing sound using wax cylinders, marked the birth of analog recording. For nearly a hundred years, analog methods would dominate the music industry, evolving from Edison’s cylinders to vinyl records and magnetic tape.

 

An Edison phonograph and wax cylinders. Courtesy of Wikimedia Commons/Infrogmation of New Orleans.

 

While analog recording was a significant achievement, it had inherent limitations. Vinyl records, cherished for their warm sound, were prone to wear and tear. Each play caused minute damage to the grooves, gradually degrading sound quality. They were also susceptible to dust, scratches, and warping. Magnetic tape, introduced in the mid-20th century, allowed for easier editing and multi-track recording but could stretch, break, or degrade over time. But perhaps the biggest drawback of analog formats was the issue of copying. Each generation of copies resulted in a noticeable loss in quality, a significant problem for the music industry where multiple copies were often needed in production and distribution.

As these challenges persisted, the concept of digital audio was beginning to take shape. Its roots can be traced back to 1928 when Harry Nyquist, a Swedish-born American engineer at Bell Labs, published a paper on signal sampling. Nyquist proposed that to accurately reconstruct a signal, you needed to sample it at least twice the rate of its highest frequency. This became known as the Nyquist Theorem, a cornerstone of digital audio theory.

Building on Nyquist’s work, Claude Shannon published his groundbreaking paper on information theory in 1948. Shannon’s work provided the mathematical foundation for digital communication, including audio. Together, Nyquist and Shannon laid the theoretical groundwork for digital audio decades before the technology to implement it existed.

The first practical step towards digital audio came in the 1960s with pulse-code modulation (PCM), a method of representing analog signals digitally. Initially used for telephony, PCM’s potential for high-quality audio reproduction was quickly recognized. Around the same time, another technique emerged: pulse-density modulation (PDM), later known as DSD (Direct Stream Digital). PDM took a different approach, using a very high sampling rate to produce a stream of single bits.

 

A sound wave, in red, represented digitally, in blue (after sampling and 4-bit quantization). Courtesy of Wikipedia/Aquegg.

 

In the early 1970s, Thomas Stockham, often called the father of digital audio, developed one of the first digital audio recording systems. His Soundstream system proved that digital audio could rival or surpass the quality of analog recordings. The next major breakthrough came in 1974 when Nippon Columbia (Denon) released the first commercially available digital audio recorder, the DN-023R, though it was massive, expensive, and used videotape for storage.

While the concept of converting sound to numbers was promising, a significant challenge remained: how to store and process all this digital data.

 

The Challenge of Storing Data

The challenge of storing data has been a constant throughout human history. From cave paintings to clay tablets, our ancestors sought ways to preserve knowledge and share information across time and distance. The written word was a revolutionary step, allowing complex ideas to be captured and transmitted. As civilizations grew, so did their information needs, leading to innovations like the abacus and various numbering systems.

The printing press, invented in the 15th century, marked another leap forward, enabling mass production of written material. However, it didn’t solve the problem of data processing. The industrial revolution brought new challenges, spurring innovations like Herman Hollerith’s punch card system in the late 19th century, a significant step towards automated data processing.

The advent of electronic computing began to offer solutions for both data storage and processing. Let’s start with processing.

Early computers used vacuum tubes, which were revolutionary but prone to burnout and generated significant heat. The invention of the transistor in 1947 was a major breakthrough, allowing for more powerful and compact computers.*

 

*It was during this era of room-sized computers that the term “bug” entered the computing lexicon. In 1947, Grace Hopper and her team found a moth trapped in a relay of the Harvard Mark II computer, causing a malfunction. This incident popularized the terms “bug” and “debugging” in computing.

 

The development of integrated circuits (ICs) in the late 1950s by Jack Kilby and Robert Noyce was the next big leap. This allowed multiple transistors to be packed onto a single chip, dramatically increasing computing power while reducing size and cost. Moore’s Law predicted the exponential growth in transistor density, which has held true from early ICs with a few thousand transistors to modern processors with billions.

These advancements in computing power and miniaturization were crucial for digital audio. They enabled the creation of digital-to-analog converters (DACs), digital signal processors (DSPs), and compact, powerful amplifiers that could process and reproduce digital audio with unprecedented accuracy.

As computers evolved, so did data storage methods. Magnetic tape gave way to floppy disks, then to hard drives, and eventually to solid-state drives. Each new technology offered greater capacity and faster access times, crucial for storing and accessing high-quality digital audio.

 

Enter the Compact Disc

A pivotal moment in digital audio history came in 1982 with the introduction of the Compact Disc (CD), jointly developed by Sony and Philips. This revolutionary format offered 74 minutes (later extended to 80 minutes) of high-quality digital audio in a small, durable package, fundamentally changing how we consume and store music.

The history of optical disc storage dates back to the 1960s, with roots in both audio and video technologies. The concept of using light to store and read information from a disc was first explored by David Paul Gregg, who patented the basic technology for optical discs in 1961. This laid the groundwork for future developments in both audio and video formats.

In the realm of video, the first commercially available optical disc format was the LaserDisc, introduced by Philips and MCA in 1978. LaserDisc was a significant leap forward in home video quality, offering superior picture and sound compared to VHS tapes. The format stored analog video and audio on a 12-inch disc, read by a laser. While LaserDisc never achieved widespread adoption due to its high cost and lack of recording capability, it was popular among videophiles and laid important groundwork for future optical disc technologies.

Meanwhile, in the audio world, Sony and Philips were independently working on digital audio disc systems in the late 1970s. Sony’s work was partially based on optical disc technology they had been developing for video applications. Philips, on the other hand, had been working on both LaserDisc technology and a smaller audio-only disc format called Compact Disc Digital Audio.

Recognizing the potential for conflict and market fragmentation, Sony and Philips decided to collaborate in 1979. Their goal was to create a single standard for digital audio discs that would dominate the market. This collaboration brought together Philips’ expertise in laser technology and Sony’s experience with digital signal processing.

 

Sony CDP-101, the world's first commercially available CD player. Courtesy of Wikimedia Commons/Atreyu.

 

The development of the CD faced numerous technical challenges. One major hurdle was determining the optimal sampling rate and bit depth for digital audio. After much debate and testing, they settled on a 44.1 kHz sampling rate and 16-bit resolution. This combination was chosen to capture the full range of human hearing (typically 20 Hz to 20 kHz) while providing sufficient dynamic range (about 96 dB) to reproduce the quietest and loudest sounds in music.

The 44.1 kHz sampling rate wasn’t chosen arbitrarily (which we’ll be discussing in a bit). It directly relates to Nyquist’s Theorem mentioned earlier, which to remind you states that to accurately reconstruct a signal, you need to sample it at least twice the rate of its highest frequency. Human hearing typically extends to about 20 kHz, so a sampling rate of at least 40 kHz was necessary to capture the entire audible spectrum.

The engineers chose 44.1 kHz, slightly above the minimum required by Nyquist’s Theorem, to allow for the limitations of analog anti-aliasing filters used in the recording process. This extra headroom ensured that the entire audible spectrum could be captured without introducing aliasing artifacts (unwanted frequencies that can occur when a signal is sampled at too low a rate, causing higher frequencies to “fold back” into the audible range and create distortions). Aliasing can result in false or phantom frequencies that weren’t present in the original audio, potentially causing a harsh or unnatural sound. By sampling at 44.1 kHz, engineers provided a small buffer above the theoretical minimum, allowing for more gradual and effective anti-aliasing filters and ensuring a cleaner representation of the highest audible frequencies.

The 16-bit resolution was chosen as a balance between audio quality and practical considerations of data storage and processing capabilities of the time. It provides a theoretical dynamic range of 96 dB, which was deemed sufficient to capture the range between the quietest and loudest sounds in most music recordings.

Another significant challenge was developing reliable laser pickup technology that could accurately read the microscopic pits on the disc’s surface. The team also had to create a robust error correction system to ensure that minor scratches or dust wouldn’t affect playback.

CDs store data in a spiral track of pits and lands on a reflective surface. What the hell are “pits” and “lands?” Imagine a smooth, shiny surface, like a mirror this represents the “lands.” Now, picture tiny indentations or dents in this surface these are the “pits.” The pits are incredibly small, about 100 nanometers deep and 500 nanometers wide, far too tiny to see with the naked eye. To put this in perspective, if a human hair were as wide as a football field, a pit on a CD would be about the size of a grain of sand on that field. Or, imagine shrinking yourself down so that the width of a CD (about 4.7 inches) looked as big as the distance from New York to Los Angeles. At that scale, a pit would still be smaller than the width of your fingernail. These minuscule dimensions allow CDs to pack an enormous amount of data into a small space, enabling their high storage capacity.

 

The pits and lands of a CD under a microscope. Courtesy of Wikimedia Commons/Akroti.

 

And if you think that’s small, consider the Digital Versatile Disc (or DVD), introduced to the public 12 years after the commercial release of the CD. DVD pits are even tinier, about half the size of CD pits. Going even further, Blu-ray discs (introduced to the public twelve years after the DVD) have even smaller pits, about a quarter the size of DVD pits. In our analogy, these would be roughly the size of a large molecule. This minute size is why Blu-ray discs can store up to 50 GB of data on a single-layer disc compared to the 4.7 GB of a DVD and the mere 700 MB (or 0.7 GB) of a standard CD. To put it another way, a single Blu-ray disc can hold about 10 times as much data as a DVD, or over 70 times as much as a CD. This massive increase in storage capacity is what allows Blu-ray discs to contain entire high-definition movies with multiple audio tracks and special features, while CDs are limited to about 80 minutes of standard-quality audio.

A laser reads these pits and lands as the CD spins. When the laser hits a land, it reflects straight back to a sensor. When it hits a pit, the light scatters and less light returns to the sensor. This difference in reflection is interpreted as the 1s and 0s of digital data, which represents the audio information.

This optical system meant CDs were immune to the physical wear and tear that affected vinyl records and tapes. Unlike a record needle that physically contacts the disc surface, or magnetic tape that rubs against playback heads, the CD’s laser never touches the disc. This allows for theoretically perfect playback every time, no matter how often the CD is played.

The pit and land structure, combined with the disc’s protective plastic layer, also made CDs more resistant to dust and minor scratches compared to earlier formats. Even if a small scratch or dust particle obscures some pits, the CD player’s error correction system can often reconstruct the missing data, ensuring uninterrupted playback. This error correction system was particularly innovative, using a combination of interpolation and redundancy to ensure that minor scratches or dust wouldn’t affect playback. This made CDs much more robust than previous formats.

Despite the poor sound quality in early versions of CDs, and subsequent outrage from us audiophiles, the format quickly gained popularity. They offered several advantages over vinyl and cassettes: no surface noise, no need to flip sides, quick access to any track, and consistent sound quality regardless of how many times they were played.

The CD’s 44.1 kHz/16-bit format became the standard for digital audio for decades, influencing everything from professional recording practices to consumer playback equipment. Even as higher resolution formats have emerged, many argue that CD quality is sufficient for most listening situations, given the limitations of human hearing. Moreover, the CD’s physical format proved to be remarkably versatile. It became the basis for other optical disc formats like CD-ROM for data storage, CD-R for home recording, and eventually (as mentioned earlier) DVDs and Blu-rays for video and higher resolution audio.

While digital downloads and streaming have largely supplanted CDs for everyday listening, the format remains important in the audiophile community. Many enthusiasts appreciate the tangible nature of CDs, their consistent quality, and the fact that they offer uncompressed audio without the need for an internet connection.

 

God and 0

The transition from CDs to cloud-based music access represents another paradigm shift in how we store and consume music. Streaming services offer instant access to vast libraries of music, far beyond what any personal collection could contain. This shift has changed not just how we listen to music, but how we discover new artists, create playlists, and share our musical tastes with others. But how exactly does it work? Most of us understand the idea that computers work with bits of data, 1s and 0s. How in the world can we count to 65,535 (the maximum number used in a CD’s digital audio) using only 1s and 0s? Seems impossible, right?

Gottfried Wilhelm Leibniz, a brilliant German polymath of the 17th century, found himself pondering that same question: How could one represent all numbers using only the power of 2? This seemingly simple query would lead to a revolutionary way of thinking about numbers. Leibniz wasn’t the first to consider a base-2 number system. Ancient cultures, including the Chinese and Egyptians, had explored similar ideas. However, it was Leibniz who formalized the binary system as we know it today and recognized its profound implications.

 

Gottfried Wilhelm Leibniz. Courtesy of Wikimedia Commons/public domain.

 

In 1679, Leibniz wrote his seminal paper “Explanation of Binary Arithmetic,” where he laid out the principles of counting using only two digits: 0 and 1. He was inspired partly by the ancient Chinese text known as the I Ching, which used a system of broken and unbroken lines to represent different states. Leibniz saw in binary a reflection of creation itself, with 1 representing God and 0 representing nothingness. Beyond this philosophical angle, he recognized the system’s elegance and simplicity. Every number could be represented as a sum of powers of 2, just as every physical weight could be measured using a set of binary-based balance scale weights.

While Leibniz’s work on binary numbers was groundbreaking, it would take centuries before the practical applications of this system would be fully realized. The dawn of the computer age in the 20th century would finally bring binary counting to the forefront, forming the foundation of all digital technology, including our modern audio systems.

We’re used to working with ten digits in our everyday lives, but computers and digital audio systems work with just two: 0 and 1. It’s like having a bunch of light switches that can only be on or off. This might seem limiting, but it’s actually incredibly powerful. Let’s start with how we count in binary. In our usual decimal system, each digit represents a power of 10. For example, in the number 254, we have 2 hundreds, 5 tens, and 4 ones. In binary, we do the same thing, but with powers of 2 instead.

Here’s how we count from 0 to 10 in binary:

0 = 0
1 = 1
2 = 10 (1 two and 0 ones)
3 = 11 (1 two and 1 one)
4 = 100 (1 four, 0 twos, and 0 ones)
5 = 101
6 = 110
7 = 111
8 = 1000
9 = 1001
10 = 1010

Each position in a binary number represents a power of 2. From right to left, we have the ones place (2^0), the twos place (2^1), the fours place (2^2), the eights place (2^3), and so on. To convert a binary number to decimal, we simply add up the values of all the “on” positions. For example, the binary number 1010 can be represented as:

1 * 2^3 (8) + 0 * 2^2 (0) + 1 * 2^1 (2) + 0 * 2^0 (0) = 10

Now, let’s tackle how we can represent big numbers like 65,536 with just 1s and 0s. The key is using enough binary digits, or bits. With 16 bits, we can represent 2^16 different values, which is exactly 65,536. The binary number 1111111111111111 (sixteen 1s) represents 65,535 in decimal, while adding one more bit (making it 17 bits long) would allow us to double the amount of numbers able to be represented, up to 131,071.

In digital audio, we use these binary numbers to represent the amplitude of our sound wave at each sample point. The more bits we use, the more precise our representation can be. This is why 24-bit audio can capture more detail than 16-bit audio – it has many more possible values to work with.

But how do we get from these binary numbers back to the voltages that drive our speakers? This is where digital-to-analog converters (DACs) come in. A simple DAC might work like this: Imagine we have a 4-bit number, so we can represent values from 0 to 15. This would let us have 16 different voltage levels, each corresponding to one of these numbers. When the DAC sees a binary number, it outputs the corresponding voltage. More sophisticated DACs use techniques like delta-sigma modulation, which rapidly switches between two voltage levels (DSD or PDM) to create an average voltage that corresponds to our digital value. This allows for very precise control over the output voltage.

The DAC’s job is to take our string of binary numbers and turn them back into a smooth, continuous voltage that represents our original sound wave. It’s doing this thousands of times per second, fast enough that to our ears, it sounds like a continuous sound rather than a series of discrete steps. This process of converting between the continuous analog world and the discrete digital world is at the heart of digital audio. By understanding how we can represent complex information with just 1s and 0s, we begin to grasp the power and flexibility of digital audio systems.

In the next installment we'll discuss analog vs. digital, pulse code modulation and other types of digital encoding, and analog to digital conversion.

Back to Copper home page

1 of 2