What attributes should an item of equipment in a sound reproduction chain possess in order to meet the objectives of high-end audio playback? One attribute we tend to think of as important is flatness of frequency response. If a piece of equipment either boosts or attenuates a particular band of frequencies we tend to consider that a big no-no. Departures from a nominally flat frequency response – sometimes even subtle departures – can often be correlated with some sort of perceived tonal coloration in the resultant sound output. So flat frequency response is high on our list of good things, especially with equipment such as amplifiers.
But with loudspeakers the situation becomes horribly convoluted. Loudspeakers don’t even have a simple frequency response. For a start, their output is highly directional. More comes out in the straight-ahead direction than comes out at some angle to the side, or at some angle upwards or downwards. The laws of physics tend to demand that the proportion of the drive unit’s total output which is delivered off-center is strongly frequency dependent. So if, as you might be inclined to do, you measure a speaker’s frequency response in the straight-ahead direction, and optimize all the design variables so that the resultant on-axis frequency response is ruler flat, then the total energy delivered into the room (the sum of the loudspeaker’s outputs in all directions) cannot be flat at all. And vice-versa.
Then there’s the loudspeaker’s interaction with the room. Only a tiny, miniscule proportion of the total sound output of the loudspeaker travels directly from the loudspeaker to your ear. All the rest of it is launched into the room, where it bounces around off the walls and furniture. Some of it doesn’t so much bounce mirror-like, as diffuse like a spotlight shone onto a matte-painted wall. In any case, eventually, some of that dispersed sound also reaches your ears. When all these sounds reach your ears, having taken a plethora of paths to get there, they will recombine. Each of these sounds, depending on the path they have taken, will have a different phase delay, which means that they can constructively or destructively recombine, and the net effect is a somewhat unpredictable and chaotic disturbance to the overall frequency response. And should you move six inches to one side or the other, the net effect can change dramatically.
To a certain extent, you can attempt to correct for these effects by applying room-correction techniques to the signal. Effectively, you pre-distort the signal in such a way that the speaker/room combination exactly compensates for this added distortion and cancels it out completely. This is a complicated topic of its own, but the short answer, if you want one, is that it works really well at low frequencies and really badly at high frequencies.
Other effects, such as the ‘liveliness’ of a room – in effect the propensity of sound to continue reverberating around a room long after the generating stimulus has passed – add further complexities to the picture which cannot be reduced to a simple disturbance of the frequency response.
Another related issue boils down to the question of what a person who listens to something actually hears. What we actually hear is governed quite strongly by the peculiar shapes of our ears. If we hold the empty tube from a used-up kitchen roll to our ears, and listen through it, we hear an obviously colored sound. We would clearly be upset if our loudspeakers sounded like that. And yet that is precisely what our weirdly-shaped ears do. They effectively color the sound in quite a strong manner. Moreover, this coloration changes strongly according to the direction the sound is coming from. In other words, if you turn your head slightly, the coloration (i.e. frequency response) of what you hear will change slightly. Finally, because your ears have a different shape from mine, what you hear in terms of frequency response will be totally different to what I hear.
Now, to be fair, all of this ear stuff is compensated for to a large degree by our brains, which process what our ears detect, and decide what it is we are actually hearing. And the argument is a good one which holds that – over the short term at least – our ears are unchanging, and whatever colorations they might impose upon incoming sounds our brains are able to compensate for. But even so, at least in tests I have conducted upon myself, applying very small frequency response aberrations to a real music signal played through my loudspeakers, it is quite surprising how large a frequency response error has to be in order for it to be unambiguously noticeable. [NB: You can’t perform this test rigorously just by going into iTunes and playing with the graphical equalizer. It is quite a complex problem, in reality.]
So, an interesting question you might want to ask is this. Why, if the frequency response of a loudspeaker in a room is not remotely close to being flat, should it matter a jot that the frequency response of, say, the amplifier feeding it, is not itself ruler flat? Here we get into difficult territory, because there is plenty of anecdotal evidence which supports the truism that the flatter an amplifier’s frequency response, the better it tends to sound.
Before you start throwing things at me, I’m not trying to suggest that any amplifier with a flat frequency response will sound better than any other with a less-flat response. Painting with a broad brush here, the frequency response of just about any non-flat amplifier can be flattened up nicely with, for example, a hefty dose of negative feedback, and as a rule we find that the application of such feedback is normally deleterious to perceived the sound quality.
But it is an interesting observation nonetheless. And the equivalent situation holds for phase response: the phase response of a loudspeaker in a room is also a total mess, yet amplifiers and other electronic circuitry with only minor amplitude or phase response problems are often found to sound notably less satisfactory in real-world systems.
I would add to that digital processing. Low-pass Butterworth filters often sound fractionally better than equivalent Chebyshev filters, and both of these IIR filters tend to sound better than broadly equivalent FIR filters. There I go with the broad brush again, but when you wash away the noise that is the sort of consistent picture that tends to emerge. I will add this, though. At BitPerfect we have a digital processing engine that allows us to separate out the effects of phase and amplitude response, and the picture that emerges quite clearly is that phase response is by far the more important of the two. In an experiment we have prepared filters which produce a mathematically identical frequency response, but with three different phase responses, one of which is totally flat. Most listeners express a clear preference for the totally flat (i.e. totally linear) phase response.
So there is a definite dichotomy which is still in play. When we listen to loudspeakers in a room, we are hearing a sound which, with the best will in the world, is pickled with serious amplitude and phase distortions across the entirety of the audio band. These massively swamp any corresponding distortions that may have been introduced in the amplification chain. I would go so far as to suggest (not having tried it!) that it would be nigh on impossible to even measure the frequency and/or phase responses of an amplifier using microphones to measure the in-room output via the loudspeaker.
Why, then, are those properties of those amplifiers (and, from my personal perspective, digital audio processing chains) so gosh-darned audible?