A transient is defined as an event that quickly moves from one state to another. The strike of a cymbal in a quiet room is a transient event.
How an audio circuit handles that rapid change from nothing to something then back again is critical to its performance. A slurring of the speed of reproduction, or an over reaction to a sudden change is instantly recognized by the ear as being wrong. (our ear's sensitivity to sudden events is likely a protective mechanism we developed over millions of years. The snap of a twig could mean the difference of life and death)
This is one reason why in the lab we always measure the square wave response of any circuit we're designing. If the transient response (sometimes known as the step response or the slew rate) is off, the shape of the square wave is visibly wrong on an oscilloscope.
Here is the ideal response we're looking for.
But that's never what we get, since no circuit is perfect. Here is a picture that is more like reality.
Even though it isn't as perfect as the input, this is the kind of performance we're happy with.
Why? Because the rise times are reasonable and there's no overshoot like you find in this photo.
See those spikey lines at the beginning of the square wave? That indicates the transient response is way off. It's going to sound bright and harsh.
When we look at audio product specs we focus a great deal on lowering distortion and noise, but as designers, there's a lot more to lose sleep over.