I recently got a reminder of how often lossy compression technologies such as MP3, AAC and Dolby Digital are misunderstood, when another journalist sent me an e-mail critiquing an article I’d written. Of course, once I got to thinking about the subject, I started to see more evidence of such confusion in the comments sections following articles here and on other sites.
In all of these cases, the person commented about the reduction in dynamic range caused by MP3. Having worked at Dolby Labs back in 2000 to 2002, I knew right away that these commenters’ knowledge about lossy compression was incomplete. But their comments got me curious. I’ve used MP3 for almost two decades, but I’d never actually looked at its effects on an audio waveform. So I decided to do just that, and to find out whether or not MP3 can really affect dynamic range.
You can see the results in the top graph, which shows a 4-millisecond sample of an original 16-bit/44.1-kilohertz waveform in blue and the waveform of the same file coded in 128 kbps MP3 in red.
What you’re looking at is the first big drum hit from “War,” a percussion piece taken from Dr. Chesky’s Ultimate Headphone Demonstration Disc. As with Chesky’s other recordings, minimal sound processing is used. What you’re hearing (and seeing) is the full dynamics of the recording, without the compression used in almost all pop, rock and jazz recordings.
You can see there’s a difference between the original and the MP3. In some places, MP3 reduces the peak output slightly. The peak level of the recording is very high, about -1 dBFS — just 1 dB below full scale (the absolute maximum level that a recording system can manage). Coding this recording in MP3 reduces that peak by about 0.3 dB. That’s a difference you’d struggle to hear with steady-state tones; on a drum hit, it’s difficult or impossible to hear.
You can also see that the level of the MP3 is also slightly higher than the original in places. This is because the difference between an original 16/44.1 digital file and an MP3 copy of it is due to imperfections in the MP3 coding process. As the graph suggests, the differences are more along the lines of random errors than specific, identifiable and consistent flaws.
For the sake of comparison, I took the same file and ran it though a dynamic range compression algorithm, set for a 2:1 compression ratio with a threshold of -18 dB. (This is the first preset in the Sony Sound Forge audio editing software I used.) The original’s in blue and the compressed waveform is in green. That big drum hit is reduced by about -8 dB — a difference that would be obvious to the ear.
Many audio enthusiasts describe the effects of MP3 much as they would describe the effects of dubbing audio onto analog tape, which has a deleterious effect on almost every aspect of audio quality (frequency response, dynamic range, signal-to-noise ratio, distortion, etc.). It doesn’t work that way. It’d be more useful (although not literally accurate) to think of MP3 and other lossy codecs as introducing random elements. The lower the bitrate (i.e., 128 kbps vs. 256 kbps), the more random elements are introduced, and the lower the audio quality. The frequency response and dynamic range are essentially unchanged, there’s just more junk in the signal.
That’s because MP3 works not by reducing dynamic range or frequency response, but by discarding data that’s less likely to be heard. It breaks an audio sample down into multiple frequency bins; analyzes them to find out sounds that are unlikely to be heard (for example, a 1.1 kHz tone at -20 dBFS adjacent to a 1 kHz tone at -3 dBFS); then reduces or zeroes-out the number of bits used to encode those relatively inaudible tones. You’ll still hear that loud -3 dBFS tone in almost all of its original glory, minus or plus the slight level error that MP3 might introduce. Of course, that’s a greatly simplified explanation; if you want to dig deeper, try this site.
Here’s another way to think about it. Imagine the audio signal as a wall-size painting. Then think of MP3 as a kid with a BB gun. If the kid starts shooting the picture in random places with the BB gun, and you’re viewing the painting from 20 feet away, you probably wouldn’t notice the first few holes. As dozens of holes appear, you’d eventually start to notice them, but the overall content of the picture would remain unchanged — the color would appear the same, the black and white levels would appear the same, and the objects depicted would still be easily recognizable.
By the time the kid empties the entire 650-shot magazine of his Red Ryder, parts of the painting might be missing enough canvas that its colors and shades start to shift, and certain elements pictured in the painting become unrecognizable. That would be roughly analogous to encoding MP3 at 32 kbps, as compared to the 128 kbps I’ve used here and the 256 kbps bitrate now used for most commercially distributed MP3 downloads.
MP3 and other lossy codecs employ data compression, not dynamic range compression. I think everybody who’s interested in audio basically knows this, they just tend to forget.
(NOTE: The BB gun analogy is included only to illustrate my point. Audiophile Review strongly discourages the firing of BB guns at paintings. Always follow the rules of BB gun safety. If you put someone’s eye out, it’s on you.)