All but possibly the very newest “newbies” to our hobby must be aware of the apparently never-ending war between audio “subjectivists” – those who believe that what they hear and the emotional or psychophysical response it elicits from them is both real and important, and audio “objectivists”, those others, often engineers or academics, who think that our ears are too easily deceived, and insist on measurement or objective testing as the only way to truly evaluate audio equipment and its performance.
Unquestionably the pinnacle of such testing in the minds of its advocates is “double-blind” testing, which is done in such a way that neither the participants in the test nor those administering it have any idea of which of some number of items or products under test is actually being heard and evaluated at any given time. This, they say, eliminates any kind of bias, whether intentional or inadvertent on the part of either the testers or the testees and, they claim, makes for more reliable and believable results.
The most common use for double blind testing in audio seems to be to determine whether a sonic difference claimed to be heard by one or more people is actually perceived or is an error or illusion resulting from “placebo effect”, “expectation bias”, individual hearing characteristics (like tinnitus, for example), or some other circumstantial, situational, or individual aberration. Although it has been used in the past for evaluating many other claims, one current favorite subject for double-blind testing is audio cables and whether they actually do sound significantly different and, as I am regularly told, the findings have, to at least a major degree, been that they do NOT.
If you have read any of the things written by or about me in this and other publications, you probably know that I disagree completely with those findings. And you probably also know that I was the designer of XLO cables – at one time recognized everywhere as “the Best in the World” – and that I sold that company more than a dozen years ago, in 2002.
Knowing those things, you may ask “How can I be sure that you aren’t just challenging double-blind testing to protect your own business interests?” or “How can a test as simple as ‘Can you hear a difference between these two things?’ possibly be wrong?”
My answer to that first question is simple: No. Other than writing and a little consulting, I no longer have any business interests in the hi-fi industry. Even if I did, though, to assume that I would deceive you to protect them would be wrong. My main reason for designing and selling cables was because I, my customers, and most of the world’s most important audio reviewers believed them to work and to provide real value to their buyers.
As to the second question, there are LOTS of ways that even a popular test can give wrong, inapplicable, or misleading results: You can use the wrong test; test the wrong thing; do the test in the wrong way; use the wrong testing tools or materials; misinterpret the results; and on and on and on… Any one of those errors will produce results that may either give wrong information or tell nothing at all about whatever you were trying to learn. The simple fact is that if the test results and the facts disagree, it’s never the facts that are wrong.
To illustrate to you that, while double-blind testing may be of considerable importance in other fields and scientific or technological endeavors, it doesn’t work very well at all for most audio applications, I’d like to propose a little test of my own that could either prove my point about double-blind testing or prove me forever to be wrong:
My proposed test is, itself, double-blind. To conduct it, I propose that we first gather a number of sets of loudspeakers of whatever kind or mix of kinds we wish, providing only two simple things: They must all be of roughly similarly similar physical size, and they must all have significantly different measured performance. This last requirement should be easy; of all of the elements of a (high-end) hi-fi system, the one most likely to have the greatest measurable distortion; the most wildly varying (non-“flat”) frequency response; and the most obviously different characteristic sound quality as compared to other same-purpose products is the speakers. A system’s electronics, regardless of brand or model, will almost certainly have a frequency-response curve flat within less than one decibel within the audio range, and will – even the most “classic” tube gear – likely have a THD spec’ of less than one percent. Even most good phono cartridges will measure well — a little less flat, perhaps, with a little more distortion, but still pretty good. It is not at all uncommon, though, for speakers to have – both at the frequency extremes and at points between them – variations of 6 to even 10dB, and to have measured harmonic and other distortion figures of from 3 or 4%, to as much as 40% (usually heard as “frequency doubling” in the bass frequencies).
In short, speakers, even to the most doubting of Thomases, DO clearly and obviously sound and measure different, and that’s what makes them perfect for my proposed test: Let’s get some, measure them, get a bunch of people to listen to them – all on the same system; all playing the same music in the same room, at the same measured (using pink noise) listening volume at the same listening position, and get the people to give us their comments on each the speakers, always knowing exactly which one (if mono, or which set, if we decide to do the test in stereo) is playing. (Doing it in stereo will add other complicating factors, like the polar patterns of the speakers into the mix and allows for other judgment criteria, like imaging and soundstaging, but that’s okay).
As everybody listens to each of the speakers, let’s have him write down his listening impressions and give a brief description of what each (speaker or pair) sounds like. Then, finally, let’s mount an acoustically transparent “scrim” between the listener(s) and the speaker(s), so that which speaker(s) are playing can’t be seen, and let’s run the whole series of tests again, playing all of the same music on all of the same speakers in random order, and having everyone leave the room between plays while some non-participant third party changes the speakers, so that neither the testers nor the testees can ever know which ones are playing.
My guess is that if the testees are asked — for each playing of each speaker on each test selection of music — to blindly identify which speakers they are listening to, just as they would have to do in a test of cables or other items without such great measured differences, their test scores will come out not one whit better than they would for any of those other items. The reason being, IMHO, that music has multiple different characteristics and aspects, not all of which may be available to hear at any given instant, and that listeners have different differentiating characteristics that they listen for, which also may not all be present at any given moment, with the result that we’re really administering a test with double multiple variables when everyone knows that, in order for a test to be able to produce meaningful results, no more than a single variable can be involved.
At least for audio, while other forms of measurement and testing may be useful, double-blind testing CAN’T work. If you doubt me, try my simple test. If I’m right, you might be amazed to learn that you really WERE hearing all those differences that the double-blind testers told you were just snake oil and voodoo, and might want to go back and listen to them again. And if I’m wrong, and if you show me well-documented proof that you have conducted a double-blind test performed EXACTLY as I just described it and the testees could consistently tell which speakers were playing, I’ll apologize to you right here in print and send you a free set of cables.