It’s the time of year for saving money!
Today’s column is about scientific robustness and how it relates to the strength of the conclusions generated by an observation. First, let me define robustness. A robust scientific test, procedure, and opinion are ones that are readily repeatable over a large sampling. A test of what occurs when you heat water in a pan – it comes to a boil – is an extremely robust test. The less repeatable and less universal an experiment, or test is, the less robust it is.
For you scientific types, I just told you was obvious. But I’m good at that…
How does this relate to audio? Audible (and inaudible) sonic differences can have a wide range of robustness, but those that are more robust are far more likely to be universally true.
If you switch one of the speaker cable’s leads from correct polarity to incorrect polarity, so one of two stereo speakers is out of phase with the other, in a blind A/B test where the listener is sitting in the center listening position, most humans should be able to hear a difference. That would be a robust test.
Conversely, one person’s subjective description comparing several headphone cables would, by definition, lack robustness in that few people have the exact same signal chain so as to confirm the findings or duplicate the test environment. This test would be far less robust and have less universal application.
Recently Audio Bacon published a report on power cables that the reviewer spent one solid year working on. It is extremely detailed, and the test procedure was as rigorous as possible considering the test parameters. But due to its small sampling size (one test subject) and difficulty in corroborating the findings, this, too, despite the author’s best intents, is not a robust review.
Conversely tests done by Sean Olive at Harman’s test facility are extremely robust because he employs multiple test subjects in multiple repeatable tests. The primary area for disagreement isn’t the test methodologies, but the narrowness of the test parameters and whether the conclusions do have more generalized applications. An example would be the “room feel” tests where listeners preferred an EQ curve that reintroduced bass augmentation caused by natural “room gain” in loudspeakers to their headphones. Whether this is an increase in accuracy or merely euphony is the question here. But the test clearly shows that humans can hear the differences, and most prefer more bass…
Another way that vastly increases the robustness of a review is the reviewer’s past experience and printed history. Michael Fremer’s analog reviews are an example of a robust review that is robust because of his long history of reviews that anyone can reference to see if their tastes align with Fremer’s. If they do align that would increase the review’s value to the reader and Fremer’s conclusions would gain additional weight and usefulness. Someone could have similar conclusions, but without the history behind them they would not have the same level of truthiness.
Yes, some “subjective” reviews and reviewers are more robust than others. Fremer’s are robust while that first review in a headphone site by someone who’s never been published before is not. Fremer has earned that level of credibility through years of work.
So, what’s the takeaway here? Check your sources, just like with political articles, the source and the source’s track record matter. A review from a source that is a known quantity should have more weight and a greater level of truth than an anonymous one…methodologies, track record, and personal biases all have an effect on the usefulness and trustworthiness of a review…so check your sources because not all reviews have equal levels of truth or contain universally applicable conclusions…
Great column. A lot of readers judge the credibility of a review not by the solid criteria you list, but by how much it confirms their existing beliefs, flatters their egos, etc. Harry Pearson was probably the first to realize that, and he exploited it like nobody before or since.
Thanks for this, Steven. As a professional who worked in statistics, I can’t say much more than, “I agree.”
I was expecting a discussion of the measurements of John Atkinson as part of this discussion.
I should add, beware of trolls and certain people on internet forums who insist that if you haven’t done measurements, your findings are invalid. I feel, like Steve, that the observational opinions of experienced listeners and reviewers ARE valid.