What do audiophiles hate the most? Well, there’s Bose. And the late Julian Hirsch. After those two, it would probably be blind testing, specifically ABX testing. Why? Because the results of ABX testing tend to conflict with much of what audiophiles believe. The topic of ABX testing may be poised for another look with the recent emergence of the Audio by Van Alstine AVA ABX, which to my knowledge is the first commercially produced ABX box released in more than a decade. In this article, I’ll discuss what ABX testing is, explain the criticisms of ABX testing, and get a little into my first experiences with the AVA ABX.
When I found out about the AVA ABX from reading the comments section of this website, I immediately contacted Audio by Van Alstine’s namesake, Frank Van Alstine, to see if I could borrow one to try out and then buy if it met my needs. I was attracted to it not for its ABX capabilities, but because it looked like a well-made and versatile switcher I could use in my reviews. I have a good switching system that I designed for this purpose; however, like most hand-built, one-off electronic products, it’s not very reliable. I could see from the interior shot of the AVA ABX that it was built the same way my switcher is–with high-quality relays, minimalist controls for level matching, and a switching system. But the AVA ABX was designed by an experienced audio engineer, Dan Kuechle, who has the knowledge and resources to build products with professional-grade reliability.
What is ABX testing?
I’ve been using the AVA ABX for level-matching and switching in my reviews for a few months, but I hadn’t actually experimented with the ABX function until recently. Here’s how ABX testing works: The ABX box presents two audio signals, A and B, plus a third, X. X is either A or B; the assignment is random, and it changes (or doesn’t change) with every trial. So you listen to A, listen to B, listen to X, and then decide whether X is A or B. Then you or the test administrator activates a function on the ABX box that displays whether X was A or B for each trial.
Random guessing will, after enough trials, result in correct selections 50 percent of the time. So, to prove there’s a significant difference between A and B, you’d have to correctly identify X somewhere between 50 and 100 percent of the time. Even someone randomly guessing might get 6 or 7 out of 10 right, so the results aren’t meaningful unless you can do even better than that. For a 95 percent confidence level (a typical standard for statistical significance), you’d have to have correct identifications on 23 out of 24 trials. That’s three test sessions on the AVA ABX, which provides eight trials per test session–quite a high hurdle.
A and B can be, well, anything: two speakers, two amplifiers, two preamps, two cables, two types of digital music files, etc.
What’s the problem with ABX?
It seems pretty straightforward, right? The assignment of X is random, so neither the test subject nor the test administrator knows whether it’s A or B until someone goes back to check. Thus, there’s no chance that the brands, appearance, or prices of the components under test will influence the results.
The problem for audiophiles is, ABX testing has, to date, rarely revealed differences in sound among audio electronics components. This is why the debate about ABX testing became so fierce when the process emerged in the 1980s.
On one side, we have millions of audio enthusiasts and professionals who report hearing differences among audio electronics, between hi-res files and standard-res files, etc. And 50,000,000 fans can’t be wrong, right? They present many scientific (or at least scientific-sounding) reasons why ABX testing is invalid. Some of those reasons are obviously questionable, as I’ll detail below. And of course, audio writers are unlikely to embrace a methodology that may cast doubt on their previous written statements, and that might threaten their status as opinion-makers; and enthusiasts who just spent $5,000 on an amplifier don’t want to hear that it’s no better than a $300 receiver.
On the other side, we have a small group of scientifically oriented researchers, enthusiasts, and writers (many now retired or deceased) who insist that ABX testing proves such differences cannot be heard. When I read their articles (which are hard to find unless you have a stack of Stereo Review magazines from the 1990s), I sometimes get the sense that their work began not as an effort to find the truth, but as an effort to prove audiophiles foolish. Of course, there are many ways to set up a blind test to “prove” two products are alike in their performance and characteristics. You can use test material and conditions that make the differences among products hard to distinguish. Or you can get panelists who aren’t particularly interested in the subject, or who have already made up their minds. To take an extreme example, I wonder if my late father could have passed an ABX test with Led Zeppelin’s “Immigrant Song” as A and Deep Purple’s “Highway Star” as B. When he heard tunes like these as I played them on the 8-track, it sounded like nothing but noise and screaming to him. So, if he couldn’t have reliably identified which tune was which, does that mean they’re indistinguishable?
Considering that both sides in the debate have an axe to grind, and that both seem so convinced of the correctness of their positions, I don’t find either side persuasive. That’s why I decided to take a fresh look at ABX. My hope is that, as a writer who doesn’t fit into any particular camp of the audio world, I can weed through all the invective to find some honest, unbiased answers.
Criticisms of ABX
I probably couldn’t pursue this without the AVA ABX, which I believe addresses some of the criticisms many audiophiles have of ABX testing. Let’s examine the key criticisms here:
1) ABX boxes are poorly built and degrade the sound quality of the components under test.
2) ABX testing places too much stress on the test subject, whose performance will thus be impaired.
3) Judging the quality of audio components requires long-term listening.
4) Blind testing employs the left side of the brain, but art can be appreciated only with the right side of the brain.
None of these statements is, to my knowledge, verifiable or supported. I suspect that’s in part because most of the critics of ABX have little or no actual experience with it. Here’s how I respond to the above contentions:
1) I have never seen any audio writer present specific criticism of the supposed technical flaws in ABX boxes. You can see the guts of the AVA ABX box in the picture included with this article; tell me what the technical flaw is. And what would be the technical flaw in the ABX testing plug-in for the digital music player software Foobar2000?
2) Based on my experience with the AVA ABX so far, I can confirm that ABX testing is difficult and requires great concentration, but so does a serious comparison of any two products that are fundamentally similar. It’s stressful only if you’re worried that you won’t get the “right” answer. And if you think there’s a “right” answer, you’re using your own bias as the standard to judge the validity of the results.
3) The idea that long-term listening allows audio components to be more easily and reliably distinguished is one that a lot of audio writers throw around, but I’ve seen no actual research supporting it. Fortunately, by owning the AVA ABX, I can make my comparisons as long as I wish. And how does that long-term listening test work, anyway? Let’s say you’ve been listening to amplifier B for a month, and you think, “Wow, this thing really does seem to throw a bigger soundstage than amplifier A that I was listening to last month.” No one’s acoustic memory is anywhere near good enough to remember the subtleties of something you heard a month ago, so you have to go back to amplifier A to confirm–and then you’re making short-term A/B comparisons again.
4) When you’re judging amplifiers, you’re judging the technical qualities of an electronic component, not art. Judging art would involve, for example, listening for how melodic or lyrical or rich or smooth or original a tenor saxophone player sounds. I can easily measure an amplifier’s performance; no one can measure a saxophone player’s performance.
The beauty of owning the AVA ABX (besides the fact that it makes my reviews more accurate and much easier to set up) is that it gets past so many of the criticisms noted above. In most cases, there’s nothing but a relay (i.e., a switch) in the circuit, plus a simple volume control circuit. It lets me test products at my leisure, with whatever music I want, for as long as I want; I can do an ABX trial with a six-second snippet of music, the complete works of Gustav Mahler, or anything in between. I can “engage my left brain” and listen closely and repeatedly to a single element of a recording (such as a cymbal crash or a vocal phrase) or just let the music play and “engage my right brain” to form a gut-feel assessment of the sound.
I’ve gotten enough experience with the AVA ABX to know that I need a lot more experience with it before I make any proclamations about the validity of ABX testing. And that’s just what I’ll be doing in the coming months. I’ll be testing various categories of products, and I’ll bring in outside listeners to add their results to mine. Maybe, just maybe, we can get past some of the old debates about ABX testing–and some of the attitudes that have, in my opinion, calcified the craft of audio writing.
• CES Delivers Higher-Quality Audio at Lower Prices at HomeTheaterReview.com.
• Who Is on Your Audiophile Mount Rushmore? at HomeTheaterReview.com.
• Do You Need to Love Music to Be a True Audiophile? at HomeTheaterReview.com.