Topic: Double Blind testing on Old Italian Fiddles... (Read 2851 times)

drmimosa · « **on:** January 06, 2012, 03:51:55 pm »

Quote from: drmimosa on January 05, 2012, 08:50:01 pm

OT, but here's an interesting "double-blind" test that probably wouldn't pass the Hydrogenaudio TOS test either:

http://www.npr.org/blogs/deceptivecadence/2012/01/02/144482863/double-blind-violin-test-can-you-pick-the-strad

Quote from: candycane on January 06, 2012, 01:28:48 am

Was your comment a veiled shot at hydrogenaudio, did you not read the article, are you questioning the test's adherence to double blind testing, or are you unfamiliar with hydrogenaudio's ToSs? The violin test was specifically mentioned as being "double blind", which obviously meets HA's ToS for claims if the test actually was double blind, and I've not seen anyone anywhere (other than you) question the validity of the test's methodology, analyses, or conclusions.

But maybe you've caught something that peer review of the published study overlooked - please provide your evidence that the test does not meet the standards for a valid double blind test or other deficiencies in the study.

This NPR article made me think of recent discussions concerning DBX testing on Interact, and gives an instructive example of some of the difficulties inherent in setting up a good DBX test.

First, it's important to note that there is a DBX test of two audio files embedded in the article itself, and that's not the one I'm going to complain about! It's a cool test, however: two posted excerpts feature the same musician playing the opening to the Tchaikovsky violin concerto, one on a violin made in 1980 and the other on a Stradivarius violin (made probably around the year 1720). There are no results listed for this test, but I personally hear a clear difference between the two instruments on the recordings. In fact, one sounds hands down better to me. This seems like classic DBX testing setup, and it's fun...take the test and see if you can pick the Strad!

However, the NPR article's description of the actual playing test conducted by researchers and players doesn't seem like a fair at all. First, there are six instruments being evaluated at once, which seems absurdly large to me. It's very hard to evaluate and remember sound differences between two samples, let alone six!

It's hard to tell what constitutes a successful answer from reading the article, but it sounds like the musicians had to get all three Italian instruments correct, otherwise they "couldn't tell the difference." Statistically, would anybody ordinarily get three answers correct randomly out of a test group of 17? Perhaps the fact that 3 people did identify the instruments points toward a substantive difference in the instruments, not away from one!

Also, there are lots of variables that aren't mentioned in this article. First, setup of these instruments with strings, post, and bridge is a huge variable. I would assume care was taken to minimize this factor, but it is impossible to eliminate. Second, many of the great, older string instruments sound better in a large hall, and under the ear sound undistinguished. I've experienced this effect firsthand, walking away from a Stradivarius cello being played in a concert hall and hearing the sound get louder, richer, and warmer as I walk into the acoustic of the hall.

The consensus among musicians that have played, owned, or worked with Stradivarius instruments is that they offer unique, distinguishing tonal qualities that can't be found elsewhere. I would demand a better test and more evidence before that gets turned around, and the test described in this article doesn't convince me at all.

What does all of this have to do with Audio DBX testing? Ha ha! Well, everything...and nothing.

I'm not really that familiar with Hydrogen Audio boards, other than my reading of the responses to the Absolute Sound computer audio article posted by JimH on that board and studying their terms of service. It doesn't sound like this "Audio" test in the NPR article would fly (that's assuming the subject at hand, fine Italian musical instruments, were of interest to the members in the first place!)

Basically I think constructing a good DBX test with violins is virtually impossible, there are just too many variables at play. Therefore, I'm inclined to believe that DBX is of limited use in making evaluations about music. The conditions necessary to isolate variables are so hard to achieve that they limit options by default.

This is certainly worth considering in the Music vs. Audio vs. Science debate that emerges on forums out of debates like Wav vs FLAC, bit perfect sounds different, etc. Science is science and engineering is engineering, but audio systems aren't simple; isolating variables for a good DBX is very difficult. I've certainly heard differences in digital cable lengths, etc. come out of changes in super resolving systems using big Martin Logans, etc.

Anyway, food for thought!

candycane · « **Reply #1 on:** January 06, 2012, 04:32:06 pm »

Your concerns are thoughtful, although I recommend withholding judgment on the test's validity until more discussions occurs on the test itself. Many articles, including the NPR article, have tried to simplify the study methodology and conclusions in an attempt to explain to a nonscientific audience, but in doing so I feel many articles have significantly missed or distorted the actual test and conclusions.

And attaching audio clips in articles on the study was really a bad idea, since many have inferred that there is a connection between the clips and the test approach or results. There is no connection, and should be ignored completely, unless one wants to discuss how statistical studies are represented by media.

I think looking at the actual study authors' conclusions and words will allow far more accurate evaluation of the study.

abstract with summary conclusions: http://www.pnas.org/content/early/2012/01/02/1114999109
supplemental info: http://www.pnas.org/content/suppl/2012/01/02/1114999109.DCSupplemental/pnas.201114999SI.pdf

commnets from one of the testers: http://thestrad.com/BlogArticle.asp?bID=196

note the key conclusion of the authors (per http://www.thestrad.com/Article.asp?ArticleID=2105), which I find to be the most insightful and accurate summary of what the authors really concluded from the study:

"Differences in taste among individual players, along with differences in playing qualities among individual instruments, appear more important than any general differences between new and old violins."

Developing an accurate view of such a study, especially the exact conclusions drawn by the authors, is tricky stuff, and many posters in the HA discussion on this study completely missed or distorted the actual conclusions of the study. From years of experience in this space, my best advice is 1) reading comprehension is critical 2) statistics are not intuitive.

drmimosa · « **Reply #2 on:** January 08, 2012, 11:31:59 am »

Thanks for your reply and for the links to the source material, I've read these and also logged onto the PNAS journal and downloaded the full text of the study. If you are interested in reading this as well, send me a personal message.

I've struck some of my comments above, the study had a much clearer structure than the NPR article seems to indicate. The study takes particular attention to insuring that double blind conditions remain in place for back to back comparison of an old Italian instrument and a modern violin. In addition, multiple types of tests were conducted. A sample size of 6 instruments and 21 players is probably too small for statistical conclusions, but this is noted in the study as the largest practical size considering the instruments and players.

I still feel like the study has a fundamental flaw. The following quote comes directly from the study:

The old violins consisted of one by Guarneri del Gesu (ca. 1740) and two by Antonio Stradivari (ca. 1700 and ca. 1715). These violins were loaned with the stipulation that they remain in the condition in which we received them (precluding any tonal adjustments or even changing the strings) [emphasis mine]"

Not allowing tonal adjustments or string changes on the older instruments is a major factor, and it is reasonable to assume that all the modern instruments had been optimized for the best soundpost, bridge, string, and tailpiece placement. I have a lot of expertise in the arena, and it is safe to say that these tonal adjustments can account for a 20 to 40 percent boost in performance from a string instrument. Small changes in humidity and weather can totally negate a previously optimal setup. In addition, the type of changes you can make to an instrument through these adjustments are very apparent during short evaluation periods like the ones used in the study.

Obviously, these factors could potentially favor new instruments in the study. Impossible to really say that this happened, but it is a major structural flaw in the study and not even acknowledged as a factor in the test.

I'd also question the benefits of using a hotel room as the testing arena; the article argues that the dry acoustic offers a neutral testing ground but I would be inclined to think that the acoustic impairment of a dry, loud room would outweigh any benefit.

Quote from: candycane on January 06, 2012, 04:32:06 pm

"Differences in taste among individual players, along with differences in playing qualities among individual instruments, appear more important than any general differences between new and old violins."

I agree, but isn't this just stating a truism that there is more variety between people than between musical instruments?

Quote from: candycane on January 06, 2012, 04:32:06 pm

Developing an accurate view of such a study, especially the exact conclusions drawn by the authors, is tricky stuff, and many posters in the HA discussion on this study completely missed or distorted the actual conclusions of the study. From years of experience in this space, my best advice is 1) reading comprehension is critical 2) statistics are not intuitive.

I'd be interested in reading some of the HA threads on this study, I'll check out the forum but if you have any you would recommend feel free to post a link.

candycane · « **Reply #3 on:** January 08, 2012, 03:03:21 pm »

The main thread for this topic in HA is http://www.hydrogenaudio.org/forums/index.php?showtopic=92697

The HA thread displays many of the strengths and weaknesses of the HA approach. Few posting seemed to have read the actual study. The initial sour posts apparently reflect the initial title of the thread (now changed) that apparently used the study results to slam "golden ears", which was soundly rejected by many because that was not a test result.

Your point about the lack of adjustment to the old violins and room effect should be considered in creating valid conclusions, but I don't think it invalidates the test. Crudely put, validity in a ABX test requires near-perfection in methodology, statistical analyses, and how conclusions are drawn, but not perfection (or anything close to it) by demanding evaluation of every conceivable variable. That's not my opinion, that's part of the principles of the scientific approach. No test should be assumed to evaluate every conceivable variable or stopped because every concievable variable could not be evaluated (otherwise we'd never make any progress), but test limitations must be considered in the test when drawing conclusions. I think they did to a reasonable job relative to the conclusions they drew. In this case, I think it's reasonable to assume that the owner/custodian of the old violins very carefully maintained their extremely valuable instruments, and I think it's unreasonable to assume that the test limitation on the old violins by the test participants did not result in the playing of shoddy or ill-tuned old violins. Similarly, the room can reasonably be assumed to affect all the violins and not just negatively affect the old violins, and I think it is unreasonable to assume the room characteristics selectively helped the new violins and hurt the old violins. Certainly, following the principles of valid scientific studies, if you assert such a selective impact, you'll need to provide a statistically valid study to prove it - opinions by themselves mean nothing in the scientific method.

I appreciate the offer to see the actual study - I may take you up on that. Regarding the limitation on the old violins - was the term "tonal adjustment" more specifically defined? Does it mean that a player could not even tighten or loosen strings to tune it? I doubt it. I find it reasonable to assume that "tonal adjustment" means altering the fundamental sound of the instrument via bridge changes or similar, but would not include basic string tension. Put another way, it would be unreasonable to compare a badly out-of-tune instrument to a tuned instrument, and such differences would obviously invalidate all other aspects of a test, and I just find it hard to believe that the entire test was purposefully invalidated by such an error. And if such a thing was allowed, then the peer-review of the study prior to publication would have required that the basic study result was "in-tune instruments were preferred over out-of-tune instruments". Put yet another way, I am unwilling to assume a test-ruining defintion for "tonal adjustment" in the absence of more detail, and any more detail would be helpful.

No doubt further testing with different conditions would be great to do, and no doubt that is exactly what the authors intend to do, since the acoustics of violins are the career focus of severl of the authors and not just a one-time test.

drmimosa · « **Reply #4 on:** January 09, 2012, 11:12:29 am »

Tonal adjustments don't include tuning the strings, which I am sure was done for each violin in this playing test. Tonal adjustments include moving/changing the soundpost, bridge, or changing strings to a different gauge or material. Other more drastic changes can be made, of course, but these are the basic ones which do not physically change the instrument in any way and have a significant change on the sound, response, and resonance of a string instrument. The bridge and post on a violin are freestanding and can easily be moved with tools, here's a nice explanation http://www.youtube.com/watch?v=5kJtnUX4-ng

I'm surprised and a little suspicious the study makes no mention of trying to equalize this factor, and I'm not the only musician who has raised eyebrows at this decision. Here's a recent testimony from one of the players involved in the study:

http://www.violinist.com/blog/laurie/20121/13039/

candycane · « **Reply #5 on:** January 09, 2012, 02:04:47 pm »

I can only speculate, but since I read in one of the articles that a strict condition of the old violin loans was that tonal adjustments were not allowed, perhaps the testers did not want to be accused of back dooring that restriction and thus lose access to the instruments for further testing.

Hopefully more thoughtful testing can occur that involves such adjustments.

Another complaint I've read is that the testing time was too short to gain appreciation of the qualities of the old instruments, but I've seen nothing credible that the test duration inherently favored the newer violins over the old violins.

My only bias (that I am aware of, anyway), is that I'd like to think we can find ways to make violins even better with careful experiments and research, and that we did not plateau our human knowledge of how to make a great violin centuries ago.

drmimosa · « **Reply #6 on:** January 09, 2012, 09:07:47 pm »

Quote from: candycane on January 09, 2012, 02:04:47 pm

My only bias (that I am aware of, anyway), is that I'd like to think we can find ways to make violins even better with careful experiments and research, and that we did not plateau our human knowledge of how to make a great violin centuries ago.

I'm agree wholeheartedly. Actually, research like Joseph Curtin's and groundbreaking work from other modern luthiers around the world means that musicians today can afford really great instruments. Many say we may even be in a new "Golden Age" of string instrument making! Innovation in this arena is great thing for musicians and music lovers in the long run, and the more research the better.

Thanks for your responses, this has been a very interesting exchange!

INTERACT FORUM

Author Topic: Double Blind testing on Old Italian Fiddles... (Read 2851 times)

drmimosa

Double Blind testing on Old Italian Fiddles...

candycane

Re: Double Blind testing on Old Italian Fiddles...

drmimosa

Re: Double Blind testing on Old Italian Fiddles...

candycane

Re: Double Blind testing on Old Italian Fiddles...

drmimosa

Re: Double Blind testing on Old Italian Fiddles...

candycane

Re: Double Blind testing on Old Italian Fiddles...

drmimosa

Re: Double Blind testing on Old Italian Fiddles...