Can a study appear to be “scientific”, but not meet the requirements for drawing science-based conclusions? Unfortunately, there are certain words and phrases that push our “science” buttons and make us think that work is scientific when it is actually “faux science”, and like a faux jewel, resembles the real thing only in the most superficial way. Some of the words that make us too readily accept claims as science-based are randomization, control group, and statistical analysis. These button-pushing words can make it difficult for us to think critically about research reports or force ourselves to do the close reading and questioning needed for a real assessment of a study. It’s easy under these circumstances to slip and fall on the snake oil of pseudoscience.
In 1995, Dr. Mehmet Oz began the activities of his Complementary Care Center at Columbia-Presbyterian Medical Center in New York with the publication of a study on hypnosis:
Ashton, R.C., Whitworth, G.C., Seldomridge, J.A., Shapiro, P.A., Michler, R.E., Smith, C.R., Rose, E.A., Fisher, S., & Oz, M.C. (1995). The effects of self-hypnosis on quality of life following coronary artery bypass surgery: Preliminary results of a prospective, randomized trial. Journal of Alternative and Complementary Medicine, 1(3), 285-290. doi: 10: 1089/acm.1995.1.285.
An important piece of information about that study was omitted and was later added in the form of a letter to the editor by Oz four years later:
Oz, M.C. (1999). Self-hypnosis and coronary bypass surgery. Journal of Alternative and Complemenary Medicine, 5(5), 397.
In their 1999 article, Mehmet Oz and his colleagues claimed to show scientific evidence that surgery patients who had been instructed about self-hypnosis had significantly better experiences than those who had not. They hypothesized that patients taught self-hypnosis would have a better quality of life after surgery, as measured by a mood self-assessment. If true, this would have been a most important finding, demonstrating a simple and inexpensive way to improve patient outcomes. The study sounded good and was full of those button-pushing words like randomization. But let’s wipe away the snake oil and examine that study for plausibility, logic, and proper use of statistical analysis.
One of Oz’s hypotheses is not, in fact, implausible. This is the idea that people who are taught techniques of self-hypnosis, and who practice it repeatedly before surgery, will later be in better moods than an untreated control group or than those who fail to practice. Such an outcome is plausible for a couple of reasons. One is simply that people who have positive social contacts and attention from others usually are better pleased than who do not, whatever the specific content of the social interaction; this is a possible explanation for advantages over an untreated control group. As for a better outcome for those who elect to practice than those who do not, the most parsimonious view is that people who are cheered up by the practice will continue it, and those who find it unpleasant or neutral will not.
It is a good deal less plausible that blood pressure, bleeding, and infection due to coronary artery bypass surgery will be influenced by self-hypnosis, as Oz’s group suggested (but did not test). Plausible or implausible, however, the hypothesized associations must be demonstrated empirically before it is legitimate to claim that the medical use of self-hypnosis is effective either for improvement of the patient’s emotional experiences or for physical outcomes.
Oz and his colleagues presented research results which they argued supported the hypothesized effect of self-hypnosis on mood following surgery. They used all those good words like randomization, control group, and statistical analysis. However, their research design and report would have received devastating criticism in an undergraduate research methods course. The single good statistical decision in this work is all that would have saved this project from a failing grade. The Oz paper is an egregious example of faux science, and anyone who recognizes its many problems will decline to accept its conclusions.
Here is a list of problems in the design and implementation of the 1995 study:
Intervention fidelity. To be true science-- to follow the rules that allow us to draw a reliable conclusion about a treatment-- a study must show intervention fidelity, or guarantees that each participant has experienced the treatment exactly as planned. Without this guarantee, we can’t be sure whether we are comparing apples to oranges, bananas, or mangos.
Oz’s published report stated that subjects were randomly assigned to a treatment group and a nontreatment control group. Both groups, a total of 22 patients, were assessed before randomization on the Hypnotic Induction Profile (HIP; Siegel, 1974; note that this author’s name is misspelled in citations and reference by the Oz group, and that in fact there has been considerable controversy about this test). The randomization method assigned equal proportions of people judged to be highly hypnotizable to each group. However, the randomization of treatment was limited to presence or absence of instruction about self-hypnosis on the night before surgery. Patients in the treatment group were asked to repeat the self-hypnosis activity hourly on that evening, and again after their surgery, but not all of them did so, whereas some control group patients were interested in the activities used in the initial HIP testing and later reported that they had practiced those on their own. In other words, there was no assurance that patients in the treatment group had all had similar experiences, and that those in the control group had shared a different set of experiences. Comparison of the two groups was a look at two somewhat different fruit salads.
Confounding variables. One important technique in experimental science involves studying the effects of one factor at a time; when two factors usually operate together, their effects are confused or “confounded”, and it’s impossible to know how each would operate independently, or which causes a particular outcome. Randomization in assignment of participants to groups is a step toward making variables independent, but factors may still be confounded when treatment and control groups have experiences that differ in more than one way. In the case of this study, the treatment and control groups were different both in exposure to specific instruction and in attention and social contact from the researchers. Outcome differences, if any, can thus not be attributed to one of these factors rather than the other. Well-known effects of social attention, such as the “Hawthorne effect” in which any kind of attention improves performance, cannot be excluded by this design.
Statistical issues. You can only lie with statistics to those who don’t understand statistics, or who don’t take the time to work through the (admittedly quite boring) statistical presentation. There are a number of mistaken conclusions and claims in the Oz group’s report. (I apologize for the tedium of this section, but the only way to see what’s wrong is just to walk through the material systematically.)
The first concern raised by the Oz groups’ statistical analysis has to do with the initial claim of 22 patients, 13 in the treatment group and 9 in the control group. In a 1999 letter to the editor, Oz acknowledged that in fact only 9 patients, 5 of them in the treatment group, had completed all aspects of the data collection and made data analysis possible. Why it took four years to make this rather important correction is not clear.
Oz and his colleagues deserve full credit for their choice of a nonparametric test (Mann-Whitney) under these circumstances of a small N and ordinal data; this decision is what would save their paper from a failing grade in an undergraduate course. However, the analysis of the statistical test results, as shown in a table on p. 288 of the 1995 paper, cannot be taken to support the claim that one difference on the Profiles of Mood Scale was meaningfully changed and that others showed trends in that direction.
The choice of a .05 probability level as a cut-off point means that 1 in 20 tests of data would be expected to reach the level of statistical significance by chance alone. Multiple tests of data sets make it increasingly difficult to know how to interpret a result whose probability is less than .05. In addition to this problem, however, examination of the table shows other problems, especially with respect to the claim that trends existed.
Of the 6 scales on the POMS, there was one, the Tension scale, on which control subjects showed an increase and treatment subjects a decrease after surgery; the probability of the difference was 0.0317, a value that indicates a significant difference (with a probability of occurrence by chance less than 5 times in 100) but that needs to be interpreted in terms of the comments in the last paragraph. The Depression scale showed an increase for control subjects and a smaller increase for treatment subjects, with a probability for this difference of 0.5556; this probability value, despite its small difference from 50%, was incorrectly described as a trend in the predicted direction. The Anger scale showed an increase for control subjects and a slightly smaller increase for treatment subjects, the difference between them having the large associated probability of 0.7302, again incorrectly described as a trend in the predicted direction. On the Vigor scale, both groups showed a decrease, greater in the case of the treatment group, but with a very high associated probability for the difference between groups. The fatigue scale showed an increase for both groups, with a greater increase for the control group; the associated probability was 0.4127, very close to 50%, but once again described as a trend in the predicted direction. Finally, the Confusion scale showed a greater increase for the treatment group than for the control group, again at a probability much higher than .05.
To sum up, there was a single significant difference when changes in the 6 POMS scores were compared for the control and treatment groups; differences described in the paper as indicating “trends” were nowhere near the .10 probability level often used as a criterion for indication of a trend. The Oz group’s conclusion that “Self-hypnosis relaxation techniques can have positive effects after coronary artery bypass surgery” (1995, p. 289) is a specious conclusion with vague support from the research evidence.
Transparency and failure to report. One requirement for scientific work is transparency-- a full, straightforward, accurate description of the way a study has been done, sufficiently detailed to permit other researchers to replicate the original study and compare the two sets of data. One important piece missing from the Oz group’s report is a description of how patients were originally chosen and approached to be in the study. It’s not clear whether all patients who met the criteria were approached, nor how many (if any) who were approached responded with refusal. Neither is it clear how informed consent information presented the possible outcomes of participation, or how the Hypnotic Induction Profile was presented; both of these events have possible influences on outcomes.
The self-hypnosis instruction included suggestions that the patient concentrate on physical issues like blood pressure and infection. We might expect that the Oz group’s investigation would include comparisons of these outcomes for the treatment and control groups, especially because hospital records would be available. However, no such comparisons are reported in the paper.
The Oz group’s research is faux science, it’s plain. They have done what they should not have done, and they have left undone that which they ought to have done. The biggest problem of all would seem to be their premature commitment to their expected outcome, and the resulting nonchalance about evidence and analysis. In real science, the goal is to find all possible evidence that will reject a hypothesis. In pseudoscience, the search is for affirming evidence. That Oz’s group sought confirmation and ignored disconfirmation is certainly shown in their claims of “trends” in patient moods. Above all, this shows in their statement about patients who did not comply with the self-hypnosis methods: “When an opportunity to help oneself is presented and not taken advantage of, one must question the individual’s desire to regain their health and happiness again” (1995, p. 289). What a shame that all that snake oil was wasted on people who enjoy being sick and miserable!
No comments:
Post a Comment