Causation and Correlation

The ability to determine causal connections in the world is important. What connects the cause and the effect is invisible to us (Hume). But we can take notice of correlations and from these sometimes draw conclusions about causal relationships. Not all correlations exist because there is a causal relationship.

Correlations

Statements of correlation express a relation between two properties (the values of variables) within a single population.
Smokers  Non-smokers 
American males  51 49 
American females  34 66 
From this data we assert that 51% of American males smoke and 34% of American females smoke. The property of being an American male is positively correlated with the property of being a smoker, and the property of being an American female is negatively correlated with being a smoker. The population here is adult Americans and we are comparing two variables: smoking and gender; each variable has two values.

A is positively correlated with B if and only if the percentage of As among Bs is greater than the percentage of As among non-Bs.

A is negatively correlated with B if and only if the percentage of As among Bs is less than the percentage of As among non-Bs.

A is not correlated with B is the percentage of As among Bs is the same as the percentage of As among non-Bs.

Judging correlations

Attentional bias in judging correlations:

Nurses were asked to view 100 cards with patient information on them and then judge whether there was a relationship or connection between a particular symptom and a particular disease. Each card indicated whether the symptom was present or absent and whether the disease was present or absent. (Smedslund, 1963)

Here is the incidence of symptom and disease for 100 patients.

Disease  No disease 
Symptom 37 33 
No symptom  17 13 
Results:

There is no correlation here though 85% of the nurses thought there was a positive correlation between the symptom and the disease. The present/present cell was the best predictor of the subject's judgments; a high figure in that cell prompted a positive judgment.

Notice that for both the symptom group and the non-symptom group about as many have the disease as don't have the disease (slightly more have it than don't have it for both groups; 37-33 with symptom, 17-13 without symptom). Whether you have the disease or not, about twice as many have the symptom as don't have it.

Subjects are inclined to look only at select cells for pertinent information.

Another example: Does God answer prayers? Many say yes because many time prayers were successful. But what about the other cells?

Another example:

Subjects were asked whether Mr. Maxwell, a fictional person they were asked to imagine that they met at a party, was a professor. They were told he was either a professor or an executive, and that he belonged to the Bear's Club. Subjects were then asked what additional information they would like to have to make their judgment. For example, what percentage of professors at the party are members of the Bear Club, or what percentage of executives at the party were members of the Bear Club? 89% of the subjects wanted the first piece of information, but only 54% wanted the second piece, even though both pieces are relevant. (Also relevant is the information regarding the percentage of professors at the party.)

The effects of prior belief in judging correlations:

Clinical psychologists sometimes use Draw-a-person tests by which the patients are thought to projects aspects of their personalities into the drawings. Big eyes might indicate the patient is suspicious of others or paranoid; big shoulders might indicate a preoccupation with manliness.

Studies have shown these tests to be useless as indicators of personality traits. But in studies in which pictures and trait-labels are associated in ways that reflect no correlations, untrained subjects still claim to "discover" that certain traits are correlated with certain aspects of the drawings. Even professionals maintain confidence in them after learning of their inefficacy. Similar results apply to Rorschach tests. Quote: "I know paranoids don't seem to draw big eyes in the research lab, but they do in my office." (Chapman and Chapman, 1967, 1969)

Prior belief can increase attentional bias:

Subjects are told of an experiment in which boarding school children are given certain combinations of food to see whether they affect the likelihood of getting a cold. Before seeing the data the subjects are asked to formulate their own hypotheses. Once shown the data, their interpretations are clearly influenced by their own hypotheses. Even though the data reflect no correlations, subjects who hypothesized beforehand that the type of water (bottled or tap) might be relevant to getting a cold also said they saw such a correlation exemplified in the data. Subjects who, for example, hypothesized that the type of mustard would cause colds would look to the mustard/cold data and ignore the mustard/no cold data.

Causal relationships:

A causal generalization, e.g., that smoking causes lung cancer, is not about an particular smoker but states a special relationship exists between the property of smoking and the property of getting lung cancer. As a causal statement, this says more than that there is a correlation between the two properties.

Some causal conditions are necessary conditions: the presence of oxygen is a necessary condition for combustion; in the absence of oxygen there is no combustion. "Cause" is often used in this sense when the elimination of the cause is sought to eliminate the effect (what's causing the pain?)

Some causal conditions are sufficient conditions: the presence of a sufficient condition the effect must occur (being in temperature range R in the presence of oxygen is sufficient for combustion of many substances. "Cause" is often used in this sense when we seek to produce the effect (What causes this metal to be so strong?)

Looking for special circumstances: what was the cause of the fire? Oxygen? or an arsonist's match?

Causes are sometimes said to be INUS conditions in that they are Insufficient but Necessary parts of an Unnecessary but Sufficient set of conditions for the effect. Striking a match may be said to be a cause of its lighting. Suppose there is some set of conditions that is sufficient for a match's lighting. This might include the presence of oxygen, the appropriate chemicals in the matchhead and the striking. The striking can be said to be a necessary part of this set (though insufficient by itself) because without the striking among those other conditions the match would not have lit. But the set itself, though sufficient, is not necessary because other sets of conditions could have produced the lighting of the match.

How are causal relationships different from correlations?

1. A statement about a correlation is symmetrical while a statement about a causal relationship is asymmetrical. If being a male is positively correlated with being a smoker, being a smoker is also positively correlated with being male. But if smoking causes lung cancer it needn't be the case that lung cancer causes smoking.

2. Correlations are about actual populations and are not lawlike. Causal relationships are lawlike in the sense that they are about hypothetical populations as well as actual populations. When A is said to be the cause of B we are saying that were there an increase in the incidence of A there would be an increase in the incidence of B; or if A cases were to diminish, B cases would diminish, too. (If fewer people smoked, there would less lung cancer.) Mere correlations pertain only to actual populations. If National League success in the Super Bowl is merely correlated with stock market decline, then we should not expect changes in the stock market to affect the outcome of the Super Bowl (or vice versa).

How can one form judgments about causal relationships based on statements about correlations?

For example, there is a strong positive correlation between an increase in the number of sex education classes and an increase in the rate of gonorrhea. Suppose we conclude that increasing the number of sex education classes has caused the increase in the gonorrhea rate.

(A) Is the statistical premise (the statement about the correlation) true or well founded?

(B) What alternative explanations are available?

1. The correlation might be accidental or coincidental. Increase in the national debt is positively correlated with an increase in the gonorrhea rate, but there is no causal connection.

2. The relation might be spurious, both an increase in the number of sex education classes and an increase in the rate of gonorrhea being the effects of the same cause.

3. The causal direction might be the reverse. Could the increase in the gonorrhea rate be causally responsible for the perceived need for more sex education classes?

4. The causal relation might have been more complex than the conclusion suggests. The increase in sex education classes might have caused a change in attitudes about sex, which led to an increase in sexual activity, which led to an increase in the gonorrhea rate.

5. The causal relation cited might be insignificant relative to other factors responsible for the increase in the gonorrhea rate.

Is a causal relationship suggested in the cases below?

At one time there was a strong positive correlation between the number of mules in the state and the salaries paid to professors (the more mules the lower the salaries).

There is a strong positive correlation between the number of fire trucks in a borough of NYC and the number of fires that occur there.

There is a strong positive correlation between foot size and hand writing quality.

There is a strong negative correlation between the number of forward passes thrown in a football game and winning the game.

Heavy coffee consumption is positively correlated with heart attacks.

Going to the hospital is positively correlated with dying.

An increase in the number of hours kids watch TV positively correlates with decrease in SAT scores.

Marijuana use is negatively correlated with high GPAs.

Another example:

"[W]hile half the country's communities have flouridated water supplies and half do not, ninety percent of AIDS cases are coming from flouridated areas and only ten percent are coming from nonflouridated areas."

Any connection?

1. Communities aren't all the same size: flouridated communities (likely to be big cites) might contain much more than half the population.

2. The relationship might be spurious: cosmopolitan/progressive attitudes might encourage both fluoridation and lifestyles associated with AIDS

Another example:

Is there a causal relationship between class attendance and grades achieved?

"Students with the lowest attendance earned the poorest grades. Those who attended 79 percent of the classes or less ended up in the low C range; 90 percent and above scored above a B average. Student who sat up front got 'significantly higher grades,' but Walsh [the researcher] thinks they could be more interested in the subjects."

John Stuart Mill, A System of Logic, 1843

A is not a sufficient condition for B if A occurs without B.

A is not a necessary condition for B if B occurs without A.

The Direct Method of Agreement

Find a causal connection between an effect and a necessary condition

Which factor is always present when the effect is present?

If among the residents of a dormitory there is a rash of stomach upsets, we would likely look for one food item that all the patients ate as the cause.

1. The conclusion applies only to the occurrences considered.

2. Only probable: other important conditions might have been overlooked; it might have been a combination of factors

The Inverse Method of Agreement

Find a causal connection between an effect and a sufficient condition

Which factor is always absent when the occurrences of the effect are absent?

Five factory workers are found to be inefficient relative to others who are doing the same work. The efficient workers and the inefficient workers were found to be similar in all relevant ways except one: the inefficient were not part of a profit sharing plan. Conclusion: profit sharing causes efficiency.

1. The conclusion applies only to the occurrences considered.

2. Only probable: other important conditions might have been overlooked; it might have been a combination of factors

The Double Method of Agreement

Find a cause that is both a necessary and a sufficient condition

Which factor is always present when the effect is present?

Which factor is always absent when the occurrences of the effect are absent?

Eight patients have a disease and each was given some remedy or other. Four patients who are given serum S are cured. Of those who are cured no other single remedy was given to all. Of the four who were not cured, every patient was given at least one of the remedies (but none the serum S). Serum S judged to be the cure.

1. The conclusion applies only to the occurrences considered.

2. Only probable: other important conditions might have been overlooked; it might have been a combination of factors

The Method of Difference

Identify a sufficient condition among possible candidates in a specific occurrence

The factor is the only one that is present when phenomenon is present and absent when the phenomenon is absent.

Two identical white mice in a controlled experiment were given identical amounts of four different foods. In addition, one of the mice was fed a certain drug. A short time later the mouse that was fed the drug became nervous and agitated. The researchers concluded that the drug caused the nervousness.

1. Less general conclusion than the inverse method of difference, which applies to all occurrences listed

The Joint Method of Agreement and Difference

Identify a necessary and sufficient condition that is present is a specific occurrence.

Use the direct method of agreement to isolate necessary conditions (if no factor, no effect) and the method of difference to isolate those that are also sufficient.

1. Less general conclusion than the double method of agreement, which applies to all occurrence listed;

George, who exercised regularly, took vitamins, and got plenty of rest, contracted a rare disease. Doctors administered an antibiotic and the disease cleared up. convinced that the cure was caused by either the exercise, the rest, or the antibiotic, the doctors searched for analogous cases. Of the two that were found, one got no exercise, took no vitamins, and got little rest. He was given the same antibiotic and was cured. The other person, who did the same things George did, was given no antibiotic and was not cured. The doctors concluded that George was cured by the antibiotic.

Method of Residues

"Separate from a group of causally connected conditions and phenomena those strands of causal connection that are already known, leaving the required causal connection as the 'residue'."

Method of Concomitant Variation

Match variations in one condition with variations in another.