Monday, January 21, 2008

Throw away your cold medicine?

A recent article in the New York times touts: "Seawater Seems to Beat Medicine in Fighting Colds." The article goes on to describe a study where "scientists assigned 289 cold or flu patients ages 6 to 10 to be given a nasal wash three times a day with water from the Atlantic Ocean that had been commercially processed but retained seawater’s trace elements and minerals. As comparison, a group of 101 children used ordinary over-the-counter cough and cold medicines."

The Times gets this first part wrong. The "seawater" group got both standard medications and seawater, as explained by the journal article on the study. So, right away, we are not talking about throwing away our cold medicines. Instead we might have to buy (for those of us not close to the ocean) seawater (more below about saline vs. seawater).

But the study does have some interesting results.

Here's the good part
The results for the preventative success are most striking: at week 12, 25% of the children in the control group had reported illnesses that caused an absence from school versus only 8% in the treatment group. This is statistically significant, meaning the results were too large to be explained away by mere chance. This does not mean, however, that biases in the study could not have caused the difference (no matter how statistically significant, bias, if it exists, can mean an otherwise statistically significant difference is spurious).

Here's the bad part
1. The study was not blind. This means that the children (and physicians and parents) were aware of whether the kids were taking the saline solution or not, subjecting the study to a "placebo" effect: kids who were taking the saline might have 'felt' better, but had no less incidence of a cold. The study's authors make an error in the journal article by stating: "the large number of participants, multi center design, and consistence of results between individual parameters (assessed by physician, patient, and parent) lower the risk of bias."

Bias is not mitigated by sample size -- that is, a large biased group is no better than a small biased group (imagine trying to figure out average height of all men by taking an NBA team, then doing a second study with the average of all NBA teams, saying this lessens the bias).

Similarly, having three biased parties (physician, patient, and parent) would only reduce the bias if we were comparing it to the bias of the most biased party (say, the parent).

2. The treatment is no fun.
Perhaps as important as the questionable effect of the study due to bias is the fact that the treatment involves the solution being squirted into the kid's nose 3 times a day for 12 weeks. I was sort of amazed that the study had so few dropouts (only 11 out of 401 patients in the study dropped out). The unpleasantness seems barely worth the avoidance of a cold or two.

3. Seawater, water or saline-- does it matter? The study does not provide any comparison of the seawater spray to other sprays, or even a simple water spray. To its credit, the journal article does not focus on the fact that the solution was seawater but instead on the comparison of nasal wash versus no nasal wash. In the journal article, there is little indication that seawater would be any better than salt water (the word seawater is mentioned 10 times in the journal article as opposed to saline, which is mentioned 64 times). BTW, the seawater in the study was processed (and presumably sterilized), so don't take it literally and go to your neighborhood polluted beach for your solution.

The Times article, however, focuses on the idea of seawater, as opposed to a simple saline (salt-water) solution.

Net Net
It does seem that washing your nose out with saline 3 times a day will make your kids feel better and they will miss less school. But it's unclear whether they are actually any less sick or they just think they are less sick (to that end, maybe giving them a sugar pill each day, telling them it was a special cold pill, would have the same effect).

Thursday, January 10, 2008

Election Math - Update

So my guess was wrong. Clinton won. The question is, was it the various selection biases and undecided voters, the measurement error (polls were on Saturday and Sunday but the election is on Tuesday), or something else?

Unfortunately, there is really no way to tell. However, there are two interesting things to note.

One is that in one local poll, they show the percentages both including and excluding those leaning toward a candidate but are undecided. In this case, Obama received 2% more of the vote if the leaners are counted, indicating there is some bias in the method that most polls use, which is to count the leaners (who are actually still undecided) rather than exclude them (the polls do exclude the truly undecided, which has hovered in the 5-10% range, but implicitly assume they will vote the same way as the decided voters).

The second thing to note is that the polls very accurately predicted Obama's percentage and inaccurately predicted Hillary's. Obama's actual percentage was 36%, and the seven major polls predicted 36%, 35%, 39%,41%,34%,38%, and 39% (an average of 37%). Hillary's percentages (same web page) were 28%, 34%,28%,28%,31%,29%, and 29% (average of 30%) versus her actual of 39% (a difference that is outside of the zone for mere statistical error). Hillary apparently picked up votes from undecideds, people who were leaning for Obama, and from the Edwards/Richardson camps.

Monday, January 7, 2008

Election Math

Today's CNN headline screams out "Obama opens double-digit lead over Clinton." When you read the article, you find that the poll, of 341 Democrats, showed Obama over Clinton, 39% to 29%. This is as compared to a Saturday poll showing them in a dead heat, at 33-33 (http://www.presidentpolls2008.com/ is a nice polling site, because it shows the polls side by side and you can link to the details of the statistical error and actual questions asked).

Did so many people change their minds in one day? If we ignore other polls, then the statistical evidence is not conclusive. Why? Because the margin of error in the 39-29 poll is 5%, meaning that the Obama's percentage is likely somewhere between 34 and 44%. The Saturday poll, also with a 5% margin of error, indicates his percentage is between 28 and 38%. The overlap between these two ranges means the numbers might not have changed at all. Rather, the difference is mere statistical error, which is an artificat of the sampling. By luck of the draw, the Sunday poll may have found more Obama supporters, even though no one changed their mind.

When comparing two polls taken independently (as above), the error is more than the stated error (5% above) but less than the sum of the stated error in the two polls (5%+5%=10% above). We can compute the error of the difference in these two polls as 7%, which means that Obama's numbers, which appeared to go up by 6% (from 33% to 39%), may have gone down by as much as 1% or up by as much as 13%.

Some math: The 7% is computed as the 1.96 multiplied by the square root of the sum of the squared standard deviations of the polls. The standard deviation of the poll is the error rate (5%) divided by 1.96, or about 2.5%. In a standard probability distribution called a Normal Distribution, 95% of the data falls between plus or minus 1.96 standard deviations from the mean. Thus, in the latest poll, the mean for Obama was 39%, with a standard deviation of 2.5%.

Several implicit assumptions are made in computing the error rate in these polls, primarily summarized as: 1) the Normal Distribution is appropriate, 2) the sample is a random sample of all who will vote in the Tuesday Democratic primary, and 3) the answers in this poll are reflective of the way the voters will actually vote come Tuesday.

Assumption Review
1) Normal Distribution. This one is easy. For a large, random sample and a multiple choice question (who will you vote for), this assumption is always close to reality except when the number polled is very small or the percentages (as are those for say, Kucinich) are close to 0% or 100%. For Clinton and Obama, there is no real issue here, since the sample size is moderately large and their percentages are in the 30-40% range (a neat demo that shows how close a distribution is to Normal, depending on the sample size and percentage, is at http://www.ruf.rice.edu/~lane/stat_sim/binom_demo.html).

2) Random Sample. This is more difficult. Suppose Clinton voters go to chuch on Sunday, followed by lunch, while the Obama voters are home-bodies. This problem can be called selection bias. If there is church-lunch/home-body selection bias, then, in the Sunday poll, a random dialing of phone numbers would have surfaced more Obama voters and would not have been a random sample of Tuesday's voters, as opposed to Saturday, where you might have gotten more equality. [There is generally a second underlying issues of refusals--people who refuse to be polled. If these voters are more likely to vote for Clinton, then Clinton's numbers will be under-stated, but this would be true for both polls that we are comparing.]

3) The difference between how people said they felt in today's poll versus how they will vote in Tuesday's election. I find this difference, which can be called measurement error, the most troublesome. Take, for example, the fact that, in the CNN Saturday poll, one-fourth of voters stated that they had not yet decided, another quarter were only leaning toward someone, and just half had definitely decided. This is, of course, only what people are saying, and often people do not want to admit indecision, so the true numbers of undecided may even be higher. Still, if the undecided's vote even 60-40 in favor of hillary, it would erase the 10% lead of Obama.

The 5% error rate (and 7% error of difference rate) does not take the above issues into account. It implicitly assumes they will have no effect. Thus, the true error rate in election polls is likely far higher.

If we consider other polls, the Obama lead, and the change seems to be clearer. In the seven polls published th 6th of January, Obama has an average lead of about 2-3%. In the 5 polls published Friday and Saturday, Clinton led in all of them, by around 5 points. We can prove this change, from pre-Sunday to Sunday, is statistically significant. However, because of the selection bias and measurement error issues above, it may not be indicative of the outcome on Tuesday.

My personal guess? Obama by a good margin...but a lot can happen in a day.

Tuesday, January 1, 2008

Where is the safest place to live?

If you figure out where to live by the crime rate, many of the safest places to live are outside the United States, where crime, though lower than it used to be, is still high by Western standards. Here in Israel, for example, fewer than 200 people are murdered a year. In the U.S., about 17,000 people were murdered. Of course, these figures are not comparable due to the fact that Israel has around 6 million people as compared to the U.S.'s 300 million.

Crime rates and murder rates are typically adjusted for population by reporting the number of crimes per 100,000 people. In Israel this rate (for murders) is about 3, as compared to about 6 per 100,000 in the U.S. Even including terror attacks, the rate has never been as high in Israel as in the U.S. In England, the rate is less than 2 per 100,000, which is in line with most of Western Europe.

All this is fine and good if you are willing to live abroad just to enjoy a lower crime rate, but, assuming you are looking for a nice place in the US to live, what city might suit your needs vis-a-vis lack of crime?

I considered the four cities I have lived in (see the FBI site for all the stats), Columbia, SC; Washington, DC; Phildalephia; and New York. Of these, New York (sorry Mom) is clearly the safest, with a murder rate of 6 per 100,000 and a violent crime rate of about 0.7% (less than 1 in 100). The worst is DC (murder: 29, violent crime 1.5%), but Philadelphia is a near tie (26, 1.5%). Columbia, my home town and my parents' current and future town, is in the middle (13 murders per 100,000 and 1.1% violent crime rate).

These stark differences mask three important facts. First, the chances of a random individual being a victim of a violent crime in any given year, is very low, no matter what the city (though it's more likey to get killed by a person than a car in Philly and DC, according to National Safety Council statistics--originally linked here but apparently not on the web anymore as of at least 11/28/2012).

Second, most violent crimes are committed by someone we know, and most of the people reading this blog do not have violent acquaintances, plus most murders are committed with guns, and most people reading this blog do not have acquaintances with guns.

Third, crime is highly localized, and no matter which city we live in, we are likely to live in a more affluent area, and these areas have much lower crime.

So, in conclusion: don't worry, Mom, New York is safer than Columbia, as long as I can keep making a living!