Friday, July 17, 2020

Reminder to check my website ( ) for recent posts.
However, here's a preview of the most recent one ( ):
July 16, 2020 By Alan Salzberg
Our dear leader in orange is clearly a villain with respect to COVID, whose response has ranged from nothing to blaming others.  So it is natural that we need heroes.  Anthony Fauci has proven worthy over and over, for decades, actually.  There are many other scientists who sounded the alarm, also.
When it comes to political leaders, Cuomo has gotten a lot of press recently.  This article today ( lauds New York's turnaround from "worst to first" and Cuomo says "this wasn't only about what government did. This was about what people did."  Cuomo put that false modesty aside when he created a ridiculous piece of self-congratulatory graphic art:
New York has won only this title thus far in the fight against COVID: Worst in the world.
... (see full post here )

Wednesday, January 15, 2020

Vaccines are good, but this article about them isn't!

Note: most of my posts to this blog can also be found at my website:

The evidence that showing vaccines are safe and that they save lives is generally overwhelming, so I'm always pleased to see another article reviewing the data behind them.  I figure such articles will lead to even more people being vaccinated and more lives saved.

However, I was disappointed that a recent New York Times article did the statistics so poorly.  The article compares 10,000 people who got various diseases with 10,000 people who were vaccinated. This comparison is inappropriate, because most people who do not get vaccinated do not get the disease they are being vaccinated for, and, especially for diseases like the flu, many people who do get vaccinated get the disease they were vaccinated for.  A proper comparison would compare some number of people who were vaccinated against the same number who were not vaccinated.

I use the CDC figures, and the figures presented in the NY Times article to do just that, focusing on the flu, since it is by far the most common illness mentioned in the article.  Millions of people in the US get the flu every year and typically tens of thousands die from it.

It is known that the flu vaccine reduces the chances of getting the flu by about 50% (see  The percentage of the people who get the flu varies quite a bit.  In 2017-2018, an estimated 45 million people got the flu (more than 10% of the population) but in 2011-2012, only about 9 million people got the flu (see, which also has hospitalization and death rates).  So, we'll assume a year that is somewhere between the best and worst years, where about 1 in 13 people get the flu. Given a vaccination rate of 50% (sadly it has been below this recently), that means that, you have about a 1 in 20 chance of getting the flu if you get the vaccine and a 1 in 10 chance if you do not.

Now let's consider the effect of everyone in the US (about 300 million people) not getting vaccinated versus getting vaccinated.  The following table summarizes the results (using the per 10,000 people figures found in the NY Times article but projecting them to the full population).

This shows that vaccinating everyone would mean fewer than 270,000 hospitalizations (versus 540,000 if no one was vaccinated) and fewer than 21,000 deaths (versus 42,000).  The only area where the vaccine is worse is "other bad effects," where I am grouping allergic reactions and Guillain-BarrĂ© Syndrome.  In this case, about 800 more people might suffer these (sometimes) very serious side effects.  However,  this pales in comparison to the more than 20,000 lives saved by the vaccine annually.

Even these huge benefits are likely understated.  If everyone were to get the flu vaccine, it is likely that it would spread less, since many of the people who currently catch the flu who were vaccinated get the flu from people who are unvaccinated (in 2017-2018 only about 37% of adults were vaccinated:  Also, vaccinations have been shown to reduce the severity of the flu for those who get it (see here again).

So overall, despite the poor comparisons in the article, which seems to imply no one would die from the flu if vaccinated, the benefit of the flu vaccine is still overwhelming.

Wednesday, October 21, 2015

bridge splits re-visited

A couple years back, I wrote on the chances of various "splits" in bridge (and explained why this is something bridge players care about) in this post, which also explains the math behind the chances.

However, in that post, I failed to include the possibility of 7 trumps being out, because it is fairly rare. Due to some poor bidding on my part, I found myself playing 4 spades last night, and my partner and I only had 6 trumps between us.  Here are the chances of the different splits of 7 trumps that are out, between the other two players.

4-3 split: 62.2%
5-2 split: 30.5%
6-1 split: 6.8%
7-0 split: 0.5%

For completeness, here are splits with 6 and fewer (from the prior post).
For hands with 6 trumps out:
3-3 split : 35.5%
4-2 split: 48.4%
5-1 split: 14.5%
6-0 split:  1.5%

For hands with 5 trumps out, we get:
3-2 split: 67.8%
4-1 split: 28.3%
5-0 split: 3.9%

For hands with 4 trumps out:
2-2 split: 40.7%
3-1 split: 49.7%
4-0 split: 9.5%

For hands with 3 trumps out:
2-1 split: 78%
3-0 split: 22%

For hands with 2 trumps out:
1-1 split: 52%
2-0 split: 48%

It's worth mentioning that these probabilities are unconditional.  Since the bidding that precedes playing any given hand gives some information, it is typically true that some splits can be ruled out or downplayed.  For example. in the 4 spade hand I played last night, a 5-2 or (especially) worse split seemed unlikely, because there was no double from the other side, so I would've put the chances of a 403 split far higher than the unconditional 62%.  

Friday, September 4, 2015

See my new posts on my web site

My newer posts (and some of the old ones) are now on my website: Salt Hill Blog

Sunday, February 15, 2015

Ultimate Frisbee: to Huck or not to Huck?

I play a lot of Ultimate Frisbee, a game akin to football in that there are end zones, but akin to soccer in that there is constant action until someone scores.  In Ultimate, you can only advance by throwing the disc (so-called because we generally do not generally use Wham-O branded discs, which are called Frisbees). An incomplete pass or a pass out of bounds is a turnover, as is a "stall," where the offense holds the disc without throwing for more than 10 seconds.

In other words, in order for the offense to score, you need to complete passes until someone catches the disc in the end zone.  The accepted method of doing this is to complete shorter, high-percentage passes.  On a non-windy day, it seems fairly simple for at least one of your six teammates to get open and thus you can march down the field.  Of course, one long pass, or "huck," can shortcut the process and give your team the quick score.  Much like football, the huck is not typically done except in desperation (game almost over due to time or thrower almost stalled).

However, I am not at all sure this logic makes sense.  Suppose you need six short passes to advance to a score.  If your team completes short passes with a probability of 90%, you will score about 53% of the time (90% to the sixth power gives the chances of completing six passes in a row).  In other words, as long as the chance of completing the huck is more than 53%, you would have a better chance of scoring with a huck.

Thus, the relative chances of scoring via the two methods depends on three things: 1) chance of completing a short pass, 2) chance of completing a huck, and 3) number of short passes needed for a score.  The graph below shows the threshold huck completion rate (the rate at which it makes more sense to huck) for different short pass completion rates and always assuming 6 short passes is enough for a score and one huck is enough for a score.

Of course, this simple analysis assumes 6 throws equals a score, and it also leaves out a number of other factors.  For example, an incomplete huck confers a field advantage to the hucking team because the opposing team has to begin from the point of in-completion (as long as it was in-bounds).  On the other hand, it may not take long for the opposing team to figure out the hucking strategy and play a zone style defense that will lower the hucking chances considerably.

Tuesday, September 30, 2014

What is a p value and why doesn't anyone understand it?

 I feel like I've written this too many times, but here we go again.
There was a splendid article in the New York times today concerning Bayesian statistics, except that, as usual, it had some errors.

Lest you think me overly pedantic, I will note that Andrew Gelman, the Columbia professor profiled in much of the article, has already posted his own blog entry highlighting a bunch of the errors (including the one I focus on) here.

Concerning p-values the article states:
"accepting everything with a p-value of 5 percent means that one in 20 “statistically significant” results are nothing but random noise."  This is nonsense.  I found this nonsense particularly interesting because I recently read almost the same line in a work written by an MIT professor.

P-value explained in brief

Before I get to explaining why the Times is wrong, I need to explain what a p-value is.  A p-value is a probability calculation, first of all.  Second of all, it has an inherent assumption behind it (technically speaking, it is a conditional probability calculation).  Thus, it calculates a probability assuming a certain state of the world.  If that state of the world does not exist, then the probability is inapplicable.

An example: I declare:"The probability you will drown if  you fall into water is 99%."  "Not true," you say, "I am a great swimmer."  "I forgot to mention," I explain, "that you fall from a boat, which continues without you to the nearest land 25 miles away...and the water is 40 degrees."  The p-value is a probability like that -- it is totally rigged.

The assumption behind the p-value is often called a Null Hypothesis. The p-value is the chance of obtaining your particular favorable research result, under the "Null Hypothesis" assumption that the research is garbage.    It is the chances that, given your research is useless, you obtained a result at least as positive as the one you did.  But, you say, "my research may not be totally useless!"  The p-value doesn't care about that one bit.

More detail using an SAT prep course example
Suppose we are trying to determine whether an SAT prep course results in a better score for the SAT. The Null Hypothesis would be characterized as follows:
H0=Average Change in Score after course is 0 points or even negative.  In shorthand, we could call the average change in score D (for difference) and say H0: D<=0.  Of course, we are hoping the test results in a higher score, so there is also a research hypothesis: D>=0.  For the purposes of this example, we will assume the change that occurs is wholly due to the course and not to other factors, such as the students becoming more mature with or without the course, the later test being easier, etc.

Now suppose we have an experiment where we randomly selected 100 students who took the SAT and gave them the course before they re-took the exam.  We measure each students change and thus calculate the average d for the sample (I am using a small d to denote the sample average while the large D is the average if we were to measure it across the universe of all students who ever existed or will exist).  Suppose that this average for the 100 students is an score increase of 40 points.  We would like to know, given the average difference, d, in the sample, is the universe average D greater than 0?  Classical statistics neither tells us the answer to this question nor does it even give the probability that the answer to this question is "yes."

Instead, classical statistics allows us only to calculate the p-value: P(d>=40| D<=0).  In words, the p-value for this example is the probability that the average difference in our sample is 40 or more, given that the Universe average difference is 0 or less (Null Hypothesis is true).  If this probability is less than 5%, we usually conclude the Null Hypothesis is FALSE, and if the NUll Hypothesis were in fact true, we would be incorrectly concluding statistical significance.  This incorrect conclusion is often called a false positive.  The chance of a false positive can be written in shorthand as P(FP|H0), where FP is false positive, "|" means given, and H0 means Null hypothesis.  (Technically,  but not important here, we calculate the probability at D=0 even though the Null Hypothesis covers values less than zero, because that gives the highest (most conservative) value.)  If the p-value is set at 5% for statistical significance, that means P(FP|H0)=5%.

A more general way of defining the p-value is that the p-value is the chance of obtaining a result at least as extreme as our sample result under the condition/assumption that the Null Hypothesis is true.  If the Null Hypothesis is false (in our example if the universe difference is more than 0), the p-value is meaningless.

So why do we even use the p-value?  The idea is that if the p-value is extremely small, it indicates that our underlying Null Hypothesis is false.  In fact, it says either we got really lucky or we were just assuming the wrong thing.  Thus, if it is low enough, we assume we couldn't have been that lucky and instead decide that the Null Hypothesis must have been false.  BINGO--we then have a statistically significant result.

If we set the level for statistical significance at 5% (sometimes it is set at 1% or 10%), p-values at or below 5% result in rejection of the Null Hypothesis and a declaration of a statistically significant difference.   This mode of analysis leads to four possibilities:
False Positive (FP), False Negative (FN), True Positive (TP), and True Negative(TN).
False Positives occur when the research is useless but we nonetheless get a result that leads us to conclude it is useful.
False Negatives occur when the research is useful but we nonetheless get a result that leads us to conclude that it is useless.
True Positive occur when the research is useful and we get a result that leads us to conclude that it is useful.
True Negatives occur when the research is useless and we get a result that leads us to conclude that it is useless.
We only know if the result was positive (statistically significant) or negative (not statistically significant)--we never know if the result was TRUE (correct)  or FALSE (incorrect).  The p-value limits the *chance* of a false positive to 5%.  It does not explicitly deal with FN, TP, or TN.

Back to the Question of how many published studies are garbage, but it gets a little technical
Now, back to the quote in the article: "accepting everything with a p-value of 5 percent means that one in 20 “statistically significant” results are nothing but random noise."
Let's consider a journal that publishes 100 statistically significant results regarding SAT courses that improve scores and statistical significance is based on p-values of 5% or below.  In other words, this journal published 100 articles with research showing that 100 different courses were helpful.  What number of these courses actually are helpful?
Given what we have just learned about the p-value, I hope your answer is 'we have no idea.' There is no way to answer this question without more information.  It may be that all 100 courses are helpful and it may be that none of them are.  Why?  Because we do not know if these are all FPs or all TPs or something in-between--we only know that they are positive, statistically significant results.

To figure out the breakdown, let's do some math.  First, create an equation, using some of the terminology from earlier in the post.
The Number of statistically significant results = False positives (FP) plus True positives (TP).  This is simple enough

We can go one step further and define the probability of a false positive given the Null hypothesis is true and the probability of a true positive given the alternative hypothesis is true -- P(FP|H0) and P(TP|HA).  We know that P(FP|H0) is 5% -- we set this is by only considering a result statistically significant when the p-value is 5%.  However, we do not know P(TP|HA), the chances of getting a true positive when the alternative hypothesis is true.  The absolute best case scenario is that it is 100%--that is, any time a course is useful, we get a statistically significant result.

Suppose that we know that B% of courses are bad and (1-B)% of courses are helpful.  Bad courses do not improve scores and helpful courses do.   Further, let's suppose that N courses in total were considered, in order to get the 100 with statistically significant results.   In other words, a total of N studies were performed on courses and those with statistically significant results were published by the journal.  Let's further assume the extreme concept above that ALL good courses will be found to be good (no False Negatives), so that P(TP|HA)=100%.  Now we have the components to figure out how many bad courses are among the 100 publications regarding helpful courses.

The number of statistically significant results is :
100= B*N*P(FP|H0) + (1-B)*N*P(TP|HA)
This first term just multiplies the (unknown) percent of courses that are bad by the total studies performed by the percent of studies that will give the false positive result that says the course is good.  The second term is analogous, but for good courses that achieve true positive results.  These reduce to:
    100 = N(B*5% + (1-B)*100%)  [because the FP chances are 5% and TP chances are 100% ]
           = N(.05B +1 - B)      [algebra]
          = N(1-.95B)              [more algebra]
        ==> B =  (20/19)*(1- 100/N) [more algebra]
The published courses equal B*N*P(FP|H0), which in turn equals (1/19)*(N-100) [using more algebra].

If you skipped the algebra, what this comes down to is that the number of bad courses published depends on N, the total number of different courses that were researched.
If N were 100, then 0 of the publications were garbage and all 100 were useful.
If the N were 1,000, then about 947 were garbage, about 47 of which were FPs and thus among the 100 publications.  So 47 garbage courses were among the 100 published.
If the total courses reviewed were 500, then about 421 were garbage, about 21 which were FPs and thus among the 100 publications.
You might notice, that given our assumptions, N cannot be below 100, the point at which no studies published are garbage.
Also, N cannot be above 2000, the point at which all studies published are garbage.

You might be thinking--we have no idea how many studies are done for each journal article accepted for publication though, and thus knowing that 100 studies are published tells us nothing about how many are garbage--it could be anything from 0 to 100% of all studies! Correct.  We need more information to crack this problem. However, 5%  garbage may not be so terrible anyway.

While it might seem obvious that 0 FPs is the goal, such a stringent goal, even if possible, would almost certainly lead to many more FNs, meaning good and important research would be ignored because its statistical significance did not meet a more stringent standard.  In other words, if standards were raised to 1% or 0.1%, then some TPs under the 5% standard would become FNs under the more stringent standard, important research--thought to be garbage--would be ignored, and scientific progress would be delayed.