Sunday, February 24, 2008

7 letter words and 8 card suits

A recent commenter asked whether the following is true: "ETAERIO is the most likely seven-letter word to get in scrabble."


As all scrabble players know, if you use all 7 of your letters, you get a bonus. However, except for the first turn, you'd need an 8-letter word to achieve this--ETAERIO would not do. Thus, I am going to try to answer the question: "What are the chances of getting the letters ETAERIO on your initial turn in scrabble (you also have to hope you are first)?" I have no hope of finding out whether this is the most likely seven letter word, because I can't automatically check all letter combinations, but I will try to give some guidance there as well.


To find the chances of gettting ETAERIO, we need the number of combinations that produce these letters divided by the total number of combinations. In other words, we have to go back to 12-th grade math, where we all learned (or sort-of learned) permutations and combinations.


There are 100 tiles in scrabble and we are choosing 7. Thus, there are 100 ways to to choose the first tile, 99 ways to choose the second, and so forth down to 94. If we chose them in order, we'd have 100*99*98*97*96*95*94 permutations. However, we don't care about the order, so we have to take the above product and divide by the number of ways we can permute the 7 tiles, which is 7*6*5*4*3*2*1. The shorthand way to express this number of combinations is "100 choose 7" or




By dividing the two ratios above ((100*99*98*97*96*95*94) / (7*6*5*4*3*2*1)), we come up with 16,007,560,800. Since most letters appear multiple times, the number of possible letter combinations is far less, and to know the chances of getting ETAERIO, I need to know how many times each letter appears.



Thus, I found the letter distributions on Wikipedia (counting our own scrabble pieces would probably not do with a three-year old distributing them around the house). The most common ones as follows:
E - 12 tiles
A, I - 9 tiles
O - 8 tiles
N, R, T - 6 tiles
D, L, S, and U - 4 tiles
other letters - 3 or less, but not relevant here


To figure out the chances of getting ETAERIO, we need to know the number of combinations that produce it. We need 2 E's, 1 T, 1 A, 1 R, 1 I, and 1 O. It turns out that the number of ways is the product of each of these implied combinations. Thus, it is "12 choose 2" (E's) times "6 choose 1" (T) times "9 choose 1" (A) and so forth. This comes out to 1,539,648 ways to get the letters in ETAERIO. If we divide this by the total number of combinations (16,007,560,800), we find that there is about a 1 in 10,000 chance of getting ETAERIO as your first 7 letters. Of course, from there, you have to know it is a word and figure out that you can make that word from those letters, since they are not likely to appear in that order.


I could not find another word with a higher probability, but I did find TREASON and TRAINED (both about 1 in 20,000). However, It's clear from the distribution of letter tiles that in order to find a word that beats ETAERIO, you can only use letters appearing in 6 or more tiles.


BRIDGE HANDS

Now that we all remember the mechanics of combinations (or at least, we are on the subject of them), let's investigate another oft-asked question around here: what's the chance of being dealt a 7 card suit in bridge? This would be 4 (number of suits) times "13 choose 7" (ways to choose 7 from a suit) times "39 choose 6" (ways to choose the other 6 from the other 3 suits) divided by "52 choose 13" (ways to choose 13 cards from 52). This comes out to about 3.5%, or 3 or 4 times in every 100 hands.

For an 8-card suit, it is 1 in about 200. For a 9-card suit, it is about 1 in 2,700. Of course, my kids are always asking about the chances of being dealt a 10 card suit or even a 13-card suit:
10-card suit: 1 in 60,738
11-card suit: 1 in 2,746,693 (less than 1 in million)
12-card suit: 1 in 313,123,057 (less than 1 in 300 million)
13-card suit: 1 in 158,753,389,900 (less than 1 in 150 billion)

The chances aren't too great, but with some really poor shuffling, they've managed the 13-card suit once or twice.

10 comments:

Ira Skop said...

Thanks.

James said...

an interesting post. though ETAERIO is not acceptable in scrabble. It's commonly believed that ANEROID is the most probably bingo. There are quite a few lists of the most likely 7s and 8s out there..

Grandpotato said...

Nice post.

However you can still get the bingo with ETAERIO after the first turn by also building a 2 letter word at the same time.

phantomfive said...

Nice post.

I've been trying to figure out the chances that on your first move you will have a bingo in your hand. (I have a theory that in every game, you will probably have at least one bingo in your letters, if only you are observant enough to see it). That is a hard problem, though.

Phil Birnbaum said...

I don't think it's that hard a problem ... get the list of 7-letter words (available in a text file), and calculate the probability of each one. Then, add up all the probabilities.

Two complications: Blanks mean you have to calculate a lot more probabilities, probably 20 times as many. And, second, you have to make sure you don't count the same combination of tiles twice, such as TEACHER and CHEATER.

Anonymous said...

OTARINE is the most likely seven-letter word, leaving ETAERIO in second place. However, these two words are acceptable only in the British/international lexicon, not in the American subset of the lexicon.

Several words tie for the third place; of these, ANEROID is the first one when sorted by "alphagram" (the letters of the word arranged in alphabetical order, here ADEINOR), but AILERON is the first when sorted as words. Others equally likely words are ALIENOR, ATONIES, ELATION, ERASION and TOENAIL, and outside America also ALERION, OARIEST and OTARIES.

Anonymous said...

You don't need an 8 letter word to play all seven letters during the game.

ETAERIO could be played at the end of PAL,BUT,ROT,BAN,PEE,ANT*,TA, making your turn, respectively:

PALE,ETAERIO
BUTT,ETAERIO
ROTA,ETAERIO
BANE,ETAERIO
PEER,ETAERIO
ANTI,ETAERIO
TAO,ETAERIO

*ANTI might not be allowed, but I don't think ETAERIO is, either.

Scrabbler, James said...

I built the probability scrabble dictionary to answer these questions for all English words using the English rule set, but the "blank" tiles completely haunt the math behind this equation, so I use binomial calculations without the blank tiles and with the blank tiles counted as only two tiles. Using just binomial coefficient, what are the chances is a simple process for all words. I think the math is completely incorrect though.

What do you think is the best way to factor in the blank tiles, would they be considered 26 choose 1 since both blank tiles could be any letter?

~ James Cordeiro

You Go Media said...

I built the probability scrabble dictionary to answer these questions for all English words using the English rule set, but the "blank" tiles completely haunt the math behind this equation, so I use binomial calculations without the blank tiles and with the blank tiles counted as only two tiles. Using just binomial coefficient, what are the chances is a simple process for all words. I think the math is completely incorrect though.

What do you think is the best way to factor in the blank tiles, would they be considered 26 choose 1 since both blank tiles could be any letter?

Alan Salzberg said...

The wildcards (blank tiles) do complicate it because you cannot just add 2 to the combination for any given letter (you cannot say that the combinations of E's is 14 choose 2 instead of 12 choose 2), because that does not fully describe the possibilities with blanks. If you instead add 2 to all of them, then you over-count. It seems that one way is to add up all the combinations of 0, 1 or 2 blanks being available for any given letter, which would deal with inter-dependencies. Or you could figure out the chances of getting any five letters and both blanks, any 6 letters and one blank, and all seven letters (original formulation). I would have to think through it some more to get the answer and figure out if there is a simple way to deal with the wildcards.