"Paperback" Reutter: probability

Showing posts with label probability. Show all posts

Wednesday, October 25, 2023

When the Odds Are Ever Not in Your Favor, Wordscapes Edition

I was introduced to Wordscapes in late 2019, and still lightly compete for crowns on the weekends while getting steps. During the week, there is a daily puzzle, but it otherwise feels like it's turned into a Tamagotchi game to give you rewards in the form of jeweled hearts (to get you more pets -- you can have one pet "active" at a time), cocoons (to hatch butterflies for a variety of settings), and binoculars (to find pieces of portraits that you can use).

For the month of October, there is a set of 12 portraits you can collect, and in order to get a portrait, you need to find 3 "pieces" of that portrait. At this stage, where I'd collected 8 out of the 12 portraits, if the probability of finding each piece is roughly equal, I'd expect to get a new piece about 1/3 of the time. Instead, I found 111 duplicate pieces before getting another pumpkin piece. The probability of that happening is on the order of 10^-20, so that strongly suggests the probability of getting each piece is not equal. Unfortunately, there's no indication of this in the interface.

Friday, September 13, 2013

This rabbit hole leads to Triple Town data analysis

Well, how did I get here?

Commenting on a previous post, a friend suggested a new game to try.
The Wikipedia page for Gone Home notes it uses the Unity Engine.
The Unity Engine page lists Triple Town as a client app.
The Triple Town page notes some research done on the distribution of tiles
This Triple Town Tribune post by Andrew Brown mentions that the data was collected by David de Kloet.
The data was originally shared in the comments of this post, which I found by searching on "David de Kloet triple town" in G+. (NOTE: I also asked Andrew if he had a copy of the data, which he was kind enough to share)

Now, the distributions are a good start, but my first question is: does the probability of getting a particular tile on your next tile depend upon the tile you've just been given? For example, if you've just been given a bear, are you more or less likely than the overall estimated probability of 0.15 to get another bear with your next tile?

A quick way to start examining this is to look at a crosstabulation of the tile you've just been given by the next tile. In this table, the rows show the current tile, and the columns show the next tile. Looking at the first row, what this means is that of the 1043 total times that a Bear appeared, it was followed by another Bear 157 times, by a Bush 157 times by Grass 637 times, and so on.

The raw counts are useful, but it can be hard to compare rows to see if they're different. So now let's look at the row proportions. Again, the rows show the current tile, and the columns show the next tile. Looking at the first row, what this means is that of all the times that a Bear appeared, it was followed by another Bear 15.1% of the time, by a Bush 15.1% of the time, by Grass 61.1% of the time, and so on. Bears appear pretty consistently around the overall average of 15.1% of the time, though they seem to appear less often after a Hut (11.4%) and more often after a Tree (18.4%). However, Huts and Trees don't appear very often, so these differences could be due to chance.

So let's look at the chi-square test. SPSS Statistics produces both Pearson and Likelihood Ratio chis-square tests. Since the significance values of the tests are each under 0.05, this suggests that there is, in fact, a relationship between what tile you have now and the next tile you'll get, but I'm a little worried about the large number of cells with low expected cell counts. That can throw the results of the test off.

So another set of tests to look at are pairwise comparisons of the column proportions. (NOTE: I really want to compare the row proportions, but that's not an option, so I've reorganized the table so that the current tile is in the columns and the next tile is in the rows) At any rate, the tests suggests that when your current tile is a Tree, the distribution of your next tile is different from when you current tile is a Bear, Bush, or Grass.

Looking back at the table of proportions, what's not clear from the test is whether the detected statistically significant differences come from the relatively higher rate of Bears and lower rate of Bushes when the current tile is a Tree, or whether it comes from the relatively higher rate of Huts and lower rates of Bots and Trees. The latter sets of relative differences arises from very rare events, and I wouldn't trust results based on that. We could re-run the test while ignoring those columns, but for now I'm pretty comfortable saying that there is no practically significant relationship between the current tile and the probability distribution of the next tile.

These tables were produced using this SPSS Statistics syntax on this text file.

Tuesday, April 23, 2013

Fun with probability and dice

My 3rd grader's class is doing money, fractions, and probability, and I've been thinking up a brief parent volunteer lesson plan to talk about probability with dice, around the idea of having two people each roll a die with a different number of sides, and talking about how likely it is that one or the other roll the higher number.

I'd start with d4 vs. d6; there are 24 possibilities and it's easy enough to work with the kids to enumerate all of them. There are 14 wins for the d6, 6 for the d4, and 4 ties.

d6 vs. d8 is tougher to do in a short lesson; we'll work out that there are 48 possible outcomes, and there are 6 ties in those outcomes. Maybe we'll have time to work out that the number of wins for the d6 is 1+2+3+4+5, or the sum of number from 1 to (number of sides-1), and therefore the number of wins for d8 is 48 - 6 - 15 = 27, but that's probably too much to get through in a single day.

So we're all set, except... in d4 vs. d6, 14 - 6 = 8 = 2 * 4; in d6 vs. d8, 27 - 15 = 12 = 2 * 6. Hunh.

Dice    Larger Smaller Ties
      Wins    Wins
d4 vs. d6    14      6        4
d6 vs. d8    27      15       6
d8 vs. d10   44      28       8
d10 vs. d12 65    45 10
d12 vs. d20 162 66 12

The difference in the number of wins is the number of sides on the smaller die times the difference in the size of the dice. I'd never noticed this before. Cool.

The visual explanation for this relationship is as follows (looking at the possible outcomes of d4 vs. d6):

1,1 2,1 3,1 4,1
1,2 2,2 3,2 4,2
1,3 2,3 3,3 4,3
1,4 2,4 3,4 4,4
1,5 2,5 3,5 4,5
1,6 2,6 3,6 4,6

The red area above the diagonal shows where the smaller die wins, the blue on the diagonal shows the ties, and the green below the diagonal shows where the larger die wins. In the first four rows, the numbers of wins are balanced, but once past the possibility of a tie, the larger die wins every possible outcome in the row. The number of outcomes in each row is the size of the smaller die, and the number of rows in which the the larger die wins all possible outcomes is the difference in the sizes of the dice.

Ah, of course. It looks obvious from this angle, but wasn't immediately so coming at it from the other side.