Wsrtjtyk

Question

(a) Two monkeys are typing capital letters (A-Z) randomly. The first stops typing when the word COCONUT appears as seven successive letters. The second stops typing when TUNOCOC appears; TUNOCOC is simply COCONUT spelt backwards. Which monkey is expected to type more?

(b) A monkey is typing capital letters (A-Z) randomly. Two squirrels observe the sequence of letters thus generated. The first squirrel wins if COCONUT appears (as seven successive letters) before TUNOCOC. The second squirrel wins if TUNOCOC appears before COCONUT. Which squirrel is more likely to win?

Is there any way to solve it without using markov chains?

@mukyuu How do people win if dogs are running? It's called betting... — Nov 23 '18 at 9:48
Welcome to puzzling :) This question is better suited to MATH.SE — Nov 23 '18 at 10:32
@EricDuminil I think they'd be equivalent if (b) had two monkeys typing and each squirrel was watching a different monkey. It's a different scenario when both are watching the same monkey. — Nov 23 '18 at 11:43
Please clarify: in a, do the monkeys generate separate input strings or do they co-operate on one? — Nov 23 '18 at 17:22
Clarification? Are the two monkeys typing on two computers, each creating their own string of letters? Or are the typing parallel on the same keyboard, creating a unified line of letters? — Nov 26 '18 at 13:37

score 45 · Accepted Answer · 2018-11-23 14:39:02Z

(a) I claim that the expected typing length are the same for both monkeys. I guess something in my argument will be incorrect, as jafe's answer has 9 approvals, but finding that incorrectness would be helpful to me.

I prove that the probabilty that a monkey ends his typing after exactly $n$ letters is the same for both monkeys, which then implies that the expected typing lengths are the same.

In order for the typing to end after letter number $n$, the last seven letters must be exactly the monkey's preferred word "COCONUT" / "TUNOCOC" and that word cannot ever have appeared before in the text. That word appearing before can be of two kinds:

It could share some letters with the last 7, or

be completely contained in the first $n-7$ letters.

It's easy to see that for the first monkey, case 1 can not happen: If the word "COCONUT" shared some of its letters with the last 7 letters, then the last letter "T" would have to appear a second time in "COCONUT" (besides last place), which it doesn't.

The argument is slightly more complicated for the second monkey, as the last letter ("C") of "TUNOCOC" does appear a second time in it. But "TUNOC" is not the ending part of "TUNOCOC", so also for the second monkey case 1. cannot happen.

Let's recap: In order for each monkey to end typing after exactly $n$ letters, the text it produced must fullfill two criteria:

A) The last seven letters must be exactly the monkey's preferred word
"COCONUT"/"TUNOCOC", and

B) The previous $n-7$ letters cannot contain
the monkey's preferred word.

This is an equivalence: Any sequence of $n$ letters that fulfills A and B (for any of the two monkeys) is a sequence that has that monkey stop after exactly $n$ letters, and vice versa.

Since each letter typed is independent of others, the events that A or B aply to a sequence of $n$ letters (for a given monkey) are also independent, so the respective probabilities can be multiplied. If we use indices $1$ and $2$ to apply for the first and second monkey, we obviuously get

$$ p_1(A)=p_2(A)=left(frac1{26}right)^7$$

Calculating $p_{1/2}(B)$ directly is much harder, but they are the same: Both consider sequences of $n-7$ independent, uniformly distributed letters, and if and only if such a sequence does (does not) contain the word "COCONUT", then the reversed sequence (which is again a sequence of $n-7$ independent, uniformly distributed letters) does (does not) contain the word "TUNOCOC".

So we get

$$p_1(B) = p_2 (B)$$ and this proves my initial claim: The probability that the typing ends after exactly $n$ letters is $p_1(A)p_1(B)$ for the first monkey and $p_2(A)p_2(B)$ for the second, and they are the same.

+++++++

I cannot say why jafe's argument is incorrect, it is very compelling, and as I said initially, it may very well be that it is correct and something I said is incorrect. I just don't see what that may be, in either case.

ADDITION after more thinking and reading the comments:

I think the flaw in jafe's argument is in double counting after "COCOCO". If the next letter is "N", this is counted as 'good' for both the ending "COCO" and continuation "NUT" and the intitial "COCO" and continution "CONUT". However, both can only lead to the same end result "COCOCONUT", the additional "CONUT" option is not really additional.

The caveat is that you show that the proba for each monkey to end after $n$ letters is the same assuming that they both type at least $n$ letters. But the monkey A has a higher probability of having finished beforehand. — Nov 23 '18 at 12:30
How can that be? If the first monkey has a higher probabilty to finish beforehand than monkey 2, it has to have a higher probabilty to have finished after exactly $k$ letters for some $k < n$, and then my argument applies again. In addition, I do not assume that they have typed at least $n$ letters, I have given exact condtions for that to be true and shown that they have equal probability. — Nov 23 '18 at 12:42
Again, if you think that the first monkey has a higher probabilty to finish beforehand, which do you think is the first letter index at which the first monkey has a higher probability to have finished? — Nov 23 '18 at 12:50
I am now convinced that this is the correct answer. Very confusing problem, but the flaw is in the other reasoning (see my comment down Viktor Mellgren's answer) — Nov 23 '18 at 13:49
I wrote a Python script with a smaller alphabet ('ABCNOTU'). Both monkeys seem to need around 800000 letters before writing the desired word. I repeated the experiment 300 times and couldn't find any bias toward monkey 1 or 2. — Nov 23 '18 at 14:11

score 30 · Accepted Answer · 2018-11-25 16:23:18Z

30

(a)

Edit: This is incorrect, see comments

The TUNOCOC monkey is expected to type more.

Regardless of the previous letters, the TUNOCOC monkey has 1/26 chance of getting the next letter right. Clearly the COCONUT monkey has at least this, but turns out it does even better: Once it has typed COCO, it has two ways to get the correct result (NUT or CONUT).

edited Nov 25 '18 at 16:23

answered Nov 23 '18 at 7:56

jafe

24.6k472246

4

$begingroup$
Sounds very compelling, still I think it is wrong, see my solution below. Would be very interested in you (and everybody else, of course) to check my approach.
$endgroup$
– Ingix
Nov 23 '18 at 12:12

3

$begingroup$
I reduced the problem to the monkeys typing FFA or AFF which I think is equivalent. Then I made a computer simulation where I had them imput random characters and select a winner. The simulation does NOT support your answer. They both were winner the same amount of times.
$endgroup$
– Pieter B
Nov 23 '18 at 13:00

1

$begingroup$
PieterB & Evargalo, what is the characterset of the typewriter in the FFA example? I think it needs a third 'wrong' character.
$endgroup$
– elias
Nov 23 '18 at 13:52

2

$begingroup$
This answer indeed appears to be wrong but I don't know why. ¯_(ツ)_/¯
$endgroup$
– Eric Duminil
Nov 23 '18 at 14:12

2

$begingroup$
I listed out all combinations of C,O,U,N,T for up to 11 letters. Indeed both COCONUT and TUNOCOC show up the same number of times for each letter count. D'oh!
$endgroup$
– jafe
Nov 23 '18 at 14:12

|
show 12 more comments

score 25 · Accepted Answer · 2018-11-27 04:39:37Z

Monkey problem

To settle down which monkey is faster on average, I'll use Markov chains and Mathematica. Define a state $i$, for $i = 0..6$, as that the monkey 1 has currently written $i$ correct subsequent characters of COCONUT, but has never written all of COCONUT. Define a state 7 as the monkey 1 having written COCONUT at least once. Since state 7 cannot be escaped, we say that it is absorptive (other states are transient). The transition matrix for monkey 1 is:

P1 = {

    {25, 1, 0, 0, 0, 0, 0, 0}, 

    {24, 1, 1, 0, 0, 0, 0, 0},

    {25, 0, 0, 1, 0, 0, 0, 0},

    {24, 1, 0, 0, 1, 0, 0, 0},

    {24, 0, 0, 1, 0, 1, 0, 0},

    {24, 1, 0, 0, 0, 0, 1, 0},

    {24, 1, 0, 0, 0, 0, 0, 1},

    {0, 0, 0, 0, 0, 0, 0, 26}

} / 26;

Here $P1_{ij}$ is the probability that the monkey 1 transitions from state $i$ to state $j$ on pushing a key. The Markov chain can be visualized as a graph:

p1 = DiscreteMarkovProcess[{1, 0, 0, 0, 0, 0, 0, 0}, P1];

Graph[Table[StringTake["COCONUT", x], {x, 0, 7}], p1, 

VertexSize -> 0.6, GraphLayout -> {VertexLayout -> "CircularEmbedding" }]

Markov chain for monkey 1

Next up, we extract the sub-matrix which does not contain the absorptive state:

Q1 = P1[[1 ;; 7, 1 ;; 7]];

As described here, the expected time for the monkey 1 to write COCONUT (expected time to absorption) starting from state $i$ is given by

N1 = Inverse[IdentityMatrix[7] - Q1]

t1 = N1.{1, 1, 1, 1, 1, 1, 1};

The exact solution is given by:

-      8031810176

C      8031810150

CO     8031809500

COC    8031792574

COCO   8031352524

COCON  8019928800

COCONU 7722894400

The transition matrix for monkey 2 is given by:

P2 = {

    {25, 1, 0, 0, 0, 0, 0, 0}, 

    {24, 1, 1, 0, 0, 0, 0, 0},

    {24, 1, 0, 1, 0, 0, 0, 0},

    {24, 1, 0, 0, 1, 0, 0, 0},

    {24, 1, 0, 0, 0, 1, 0, 0},

    {24, 1, 0, 0, 0, 0, 1, 0},

    {24, 1, 0, 0, 0, 0, 0, 1},

    {0, 0, 0, 0, 0, 0, 0, 26}

} / 26;

The Markov chain for monkey 2 visualized as a graph:

p2 = DiscreteMarkovProcess[{1, 0, 0, 0, 0, 0, 0, 0}, P2];

Graph[Table[StringTake["TUNOCOC", x], {x, 0, 7}], p2, 

VertexSize -> 0.6, GraphLayout -> {VertexLayout -> "CircularEmbedding" }]

Markov chain for monkey 2

By following the same procedure as for monkey 1, we solve $t2$ as:

-      8031810176

T      8031810150

TU     8031809500

TUN    8031792600

TUNO   8031353200

TUNOC  8019928800

TUNOCO 7722894400

Because the monkeys start from the situation when nothing has been written yet, we see that the expected time for them to write their word is the same $C = 8031810176$

This result can also be obtained directly in Mathematica:

d1 = FirstPassageTimeDistribution[p1, 8];

Mean[d1]

In fact, we can compute the characteristic function for the absorption-time-distribution:

CharacteristicFunction[d1, x]

which evaluates to

e^(7ix) / (C - C e^(ix) + e^(7ix))

The distributions for both monkeys have this same characteristic function. Wikipedia claims that cumulative distribution functions and characteristic functions are in one-to-one correspondence. This implies that the monkeys' finishing-time-distributions are the same.

Squirrel problem

The transition matrix for a monkey aiming for both words COCONUT and TUNOCOC is:

P3 = {

{24, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0},

{23, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0},

{24, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0},

{23, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0},

{23, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0},

{23, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0},

{24, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0},

{23, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0},

{23, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0},

{23, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0},

{24, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0},

{23, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0},

{24, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1},

{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 0},

{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26}} / 26

Here the states have been numbered in order - C CO COC COCO COCON COCONU T TU TUN TUNO TUNOC TUNOCO COCONUT TUNOCOC. Note that the absorbing states come last. The same reference as above shows that we can compute the probability $B3_{ij}$ of ending from a transient state $i$ to an absorption state $j + 13$ as follows:

Q3 = P3[[1 ;; 13, 1 ;; 13]];

R3 = P3[[1 ;; 13, 14 ;; 15]];

N3 = Inverse[IdentityMatrix[13] - Q3]

B3 = N3.R3

The matrix $B3 * 617830874$ is

        COCONUT   TUNOCOC

-       308915099 308915775

C       308915100 308915774

CO      308915125 308915749

COC     308915776 308915098

COCO    308932701 308898173

COCON   309372075 308458799

COCONU  320796475 297034399

T       308915098 308915776

TU      308915073 308915801

TUN     308914423 308916451

TUNO    308897523 308933351

TUNOC   308458124 309372750

TUNOCO  297033749 320797125

Since the monkey starts from nothing, the probability for writing COCONUT and TUNOCOC is 308915099 and 308915775, respectively, divided by 617830874. These correspond to 0.4999994529 and 0.5000005471, computed with Mathematica accurate to 10 decimal places. Therefore the monkey is more probable to write TUNOCOC than COCONUT.

Code

Here is the Python-code I used to generate the transition matrices.

def computeTransitionMatrix(words, keys):

    def properPrefixes(word):

        return (word[:i] for i in range(len(word)))



    def suffixesInDecreasingLength(word):

        return (word[i:] for i in range(len(word) + 1))



    prefixToState = {}

    stateToPrefix = 



    def addState(prefix):

        if prefix in prefixToState:

            return

        prefixToState[prefix] = len(stateToPrefix)

        stateToPrefix.append(prefix)



    # Create a state for each proper prefix of the word.

    for word in words:

        for prefix in properPrefixes(word):

            addState(prefix)



    # Create a state for each word last.

    for word in words:

        addState(word)



    print(stateToPrefix)



    # Number of states.

    nStates = len(stateToPrefix)



    # Compute the (scaled) transition probabilities.

    transitions = 

    for i in range(nStates):

        row = [0] * nStates

        transitions.append(row)



        prefix = stateToPrefix[i]

        if prefix in words:

            # The word is an absorptive state.

            row[i] = len(keys)

            continue

        for key in keys:

            nextPrefix = prefix + key

            # Find the longest suffix which

            # is a prefix of the word. 

            for suffix in suffixesInDecreasingLength(nextPrefix):

                j = prefixToState.get(suffix)

                if j != None:

                    row[j] += 1

                    break



    return transitions



# The words the monkey is supposed to write.

words = ['COCONUT', 'TUNOCOC']



# Keys available on monkey's keyboard.

keys = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'



transitions = computeTransitionMatrix(words, keys)

print(repr(transitions).replace('[', '{').replace(']','}').replace('}, ', '},n'))

Thanks for doing the work of establishing the Markov matrices and doing the calculations! It should be noted that the discussed effects about the repeated sub-words "CO" and "OC" can be seen: the expected value vectors start and and end the same, but have different values for COC/TUN and COCO/TUNO. The differences just vanish again in the end. The reason for this is IMO that COCONUT can not overlap with itself in a text (like e.g. NUTNU could). That means that for all such words of the same length (say "TOBACCO"), the expected time should be exactly the same. — Nov 24 '18 at 19:18
It means the expected time is simply 26**7. How anticlimactic! :) — Nov 25 '18 at 9:09
Thanks. This is a great answer for learning some cool tricks in Mathematica. — Nov 26 '18 at 8:26
This is a great answer! In my answer I calculated the probabilities for the two squirrels in the second problem too, but my result is slightly more in favour to the second squirrel (rounds to around 50.037 %). I'd expect the methods used to be equivalent, so maybe there's some typo or silly error in one of the two answers? I'll check when I have more time — Nov 27 '18 at 9:30

score 16 · Accepted Answer · 2018-11-23 14:48:15Z

16

(a) has been adequately answered by

Ingix.

In (b), though:

The second squirrel is more likely to win.

Consider:

A simpler example, where instead of typing "COCONUT" or "TUNOCOC", the monkeys are trying to type "AB" or "CA". Also, the keyboard only has "A", "B", and "C" keys. Consider what happens after the first three keystrokes, for which there are 27 possibilities.

In this case:

In 5 out of the 27 possiblities, "AB" appears and "CA" does not. Similarly, in 5 of the possiblities, "CA" appears and "AB" does not. But there's also one possibility, "CAB", where both appear, and squirrel 2 wins. In this instance, the "A" is useful for both, but it was useful for the second squirrel earlier. The "setup" for the first squirrel to match will often result in the second squirrel already having matched.

Now consider:

A similar thing is true of "COCONUT" versus "TUNOCOC". The first squirrel wants "COC" as a setup, but if "COC" does appear, it's often at the tail end (pun intended) of the second squirrel's victory.

edited Nov 23 '18 at 14:48

answered Nov 23 '18 at 14:29

Sneftel

1,5971019

$begingroup$
It sounds plausible. But it would also mean that if there's any bias, it will be probably too small to be detected by random simulations, right?
$endgroup$
– Eric Duminil
Nov 23 '18 at 18:45

$begingroup$
@Duminil correct. Even simulating a single play-through would be fairly expensive.
$endgroup$
– Sneftel
Nov 23 '18 at 20:11

$begingroup$
Can't we check the CAB-Thesis, very cheaply and extrapolate from there?
$endgroup$
– Falco
Nov 26 '18 at 13:46

$begingroup$
@Falco Sure, but there's no need for that. The situation is simple enough to solve analytically based on the initial conditions and the transition matrix. In that situation, the second squirrel wins precisely 60% of the time. Of course, I don't offer a formal proof that it is valid to extrapolate from AB to COCONUT.
$endgroup$
– Sneftel
Nov 26 '18 at 14:51

$begingroup$
I ran a simulation with "C,O,N,U,T,X" as possible letters (since the number of dummy-letters will only make the bias smaller. And I got a fairly consistent bias of over 50% wins for the second squirrel
$endgroup$
– Falco
Nov 26 '18 at 14:51

|
show 2 more comments

score 15 · Accepted Answer · 2018-11-23 14:42:34Z

15

@Ingix provides a good proof, one certainly worthy of the tick, but one that could perhaps use some reinforcement. The answer is that the N number of letters would have the same expected value. A few points that might help @jafe realise the flaw in his proof.

The order of independent events occurs has no directional bias. For example, drawing a King Of Hearts followed by an Ace Of Hearts, from a 52 card deck is no more probable than an Ace followed by a King.

Whilst the COCOCONUT argument might seem to hold water, it's first important to realise that the probability of that 9 letter sequence is the same as the probability of DOCOCONUT, or for that matter TUTUNOCOC having the initial CO does not gain anything to the probability as there's still a 1/26 chance of getting the next letter correct.

The apparent gain of the fallback that COCOCONUT gives you is offset by the amount of extra letters that you have to type. If you get the sequence COCOCONUX, you have wasted 9 letters, whereas TUNOCOX only wastes 7. (And COCOCONUX is far more improbable than TUNOCOX).

TLDR: the order of independent variables has no favoured direction.

(P.S. This is my first stack answer, so feel free to give any constructive feedback)

edited Nov 23 '18 at 14:42

answered Nov 23 '18 at 14:22

WittierDinosaur

1594

2

$begingroup$
Makes sense! Welcome to Puzzling.
$endgroup$
– jafe
Nov 23 '18 at 14:25

1

$begingroup$
I'm still confused about this, but your 3) point seems very good to me
$endgroup$
– George Menoutis
Nov 23 '18 at 14:26

$begingroup$
Welcome to Puzzling! Yup, your first two points are exactly right. As for the wastage of letters, I don't think it's really a good argument. I'm not sure it works that way. IMO the reason why COCOCONUT is a false benefit, is because when considering the probability of the events happening, you are assuming the letters before the last 7 are essentially random. This means each letter has 26 possibilities. As such, the combination of CO- as a prefix to COCONUT is already considered within the probability count. To factor it in again is double counting.
$endgroup$
– Arch2K
Nov 23 '18 at 14:52

$begingroup$
Thanks for your feedback @Ong Yu Hann. The difference between the 3rd point and the others is that the first 2 are easily provable mathematical facts. The 3rd is simply to help people logically understand why COCOCOCO doesn't provide a probabilistic benefit.
$endgroup$
– WittierDinosaur
Nov 23 '18 at 14:58

$begingroup$
While it's true that COCONUT and TUNOCOC are equivalent, you are making a far more sweeping (and false) statement when you imply that all sequences of the same length are equivalent. For a simple counterexample that can be checked by hand (using Markov chains, or even just by trying it): if you flip coins over and over, on average it will take you 6 coinflips until you see HH, but only 4 coinflips until you see HT.
$endgroup$
– Misha Lavrov
Nov 27 '18 at 1:14

|
show 3 more comments

score 7 · Accepted Answer · 2018-11-26 15:46:13Z

There is a clever trick to finding the kind of expected times that appear in part (a). It turns out that both for COCONUT and TUNOCOC, the expected time is just $26^7$ (so the monkeys have equal expected times to win), but for this to be convincing, I will demonstrate the method on a case where the expected value is unexpected: the word ABRACADABRA.

Suppose that a monkey is typing randomly, and a family of beavers decides to bet on the outcome. The rules for betting are as follows:

Right before a new letter is typed, a new beaver shows up and is given $1$ dollar. The beaver bets that dollar on that letter being A, at $26:1$ odds.

If the beaver wins (and now has $26$ dollars), the beaver bets all of them on the next letter being B.

If the beaver wins (and now has $26^2$ dollars), the beaver bets all of them on the next letter being R.

In general, whenever the beaver wins the first $k$ bets (and has $26^k$ dollars), the beaver bets all of them on the next, $(k+1$)-th letter still matching the $(k+1)$-th letter of ABRACADABRA.

Any time a beaver loses a bet, that beaver has no more money and goes home. (But at any time, multiple beavers may be betting, because a new beaver shows up right before each letter.)

Each bet is fair, so after $n$ letters are typed, the expected number of money the beavers have altogether is $n$: the $n$ dollars that the beavers get when they show up.

When ABRACADABRA first appears, we can count the total amount of money in the system:

One lucky beaver has $26^{11}$ dollars from betting on ABRACADABRA all the way through.

One beaver started on the ABRA at the end of ABRACADABRA, and has $26^4$ dollars.

One beaver just started betting before the last A, and has $26$ dollars.

All other beavers lost their bets and have no money.

So we are guaranteed to have $26^{11} + 26^4 + 26$ dollars in the system. But the expected amount of money in the system is equal to the number of letters typed. So in expectation, it takes $26^{11} + 26^4 + 26$ letters to type ABRACADABRA.

(Formally, we are invoking Wald's equation to see that the rule "the expected money in the system is equal to the expected number of letters typed" still holds when the number of letters typed depends on the letters that have been typed - when we stop once ABRACADABRA is reached)

Now, going back to COCONUT and TUNOCOC: neither word ends with an initial segment of the word. (The CONUT in COCONUT doesn't change anything.) If the beavers were betting on COCONUT, then at the end when COCONUT is typed, one beaver would have $26^7$ dollars and no other beavers would have anything. The same is true for TUNOCOC. So the expected time is just $26^7$ in both cases.

(A weird consequence is that a word like AAAAA has a longer expected waiting time than a word like ABCDE. To explain this intuitively, consider that if ABCDE hasn't been typed yet, the last five letters can can be any of AABCD, BABCD, CABCD, ..., ZABCD, giving us 26 ways to win on the next letter. But if AAAAA hasn't been typed yet, the last five letters can only be any of BAAAA, CAAAA, DAAAA, ..., ZAAAA, giving us only 25 ways to win.)

This sounds exciting. Could you add a note on how Wald's equation is applied here? Referring to Wikipedia's page, what is the sequence (X_n), what is N, and how to see that X_N are i.i.d., and in particular why N is independent of the sequence (X_n)? — Nov 26 '18 at 23:03
We are using the general version, so we don't need either of the requirements you mention. Here, $N$ is the number of steps taken and $X_n$ is the change in how much money there is in the system at step $n$. Condition 2 in Wikipedia's article essentially requires that $N$ be a stopping time: $N ge n$ is independent of $X_n, X_{n+1}, X_{n+2}, dots$ (so it does not require telling the future). — Nov 26 '18 at 23:13
Thanks, I somehow missed the general version. I replicated that the mean time for ABRACADABRA is indeed $26^{11} + 26^4 + 26$ using Markov chains as in my answer. This answer is probably the best one for problem (a) this far. — Nov 27 '18 at 1:04

score 5 · Accepted Answer · 2018-11-27 22:29:39Z

Problem (a) has been adequately covered in other answers.

In this answer I'll show that in problem (b), the second squirrel has better chances.

First (for introduction purposes only) we'll see a simplified version of the problem, that is simple enough to be run on a simulation

Then we'll see how we can exactly calculate the probability that the second squirrel will win, in this simplified example. The method that we'll use does not depend on any simplification, and can be applied to the original problem as well.

We'll see that the result that we obtain with this method matches with the results of the simulation.

Having then some extra confidence in the method (not that it's really needed, since it's fairly clear), we'll apply it to the original problem and obtain a result.

Simplified problem:

A monkey is typing capital letters from an alphabet that contains the letters A, B and X. Two squirrels observe the sequence of letters thus generated. The first squirrel wins if AAB appears (as three successive letters) before BAA. The second squirrel wins if BAA appears before AAB. Which squirrel is more likely to win?

If chosen this problem because it has a similarity with the original, which I believe is crucial: "AAB" can overlap with "BAA" only in the form "AABAA", while in the other way around it can be either "BAAAB" or "BAAB". Just like "COCONUT" followed by "TUNOCOC" can be overlapped only in the form "COCONUTUNOCOC", while, the other way around, they can be overlapped like "TUNOCOCOCONUT" or like "TUNOCOCONUT".

Note however that we'll be solving the actual problem in the end. The only reason I'm including a simpler version is to explain the method on a simpler case, and because for the simpler problem we can run a simulation to verify our predictions.

Explanation

We must realize that at any point in the game, any of the two strings can be partly under construction, or not. More importantly, the only thing that affects the probabilities at any point in the game is how much of each string is already written. If, for example, the two last letters of the string so far are "XA", that means that the string AAB is partly written up to "A", and the string BAA is not partly written at all. This and only this is what determines which of the two squirrels is better off at this point, regardless of what has been written so far before "XA".

Therefore the state of the game at any given point is defined exclusively by the longest suffix of the sequence written so far that is also a prefix of at least one of the strings that the squirrels are betting at.

Thus this game has exactly seven different possible states, which are:

$epsilon$ (empty string, meaning no matching suffix)

$verb|A|$

$verb|AA|$

$verb|AAB|$

$verb|B|$

$verb|BA|$

$verb|BAA|$

On each of these different states, the squirrels may have different chances of winning. Let's note $P_epsilon$ as the chances that the second squirrel has when the state is $epsilon$, $P_{A}$ as the chances that he has when the state is $verb|A|$, etc. What we're interested in is $P_epsilon$, because that's the initial state of the game.

We can easily formulate each of this probabilities in terms of the others. From state $epsilon$ there's a 1/3 chance of writing an "A", thus moving to state A. There's also a 1/3 chance of transitioning to B, and finally a 1/3 chance of staying at e (if an "X" is written). That means that the chances for the second squirrel on state e are:

$$
P_{epsilon} = frac{1}{3} P_{A} + frac{1}{3} P_{B} + frac{1}{3} P_{epsilon}
$$

We can similarly express the chances for all the other states. Note that $P_{AAB} = 0$, because the second squirrel loses immediately on that state, and $P_{BAA} = 1$, because it wins immediately. The full list is:

$$
begin{align*}
P_epsilon &= frac{1}{3} P_{A} + frac{1}{3} P_{B} + frac{1}{3} P_epsilon \
P_{A} &= frac{1}{3} P_{AA} + frac{1}{3} P_{B} + frac{1}{3} P_epsilon \
P_{AA} &= frac{1}{3} P_{AAB} + frac{1}{3} P_{AA} + frac{1}{3} P_epsilon \
P_{AAB} &= 0 \
P_{B} &= frac{1}{3} P_{BA} + frac{1}{3} P_{B} + frac{1}{3} P_epsilon \
P_{BA} &= frac{1}{3} P_{BAA} + frac{1}{3} P_{B} + frac{1}{3} P_epsilon \
P_{BAA} &= 1
end{align*}
$$

This is simply a system of linear equations, and it's simple enough to be solved by hand. The solution is $P_epsilon = frac{8}{13}$ or approximately 61.54%. These are the chances for the second squirrel when the game starts. The chances for the first squirrel are $frac{5}{13}$, about 38.46%.

Running the simplified problem

Now, because this game has only three letters, the average game will be pretty short, so much in fact that we can play it enough times to have some statistics. The following program (in Java) simulates 10 million games and prints the percentage of wins for each squirrel. I've run it several times and the results are always in the range of 38.4 - 38.5% of wins for the first squirrel, and 61.5 - 61.6% of wins for the second squirrel, which matches very well with our previous calculations.

public class Monkeys {



     private enum Squirrel {

         FIRST_SQUIRREL,

         SECOND_SQUIRREL

     }



     private static final Random random = new Random();

     private static final char letters = "ABX".toCharArray();



     public static void main(String args) {



         int winsF = 0;

         int winsS = 0;

         int iterations = 10000000;



         for (int i = 0; i < iterations; i++) {



             Squirrel winner = playGame();

             if (winner == Squirrel.FIRST_SQUIRREL) {

                 winsF++;

             } else if (winner == Squirrel.SECOND_SQUIRREL) {

                 winsS++;

             }

         }



         System.out.println("First squirrel wins:  " + ((double) winsF) / (winsF + winsS));

         System.out.println("Second squirrel wins: " + ((double) winsS) / (winsF + winsS));



     }



     private static Squirrel playGame() {

         StringBuilder sb = new StringBuilder();

         while (true) {

             if (sb.length() >= 3) {

                 if (sb.substring(sb.length() - 3, sb.length()).equals("AAB")) {

                     return Squirrel.FIRST_SQUIRREL;

                 }

                 if (sb.substring(sb.length() - 3, sb.length()).equals("BAA")) {

                     return Squirrel.SECOND_SQUIRREL;

                 }

             }

             sb.append(letters[random.nextInt(letters.length)]);

         }

     }

 }

Actual problem

Great! So the only thing we need to do now is apply the same method for the original problem. Following the same reasoning, in the original problem there are 15 distinct states:

$epsilon$

$verb|C|$

$verb|CO|$

$verb|COC|$

$verb|COCO|$

$verb|COCON|$

$verb|COCONU|$

$verb|COCONUT|$

$verb|T|$

$verb|TU|$

$verb|TUN|$

$verb|TUNO|$

$verb|TUNOC|$

$verb|TUNOCO|$

$verb|TUNOCOC|$

The chances for the second squirrel in each of the states are (thanks @kaba for pointing a previous error):

$$
begin{align*}
P_epsilon &= frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{24}{26} P_epsilon \
P_{C} &= frac{1}{26} P_{CO} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{CO} &= frac{1}{26} P_{COC} + frac{1}{26} P_{T} + frac{24}{26} P_epsilon \
P_{COC} &= frac{1}{26} P_{COCO} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{COCO} &= frac{1}{26} P_{COCON} + frac{1}{26} P_{COC} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{COCON} &= frac{1}{26} P_{COCONU} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{COCONU} &= frac{1}{26} P_{COCONUT} + frac{1}{26} P_{C} + frac{24}{26} P_epsilon \
P_{COCONUT} &= 0 \
P_{T} &= frac{1}{26} P_{TU} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{TU} &= frac{1}{26} P_{TUN} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{TUN} &= frac{1}{26} P_{TUNO} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{TUNO} &= frac{1}{26} P_{TUNOC} + frac{1}{26} P_{T} + frac{24}{26} P_epsilon \
P_{TUNOC} &= frac{1}{26} P_{TUNOCO} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{TUNOCO} &= frac{1}{26} P_{TUNOCOC} + frac{1}{26} P_{T} + frac{24}{26} P_epsilon \
P_{TUNOCOC} &= 1
end{align*}
$$

Finally let's solve this system of equations. I've used SageMath for that purpose. The script is (after renaming the variables):

var('a, b, c, d, e, f, g, h, i, j, k, l, m, n, u')



eqn = [

  a == (1/26) * b + (1/26) * i + (24/26) * a,

  b == (1/26) * c + (1/26) * b + (1/26) * i + (23/26) * a,

  c == (1/26) * d + (1/26) * i + (24/26) * a,

  d == (1/26) * e + (1/26) * b + (1/26) * i + (23/26) * a,

  e == (1/26) * f + (1/26) * d + (1/26) * i + (23/26) * a,

  f == (1/26) * g + (1/26) * b + (1/26) * i + (23/26) * a,

  g == (1/26) * h + (1/26) * b + (24/26) * a,

  h == 0,

  i == (1/26) * j + (1/26) * b + (1/26) * i + (23/26) * a,

  j == (1/26) * k + (1/26) * b + (1/26) * i + (23/26) * a,

  k == (1/26) * l + (1/26) * b + (1/26) * i + (23/26) * a,

  l == (1/26) * m + (1/26) * i + (24/26) * a,

  m == (1/26) * n + (1/26) * b + (1/26) * i + (23/26) * a,

  n == (1/26) * u + (1/26) * i + (24/26) * a,

  u == 1

]



solution = solve(eqn, a, b, c, d, e, f, g, h, i, j, k, l, m, n, u)



print(solution)

And the output (formatting mine, for readability):

[

[

a == (308915775/617830874),

b == (154457887/308915437),

c == (308915749/617830874),

d == (154457549/308915437),

e == (308898173/617830874),

f == (308458799/617830874),

g == (297034399/617830874),

h == 0,

i == (154457888/308915437),

j == (308915801/617830874),

k == (308916451/617830874),

l == (308933351/617830874),

m == (154686375/308915437),

n == (320797125/617830874),

u == 1

]

]

We see that the probability that the second squirrel will win is exactly $frac{308915775}{617830874}$ or approximately 50.0000547%

The coefficients you use to relate different $P_{?}$ correspond to the transition matrix $P3$ in my answer. I think you have a typo for $P_{COCO}$ where $P_C$ should be replaced with $P_{COC}$. — Nov 27 '18 at 13:04
Using notation from my answer, you are solving the equation $P3 begin{bmatrix} x_T \ x_A end{bmatrix} = begin{bmatrix}Q3 && R3 \ 0 && I end{bmatrix} begin{bmatrix} x_T \ x_A end{bmatrix} = begin{bmatrix} x_T \ 0 \ 1 end{bmatrix}$ which is equivalent to $x_A = begin{bmatrix} 0 \ 1 end{bmatrix}$ and $x_T = (I - Q3)^{-1} R3 begin{bmatrix} 0 \ 1 end{bmatrix}$. — Nov 27 '18 at 13:30
@kaba you're right, thanks for spotting that! It's not really a typo but rather a mistake that carries on to the result too. That probably explains the difference between my result and yours. I will fix it later. — Nov 27 '18 at 13:38
Our solutions are equivalent, and therefore should be equal. I like how leaving it in the linear equation form, as you have, is clearer than solving explicitly using matrix inverse. — Nov 27 '18 at 13:42

Fabio TuratiFabio Turati 1756 · Accepted Answer · 2018-11-24 02:37:27Z

I have tried to create a Markov chain modelling problem a). I haven't even tried to solve it, but the model (i.e. the states and the transitions) should be correct (with one caveat which I'll explain at the end).

First, a simplification: I have reduced the problem to getting either the string A-A-B (as in CO-CO-NUT) or B-A-A (as in TUN-OC-OC), using an alphabet that only includes 3 letters: A, B, and C.

Here's what I've got. This is for AAB:

As you can see, I have separated the state "Only one A" from the state "AA". It is noteworthy that there is no way to go from "AA" to "A".

And this is for BAA:

I'm not sure what this leads to. I was hoping there would be some visible hint about the solution (like: both chains are very similar, with just one difference that tilts the balance in favour of one of them), but this doesn't seem to be the case.

To be honest, I'm not even sure my simplification from COCONUT to AAB is correct, because in my case A and B are equally likely (both of them are just a letter which is typed with probability P = 1/3), whereas CO and NUT are not (CO is only 2 letters long, NUT is 3). This means that solving my chains could, or could not, help solving the original problem.

Hopefully someone can pick up from here.

score 3 · Accepted Answer · 2018-11-25 20:29:58Z

3

I'll try to make the argument that the answer to both parts is:

The two are just as likely.

We start from the simple fact that the probability that a window of 7 letters reads COCONUT, or TUNOCOC, or anything else, is equal. (It's 26^(-7), as @Ingix said.). Crucially, it's completely independent of any letters that appear before or after the window, simply because these letters are not in the window...

Now, let's start with (b). We look at letters 1-7. If they say COCONUT, the first squirrel wins. If they say TUNOCOC (at the same probability!), the second one wins. If they say neither, we move on to letters 2-8 and do the same. At this point, letter 1 is completely irrelevant. You see how we move on and on through the letters, until there's a winner, and the chances of either squirrels to win never differ from each other.

Further clarification:

Of course, if letters 2-7 happened to be COCONU, the first squirrel is in a very good position. Similarly (and in the same probability), if they were TUNOCO, the second squirrel is in a good position. But this is symmetrical, so no one has an overall advantage.

Of course, if letters 1-7 were TUNOCOC, the chance to get COCONUT at letters 5-11 increases, but this doesn't matter since, in this instance, the game has already finished...

In other words, if we were to ask "Which of the two words are we more likely to see at letters 5-11?" (see @abl's comment below), the answer would be TUNOCOC. But the difference is only due to a situation in which the game has already finished. And this is just not the question in hand.

Part (a) is a very similar. We look at letters 1-7 of both monkeys. If the first monkey typed COCONUT, it wins. If the second one types TUNOCOC, it wins. If neither won, we move on to look at letters 2-8 of both, and so on.

No Markov chains :)

edited Nov 25 '18 at 20:29

answered Nov 24 '18 at 22:27

Angkor

1,147416

1

$begingroup$
Except that in (b) the probabilities are not independent across windows. The probability of letters 5-11 being COCONUT, conditional to the fact that you reached this window, are lower than the equivalent for TUNOCOC, simply because you have to account for the case "TUNOCOCONUT" (in which case you wouldn't reach window 5-11). Maybe you can believe the discussion now :)
$endgroup$
– abl
Nov 25 '18 at 0:11

$begingroup$
@abl Sorry, but this is just not true... Are you saying that, typing randomly, one is more likely to type COCONUT than TUNOCOC? These are just two 7-letter sequences, and they're just as likely. Arguments like "consider the case X" are not really valid, because there are so many other cases to consider too. The trick is to be able to consider all cases together with a general argument.
$endgroup$
– Angkor
Nov 25 '18 at 13:37

4

$begingroup$
@Angkor Maybe I didn't explain myself well. Consider all possible sequences that start with "****TUNOCOC...". It's obvious that there are just as many of those as sequences of the form "****COCONUT...", i.e. both situations are equally probable, right? Note that all sequences of the form "****TUNOCOC..." are a win for squirrel 2. And all sequences of the form "****COCONUT..." are a win for squirrel 1, except for those that start with "TUNOCOCONUT..." which are a win for squirrel 2. There's an asymmetry here, so at the very least we can say that those are not "just two 7-letter sequences".
$endgroup$
– abl
Nov 25 '18 at 16:33

2

$begingroup$
@Angkor Point taken, I apologize if I was out of tone. Imagine that the alphabet has only two letters, A and B. The squirrels bet for AAB and BAA respectively. The first three letters are typed. Each squirrel has 1/8 chances of winning at the third letter. If none of them wins, then the sequence is one of AAA, ABA, ABB, BAB, BBA, BBB, with equal probability. At the fourth letter, the possibilities (with equal probability) are AAAA, AAAB, ABAA, ABAB, ABBA, ABBB, BABA, BABB, BBAA, BBAB, BBBA, BBBB. Squirrel 1 wins with AAAB. Squirrel 2 wins with ABAA or BBAA, twice the chance.
$endgroup$
– abl
Nov 25 '18 at 20:43

1

$begingroup$
@Angkor, also, this simplified case is simple enough to be simulated, which I did (see my answer) and the simulation over several million games shows the second squirrel has better chances.
$endgroup$
– abl
Nov 25 '18 at 20:49

|
show 3 more comments

Marco SalernoMarco Salerno 1293 · Accepted Answer · 2018-11-23 15:28:34Z

I tried to make a C# program to do that for me:

class Program

{

    static void Main(string args)

    {

        Log("Le scimmie si preparano alla sfida");



        Monkey monkey1 = new Monkey("COCONUT");

        Monkey monkey2 = new Monkey("TUNOCOC");

        int counter = 0;



        while (true)

        {

            counter++;

            Log("Le scimmie digitano una lettera per la {0}° volta.", counter);



            monkey1.PressLetter();

            monkey2.PressLetter();



            if(monkey1.IsTargetReached() && monkey2.IsTargetReached())

            {

                Log("Entrambe le scimmie hanno vinto!");

                break;

            }

            else if (monkey1.IsTargetReached())

            {

                Log("La scimmia "{0}" ha vinto!", monkey1.Target);

                break;

            }

            else if (monkey2.IsTargetReached())

            {

                Log("La scimmia "{0}" ha vinto!", monkey2.Target);

                break;

            }

        }

        Log("Le scimmie si prendono il meritato riposo dopo {0} turni!", counter);

    }



    private static void Log(string message, params object args)

    {

        Console.WriteLine(DateTime.Now + " - " + string.Format(message, args));

    }

}



public class Monkey

{

    private Random Random { get; set; }

    public string Target { get; set; }

    public string Text { get; set; }



    public Monkey(string target)

    {

        Target = target;

        Text = "";

        Random = new Random();

    }



    public void PressLetter()

    {

        int num = Random.Next(0, 26);

        char let = (char)('a' + num);

        string lastLetter = let.ToString().ToUpper();

        Text += lastLetter;

    }



    public bool IsTargetReached()

    {

        return Text.IndexOf(Target) > -1;

    }

}

After millions of Monkeys' tries, I ended up thinking that they have
the same possibilities.

Anyway my computer still didn't successfully solve the problem.

That's not surprising. The expected number of keystrokes before either monkey wins is well into the billions. — Nov 23 '18 at 15:57
May I advice English comment? As a non native speaker I read every latin language as loren ipsum. — Nov 27 '18 at 13:44

JeffreyJeffrey 1215 · Accepted Answer · 2018-11-24 19:50:11Z

There's good arguments for both sides. Here's code that prove it definitely:

import numpy as np



def main():

    m1 = np.array([

    [25, 1, 0, 0, 0, 0, 0, 0], 

    [24, 1, 1, 0, 0, 0, 0, 0],

    [25, 0, 0, 1, 0, 0, 0, 0],

    [24, 1, 0, 0, 1, 0, 0, 0],

    [24, 0, 0, 1, 0, 1, 0, 0],

    [24, 1, 0, 0, 0, 0, 1, 0],

    [24, 1, 0, 0, 0, 0, 0, 1],

    [0, 0, 0, 0, 0, 0, 0, 26]], dtype='object')



    m2 = np.array([

    [25, 1, 0, 0, 0, 0, 0, 0], 

    [24, 1, 1, 0, 0, 0, 0, 0],

    [24, 1, 0, 1, 0, 0, 0, 0],

    [24, 1, 0, 0, 1, 0, 0, 0],

    [24, 1, 0, 0, 0, 1, 0, 0],

    [24, 1, 0, 0, 0, 0, 1, 0],

    [24, 1, 0, 0, 0, 0, 0, 1],

    [0, 0, 0, 0, 0, 0, 0, 26]], dtype = 'object')



    p1 = np.array([1,0,0,0,0,0,0,0])

    p2 = np.array([1,0,0,0,0,0,0,0])



    for i in range(1000):

        p1 = p1.dot(m1)

        p2 = p2.dot(m2)



        #this shows that how often the two monkes have 1,2,3 letters correct vary

        #but it doesn't affact past the 4th letter

        #print(i,p2 - p1)



        if p1[7] == p2[7]:

            print("after {} steps, both monkeys have equal chances".format(i))

        elif p2[7] > p1[7]:

            print("crap, this whole arguments just got destroyed")

        else:

            print("after {} steps, coconut monkey has {} more ways to have won".format(i, p1[7] - p2[7]))



main()

I ran it up to:

after 99999 steps, both monkeys have equal chances

No matter the number of steps, both monkeys have equal chances. Even though one monkey is more likely to have exactly the last 1,2, or 3 letters correct.

(mind-boggling)

score 1 · Accepted Answer · 2018-11-25 10:19:50Z

I have created the simulation in C# based on Marco's idea.
The alphabet is reduced and the performance is improved:

public class Monkey

{

    private static readonly char alphabet = "ACNOTUXY".ToCharArray();

    private static readonly Random random = new Random();



    public Monkey(string Target)

    {

        target = Target.ToCharArray();

        length = target.Length;

        input = new char[length];

        inputPosition = 0;

    }



    private readonly char target;

    private readonly int length;

    private readonly char input;

    private int inputPosition;



    public bool TargetReached { get; private set; }



    public void PressKey()

    {

        var key = alphabet[random.Next(0, 8)];

        input[inputPosition] = key;

        inputPosition = ++inputPosition % length;

        TargetReached = IsTargetReached();

    }



    private bool IsTargetReached()

    {

        for (var i = 0; i < length; i++)

        {

            if (input[i] != target[i]) return false;

        }

        return true;

    }

}



public class Simulator

{

    private int ties;

    private int monkey1Wins;

    private int monkey2Wins;



    public void Run(int numberOfCompetitons)

    {

        for (var i = 0; i < numberOfCompetitons; i++)

        {

            RunCompetition();

        }

    }



    private void RunCompetition()

    {

        var monkey1 = new Monkey("COCONUT");

        var monkey2 = new Monkey("TUNOCOC");



        while (true)

        {

            monkey1.PressKey();

            monkey2.PressKey();



            if (monkey1.TargetReached && monkey2.TargetReached)

            {

                ties++;

                Console.WriteLine("It's a TIE!");

                break;

            }

            if (monkey1.TargetReached)

            {

                monkey1Wins++;

                Console.WriteLine("Monkey 1 WON!");

                break;

            }

            if (monkey2.TargetReached)

            {

                monkey2Wins++;

                Console.WriteLine("Monkey 2 WON!");

                break;

            }

        }

    }



    public void ShowSummary()

    {

        Console.WriteLine();

        Console.WriteLine($"{ties} TIES");

    Console.WriteLine($"Monkey 1: {monkey1Wins} WINS");

        Console.WriteLine($"Monkey 2: {monkey2Wins} WINS");

    }

}



internal class Program

{

    private static void Main()

    {

        var simulator = new Simulator();

        simulator.Run(1000);

        simulator.ShowSummary();

        Console.ReadKey();

    }

}

Sample output:

...

Monkey 1 WON!

Monkey 1 WON!

Monkey 2 WON!

Monkey 1 WON!

Monkey 2 WON!

Monkey 2 WON!

Monkey 1 WON!

Monkey 1 WON!

Monkey 2 WON!

0 TIES

Monkey 1: 498 WINS

Monkey 2: 502 WINS

So it seems pretty tied to me

TibosTibos 58628 · Accepted Answer · 2018-11-24 11:25:56Z

Monkey problem

The crux the problem is the following: do repeated groups influence the expected characters needed to complete a sequence? The answer is

No, repeated groups bear no influence on the result

Let's reduce the problem to something similar - AAB vs ABC with only these letters being on the keyboard.

The probability of obtaining AAB in a sequence is:

starting at 1: P(A) * P(A) * P(B) = (1/3 * 1/3 * 1/3) +

starting at 2: P(!A) * P(A) * P(A) * P(B) = 2/3 * (1/3 * 1/3 * 1/3) +

starting at 3: P(!A) * P(!A) * P(A) * P(A) * P(B) = 2/3 * 2/3 * (1/3 * 1/3 * 1/3) + ...

The probability of obtaining ABC in a sequence is:

starting at 1: P(A) * P(B) * P(C) = (1/3 * 1/3 * 1/3) +

starting at 2: P(!A) * P(A) * P(B) * P(C) = 2/3 * (1/3 * 1/3 * 1/3) +

starting at 3: P(!A) * P(!A) * P(A) * P(B) * P(C) = 2/3 * 2/3 * (1/3 * 1/3 * 1/3) + ...

This makes it easy to see that

No matter how you shuffle the desired sequence around, the only thing that influences the probability of it occurring is the length of the sequence.

Alternative proof:
All of the possible sequences of 4 letters are:

AAAA ABAA ACAA BAAA BBAA BCAA CAAA CBAA CCAA

AAAB ABAB ACAB BAAB BBAB BCAB CAAB CBAB CCAB

AAAC ABAC ACAC BAAC BBAC BCAC CAAC CBAC CCAC

AABA ABBA ACBA BABA BBBA BCBA CABA CBBA CCBA

AABB ABBB ACBB BABB BBBB BCBB CABB CBBB CCBB

AABC ABBC ACBC BABC BBBC BCBC CABC CBBC CCBC

AACA ABCA ACCA BACA BBCA BCCA CACA CBCA CCCA

AACB ABCB ACCB BACB BBCB BCCB CACB CBCB CCCB

AACC ABCC ACCC BACC BBCC BCCC CACC CBCC CCCC

And here is the same table with the uninteresting sequences cut out:

XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX

AAAB XXXX XXXX BAAB XXXX XXXX CAAB XXXX XXXX

XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX

AABA XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX

AABB XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX

AABC XXXX XXXX BABC XXXX XXXX CABC XXXX XXXX

XXXX ABCA XXXX XXXX XXXX XXXX XXXX XXXX XXXX

XXXX ABCB XXXX XXXX XXXX XXXX XXXX XXXX XXXX

XXXX ABCC XXXX XXXX XXXX XXXX XXXX XXXX XXXX

There are

3 sequences where AAB starts at position 1: AABA, AABB, AABC

3 sequences where AAB starts at position 2: AAAB, BAAB, CAAB

3 sequences where ABC starts at position 1: ABCA, ABCB, ABCC

3 sequences where ABC starts at position 2: AABC, BABC, CABC

Thus the answer to the monkey problem is that

both monkeys have the same chance of finishing first

Squirrel problem

Let's consider the string typed by the monkey after a long (infinite) time and make some simplifications that don't affect the outcome to understand the problem better.

Replace any letter not in COCONUT with X

Replace all groups of COC with A

Replace all remaining C's with X

Replace all remaining O's with B, N's with C, U's with D and T's with E

The problem now is:

in the new string will we more likely first find ABCDE or EDCBA?

And the solution:

If A, B, C, D and E had the same chances of occurring, the answer would be "both are equally likely", but C, D and E have normal chances of occurring, A has a very low chance of occurring, B has a slightly lower than normal chance.

Considering a simplified case - 2 letters - P and Q with P occurring more often than Q, PQ has more chances of appearing first than QP.
This means EDCBA is more likely to appear first than ABCDE.

Looking at it from a different angle:

When the unlikely event that A appears in the string, it is equally likely that it is followed by a B as it is that it is preceded by one.

In the even more unlikely event that an AB or BA appears in the string, it is equally likely that it is followed by a C as it is that it is preceded by one.

...

Which makes it equally likely that ABCDE and EDCBA appear in the string, so when an A appears in either of these sequences, it is equally likely that it starts the ABCDE sequence as it is that it ends the EDCBA sequence.

Thus, the answer to the squirrel problem is that

the second squirrel's sequence (TUNOCOC) appears first

score 0 · Accepted Answer · 2018-11-25 03:03:23Z

Similar to Marco Salerno's answer, I wrote a computer program to solve the monkey problem.

static void Coconuts(int iterations)

{

    Tuple<double, double> outcomes = new Tuple<double, double>[iterations];

    char alphabet = new char { 'C', 'O', 'N', 'U', 'T' };

    string target1 = "COCONUT";

    string target2 = "TUNOCOC";

    for(int i = 0; i < iterations; i++)

    {

        outcomes[i] = new Tuple<double, double>(RandomLettersToTarget(alphabet, target1), RandomLettersToTarget(alphabet, target2));

        Console.WriteLine($"{outcomes[i].Item1:0.0}t{outcomes[i].Item2:0.0}");

    }



    double average1 = outcomes.Select(pair => pair.Item1).Average();

    double average2 = outcomes.Select(pair => pair.Item2).Average();



    Console.WriteLine($"Average durations are {average1:00000.0}, {average2:00000.0}");

}



static int RandomLettersToTarget(char alphabet, string target, int stepsToInsanity = 100000000)

{

    string soFar = "";

    int steps = 0;

    while(steps < stepsToInsanity)

    {

        soFar += alphabet[rng.Next(0, alphabet.Length)];

        if(soFar.Length >= target.Length)

        {

            soFar = soFar.Substring(soFar.Length - target.Length, target.Length);

        }

        if(target.Equals(soFar))

        {

            return steps;

        }

        steps++;

    }

    return stepsToInsanity;

}

With the reduced alphabet (containing only 5 letters) this program does complete, and the results over thousands of iterations

show no clear winner. Thus, I conclude that each monkey has equal probability of winning the first game.

The difference in odds is so slight that a computer program will give you no reliable answer. See my answer for details. — Nov 27 '18 at 13:35

score 0 · Accepted Answer · 2018-11-27 13:40:12Z

(a)

The fact that two monkeys are typing is irrelevant to the study of the sequence. They're both typing on the same computer (see title), so there is only one sequence of letters. There is no bias, both monkeys have the exact same chance of having their word appear first. Winning, in the clear text, means "having my word come out first".

We state obviously that, if less than $7$ letters are typed, both monkeys have equally $0$ chance of winning.

Let's do some recurrence reasoning, trying to prove that if both monkeys have the same chance of winning after $n$ letters are typed, they have the same chance of winning after $n+1$ letters are typed.

Let $n$ be greater than or equal to $6$. Given there is no winner after the $n$th letter is typed, and both monkeys have had the same chance of winning up until now, (which is trivially established for $n=6$) there are three cases:

the letters $n-5$ to $n$ say COCONU and the letter typed is T, monkey 1 wins. $1$ chance out of $26^7$ ($26^6$ suffixes times $26$ possible letters)

the letters $n-5$ to $n$ say TUNOCO and the letter typed is C, monkey 2 wins. $1$ chance out of $26^7$ ($26^6$ suffixes times $26$ possible letters)

all other cases, no winner. $26^7 - 2$ out of $26^7$

And the monkeys have the same chance of winning upon the $n+1$th letter typed. This completes the proof by recurrence.

So, reading the question correctly, the one expected to type more will be the one that stops last, assuming they type at the same frequency. There are two cases. Either the first word out is COCONUT, in which case, once monkey 1 stops typing, monkey 2 has T as a prefix for his word to come out. Or the first word out is TUNOCOC, in which case, once monkey 2 stops typing, monkey 1 has COC as a prefix for his word to come out. Therefore, monkey 2 is expected to have to type more than monkey 1 in this rare case, and then same as monkey 1 once he spoils his opportunity to finish the word.
So, yes, monkey 2 is expected to type slightly more, due to these opportunities being unequal.

(b)

If "before" means "before in time", we are in the situation described in the clear text. Both squirrels have the same odds, if they stop watching after the first word appears. If they wait for both words to appear, well, this does not change anything, the word appearing first is the word appearing first.

Whereas

If "before" means "before in the lexical order", squirrel 1 wins, and the fact that a monkey is typing on a computer is irrelevant. This shows that squirrel 1 has better chances of winning anyway.

Look, Ma! No Markov chains!

score 45 · Accepted Answer · 2018-11-23 14:39:02Z

(a) I claim that the expected typing length are the same for both monkeys. I guess something in my argument will be incorrect, as jafe's answer has 9 approvals, but finding that incorrectness would be helpful to me.

I prove that the probabilty that a monkey ends his typing after exactly $n$ letters is the same for both monkeys, which then implies that the expected typing lengths are the same.

In order for the typing to end after letter number $n$, the last seven letters must be exactly the monkey's preferred word "COCONUT" / "TUNOCOC" and that word cannot ever have appeared before in the text. That word appearing before can be of two kinds:

It could share some letters with the last 7, or

be completely contained in the first $n-7$ letters.

It's easy to see that for the first monkey, case 1 can not happen: If the word "COCONUT" shared some of its letters with the last 7 letters, then the last letter "T" would have to appear a second time in "COCONUT" (besides last place), which it doesn't.

The argument is slightly more complicated for the second monkey, as the last letter ("C") of "TUNOCOC" does appear a second time in it. But "TUNOC" is not the ending part of "TUNOCOC", so also for the second monkey case 1. cannot happen.

Let's recap: In order for each monkey to end typing after exactly $n$ letters, the text it produced must fullfill two criteria:

A) The last seven letters must be exactly the monkey's preferred word
"COCONUT"/"TUNOCOC", and

B) The previous $n-7$ letters cannot contain
the monkey's preferred word.

This is an equivalence: Any sequence of $n$ letters that fulfills A and B (for any of the two monkeys) is a sequence that has that monkey stop after exactly $n$ letters, and vice versa.

Since each letter typed is independent of others, the events that A or B aply to a sequence of $n$ letters (for a given monkey) are also independent, so the respective probabilities can be multiplied. If we use indices $1$ and $2$ to apply for the first and second monkey, we obviuously get

$$ p_1(A)=p_2(A)=left(frac1{26}right)^7$$

Calculating $p_{1/2}(B)$ directly is much harder, but they are the same: Both consider sequences of $n-7$ independent, uniformly distributed letters, and if and only if such a sequence does (does not) contain the word "COCONUT", then the reversed sequence (which is again a sequence of $n-7$ independent, uniformly distributed letters) does (does not) contain the word "TUNOCOC".

So we get

$$p_1(B) = p_2 (B)$$ and this proves my initial claim: The probability that the typing ends after exactly $n$ letters is $p_1(A)p_1(B)$ for the first monkey and $p_2(A)p_2(B)$ for the second, and they are the same.

+++++++

I cannot say why jafe's argument is incorrect, it is very compelling, and as I said initially, it may very well be that it is correct and something I said is incorrect. I just don't see what that may be, in either case.

ADDITION after more thinking and reading the comments:

I think the flaw in jafe's argument is in double counting after "COCOCO". If the next letter is "N", this is counted as 'good' for both the ending "COCO" and continuation "NUT" and the intitial "COCO" and continution "CONUT". However, both can only lead to the same end result "COCOCONUT", the additional "CONUT" option is not really additional.

The caveat is that you show that the proba for each monkey to end after $n$ letters is the same assuming that they both type at least $n$ letters. But the monkey A has a higher probability of having finished beforehand. — Nov 23 '18 at 12:30
How can that be? If the first monkey has a higher probabilty to finish beforehand than monkey 2, it has to have a higher probabilty to have finished after exactly $k$ letters for some $k < n$, and then my argument applies again. In addition, I do not assume that they have typed at least $n$ letters, I have given exact condtions for that to be true and shown that they have equal probability. — Nov 23 '18 at 12:42
Again, if you think that the first monkey has a higher probabilty to finish beforehand, which do you think is the first letter index at which the first monkey has a higher probability to have finished? — Nov 23 '18 at 12:50
I am now convinced that this is the correct answer. Very confusing problem, but the flaw is in the other reasoning (see my comment down Viktor Mellgren's answer) — Nov 23 '18 at 13:49
I wrote a Python script with a smaller alphabet ('ABCNOTU'). Both monkeys seem to need around 800000 letters before writing the desired word. I repeated the experiment 300 times and couldn't find any bias toward monkey 1 or 2. — Nov 23 '18 at 14:11

score 30 · Accepted Answer · 2018-11-25 16:23:18Z

30

(a)

Edit: This is incorrect, see comments

The TUNOCOC monkey is expected to type more.

Regardless of the previous letters, the TUNOCOC monkey has 1/26 chance of getting the next letter right. Clearly the COCONUT monkey has at least this, but turns out it does even better: Once it has typed COCO, it has two ways to get the correct result (NUT or CONUT).

edited Nov 25 '18 at 16:23

answered Nov 23 '18 at 7:56

jafe

24.6k472246

4

$begingroup$
Sounds very compelling, still I think it is wrong, see my solution below. Would be very interested in you (and everybody else, of course) to check my approach.
$endgroup$
– Ingix
Nov 23 '18 at 12:12

3

$begingroup$
I reduced the problem to the monkeys typing FFA or AFF which I think is equivalent. Then I made a computer simulation where I had them imput random characters and select a winner. The simulation does NOT support your answer. They both were winner the same amount of times.
$endgroup$
– Pieter B
Nov 23 '18 at 13:00

1

$begingroup$
PieterB & Evargalo, what is the characterset of the typewriter in the FFA example? I think it needs a third 'wrong' character.
$endgroup$
– elias
Nov 23 '18 at 13:52

2

$begingroup$
This answer indeed appears to be wrong but I don't know why. ¯_(ツ)_/¯
$endgroup$
– Eric Duminil
Nov 23 '18 at 14:12

2

$begingroup$
I listed out all combinations of C,O,U,N,T for up to 11 letters. Indeed both COCONUT and TUNOCOC show up the same number of times for each letter count. D'oh!
$endgroup$
– jafe
Nov 23 '18 at 14:12

|
show 12 more comments

score 25 · Accepted Answer · 2018-11-27 04:39:37Z

Monkey problem

To settle down which monkey is faster on average, I'll use Markov chains and Mathematica. Define a state $i$, for $i = 0..6$, as that the monkey 1 has currently written $i$ correct subsequent characters of COCONUT, but has never written all of COCONUT. Define a state 7 as the monkey 1 having written COCONUT at least once. Since state 7 cannot be escaped, we say that it is absorptive (other states are transient). The transition matrix for monkey 1 is:

P1 = {

    {25, 1, 0, 0, 0, 0, 0, 0}, 

    {24, 1, 1, 0, 0, 0, 0, 0},

    {25, 0, 0, 1, 0, 0, 0, 0},

    {24, 1, 0, 0, 1, 0, 0, 0},

    {24, 0, 0, 1, 0, 1, 0, 0},

    {24, 1, 0, 0, 0, 0, 1, 0},

    {24, 1, 0, 0, 0, 0, 0, 1},

    {0, 0, 0, 0, 0, 0, 0, 26}

} / 26;

Here $P1_{ij}$ is the probability that the monkey 1 transitions from state $i$ to state $j$ on pushing a key. The Markov chain can be visualized as a graph:

p1 = DiscreteMarkovProcess[{1, 0, 0, 0, 0, 0, 0, 0}, P1];

Graph[Table[StringTake["COCONUT", x], {x, 0, 7}], p1, 

VertexSize -> 0.6, GraphLayout -> {VertexLayout -> "CircularEmbedding" }]

Markov chain for monkey 1

Next up, we extract the sub-matrix which does not contain the absorptive state:

Q1 = P1[[1 ;; 7, 1 ;; 7]];

As described here, the expected time for the monkey 1 to write COCONUT (expected time to absorption) starting from state $i$ is given by

N1 = Inverse[IdentityMatrix[7] - Q1]

t1 = N1.{1, 1, 1, 1, 1, 1, 1};

The exact solution is given by:

-      8031810176

C      8031810150

CO     8031809500

COC    8031792574

COCO   8031352524

COCON  8019928800

COCONU 7722894400

The transition matrix for monkey 2 is given by:

P2 = {

    {25, 1, 0, 0, 0, 0, 0, 0}, 

    {24, 1, 1, 0, 0, 0, 0, 0},

    {24, 1, 0, 1, 0, 0, 0, 0},

    {24, 1, 0, 0, 1, 0, 0, 0},

    {24, 1, 0, 0, 0, 1, 0, 0},

    {24, 1, 0, 0, 0, 0, 1, 0},

    {24, 1, 0, 0, 0, 0, 0, 1},

    {0, 0, 0, 0, 0, 0, 0, 26}

} / 26;

The Markov chain for monkey 2 visualized as a graph:

p2 = DiscreteMarkovProcess[{1, 0, 0, 0, 0, 0, 0, 0}, P2];

Graph[Table[StringTake["TUNOCOC", x], {x, 0, 7}], p2, 

VertexSize -> 0.6, GraphLayout -> {VertexLayout -> "CircularEmbedding" }]

Markov chain for monkey 2

By following the same procedure as for monkey 1, we solve $t2$ as:

-      8031810176

T      8031810150

TU     8031809500

TUN    8031792600

TUNO   8031353200

TUNOC  8019928800

TUNOCO 7722894400

Because the monkeys start from the situation when nothing has been written yet, we see that the expected time for them to write their word is the same $C = 8031810176$

This result can also be obtained directly in Mathematica:

d1 = FirstPassageTimeDistribution[p1, 8];

Mean[d1]

In fact, we can compute the characteristic function for the absorption-time-distribution:

CharacteristicFunction[d1, x]

which evaluates to

e^(7ix) / (C - C e^(ix) + e^(7ix))

The distributions for both monkeys have this same characteristic function. Wikipedia claims that cumulative distribution functions and characteristic functions are in one-to-one correspondence. This implies that the monkeys' finishing-time-distributions are the same.

Squirrel problem

The transition matrix for a monkey aiming for both words COCONUT and TUNOCOC is:

P3 = {

{24, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0},

{23, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0},

{24, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0},

{23, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0},

{23, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0},

{23, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0},

{24, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0},

{23, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0},

{23, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0},

{23, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0},

{24, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0},

{23, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0},

{24, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1},

{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26, 0},

{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26}} / 26

Here the states have been numbered in order - C CO COC COCO COCON COCONU T TU TUN TUNO TUNOC TUNOCO COCONUT TUNOCOC. Note that the absorbing states come last. The same reference as above shows that we can compute the probability $B3_{ij}$ of ending from a transient state $i$ to an absorption state $j + 13$ as follows:

Q3 = P3[[1 ;; 13, 1 ;; 13]];

R3 = P3[[1 ;; 13, 14 ;; 15]];

N3 = Inverse[IdentityMatrix[13] - Q3]

B3 = N3.R3

The matrix $B3 * 617830874$ is

        COCONUT   TUNOCOC

-       308915099 308915775

C       308915100 308915774

CO      308915125 308915749

COC     308915776 308915098

COCO    308932701 308898173

COCON   309372075 308458799

COCONU  320796475 297034399

T       308915098 308915776

TU      308915073 308915801

TUN     308914423 308916451

TUNO    308897523 308933351

TUNOC   308458124 309372750

TUNOCO  297033749 320797125

Since the monkey starts from nothing, the probability for writing COCONUT and TUNOCOC is 308915099 and 308915775, respectively, divided by 617830874. These correspond to 0.4999994529 and 0.5000005471, computed with Mathematica accurate to 10 decimal places. Therefore the monkey is more probable to write TUNOCOC than COCONUT.

Code

Here is the Python-code I used to generate the transition matrices.

def computeTransitionMatrix(words, keys):

    def properPrefixes(word):

        return (word[:i] for i in range(len(word)))



    def suffixesInDecreasingLength(word):

        return (word[i:] for i in range(len(word) + 1))



    prefixToState = {}

    stateToPrefix = 



    def addState(prefix):

        if prefix in prefixToState:

            return

        prefixToState[prefix] = len(stateToPrefix)

        stateToPrefix.append(prefix)



    # Create a state for each proper prefix of the word.

    for word in words:

        for prefix in properPrefixes(word):

            addState(prefix)



    # Create a state for each word last.

    for word in words:

        addState(word)



    print(stateToPrefix)



    # Number of states.

    nStates = len(stateToPrefix)



    # Compute the (scaled) transition probabilities.

    transitions = 

    for i in range(nStates):

        row = [0] * nStates

        transitions.append(row)



        prefix = stateToPrefix[i]

        if prefix in words:

            # The word is an absorptive state.

            row[i] = len(keys)

            continue

        for key in keys:

            nextPrefix = prefix + key

            # Find the longest suffix which

            # is a prefix of the word. 

            for suffix in suffixesInDecreasingLength(nextPrefix):

                j = prefixToState.get(suffix)

                if j != None:

                    row[j] += 1

                    break



    return transitions



# The words the monkey is supposed to write.

words = ['COCONUT', 'TUNOCOC']



# Keys available on monkey's keyboard.

keys = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'



transitions = computeTransitionMatrix(words, keys)

print(repr(transitions).replace('[', '{').replace(']','}').replace('}, ', '},n'))

Thanks for doing the work of establishing the Markov matrices and doing the calculations! It should be noted that the discussed effects about the repeated sub-words "CO" and "OC" can be seen: the expected value vectors start and and end the same, but have different values for COC/TUN and COCO/TUNO. The differences just vanish again in the end. The reason for this is IMO that COCONUT can not overlap with itself in a text (like e.g. NUTNU could). That means that for all such words of the same length (say "TOBACCO"), the expected time should be exactly the same. — Nov 24 '18 at 19:18
It means the expected time is simply 26**7. How anticlimactic! :) — Nov 25 '18 at 9:09
Thanks. This is a great answer for learning some cool tricks in Mathematica. — Nov 26 '18 at 8:26
This is a great answer! In my answer I calculated the probabilities for the two squirrels in the second problem too, but my result is slightly more in favour to the second squirrel (rounds to around 50.037 %). I'd expect the methods used to be equivalent, so maybe there's some typo or silly error in one of the two answers? I'll check when I have more time — Nov 27 '18 at 9:30

score 16 · Accepted Answer · 2018-11-23 14:48:15Z

16

(a) has been adequately answered by

Ingix.

In (b), though:

The second squirrel is more likely to win.

Consider:

A simpler example, where instead of typing "COCONUT" or "TUNOCOC", the monkeys are trying to type "AB" or "CA". Also, the keyboard only has "A", "B", and "C" keys. Consider what happens after the first three keystrokes, for which there are 27 possibilities.

In this case:

In 5 out of the 27 possiblities, "AB" appears and "CA" does not. Similarly, in 5 of the possiblities, "CA" appears and "AB" does not. But there's also one possibility, "CAB", where both appear, and squirrel 2 wins. In this instance, the "A" is useful for both, but it was useful for the second squirrel earlier. The "setup" for the first squirrel to match will often result in the second squirrel already having matched.

Now consider:

A similar thing is true of "COCONUT" versus "TUNOCOC". The first squirrel wants "COC" as a setup, but if "COC" does appear, it's often at the tail end (pun intended) of the second squirrel's victory.

edited Nov 23 '18 at 14:48

answered Nov 23 '18 at 14:29

Sneftel

1,5971019

$begingroup$
It sounds plausible. But it would also mean that if there's any bias, it will be probably too small to be detected by random simulations, right?
$endgroup$
– Eric Duminil
Nov 23 '18 at 18:45

$begingroup$
@Duminil correct. Even simulating a single play-through would be fairly expensive.
$endgroup$
– Sneftel
Nov 23 '18 at 20:11

$begingroup$
Can't we check the CAB-Thesis, very cheaply and extrapolate from there?
$endgroup$
– Falco
Nov 26 '18 at 13:46

$begingroup$
@Falco Sure, but there's no need for that. The situation is simple enough to solve analytically based on the initial conditions and the transition matrix. In that situation, the second squirrel wins precisely 60% of the time. Of course, I don't offer a formal proof that it is valid to extrapolate from AB to COCONUT.
$endgroup$
– Sneftel
Nov 26 '18 at 14:51

$begingroup$
I ran a simulation with "C,O,N,U,T,X" as possible letters (since the number of dummy-letters will only make the bias smaller. And I got a fairly consistent bias of over 50% wins for the second squirrel
$endgroup$
– Falco
Nov 26 '18 at 14:51

|
show 2 more comments

score 15 · Accepted Answer · 2018-11-23 14:42:34Z

15

@Ingix provides a good proof, one certainly worthy of the tick, but one that could perhaps use some reinforcement. The answer is that the N number of letters would have the same expected value. A few points that might help @jafe realise the flaw in his proof.

The order of independent events occurs has no directional bias. For example, drawing a King Of Hearts followed by an Ace Of Hearts, from a 52 card deck is no more probable than an Ace followed by a King.

Whilst the COCOCONUT argument might seem to hold water, it's first important to realise that the probability of that 9 letter sequence is the same as the probability of DOCOCONUT, or for that matter TUTUNOCOC having the initial CO does not gain anything to the probability as there's still a 1/26 chance of getting the next letter correct.

The apparent gain of the fallback that COCOCONUT gives you is offset by the amount of extra letters that you have to type. If you get the sequence COCOCONUX, you have wasted 9 letters, whereas TUNOCOX only wastes 7. (And COCOCONUX is far more improbable than TUNOCOX).

TLDR: the order of independent variables has no favoured direction.

(P.S. This is my first stack answer, so feel free to give any constructive feedback)

edited Nov 23 '18 at 14:42

answered Nov 23 '18 at 14:22

WittierDinosaur

1594

2

$begingroup$
Makes sense! Welcome to Puzzling.
$endgroup$
– jafe
Nov 23 '18 at 14:25

1

$begingroup$
I'm still confused about this, but your 3) point seems very good to me
$endgroup$
– George Menoutis
Nov 23 '18 at 14:26

$begingroup$
Welcome to Puzzling! Yup, your first two points are exactly right. As for the wastage of letters, I don't think it's really a good argument. I'm not sure it works that way. IMO the reason why COCOCONUT is a false benefit, is because when considering the probability of the events happening, you are assuming the letters before the last 7 are essentially random. This means each letter has 26 possibilities. As such, the combination of CO- as a prefix to COCONUT is already considered within the probability count. To factor it in again is double counting.
$endgroup$
– Arch2K
Nov 23 '18 at 14:52

$begingroup$
Thanks for your feedback @Ong Yu Hann. The difference between the 3rd point and the others is that the first 2 are easily provable mathematical facts. The 3rd is simply to help people logically understand why COCOCOCO doesn't provide a probabilistic benefit.
$endgroup$
– WittierDinosaur
Nov 23 '18 at 14:58

$begingroup$
While it's true that COCONUT and TUNOCOC are equivalent, you are making a far more sweeping (and false) statement when you imply that all sequences of the same length are equivalent. For a simple counterexample that can be checked by hand (using Markov chains, or even just by trying it): if you flip coins over and over, on average it will take you 6 coinflips until you see HH, but only 4 coinflips until you see HT.
$endgroup$
– Misha Lavrov
Nov 27 '18 at 1:14

|
show 3 more comments

score 7 · Accepted Answer · 2018-11-26 15:46:13Z

There is a clever trick to finding the kind of expected times that appear in part (a). It turns out that both for COCONUT and TUNOCOC, the expected time is just $26^7$ (so the monkeys have equal expected times to win), but for this to be convincing, I will demonstrate the method on a case where the expected value is unexpected: the word ABRACADABRA.

Suppose that a monkey is typing randomly, and a family of beavers decides to bet on the outcome. The rules for betting are as follows:

Right before a new letter is typed, a new beaver shows up and is given $1$ dollar. The beaver bets that dollar on that letter being A, at $26:1$ odds.

If the beaver wins (and now has $26$ dollars), the beaver bets all of them on the next letter being B.

If the beaver wins (and now has $26^2$ dollars), the beaver bets all of them on the next letter being R.

In general, whenever the beaver wins the first $k$ bets (and has $26^k$ dollars), the beaver bets all of them on the next, $(k+1$)-th letter still matching the $(k+1)$-th letter of ABRACADABRA.

Any time a beaver loses a bet, that beaver has no more money and goes home. (But at any time, multiple beavers may be betting, because a new beaver shows up right before each letter.)

Each bet is fair, so after $n$ letters are typed, the expected number of money the beavers have altogether is $n$: the $n$ dollars that the beavers get when they show up.

When ABRACADABRA first appears, we can count the total amount of money in the system:

One lucky beaver has $26^{11}$ dollars from betting on ABRACADABRA all the way through.

One beaver started on the ABRA at the end of ABRACADABRA, and has $26^4$ dollars.

One beaver just started betting before the last A, and has $26$ dollars.

All other beavers lost their bets and have no money.

So we are guaranteed to have $26^{11} + 26^4 + 26$ dollars in the system. But the expected amount of money in the system is equal to the number of letters typed. So in expectation, it takes $26^{11} + 26^4 + 26$ letters to type ABRACADABRA.

(Formally, we are invoking Wald's equation to see that the rule "the expected money in the system is equal to the expected number of letters typed" still holds when the number of letters typed depends on the letters that have been typed - when we stop once ABRACADABRA is reached)

Now, going back to COCONUT and TUNOCOC: neither word ends with an initial segment of the word. (The CONUT in COCONUT doesn't change anything.) If the beavers were betting on COCONUT, then at the end when COCONUT is typed, one beaver would have $26^7$ dollars and no other beavers would have anything. The same is true for TUNOCOC. So the expected time is just $26^7$ in both cases.

(A weird consequence is that a word like AAAAA has a longer expected waiting time than a word like ABCDE. To explain this intuitively, consider that if ABCDE hasn't been typed yet, the last five letters can can be any of AABCD, BABCD, CABCD, ..., ZABCD, giving us 26 ways to win on the next letter. But if AAAAA hasn't been typed yet, the last five letters can only be any of BAAAA, CAAAA, DAAAA, ..., ZAAAA, giving us only 25 ways to win.)

This sounds exciting. Could you add a note on how Wald's equation is applied here? Referring to Wikipedia's page, what is the sequence (X_n), what is N, and how to see that X_N are i.i.d., and in particular why N is independent of the sequence (X_n)? — Nov 26 '18 at 23:03
We are using the general version, so we don't need either of the requirements you mention. Here, $N$ is the number of steps taken and $X_n$ is the change in how much money there is in the system at step $n$. Condition 2 in Wikipedia's article essentially requires that $N$ be a stopping time: $N ge n$ is independent of $X_n, X_{n+1}, X_{n+2}, dots$ (so it does not require telling the future). — Nov 26 '18 at 23:13
Thanks, I somehow missed the general version. I replicated that the mean time for ABRACADABRA is indeed $26^{11} + 26^4 + 26$ using Markov chains as in my answer. This answer is probably the best one for problem (a) this far. — Nov 27 '18 at 1:04

score 5 · Accepted Answer · 2018-11-27 22:29:39Z

Problem (a) has been adequately covered in other answers.

In this answer I'll show that in problem (b), the second squirrel has better chances.

First (for introduction purposes only) we'll see a simplified version of the problem, that is simple enough to be run on a simulation

Then we'll see how we can exactly calculate the probability that the second squirrel will win, in this simplified example. The method that we'll use does not depend on any simplification, and can be applied to the original problem as well.

We'll see that the result that we obtain with this method matches with the results of the simulation.

Having then some extra confidence in the method (not that it's really needed, since it's fairly clear), we'll apply it to the original problem and obtain a result.

Simplified problem:

A monkey is typing capital letters from an alphabet that contains the letters A, B and X. Two squirrels observe the sequence of letters thus generated. The first squirrel wins if AAB appears (as three successive letters) before BAA. The second squirrel wins if BAA appears before AAB. Which squirrel is more likely to win?

If chosen this problem because it has a similarity with the original, which I believe is crucial: "AAB" can overlap with "BAA" only in the form "AABAA", while in the other way around it can be either "BAAAB" or "BAAB". Just like "COCONUT" followed by "TUNOCOC" can be overlapped only in the form "COCONUTUNOCOC", while, the other way around, they can be overlapped like "TUNOCOCOCONUT" or like "TUNOCOCONUT".

Note however that we'll be solving the actual problem in the end. The only reason I'm including a simpler version is to explain the method on a simpler case, and because for the simpler problem we can run a simulation to verify our predictions.

Explanation

We must realize that at any point in the game, any of the two strings can be partly under construction, or not. More importantly, the only thing that affects the probabilities at any point in the game is how much of each string is already written. If, for example, the two last letters of the string so far are "XA", that means that the string AAB is partly written up to "A", and the string BAA is not partly written at all. This and only this is what determines which of the two squirrels is better off at this point, regardless of what has been written so far before "XA".

Therefore the state of the game at any given point is defined exclusively by the longest suffix of the sequence written so far that is also a prefix of at least one of the strings that the squirrels are betting at.

Thus this game has exactly seven different possible states, which are:

$epsilon$ (empty string, meaning no matching suffix)

$verb|A|$

$verb|AA|$

$verb|AAB|$

$verb|B|$

$verb|BA|$

$verb|BAA|$

On each of these different states, the squirrels may have different chances of winning. Let's note $P_epsilon$ as the chances that the second squirrel has when the state is $epsilon$, $P_{A}$ as the chances that he has when the state is $verb|A|$, etc. What we're interested in is $P_epsilon$, because that's the initial state of the game.

We can easily formulate each of this probabilities in terms of the others. From state $epsilon$ there's a 1/3 chance of writing an "A", thus moving to state A. There's also a 1/3 chance of transitioning to B, and finally a 1/3 chance of staying at e (if an "X" is written). That means that the chances for the second squirrel on state e are:

$$
P_{epsilon} = frac{1}{3} P_{A} + frac{1}{3} P_{B} + frac{1}{3} P_{epsilon}
$$

We can similarly express the chances for all the other states. Note that $P_{AAB} = 0$, because the second squirrel loses immediately on that state, and $P_{BAA} = 1$, because it wins immediately. The full list is:

$$
begin{align*}
P_epsilon &= frac{1}{3} P_{A} + frac{1}{3} P_{B} + frac{1}{3} P_epsilon \
P_{A} &= frac{1}{3} P_{AA} + frac{1}{3} P_{B} + frac{1}{3} P_epsilon \
P_{AA} &= frac{1}{3} P_{AAB} + frac{1}{3} P_{AA} + frac{1}{3} P_epsilon \
P_{AAB} &= 0 \
P_{B} &= frac{1}{3} P_{BA} + frac{1}{3} P_{B} + frac{1}{3} P_epsilon \
P_{BA} &= frac{1}{3} P_{BAA} + frac{1}{3} P_{B} + frac{1}{3} P_epsilon \
P_{BAA} &= 1
end{align*}
$$

This is simply a system of linear equations, and it's simple enough to be solved by hand. The solution is $P_epsilon = frac{8}{13}$ or approximately 61.54%. These are the chances for the second squirrel when the game starts. The chances for the first squirrel are $frac{5}{13}$, about 38.46%.

Running the simplified problem

Now, because this game has only three letters, the average game will be pretty short, so much in fact that we can play it enough times to have some statistics. The following program (in Java) simulates 10 million games and prints the percentage of wins for each squirrel. I've run it several times and the results are always in the range of 38.4 - 38.5% of wins for the first squirrel, and 61.5 - 61.6% of wins for the second squirrel, which matches very well with our previous calculations.

public class Monkeys {



     private enum Squirrel {

         FIRST_SQUIRREL,

         SECOND_SQUIRREL

     }



     private static final Random random = new Random();

     private static final char letters = "ABX".toCharArray();



     public static void main(String args) {



         int winsF = 0;

         int winsS = 0;

         int iterations = 10000000;



         for (int i = 0; i < iterations; i++) {



             Squirrel winner = playGame();

             if (winner == Squirrel.FIRST_SQUIRREL) {

                 winsF++;

             } else if (winner == Squirrel.SECOND_SQUIRREL) {

                 winsS++;

             }

         }



         System.out.println("First squirrel wins:  " + ((double) winsF) / (winsF + winsS));

         System.out.println("Second squirrel wins: " + ((double) winsS) / (winsF + winsS));



     }



     private static Squirrel playGame() {

         StringBuilder sb = new StringBuilder();

         while (true) {

             if (sb.length() >= 3) {

                 if (sb.substring(sb.length() - 3, sb.length()).equals("AAB")) {

                     return Squirrel.FIRST_SQUIRREL;

                 }

                 if (sb.substring(sb.length() - 3, sb.length()).equals("BAA")) {

                     return Squirrel.SECOND_SQUIRREL;

                 }

             }

             sb.append(letters[random.nextInt(letters.length)]);

         }

     }

 }

Actual problem

Great! So the only thing we need to do now is apply the same method for the original problem. Following the same reasoning, in the original problem there are 15 distinct states:

$epsilon$

$verb|C|$

$verb|CO|$

$verb|COC|$

$verb|COCO|$

$verb|COCON|$

$verb|COCONU|$

$verb|COCONUT|$

$verb|T|$

$verb|TU|$

$verb|TUN|$

$verb|TUNO|$

$verb|TUNOC|$

$verb|TUNOCO|$

$verb|TUNOCOC|$

The chances for the second squirrel in each of the states are (thanks @kaba for pointing a previous error):

$$
begin{align*}
P_epsilon &= frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{24}{26} P_epsilon \
P_{C} &= frac{1}{26} P_{CO} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{CO} &= frac{1}{26} P_{COC} + frac{1}{26} P_{T} + frac{24}{26} P_epsilon \
P_{COC} &= frac{1}{26} P_{COCO} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{COCO} &= frac{1}{26} P_{COCON} + frac{1}{26} P_{COC} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{COCON} &= frac{1}{26} P_{COCONU} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{COCONU} &= frac{1}{26} P_{COCONUT} + frac{1}{26} P_{C} + frac{24}{26} P_epsilon \
P_{COCONUT} &= 0 \
P_{T} &= frac{1}{26} P_{TU} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{TU} &= frac{1}{26} P_{TUN} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{TUN} &= frac{1}{26} P_{TUNO} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{TUNO} &= frac{1}{26} P_{TUNOC} + frac{1}{26} P_{T} + frac{24}{26} P_epsilon \
P_{TUNOC} &= frac{1}{26} P_{TUNOCO} + frac{1}{26} P_{C} + frac{1}{26} P_{T} + frac{23}{26} P_epsilon \
P_{TUNOCO} &= frac{1}{26} P_{TUNOCOC} + frac{1}{26} P_{T} + frac{24}{26} P_epsilon \
P_{TUNOCOC} &= 1
end{align*}
$$

Finally let's solve this system of equations. I've used SageMath for that purpose. The script is (after renaming the variables):

var('a, b, c, d, e, f, g, h, i, j, k, l, m, n, u')



eqn = [

  a == (1/26) * b + (1/26) * i + (24/26) * a,

  b == (1/26) * c + (1/26) * b + (1/26) * i + (23/26) * a,

  c == (1/26) * d + (1/26) * i + (24/26) * a,

  d == (1/26) * e + (1/26) * b + (1/26) * i + (23/26) * a,

  e == (1/26) * f + (1/26) * d + (1/26) * i + (23/26) * a,

  f == (1/26) * g + (1/26) * b + (1/26) * i + (23/26) * a,

  g == (1/26) * h + (1/26) * b + (24/26) * a,

  h == 0,

  i == (1/26) * j + (1/26) * b + (1/26) * i + (23/26) * a,

  j == (1/26) * k + (1/26) * b + (1/26) * i + (23/26) * a,

  k == (1/26) * l + (1/26) * b + (1/26) * i + (23/26) * a,

  l == (1/26) * m + (1/26) * i + (24/26) * a,

  m == (1/26) * n + (1/26) * b + (1/26) * i + (23/26) * a,

  n == (1/26) * u + (1/26) * i + (24/26) * a,

  u == 1

]



solution = solve(eqn, a, b, c, d, e, f, g, h, i, j, k, l, m, n, u)



print(solution)

And the output (formatting mine, for readability):

[

[

a == (308915775/617830874),

b == (154457887/308915437),

c == (308915749/617830874),

d == (154457549/308915437),

e == (308898173/617830874),

f == (308458799/617830874),

g == (297034399/617830874),

h == 0,

i == (154457888/308915437),

j == (308915801/617830874),

k == (308916451/617830874),

l == (308933351/617830874),

m == (154686375/308915437),

n == (320797125/617830874),

u == 1

]

]

We see that the probability that the second squirrel will win is exactly $frac{308915775}{617830874}$ or approximately 50.0000547%

The coefficients you use to relate different $P_{?}$ correspond to the transition matrix $P3$ in my answer. I think you have a typo for $P_{COCO}$ where $P_C$ should be replaced with $P_{COC}$. — Nov 27 '18 at 13:04
Using notation from my answer, you are solving the equation $P3 begin{bmatrix} x_T \ x_A end{bmatrix} = begin{bmatrix}Q3 && R3 \ 0 && I end{bmatrix} begin{bmatrix} x_T \ x_A end{bmatrix} = begin{bmatrix} x_T \ 0 \ 1 end{bmatrix}$ which is equivalent to $x_A = begin{bmatrix} 0 \ 1 end{bmatrix}$ and $x_T = (I - Q3)^{-1} R3 begin{bmatrix} 0 \ 1 end{bmatrix}$. — Nov 27 '18 at 13:30
@kaba you're right, thanks for spotting that! It's not really a typo but rather a mistake that carries on to the result too. That probably explains the difference between my result and yours. I will fix it later. — Nov 27 '18 at 13:38
Our solutions are equivalent, and therefore should be equal. I like how leaving it in the linear equation form, as you have, is clearer than solving explicitly using matrix inverse. — Nov 27 '18 at 13:42

Fabio TuratiFabio Turati 1756 · Accepted Answer · 2018-11-24 02:37:27Z

I have tried to create a Markov chain modelling problem a). I haven't even tried to solve it, but the model (i.e. the states and the transitions) should be correct (with one caveat which I'll explain at the end).

First, a simplification: I have reduced the problem to getting either the string A-A-B (as in CO-CO-NUT) or B-A-A (as in TUN-OC-OC), using an alphabet that only includes 3 letters: A, B, and C.

Here's what I've got. This is for AAB:

As you can see, I have separated the state "Only one A" from the state "AA". It is noteworthy that there is no way to go from "AA" to "A".

And this is for BAA:

I'm not sure what this leads to. I was hoping there would be some visible hint about the solution (like: both chains are very similar, with just one difference that tilts the balance in favour of one of them), but this doesn't seem to be the case.

To be honest, I'm not even sure my simplification from COCONUT to AAB is correct, because in my case A and B are equally likely (both of them are just a letter which is typed with probability P = 1/3), whereas CO and NUT are not (CO is only 2 letters long, NUT is 3). This means that solving my chains could, or could not, help solving the original problem.

Hopefully someone can pick up from here.

score 3 · Accepted Answer · 2018-11-25 20:29:58Z

3

I'll try to make the argument that the answer to both parts is:

The two are just as likely.

We start from the simple fact that the probability that a window of 7 letters reads COCONUT, or TUNOCOC, or anything else, is equal. (It's 26^(-7), as @Ingix said.). Crucially, it's completely independent of any letters that appear before or after the window, simply because these letters are not in the window...

Now, let's start with (b). We look at letters 1-7. If they say COCONUT, the first squirrel wins. If they say TUNOCOC (at the same probability!), the second one wins. If they say neither, we move on to letters 2-8 and do the same. At this point, letter 1 is completely irrelevant. You see how we move on and on through the letters, until there's a winner, and the chances of either squirrels to win never differ from each other.

Further clarification:

Of course, if letters 2-7 happened to be COCONU, the first squirrel is in a very good position. Similarly (and in the same probability), if they were TUNOCO, the second squirrel is in a good position. But this is symmetrical, so no one has an overall advantage.

Of course, if letters 1-7 were TUNOCOC, the chance to get COCONUT at letters 5-11 increases, but this doesn't matter since, in this instance, the game has already finished...

In other words, if we were to ask "Which of the two words are we more likely to see at letters 5-11?" (see @abl's comment below), the answer would be TUNOCOC. But the difference is only due to a situation in which the game has already finished. And this is just not the question in hand.

Part (a) is a very similar. We look at letters 1-7 of both monkeys. If the first monkey typed COCONUT, it wins. If the second one types TUNOCOC, it wins. If neither won, we move on to look at letters 2-8 of both, and so on.

No Markov chains :)

edited Nov 25 '18 at 20:29

answered Nov 24 '18 at 22:27

Angkor

1,147416

1

$begingroup$
Except that in (b) the probabilities are not independent across windows. The probability of letters 5-11 being COCONUT, conditional to the fact that you reached this window, are lower than the equivalent for TUNOCOC, simply because you have to account for the case "TUNOCOCONUT" (in which case you wouldn't reach window 5-11). Maybe you can believe the discussion now :)
$endgroup$
– abl
Nov 25 '18 at 0:11

$begingroup$
@abl Sorry, but this is just not true... Are you saying that, typing randomly, one is more likely to type COCONUT than TUNOCOC? These are just two 7-letter sequences, and they're just as likely. Arguments like "consider the case X" are not really valid, because there are so many other cases to consider too. The trick is to be able to consider all cases together with a general argument.
$endgroup$
– Angkor
Nov 25 '18 at 13:37

4

$begingroup$
@Angkor Maybe I didn't explain myself well. Consider all possible sequences that start with "****TUNOCOC...". It's obvious that there are just as many of those as sequences of the form "****COCONUT...", i.e. both situations are equally probable, right? Note that all sequences of the form "****TUNOCOC..." are a win for squirrel 2. And all sequences of the form "****COCONUT..." are a win for squirrel 1, except for those that start with "TUNOCOCONUT..." which are a win for squirrel 2. There's an asymmetry here, so at the very least we can say that those are not "just two 7-letter sequences".
$endgroup$
– abl
Nov 25 '18 at 16:33

2

$begingroup$
@Angkor Point taken, I apologize if I was out of tone. Imagine that the alphabet has only two letters, A and B. The squirrels bet for AAB and BAA respectively. The first three letters are typed. Each squirrel has 1/8 chances of winning at the third letter. If none of them wins, then the sequence is one of AAA, ABA, ABB, BAB, BBA, BBB, with equal probability. At the fourth letter, the possibilities (with equal probability) are AAAA, AAAB, ABAA, ABAB, ABBA, ABBB, BABA, BABB, BBAA, BBAB, BBBA, BBBB. Squirrel 1 wins with AAAB. Squirrel 2 wins with ABAA or BBAA, twice the chance.
$endgroup$
– abl
Nov 25 '18 at 20:43

1

$begingroup$
@Angkor, also, this simplified case is simple enough to be simulated, which I did (see my answer) and the simulation over several million games shows the second squirrel has better chances.
$endgroup$
– abl
Nov 25 '18 at 20:49

|
show 3 more comments

Marco SalernoMarco Salerno 1293 · Accepted Answer · 2018-11-23 15:28:34Z

I tried to make a C# program to do that for me:

class Program

{

    static void Main(string args)

    {

        Log("Le scimmie si preparano alla sfida");



        Monkey monkey1 = new Monkey("COCONUT");

        Monkey monkey2 = new Monkey("TUNOCOC");

        int counter = 0;



        while (true)

        {

            counter++;

            Log("Le scimmie digitano una lettera per la {0}° volta.", counter);



            monkey1.PressLetter();

            monkey2.PressLetter();



            if(monkey1.IsTargetReached() && monkey2.IsTargetReached())

            {

                Log("Entrambe le scimmie hanno vinto!");

                break;

            }

            else if (monkey1.IsTargetReached())

            {

                Log("La scimmia "{0}" ha vinto!", monkey1.Target);

                break;

            }

            else if (monkey2.IsTargetReached())

            {

                Log("La scimmia "{0}" ha vinto!", monkey2.Target);

                break;

            }

        }

        Log("Le scimmie si prendono il meritato riposo dopo {0} turni!", counter);

    }



    private static void Log(string message, params object args)

    {

        Console.WriteLine(DateTime.Now + " - " + string.Format(message, args));

    }

}



public class Monkey

{

    private Random Random { get; set; }

    public string Target { get; set; }

    public string Text { get; set; }



    public Monkey(string target)

    {

        Target = target;

        Text = "";

        Random = new Random();

    }



    public void PressLetter()

    {

        int num = Random.Next(0, 26);

        char let = (char)('a' + num);

        string lastLetter = let.ToString().ToUpper();

        Text += lastLetter;

    }



    public bool IsTargetReached()

    {

        return Text.IndexOf(Target) > -1;

    }

}

After millions of Monkeys' tries, I ended up thinking that they have
the same possibilities.

Anyway my computer still didn't successfully solve the problem.

That's not surprising. The expected number of keystrokes before either monkey wins is well into the billions. — Nov 23 '18 at 15:57
May I advice English comment? As a non native speaker I read every latin language as loren ipsum. — Nov 27 '18 at 13:44

JeffreyJeffrey 1215 · Accepted Answer · 2018-11-24 19:50:11Z

There's good arguments for both sides. Here's code that prove it definitely:

import numpy as np



def main():

    m1 = np.array([

    [25, 1, 0, 0, 0, 0, 0, 0], 

    [24, 1, 1, 0, 0, 0, 0, 0],

    [25, 0, 0, 1, 0, 0, 0, 0],

    [24, 1, 0, 0, 1, 0, 0, 0],

    [24, 0, 0, 1, 0, 1, 0, 0],

    [24, 1, 0, 0, 0, 0, 1, 0],

    [24, 1, 0, 0, 0, 0, 0, 1],

    [0, 0, 0, 0, 0, 0, 0, 26]], dtype='object')



    m2 = np.array([

    [25, 1, 0, 0, 0, 0, 0, 0], 

    [24, 1, 1, 0, 0, 0, 0, 0],

    [24, 1, 0, 1, 0, 0, 0, 0],

    [24, 1, 0, 0, 1, 0, 0, 0],

    [24, 1, 0, 0, 0, 1, 0, 0],

    [24, 1, 0, 0, 0, 0, 1, 0],

    [24, 1, 0, 0, 0, 0, 0, 1],

    [0, 0, 0, 0, 0, 0, 0, 26]], dtype = 'object')



    p1 = np.array([1,0,0,0,0,0,0,0])

    p2 = np.array([1,0,0,0,0,0,0,0])



    for i in range(1000):

        p1 = p1.dot(m1)

        p2 = p2.dot(m2)



        #this shows that how often the two monkes have 1,2,3 letters correct vary

        #but it doesn't affact past the 4th letter

        #print(i,p2 - p1)



        if p1[7] == p2[7]:

            print("after {} steps, both monkeys have equal chances".format(i))

        elif p2[7] > p1[7]:

            print("crap, this whole arguments just got destroyed")

        else:

            print("after {} steps, coconut monkey has {} more ways to have won".format(i, p1[7] - p2[7]))



main()

I ran it up to:

after 99999 steps, both monkeys have equal chances

No matter the number of steps, both monkeys have equal chances. Even though one monkey is more likely to have exactly the last 1,2, or 3 letters correct.

(mind-boggling)

score 1 · Accepted Answer · 2018-11-25 10:19:50Z

I have created the simulation in C# based on Marco's idea.
The alphabet is reduced and the performance is improved:

public class Monkey

{

    private static readonly char alphabet = "ACNOTUXY".ToCharArray();

    private static readonly Random random = new Random();



    public Monkey(string Target)

    {

        target = Target.ToCharArray();

        length = target.Length;

        input = new char[length];

        inputPosition = 0;

    }



    private readonly char target;

    private readonly int length;

    private readonly char input;

    private int inputPosition;



    public bool TargetReached { get; private set; }



    public void PressKey()

    {

        var key = alphabet[random.Next(0, 8)];

        input[inputPosition] = key;

        inputPosition = ++inputPosition % length;

        TargetReached = IsTargetReached();

    }



    private bool IsTargetReached()

    {

        for (var i = 0; i < length; i++)

        {

            if (input[i] != target[i]) return false;

        }

        return true;

    }

}



public class Simulator

{

    private int ties;

    private int monkey1Wins;

    private int monkey2Wins;



    public void Run(int numberOfCompetitons)

    {

        for (var i = 0; i < numberOfCompetitons; i++)

        {

            RunCompetition();

        }

    }



    private void RunCompetition()

    {

        var monkey1 = new Monkey("COCONUT");

        var monkey2 = new Monkey("TUNOCOC");



        while (true)

        {

            monkey1.PressKey();

            monkey2.PressKey();



            if (monkey1.TargetReached && monkey2.TargetReached)

            {

                ties++;

                Console.WriteLine("It's a TIE!");

                break;

            }

            if (monkey1.TargetReached)

            {

                monkey1Wins++;

                Console.WriteLine("Monkey 1 WON!");

                break;

            }

            if (monkey2.TargetReached)

            {

                monkey2Wins++;

                Console.WriteLine("Monkey 2 WON!");

                break;

            }

        }

    }



    public void ShowSummary()

    {

        Console.WriteLine();

        Console.WriteLine($"{ties} TIES");

    Console.WriteLine($"Monkey 1: {monkey1Wins} WINS");

        Console.WriteLine($"Monkey 2: {monkey2Wins} WINS");

    }

}



internal class Program

{

    private static void Main()

    {

        var simulator = new Simulator();

        simulator.Run(1000);

        simulator.ShowSummary();

        Console.ReadKey();

    }

}

Sample output:

...

Monkey 1 WON!

Monkey 1 WON!

Monkey 2 WON!

Monkey 1 WON!

Monkey 2 WON!

Monkey 2 WON!

Monkey 1 WON!

Monkey 1 WON!

Monkey 2 WON!

0 TIES

Monkey 1: 498 WINS

Monkey 2: 502 WINS

So it seems pretty tied to me

TibosTibos 58628 · Accepted Answer · 2018-11-24 11:25:56Z

Monkey problem

The crux the problem is the following: do repeated groups influence the expected characters needed to complete a sequence? The answer is

No, repeated groups bear no influence on the result

Let's reduce the problem to something similar - AAB vs ABC with only these letters being on the keyboard.

The probability of obtaining AAB in a sequence is:

starting at 1: P(A) * P(A) * P(B) = (1/3 * 1/3 * 1/3) +

starting at 2: P(!A) * P(A) * P(A) * P(B) = 2/3 * (1/3 * 1/3 * 1/3) +

starting at 3: P(!A) * P(!A) * P(A) * P(A) * P(B) = 2/3 * 2/3 * (1/3 * 1/3 * 1/3) + ...

The probability of obtaining ABC in a sequence is:

starting at 1: P(A) * P(B) * P(C) = (1/3 * 1/3 * 1/3) +

starting at 2: P(!A) * P(A) * P(B) * P(C) = 2/3 * (1/3 * 1/3 * 1/3) +

starting at 3: P(!A) * P(!A) * P(A) * P(B) * P(C) = 2/3 * 2/3 * (1/3 * 1/3 * 1/3) + ...

This makes it easy to see that

No matter how you shuffle the desired sequence around, the only thing that influences the probability of it occurring is the length of the sequence.

Alternative proof:
All of the possible sequences of 4 letters are:

AAAA ABAA ACAA BAAA BBAA BCAA CAAA CBAA CCAA

AAAB ABAB ACAB BAAB BBAB BCAB CAAB CBAB CCAB

AAAC ABAC ACAC BAAC BBAC BCAC CAAC CBAC CCAC

AABA ABBA ACBA BABA BBBA BCBA CABA CBBA CCBA

AABB ABBB ACBB BABB BBBB BCBB CABB CBBB CCBB

AABC ABBC ACBC BABC BBBC BCBC CABC CBBC CCBC

AACA ABCA ACCA BACA BBCA BCCA CACA CBCA CCCA

AACB ABCB ACCB BACB BBCB BCCB CACB CBCB CCCB

AACC ABCC ACCC BACC BBCC BCCC CACC CBCC CCCC

And here is the same table with the uninteresting sequences cut out:

XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX

AAAB XXXX XXXX BAAB XXXX XXXX CAAB XXXX XXXX

XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX

AABA XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX

AABB XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX

AABC XXXX XXXX BABC XXXX XXXX CABC XXXX XXXX

XXXX ABCA XXXX XXXX XXXX XXXX XXXX XXXX XXXX

XXXX ABCB XXXX XXXX XXXX XXXX XXXX XXXX XXXX

XXXX ABCC XXXX XXXX XXXX XXXX XXXX XXXX XXXX

There are

3 sequences where AAB starts at position 1: AABA, AABB, AABC

3 sequences where AAB starts at position 2: AAAB, BAAB, CAAB

3 sequences where ABC starts at position 1: ABCA, ABCB, ABCC

3 sequences where ABC starts at position 2: AABC, BABC, CABC

Thus the answer to the monkey problem is that

both monkeys have the same chance of finishing first

Squirrel problem

Let's consider the string typed by the monkey after a long (infinite) time and make some simplifications that don't affect the outcome to understand the problem better.

Replace any letter not in COCONUT with X

Replace all groups of COC with A

Replace all remaining C's with X

Replace all remaining O's with B, N's with C, U's with D and T's with E

The problem now is:

in the new string will we more likely first find ABCDE or EDCBA?

And the solution:

If A, B, C, D and E had the same chances of occurring, the answer would be "both are equally likely", but C, D and E have normal chances of occurring, A has a very low chance of occurring, B has a slightly lower than normal chance.

Considering a simplified case - 2 letters - P and Q with P occurring more often than Q, PQ has more chances of appearing first than QP.
This means EDCBA is more likely to appear first than ABCDE.

Looking at it from a different angle:

When the unlikely event that A appears in the string, it is equally likely that it is followed by a B as it is that it is preceded by one.

In the even more unlikely event that an AB or BA appears in the string, it is equally likely that it is followed by a C as it is that it is preceded by one.

...

Which makes it equally likely that ABCDE and EDCBA appear in the string, so when an A appears in either of these sequences, it is equally likely that it starts the ABCDE sequence as it is that it ends the EDCBA sequence.

Thus, the answer to the squirrel problem is that

the second squirrel's sequence (TUNOCOC) appears first

score 0 · Accepted Answer · 2018-11-25 03:03:23Z

Similar to Marco Salerno's answer, I wrote a computer program to solve the monkey problem.

static void Coconuts(int iterations)

{

    Tuple<double, double> outcomes = new Tuple<double, double>[iterations];

    char alphabet = new char { 'C', 'O', 'N', 'U', 'T' };

    string target1 = "COCONUT";

    string target2 = "TUNOCOC";

    for(int i = 0; i < iterations; i++)

    {

        outcomes[i] = new Tuple<double, double>(RandomLettersToTarget(alphabet, target1), RandomLettersToTarget(alphabet, target2));

        Console.WriteLine($"{outcomes[i].Item1:0.0}t{outcomes[i].Item2:0.0}");

    }



    double average1 = outcomes.Select(pair => pair.Item1).Average();

    double average2 = outcomes.Select(pair => pair.Item2).Average();



    Console.WriteLine($"Average durations are {average1:00000.0}, {average2:00000.0}");

}



static int RandomLettersToTarget(char alphabet, string target, int stepsToInsanity = 100000000)

{

    string soFar = "";

    int steps = 0;

    while(steps < stepsToInsanity)

    {

        soFar += alphabet[rng.Next(0, alphabet.Length)];

        if(soFar.Length >= target.Length)

        {

            soFar = soFar.Substring(soFar.Length - target.Length, target.Length);

        }

        if(target.Equals(soFar))

        {

            return steps;

        }

        steps++;

    }

    return stepsToInsanity;

}

With the reduced alphabet (containing only 5 letters) this program does complete, and the results over thousands of iterations

show no clear winner. Thus, I conclude that each monkey has equal probability of winning the first game.

The difference in odds is so slight that a computer program will give you no reliable answer. See my answer for details. — Nov 27 '18 at 13:35

score 0 · Accepted Answer · 2018-11-27 13:40:12Z

(a)

The fact that two monkeys are typing is irrelevant to the study of the sequence. They're both typing on the same computer (see title), so there is only one sequence of letters. There is no bias, both monkeys have the exact same chance of having their word appear first. Winning, in the clear text, means "having my word come out first".

We state obviously that, if less than $7$ letters are typed, both monkeys have equally $0$ chance of winning.

Let's do some recurrence reasoning, trying to prove that if both monkeys have the same chance of winning after $n$ letters are typed, they have the same chance of winning after $n+1$ letters are typed.

Let $n$ be greater than or equal to $6$. Given there is no winner after the $n$th letter is typed, and both monkeys have had the same chance of winning up until now, (which is trivially established for $n=6$) there are three cases:

the letters $n-5$ to $n$ say COCONU and the letter typed is T, monkey 1 wins. $1$ chance out of $26^7$ ($26^6$ suffixes times $26$ possible letters)

the letters $n-5$ to $n$ say TUNOCO and the letter typed is C, monkey 2 wins. $1$ chance out of $26^7$ ($26^6$ suffixes times $26$ possible letters)

all other cases, no winner. $26^7 - 2$ out of $26^7$

And the monkeys have the same chance of winning upon the $n+1$th letter typed. This completes the proof by recurrence.

So, reading the question correctly, the one expected to type more will be the one that stops last, assuming they type at the same frequency. There are two cases. Either the first word out is COCONUT, in which case, once monkey 1 stops typing, monkey 2 has T as a prefix for his word to come out. Or the first word out is TUNOCOC, in which case, once monkey 2 stops typing, monkey 1 has COC as a prefix for his word to come out. Therefore, monkey 2 is expected to have to type more than monkey 1 in this rare case, and then same as monkey 1 once he spoils his opportunity to finish the word.
So, yes, monkey 2 is expected to type slightly more, due to these opportunities being unequal.

(b)

If "before" means "before in time", we are in the situation described in the clear text. Both squirrels have the same odds, if they stop watching after the first word appears. If they wait for both words to appear, well, this does not change anything, the word appearing first is the word appearing first.

Whereas

If "before" means "before in the lexical order", squirrel 1 wins, and the fact that a monkey is typing on a computer is irrelevant. This shows that squirrel 1 has better chances of winning anyway.

Look, Ma! No Markov chains!

2 Monkeys on a computer [closed]

closed as off-topic by ABcDexter, Astralbee, Chowzen, rhsquared, boboquack Dec 2 '18 at 9:35

closed as off-topic by ABcDexter, Astralbee, Chowzen, rhsquared, boboquack Dec 2 '18 at 9:35

closed as off-topic by ABcDexter, Astralbee, Chowzen, rhsquared, boboquack Dec 2 '18 at 9:35

closed as off-topic by ABcDexter, Astralbee, Chowzen, rhsquared, boboquack Dec 2 '18 at 9:35

15 Answers 15

Monkey problem

Squirrel problem

Code

Simplified problem:

Explanation

Running the simplified problem

Actual problem

15 Answers 15

15 Answers 15

Monkey problem

Squirrel problem

Code

Monkey problem

Squirrel problem

Code

Monkey problem

Squirrel problem

Code

Monkey problem

Squirrel problem

Code

Simplified problem:

Explanation

Running the simplified problem

Actual problem

Simplified problem:

Explanation

Running the simplified problem

Actual problem

Simplified problem:

Explanation

Running the simplified problem

Actual problem

Simplified problem:

Explanation

Running the simplified problem

Actual problem

這個網誌中的熱門文章

Academy of Television Arts & Sciences

L'Équipe

1995 France bombings

15 Answers
15

15 Answers
15

15 Answers
15