Yahtzee ai

this has nothing to do with jmonkey, but i need some creative answers here…

you probably know about yahtzee. i made a little tool which calculated the expectancy values for every possible situation in the game, so you know what to do if you want to get the maximum possible average score.

now, the ai reached an average of 245 points, sometimes more, sometimes less. i want to reduce the variance.



the key problem is:

if the possible scores for a specific choice is 1 bazillion times zero, 1,2,3,4,4,4,4,4,4,4,4, and almost unlimited once, the average value will tell me to always take the chance because no matter how low the chance is, if i win, this will boost my average score to almost unlimited.

now imagine another player who always takes a safe choice. in this case, he might always take a secore single point, leading to a much lesser average score, but still win most of the games.



i need a formula which takes numbers like 0,0,0,0,0,0,1,1,1,50 and tells me that choosing this path is a bad idea. ideally, it should be a function like evalExpectancyValue(float[] values, float howRiskyShouldIPlay) where risky = 0 means "take only what is guaranteed" and 1 means "take the average value"



any ideas?

Why not take the "expectancy" (i.e. add payoff times its probability), and then recompute expectancy by subsampling the space of possible moves until you find an optimum set of moves… In particular there is a greedy way of doing this (and I think would suffice for you) that is removing all choices for which Probability * payoff < Current Expectancy… And repeat!



Obviously you could also compute simply the one move that has the highest individual expectancy.

I'm not sure that I understand the problem (I don't know the game), and have no real experience in practice, but that's not going to stop me offering advice!



I think you are saying that you have to predict a value randomly chosen from a given set, and that if you win you get that value. So for your example, the set is (0,0,0,0,0,0,1,1,1,50), only picking 1 or 50 will earn any points, and they have expected values of 0.3 and 5 respectively, but you'd like to be able to set your algorithm to go for the safer option (1) sometimes. Analogous to working out what bets to lay: favourites, outsiders or lottery tickets. Right?



My approach would be this function:



    calcChoice(float[] values, float pointLead, float expectedRemainingTurns)



With only a few turns left (relative to the frequency of the jackpot option) and an inconclusive lead you might want to tighten up. Conversely, when narrowly behind with only a few turns left you may want to loosen up. Otherwise you'd just play expected value.



The algorithm in the function basically wants to work out what strategy increase the chance of maintaining (or making up) the lead. I'd use expected values and variances for this, but there are other methods which may be better in this case.

my current solution is to simply compare each value of set one to each value of set two and count which set would win. a simple example:

1,3,6 vs 2,4



set 1 vs set 2:

1-> 0 wins

3 -> 1 win (beats 2)

6 -> 2 wins (beats 2 and 4)



set 1 wins 3 times.



set 2 vs set 1:

2 -> 1 win (beats 1)

4 -> 2 wins (beats 1 and 3)



set 2 wins 3 times.



so both sets are equal in terms of "which one wins more often".



this works perfectly fine. the question is, can this method be used when several steps are combined? imagine a tree-like structure of possible moves. at the end of all nodes are scoresets like the ones in my example. how can i compare which nodes (not leafs) are better?

my idea was to do this:

 public void mergeWith(final VarSizeScores p_scores)
  {
    TreeMap<VInt, VDouble> l_tmp = new TreeMap<VInt, VDouble>(ms_cmp);
    for (Map.Entry<VInt, VDouble> l_entry : m_rates.entrySet())
    {
      double l_count = l_entry.getValue().d;
      p_scores.addToAll(l_count, l_entry.getKey().i, l_tmp);
    }
    m_rates = l_tmp;
    m_compressedData = null;
    onDataChanged();
  }

  private void addToAll(final double p_count, final int p_value, final TreeMap<VInt, VDouble> p_storeIn)
  {
    for (Map.Entry<VInt, VDouble> l_entry : m_rates.entrySet())
    {
      getRateForScore(l_entry.getKey().i + p_value, p_storeIn).add(l_entry.getValue().d * p_count);
    }
  }



in human speech:


if i can combine the choice of set 1 with set 2 OR set 3, which combined result is the better one?
to figure that out, i add each element of set 2 to each of set 1, and each of set 3 to a clone of set 1.
example:
set 1 is "1,2,3,4,5". set 2 is "35. the combined set is "36,37,38,39,40".
set 3 is "10,50" so the combined 1+3 is ""11,12,13,14,15,51,52,53,54,55"

does this work? the result is good, but is it really the best one?
in my case, this algorithm beats the much simpler maximum average algorithm in a simplified yahtzee (only four of a kind, three of a kind and full house), but slightly looses in the full game because of precision problems so i can't tell if it would win.
rare cases are "rounded away" during the calculations.
i tried using bigdecimal to solve this, but the calculations will take WEEKS and eat about 60gb ram, so that's a big "not yet"