Optimize for maximum median instead of maximum average

HamsterofDeath · March 16, 2009, 5:15pm

i have a problem which i can't solve. i have a virtual reality (details don't matter) in which an ai or a human player can make a lot of choices. every choice leads to another node of the choice tree, and for each choice, the player is rewarded with a random value out of a collection which is assinged to this special path.

example: basically any dice based game - yahtzee, star quest, even secret of mana or diablo (because of the random damage)

the complete tree can be calculated, all possible states of the reality are known. i can now easily optimize the ai's choices to get the maximum average value*, but that's not what i want. i want it to go for the maximum median.

if the number of choice is reaaaaly large, going for the maximum average score of a path is the way to go, but the shorter the path is the higher the chance the "average target"-score will be lower than the "median-target".score.

a little example, i already posted it somewhere:

you can choose between one path giving 0,0,0,0,0,0,0 or 99999 points and another one giving 1 point. when aiming for the maximum average score, you will choose path one. the median-strategy will have a lesser average score, but win most of the games.

the question is:

how to make my ai make those choices that lead to the maximum median?

it's really simple for depth one (one choice) - just calculate the median values of choose the highest one, but how to do it for two or more?

*

this is pretty simple once you get the idea. just traverse all directly reachable choices, calulate the average score of each choice (remember, each choice has a collection of scores) and add the expectancy value of all remaining open choices (difficult to calculate, but easy to talk about). for simplicity, let's just assume every node has a value which represents the value you can expect to get if you reach that node and play an unlimited number of games.

HamsterofDeath · March 17, 2009, 8:19am

now that i think about it, it doesn't have to be the median. there are other possibilities as well. a high median would help greatly. though

js555 · March 17, 2009, 9:02am

When you are evaluating each node, if you take both the mean and median values it might give a good hint to the rule system - if the mean is pretty much the same as the median, you know that choice has got a fairly good chance of paying off no matter which of the random values you are awarded for visiting. If the mean is much greater than the median, the AI knows its a bit of a gamble going on that spot (as in your example, most of the time you'll draw a zero rather than the 99k points).

If you score each choice based on a factor of mean and median, might that help?

HamsterofDeath · March 17, 2009, 9:29am

i'll do some research and run a few simulations

(i love do to this. when i was a child, i watched pretty much any star trek episode and every time i thought: i want to do funky stuff with computers, do calculations to solve complex problems with a modified tachion impulse - now i'm actually doing it and none of my friends and familiy understands wtf i'm doing ;D)

js555 · March 17, 2009, 10:05am

Lol yeah that sounds familiar

I was thinking about this some more and was thinking you could do add a parameter to the AI to decide if it was cautious player or risk taking player :

The cautious AI would score the nodes based on the modal value (the one most likely to occur on a single play through), a medium risk AI would use the median value (the value most likely to occur after a couple of plays), medium-high would take the mean value, and the max risk would take the maximum possible value each time (a crazy strategy but it might make things interesting!)

Its strange which problems make you ponder for a while, this is definitely one of them

js555 · March 17, 2009, 2:58pm

The idea of including the depth of the choice tree is interesting because that'll change the behaviour at different ends of the tree. Depending on the mix it will be more optimistic at one end and and more pessimistic at the other.

To be consistent the strategy for making a choice at each node in the tree would have to be independent of depth. I think you are right about the variance thing, if there are a large number of unbalanced nodes you'd expect the mean AI to score really highly in a small fraction of games but the median AI would still win the lions share.

I wonder if there is ever a time you'd want to take the mean score - if you play a one node game the median would always come out on top (assuming the node always as > 2 choices that aren't evenly distributed) - choosing based on the chance of getting a big reward would just be taking a gamble… so I think I'm back to liking the sound of the "riskiness" factor on the AI.

Give it a float between 0->1 and score the node something like (median + ((mean - median) *riskiness)) so it would encourage the risky AI to prefer the mean, the middle AI to take half the variance on top of the median and the low AI to just stick with median every time.

If I wasn't at work I'd give it a bash, just to prove if the median AI is always going to win…

HamsterofDeath · March 17, 2009, 3:07pm

it won't win always, but its chance of winning will correlate with the variance a nodes value.

I wonder if there is ever a time you'd want to take the mean score

yes, there is: if "won" isn't defined as "more points than the enemy in a single match" but as "sum of all points in a series of matches must be higher than the enemies sum of points", the mean ai is better (perfect).

i played around a bit. in a simplified yahtzee-match, the median ai can beat the mean ai, but the more categories i add, the better the mean ai gets. makes sense. in the complete game, the number of moves (13*3) is big enough to give the mean ai a significant advantage.

i actually tested the mean ai at an online gaming site (evil me), it won 350 games and lost 250, which is pretty nice, but i still have a strong feeling that this is not yet what could be achieved. i think i could boost the score by identifying risky choices somehow and use the median there. this is where the current depth comes in. in the beginning, going for a risky choice is a better idea than making that choice in the end of the game. if you have 13 tries, even if a risky choice fails a few times, there's still a good enough chance to get it. if i simply ignore the current depth, i never make a risky choice, which is why the median ai fails in this case.
my problem is: how to but that into an algorithm? what exactly is "consider the depth" in numbers?

HamsterofDeath · March 18, 2009, 12:40pm

correction: the mean ai probably wins because the calculation of the expectancy values for my nodes probably only works when going for the maximum average score. have to think about it. too bad yahtzee is too complex to fit in a single brain

HamsterofDeath · March 19, 2009, 6:46pm

the result of my research:

median > mean in many cases.

median = (or almost equal) mean in all other cases.

so the median should be preferred. but that leads to a problem, at least for a yahtzee ai:

how to calculate the median values?

to calculate the mean value of a collection, i add them and divide the number by the collections size. now, if i want to get the average of two collections, i can simply calculate the mean of both means.

but that doesn't work, for the median of 2 collections, does it? the bigger the tree is, the more values i'd have to consider. for yahtzee, it would probably blow up my 8 gb machine. but let's not give up that fast! says a voice in my head…

HamsterofDeath · March 28, 2009, 7:10am

new problems, new solutions:

if i use the median, it tells me that full house, yatzee, and every single one of the "0 or full score" catgories have a value of zero. makes sense: i can only get 0 points (in most cases) and the full score in less cases.

that's the problem. i tried to solve it, and came up with this:

basically, i have to evaluate the "chance of winning", not the expected score, median score, or mean score. the question is, how do i do that? for one category left, it's simple. more points = win, so i simply count which choice leads to which possible scores and compare the numbers to all possible results of all other possible categories.

the result is how many times choice a will beat choice b, and i'd simply take the choice that wins as many matches as possible.

but for 2 or more categories left?