Discussion:
Balancing number of plies and number of trials
(too old to reply)
MK
2023-12-13 09:47:08 UTC
Permalink
In another thread that I posted earlier today,
I saw that Tim had done an xg-roller+ rollout
with 2592 trials.

The default number of trials being 1296 and
my having asked whether you guys deem a
rollout done at a lower ply more reliable than
a simple analysis at a higher ply, I wondered
if there may be a somewhat constant inverse
ratio between plies and trials?

For example, would a rollout with 5184 trials
at xg-roller level be as reliable as a rollout
with 2592 tials at xg-roller+ level and as a
rollout with 1296 trials at xg-roller++ level?

Honestly, I probably don't understand these
things even nearly enough as you guys but
I would like to understand, (even if in the end
I may still call rollouts fantasizing jackoffs).

Any facts, guesses, thoughts, horse muffins?

MK
Timothy Chow
2023-12-13 14:27:53 UTC
Permalink
Post by MK
For example, would a rollout with 5184 trials
at xg-roller level be as reliable as a rollout
with 2592 tials at xg-roller+ level and as a
rollout with 1296 trials at xg-roller++ level?
There's a subtle distinction between "precision" and "accuracy."

An "accurate" verdict is one that gives the correct answer.

A "precise" estimate has very little statistical noise.

Increasing the number of trials increases the precision. If you
have a lot of trials then you can be very confident that you are
learning "what the bot really thinks" and that it is very unlikely
to change its mind even if you increase the number of trials to
infinity.

Accuracy is another matter. Murat of all people should understand
that "what the bot thinks the correct play is" is not necessarily
the same as "the correct play"; indeed, in some positions, it is
debatable what "the correct play" is since that can depend on who
your opponent is, what their emotional state is at the time, etc.
But even setting those things aside, suppose for the sake of
argument that we define "the correct play" as what game theorists
would call an (expectiminimax) "equilibrium" play. We can ask whether
stronger settings are more likely to yield the correct play. The
answer is that we can't ever be completely sure, but one can give
heuristic arguments in support of this principle. For example,
equilibrium play has a certain self-consistency property, so you
can "cross-examine" the bot and see its answers are self-consistent.
Experience suggests that stronger settings exhibit greater
self-consistency. Bob Wachtel's book "In the Game Until the End"
has some examples of this. But again, the arguments are only
heuristic, and we certainly can't be completely sure in any
particular instance that stronger settings are giving us more
"accurate" answers.

---
Tim Chow
Bradley K. Sherman
2023-12-13 14:46:35 UTC
Permalink
Post by Timothy Chow
...
Accuracy is another matter. Murat of all people should understand
that "what the bot thinks the correct play is" is not necessarily
the same as "the correct play"; indeed, in some positions, it is
debatable what "the correct play" is since that can depend on who
your opponent is, what their emotional state is at the time, etc.
But even setting those things aside, suppose for the sake of
argument that we define "the correct play" as what game theorists
would call an (expectiminimax) "equilibrium" play. We can ask whether
stronger settings are more likely to yield the correct play. The
answer is that we can't ever be completely sure, but one can give
heuristic arguments in support of this principle. For example,
equilibrium play has a certain self-consistency property, so you
can "cross-examine" the bot and see its answers are self-consistent.
Experience suggests that stronger settings exhibit greater
self-consistency. Bob Wachtel's book "In the Game Until the End"
has some examples of this. But again, the arguments are only
heuristic, and we certainly can't be completely sure in any
particular instance that stronger settings are giving us more
"accurate" answers.
Related:
|
| Man beats machine at Go in human victory over AI
|
| Amateur exploited weakness in systems that have otherwise
| dominated grandmasters.
| ...
<https://arstechnica.com/information-technology/2023/02/man-beats-machine-at-go-in-human-victory-over-ai/>

--bks
MK
2023-12-22 16:09:11 UTC
Permalink
Post by Bradley K. Sherman
| Man beats machine at Go in human victory over AI
| ...
<https://arstechnica.com/information-technology/2023/02/man-beats-machine-at-go-in-human-victory-over-ai/>
The title "Man beats machine at Go in human
victory over AI" is misleading, as it later says:

"The tactics that put a human back on top on
"the Go board were suggested by a computer
"program that had probed the AI systems
"looking for weaknesses. The suggested plan
"was then ruthlessly delivered by Pelrine.

Still, it's quite interesting but unfortunately this
won't happen in gamblegammon anytime soon
because there is no dissenting bot and I am the
only dissenting human, who can't even lead the
horses to water, let alone make them drink... :(

MK
Tikli Chestikov
2023-12-26 18:03:16 UTC
Permalink
Post by MK
Still, it's quite interesting but unfortunately this
won't happen in gamblegammon anytime soon
because there is no dissenting bot and I am the
only dissenting human, who can't even lead the
horses to water, let alone make them drink... :(
MK
No you're not alone, I'm firmly behind your thought processes.

The fact that the main protagonist here has a ridiculous interest in hawking fast cars around various US "strips" is irrelevant.

Keep challenging the status quo.
Timothy Chow
2023-12-27 04:08:39 UTC
Permalink
Post by Tikli Chestikov
The fact that the main protagonist here has a ridiculous interest in hawking fast cars around various US "strips" is irrelevant.
He's back!! Yay!!

Been busy helping Hans Niemann file lawsuits, I presume?

---
Tim Chow
MK
2023-12-26 19:26:45 UTC
Permalink
Post by MK
Still, it's quite interesting but unfortunately this
won't happen in gamblegammon anytime soon
because there is no dissenting bot and I am the
only dissenting human, who can't even lead the
horses to water, let alone make them drink... :(
Just as I was about to follow up to my own post
quoting the above paragraph, I saw that someone
else quoted it in a worthless, distracting post that
really pissed me off but I will take a deep breath
and say what I wanted to add as an afterthought.

I realized that the "Murat mutant bot" experiments
done by Axel were indeed attempts at dissenting
bots and we need to appreciate his open-minded
efforts, even though he didn't like the results of his
own tests and advised people to not misinterpret
them for what they actually demonstrated.

So, I guess some horses did take a few small sips
occasionally and I am hopeful that Axel, if nobody
else, will take bigger swigs in the future.

I had a draft for another mutant experiment for him
for a while but just can't seem to make time to edit
and post it here, which I promise once more that I
will try to do it soon. I think this one will still be easy
to do but the results will be more telling.

MK
MK
2023-12-22 17:18:47 UTC
Permalink
Post by Timothy Chow
Post by MK
For example, would a rollout with 5184 trials
at xg-roller level be as reliable as a rollout
with 2592 tials at xg-roller+ level and as a
rollout with 1296 trials at xg-roller++ level?
There's a subtle distinction between "precision"
and "accuracy."
The distinction is more than subtle, especially
in this context.
Post by Timothy Chow
An "accurate" verdict is one that gives the
correct answer.
That's the loose definition. The strict definition
is "correct and also consistent", i.e. without any
systematic or random errors.
Post by Timothy Chow
A "precise" estimate has very little statistical
noise. Increasing the number of trials increases
the precision.
Yes, more trials reduce random errors ("noise")
and give more "consistent" but not necessarily
"correct" results because they don't eliminate
systematic errors.
Post by Timothy Chow
If you have a lot of trials then you can be very
confident that you are learning "what the bot
really thinks" and that it is very unlikely to
change its mind even if you increase the number
of trials to infinity.
This isn't necessarily true and indeed incomplete.

While random errors decrease, systematic errors
may increase (accumulate and compound), thus
cause the bot to change its mind.

And I would say that "precision" may be useful or
even necessary to determine what is "correct" to
begin with, like during the training of bots through
lots of random decisions to figure out the "correct"
ones without already knowing them.
Post by Timothy Chow
Accuracy is another matter. Murat of all people
should understand that "what the bot thinks the
correct play is" is not necessarily the same as
"the correct play"; indeed, in some positions, it is
debatable what "the correct play" is since that
can depend on who your opponent is, what their
emotional state is at the time, etc.
It's good that you acknowledge/agree on these
but my arguments go beyond them.
Post by Timothy Chow
But even setting those things aside,
Yes, let's focus on the more tangible...
Post by Timothy Chow
suppose for the sake of argument that we define
"the correct play" as what game theorists would
call an (expectiminimax) "equilibrium" play.
I can only accept "correct play" based on empirical
data (i.e. cubeless equities derived from random
trials), not extrapolated data (i.e. cubeful equities
derived through applying arbitrary formulas to the
empirical data).
Post by Timothy Chow
We can ask whether stronger settings are more
likely to yield the correct play.
I assume you mean look-ahead plies? Can you (or
someone else) expand on this and explain/clarify
how plies work during play and during rollouts?
Post by Timothy Chow
The answer is that we can't ever be completely
sure, but one can give heuristic arguments in
support of this principle. For example, equilibrium
play has a certain self-consistency property,
I won't argue against self-consistency if you can
prove that your equilibrium play is actually that.
Post by Timothy Chow
so you can "cross-examine" the bot and see its
answers are self-consistent.
This would be most interesting for me to see. Has
any bot been cross-examined for this and how?
Post by Timothy Chow
Experience suggests that stronger settings exhibit
greater self-consistency. Bob Wachtel's book "In
the Game Until the End" has some examples of this.
Can you give some examples here from the book
(under fair use) or from other studies/experiments?
Post by Timothy Chow
But again, the arguments are only heuristic, and
we certainly can't be completely sure in any
particular instance that stronger settings are
giving us more "accurate" answers.
I argue that we can if we have unbiased bots that
are trained not only through cubeless, single-game
play but also through cubeful and "matchful" play,
eliminating extrapolated cubeful/matchful equities.

MK
Timothy Chow
2023-12-23 13:57:17 UTC
Permalink
Post by MK
Post by Timothy Chow
If you have a lot of trials then you can be very
confident that you are learning "what the bot
really thinks" and that it is very unlikely to
change its mind even if you increase the number
of trials to infinity.
This isn't necessarily true and indeed incomplete.
While random errors decrease, systematic errors
may increase (accumulate and compound), thus
cause the bot to change its mind.
No, this is not correct, at least when you are simply extending
a specific rollout. Systematic errors can indeed accumulate and
compound over the course of a game, but a rollout trial repeatedly
samples an entire game, so *each individual* trial is subject to
the accumulated systematic error. There will be some randomness
involved from trial to trial, of course; some trials may be "lucky"
enough to avoid the variations that suffer from a lot of accumulated
systematic error, while other trials may be "unlucky" enough to hit
those variations, but in the long run these fluctuations will even
out, and the rollout will converge. The final result will be an
average over all accumulated systematic errors.
Post by MK
I assume you mean look-ahead plies? Can you (or
someone else) expand on this and explain/clarify
how plies work during play and during rollouts?
The GNU team can answer this better than I can. One thing to note
is that during rollouts, the bots will apply some kind of move
filter to screen out unpromising plays. That is, if you perform
a 3-ply rollout, the bot doesn't necessarily evaluate every legal
move at 3-ply and pick the highest-scoring one. It will evaluate
all the options at the lowest ply but then discard a lot of them
as not likely to emerge as the top play.
Post by MK
I won't argue against self-consistency if you can
prove that your equilibrium play is actually that.
The *theoretical* equilibrium play is *defined* in terms of a
system of equations that expresses self-consistency. If you insist
on an empirical definition, though, then self-consistency can't be
proved.
Post by MK
Post by Timothy Chow
so you can "cross-examine" the bot and see its
answers are self-consistent.
This would be most interesting for me to see. Has
any bot been cross-examined for this and how?
I don't know if anyone has done this in a systematic fashion, but
certainly, if you take some crazy superbackgame or containment
position, you can observe inconsistency yourself. Note down the
3-ply equity (for example). Then run through all the possible rolls,
and note down their 3-ply equities. Average them, and you'll find
that they don't average out to the original 3-ply equity. This means
that the 3-ply equity isn't (entirely) self-consistent. In many
positions, the top play will still be the top play, but in the crazy
superbackgame positions, this experiment can result in wild swings
that drastically change the top play.
Post by MK
Post by Timothy Chow
But again, the arguments are only heuristic, and
we certainly can't be completely sure in any
particular instance that stronger settings are
giving us more "accurate" answers.
I argue that we can if we have unbiased bots that
are trained not only through cubeless, single-game
play but also through cubeful and "matchful" play,
eliminating extrapolated cubeful/matchful equities.
There are certainly ways to improve the way bots are trained, but it
will still be true that we won't be *completely* sure that we're getting
more accurate answers in every position. That would require more
computing power than is available in the observable universe.

---
Tim Chow
MK
2023-12-26 23:41:31 UTC
Permalink
Post by Timothy Chow
Post by MK
While random errors decrease, systematic errors
may increase (accumulate and compound), thus
cause the bot to change its mind.
No, this is not correct, at least when you are simply
extending a specific rollout.
You would be right if the number of trials is infinite.
Otherwise the amount of accumulated systematic
errors will continuously fluctuate and will be different
depending on when the rollout stops.

I shouldn't have said "and compound" because I have
no clear idea on how different systematic errors may
interact to create a compounded, (combined?), effect.
I'll set this aside for now but you can expand on it if
you want.
Post by Timothy Chow
Systematic errors can indeed accumulate and
compound over the course of a game,
Since you also reused "compound", now I am curious
to know if you have a clear enough idea of it?
Post by Timothy Chow
but a rollout trial repeatedly samples an entire game,
so *each individual* trial is subject to the accumulated
systematic error.
Okay, I agree.
Post by Timothy Chow
There will be some randomness involved from trial
to trial, of course; some trials may be "lucky" enough
to avoid the variations that suffer from a lot of
accumulated systematic error, while other trials may
be "unlucky" enough to hit those variations,
I don't like the use of the word "luck" in this context
also when the random errors in rollouts refer to the
luck of the dice rolls already. Or, by "randomness" are
you referring to some other events also?

Clusters of dice luck will cause clusters of systematic
errors. I think this is what you call "unlucky trials"(?)
These may cause the bot change its mind several times
during a rollout, especially when equity differences are
very small.
Post by Timothy Chow
but in the long run these fluctuations will even out,
and the rollout will converge. The final result will be
an average over all accumulated systematic errors.
A rollout is a continuum. When you stop it after any given
number of trials, accumulated systematic errors may be
high, low or average. If you keep it going again and stop
after some more trials, accumulated systematic errors
at that point may be high, low or average. Bots' changing
their minds throughout a rollout is inevitable, whether by
a lot or not, whether often or not, etc. When you look for
it intently, it's easy to see it.

MK
MK
2023-12-27 07:16:24 UTC
Permalink
Post by Timothy Chow
Post by MK
I assume you mean look-ahead plies? Can you (or
someone else) expand on this and explain/clarify
how plies work during play and during rollouts?
The GNU team can answer this better than I can.
Let's see if they do. If not, we may ask this on their
forum. BTW, my posts were getting rather long, so
I decided to reply to sub-topics separately.
Post by Timothy Chow
One thing to note is that during rollouts, the bots
will apply some kind of move filter to screen out
unpromising plays. That is, if you perform a 3-ply
rollout, the bot doesn't necessarily evaluate every
legal move at 3-ply and pick the highest-scoring
one. It will evaluate all the options at the lowest
ply but then discard a lot of them as not likely to
emerge as the top play.
I kind of knew this but didn't ponder much on it.

Now that I do, my immediate reaction is that it
sounds really bad. Shouldn't it be the other way
around? That is, evaluate at a higher ply first?

If lower plies are less reliable, there is a higher
chance that good moves will be eliminated by
error early on. After that, it doesn't matter how
well the other moves are refined at higher plies;
they may not really be the best. For example, the
possibly second best play may never make it to
the final ranking.

However, if the inferior moves are eliminated by
higher plies first, even if lower plies refine them
not so accurately later, at least the inaccuracies
will be small and will have occurred only among
the top plays. For example, the possibly second
best play may be ranked fourth or first, etc. but
at least will not be missing from the final ranking.

What am I missing...?

MK
Timothy Chow
2023-12-27 12:22:02 UTC
Permalink
Post by MK
Now that I do, my immediate reaction is that it
sounds really bad. Shouldn't it be the other way
around? That is, evaluate at a higher ply first?
It's done for speed. Each additional ply slows things
down by a factor of (about) 21.

---
Tim Chow
MK
2024-01-07 00:50:05 UTC
Permalink
Post by Timothy Chow
Post by MK
Now that I do, my immediate reaction is that it
sounds really bad. Shouldn't it be the other way
around? That is, evaluate at a higher ply first?
It's done for speed. Each additional ply slows
things down by a factor of (about) 21.
Ah, that magic number 21 again. :) The number
of possible dice rolls at every turn... ;)

But why the factor is imprecise, i.e. "about 21"?
Can't you give us the exact math...?

When I wrote my above questions, I wasn't sure
if I myself understood what I was talking about.

I looked in Ex-Gee and Noo-BG documentations
and saw that the question had been addressed
at least in Ex-Gee rollouts. See:

https://www.extremegammon.com/Searchinterval.aspx

You should read the entire article but here are a
few relevant snippets:

"Backgammon programs rely on search interval
"to reduce the amount of moves to be analyzed
"at higher level. This allows much faster analyze
"as typically only 4 moves get analyzed in 3-ply.
"The downside of it is that it is possible the best
"move is missed.

"1. All moves are analyzed in 1-ply (direct neural
"network output) cubeless.

"For the purpose of this study a new search
"interval is defined: infinite. It analyzes all 32
"moves after step 1 in 3-ply. This mode is not
"available to users.

"Note that is possible that at higher level (XGR+
"for instance) that the best move can be different
"than the one pick by 3-ply infinite,

"I think a much better solution than to use larger
"interval is to set the first move to use a higher
"ply: 4-ply will analyze up to 8 moves in 3-ply .....
"The speed cost is not that bad as the first move
"need to be calculated only 21 times (the rest of
"the time the program will get the result from the
"cache).

So, now I feel better about the questions that I
was able to raise, even having only a minimal
understanding of "rolledoats horse muffins"...

MK
Timothy Chow
2024-01-08 13:55:34 UTC
Permalink
Post by MK
Post by Timothy Chow
Post by MK
Now that I do, my immediate reaction is that it
sounds really bad. Shouldn't it be the other way
around? That is, evaluate at a higher ply first?
It's done for speed. Each additional ply slows
things down by a factor of (about) 21.
Ah, that magic number 21 again. :) The number
of possible dice rolls at every turn... ;)
But why the factor is imprecise, i.e. "about 21"?
Can't you give us the exact math...?
The speed at which a complex piece of code runs depends on many
factors beyond the simple math of how many different rolls there
are.

---
Tim Chow
MK
2024-01-08 18:14:20 UTC
Permalink
Post by Timothy Chow
Post by MK
Post by Timothy Chow
It's done for speed. Each additional ply slows
things down by a factor of (about) 21.
But why the factor is imprecise, i.e. "about 21"?
Can't you give us the exact math...?
The speed at which a complex piece of code
You mean like this one?:

=======================================
GNU Backgammon Manual V1.00.0
10.4.5.4 n-ply Cubeful equities
..... so how so GNU Backgammon calculate cubeful
2-ply equities? The answer is: by simple recursion:
Equity=0
Loop over 21 dice rolls
Find best move for given roll
Equity = Equity + Evaluate n-1 ply equity for resulting position
End Loop
Equity = Equity/36
=======================================
Post by Timothy Chow
runs depends on many factors beyond the simple
math of how many different rolls there are.
I have a feeling that it has something to do with the
rolls also. Perhaps a real mathematician will know. ;)

MK
Timothy Chow
2024-01-09 02:26:04 UTC
Permalink
Post by MK
Post by Timothy Chow
The speed at which a complex piece of code
=======================================
GNU Backgammon Manual V1.00.0
10.4.5.4 n-ply Cubeful equities
..... so how so GNU Backgammon calculate cubeful
Equity=0
Loop over 21 dice rolls
Find best move for given roll
Equity = Equity + Evaluate n-1 ply equity for resulting position
End Loop
Equity = Equity/36
=======================================
That's pseudocode, not code.

---
Tim Chow
MK
2024-01-09 06:28:41 UTC
Permalink
Post by Timothy Chow
Post by MK
Post by Timothy Chow
The speed at which a complex piece of code
=======================================
GNU Backgammon Manual V1.00.0
10.4.5.4 n-ply Cubeful equities
..... so how so GNU Backgammon calculate cubeful
Equity=0
Loop over 21 dice rolls
Find best move for given roll
Equity = Equity + Evaluate n-1 ply equity for resulting position
End Loop
Equity = Equity/36
=======================================
That's pseudocode, not code.
So? Why isn't it good enough for you??

It's a translation of the actual code into a language
easier for humans to understand, done by Noo-BG
programmers. Unless there are errors in the actual
implementation, it will produce the same results.

We see the phrase "approximately 21 times" all over
the gamblegammon forums. Surely they don't mean
that is because of bugs in the actual code, do they?

So, you should be able to explain the reason based
on the above pseudo code. Alternatively, feel free
to post the actual code here and point it out in there.

I came really enjoy exposing that you are a mediocre
"mathematician" (if that at all)... ;)

Oh, I almost forgot. There is a kind of rotten easter
egg in the above pseudocode. Let's see how long it
will take for you whizzes to find it...? :)

MK
Timothy Chow
2024-01-09 13:48:16 UTC
Permalink
Post by MK
So, you should be able to explain the reason based
on the above pseudo code.
Finding the best move for a given roll isn't necessarily going
to take the same amount of time for every roll. To find the
best move, one must first generate all the legal moves and
evaluate them. The number of legal ways to play 11 is not
necessarily going to be the same as the number of legal ways
to play 66. It will depend on the position.

---
Tim Chow
MK
2024-01-11 09:17:21 UTC
Permalink
Post by Timothy Chow
Post by MK
So, you should be able to explain the reason
based on the above pseudo code.
Finding the best move for a given roll isn't
necessarily going to take the same amount
of time for every roll. To find the best move,
one must first generate all the legal moves
and evaluate them. The number of legal ways
to play 11 is not necessarily going to be
the same as the number of legal ways to
play 66. It will depend on the position.
This is not it. Just like dice rolls even out (or can
be forced to artificially even out faster), number
of legal ways to play for given dice rolls at given
positions will alse average out.

Here is what Noo-BG manual says about it:

======================================
GNU Backgammon Manual V1.00.0
9.1.2 The depth to search and plies
(last two paragraphs on the page)
For a single move, on average there are about
20 legal moves to consider.

When doing a one ply analysis/evaluation, for
the top n moves (from the move filter, GNU
Backgammon needs to consider 21 rolls by
the opponent, 20 and possible legal moves
per roll) = 420 positions to evaluate.
======================================

Stop blabbering carelessly. Think. Read. Educate
yourself. Learn math. It may be useful in life... ;)

BTW: Has anyone spotted the fudge yet, in the
pseodocode I had quoted from Noo-BG manual?

Also, here is a bonus question: When you guys
say "approximately 21 times", how "approximate"
do you mean..? Plus/minus 2%? 5%? 10%? 25%?

It's amazing how the gamblegammon herd keeps
roaming the same overgrazed pastures and keeps
splatting their cowpies wherever, all over, without
a care in the world...

MK
Timothy Chow
2024-01-11 22:44:13 UTC
Permalink
Post by MK
This is not it. Just like dice rolls even out (or can
be forced to artificially even out faster), number
of legal ways to play for given dice rolls at given
positions will alse average out.
Of course. That's what "approximately" means. Check your
dictionary.

---
Tim Chow
MK
2024-01-12 10:17:33 UTC
Permalink
Post by Timothy Chow
Post by MK
This is not it. Just like dice rolls even out
(or can be forced to artificially even out
faster), number of legal ways to play for given
dice rolls at given positions will average out.
Of course. That's what "approximately" means.
Absolutely not!
Post by Timothy Chow
Check your dictionary.
I would prefer to check your dictionary instead.
Please tell us what dictionary you have checked?

Noo-BG manual says "on average there are about
20 legal moves" but because those chimpanzees
are incapable of human language either.

An average is just a single number result, like
the average winning/losing PR in your contrived
example.

Once you compute an average, you treat it as a
constant in your later calculations. There is
no such thing as an "approximate average".

Indeed, the following paragraph in the Noo-BG
manual says: "GNU Backgammon needs to consider
21 rolls by the opponent, 20 and possible legal
moves per roll) = 420 positions to evaluate."

Do you understand why it doesn't say *about*
420 positions to evaluate? Because neither the
21 possible combinations of rolls, nor the
average 20 possible legal moves, nor their
product are *not approximate*..!

Thus, the reason for each additional ply being
approximately 21 times slower has to to with
something else than the number of possible of
legal moves.

Ask someone who knows math. Axel, Paul, etal.
are looking up to you but maybe Bob Coca can
help you with this on bgonline... ;)

MK
MK
2024-01-20 01:14:24 UTC
Permalink
Post by MK
Post by MK
=======================================
GNU Backgammon Manual V1.00.0
10.4.5.4 n-ply Cubeful equities
..... so how so GNU Backgammon calculate cubeful
Equity=0
Loop over 21 dice rolls
Find best move for given roll
Equity = Equity + Evaluate n-1 ply equity for resulting position
End Loop
Equity = Equity/36
=======================================
Oh, I almost forgot. There is a kind of rotten easter
egg in the above pseudocode. Let's see how long it
will take for you whizzes to find it...? :)
Bzzzt! Time's up.

Loop over 21 dice rolls and divide by 36...?

I keep telling you folks that your venerated
bots are garbage... :(

MK

MK
2023-12-27 07:29:14 UTC
Permalink
Post by Timothy Chow
Post by MK
I won't argue against self-consistency if you can
prove that your equilibrium play is actually that.
The *theoretical* equilibrium play is *defined* in
terms of a system of equations that expresses
self-consistency. If you insist on an empirical
definition, though, then self-consistency can't be
proved.
This sounds good to me. Let's archive it... ;)
Post by Timothy Chow
Post by MK
Post by Timothy Chow
so you can "cross-examine" the bot and see its
answers are self-consistent.
This would be most interesting for me to see. Has
any bot been cross-examined for this and how?
I don't know if anyone has done this in a systematic
fashion, but certainly, if you take some crazy
superbackgame or containment position, you can
observe inconsistency yourself. Note down the
3-ply equity (for example). Then run through all the
possible rolls, and note down their 3-ply equities.
Average them, and you'll find that they don't average
out to the original 3-ply equity. This means that the
3-ply equity isn't (entirely) self-consistent. In many
positions, the top play will still be the top play, but
in the crazy superbackgame positions, this experiment
can result in wild swings that drastically change the
top play.
Even though I wouldn't limit it to "crazy superbackgame"
or any other specific type of positions, this also sounds
good enough to me. So, let's archive it too... ;)

MK
MK
2023-12-27 07:52:08 UTC
Permalink
Post by Timothy Chow
Post by MK
I argue that we can if we have unbiased bots that
are trained not only through cubeless, single-game
play but also through cubeful and "matchful" play,
eliminating extrapolated cubeful/matchful equities.
There are certainly ways to improve the way bots
are trained, but it will still be true that we won't be
*completely* sure that we're getting more accurate
answers in every position. That would require more
computing power than is available in the observable
universe.
a- We wouldn't need to train for every possible position.

b- Rarer positions that don't come up during the training
can be rolled out later the same way, as they come up.

c- Training can be ongoing, with trusted rollout results
continuously contributed to a shared database.

Progress is like hitting a golf ball. It first flies high and
far. Then it hits the ground and bounces a few times,
each getting shorter. Then it rolls some more and stops.

People are talking about a new bot they are developing
in Noo-BG forum but its sounds like its going to be just
another small bounce, if not a last short roll... :(

We need to hit a new golf ball using a club of the latest
knowledge and technology!

MK
Philippe Michel
2023-12-28 22:09:57 UTC
Permalink
Post by Timothy Chow
Post by MK
I assume you mean look-ahead plies? Can you (or
someone else) expand on this and explain/clarify
how plies work during play and during rollouts?
The GNU team can answer this better than I can. One thing to note
is that during rollouts, the bots will apply some kind of move
filter to screen out unpromising plays. That is, if you perform
a 3-ply rollout, the bot doesn't necessarily evaluate every legal
move at 3-ply and pick the highest-scoring one. It will evaluate
all the options at the lowest ply but then discard a lot of them
as not likely to emerge as the top play.
This is not specific to rollouts. Interactive play, hints, analysis all
uses this.

To answer issues raised later in the thread by Murat, this is done for
speed as already mentionned by Timothy.

The cost in accuracy seems perfectly acceptable although it is not
entirely negligible. For instance there are two predefined 2-ply
settings: world class and supremo.

The first one evaluates at 2-ply up to the top 8 0-ply moves if they are
no worse than 0.16 point weaker than the best. The second one evaluates
up to 16 moves no worse than 0.32 point. On the Depreli benchmark the
cost of errors from world class is about 4% more than from supremo.

The differences between either 1-ply or 3-ply and either of these 2-ply
settings are much larger than this.

You can change this in the analysis or rollout settings (look for
Advanced settings and then Move filter). As far as I know, the default
settings are conservative compared to what is used by the similar
feature in eXtreme Gammon.
Loading...