You aim to teach folks to write good code. A noble goal!
deps
Before a student can run or enhance this codebase, they first need to get over some installation hurdles. This is a common hangup for python newbies, and worth devoting some care to.
First we must clone the repo. But it is huge, something I will touch on below.
Then we have some text to wade through. It is well organized and not too long, so we'll soon find the how to run section. In my opinion it's better to offer the user short commands which will definitely succeed the first time and which reveal a path for exploring what longer interior commands do.
Here, I encourage you to do that with a Makefile,
or if need be with a brief bash script that uses set -x
so it is nearly as self explanatory as a make run.
In particular, there's no reason for the student to
attend to the administrivia of unpacking pickles in
the crucial initial minutes where we're trying to show
them something cool, quickly, to pull them in.
Beware the danger of someone becoming a little frustrated
before they're hooked, and wandering off to other pursuits.
The text completely ignores this detail:
$ python -c 'import numpy, torch'
We hope it will silently succeed.
If it doesn't, we should offer some sort of venv / poetry advice.
Ideally a makefile would automatically notice deficiencies and correct them.
requirements.txt
I don't know what interpreter version(s) you target,
and more importantly what versions of numpy and torch
you have tested against.
You must add a conda environment.yml,
a venv requirements.txt,
or poetry.lock file to the repo.
Else the codebase is unmaintainable.
CI / CD
Consider publishing this package on PyPI, and taking advantage of the icons they can auto-generate via continuous-integration continuous-deployment workflows. Or leverage GitHub's support for automatic workflows that will run unit tests for you. Remember, this isn't just about making life easy for you as a developer. It's about instilling confidence in a student some months down the road who hits a speedbump and needs the courage to persevere and make things work.
repo URL
Students use copy-n-paste all the time, and don't always include citations. Put at least one homepage URL, such as for the github repo, in your README.md. Then folks using a fork won't be confused.
serialized NN
size
Your muzero_2048.pkl is more than 520 MiB.
I claim a github repo is not the appropriate place to put that.
Or at least, keep your source repo "small" and
banish giant binary pickles to another repo
or some other download location.
GitHub offers e.g.
LFS
support.
Many pip-installed packages use the requests package
to transparently cache binary assets without the
end user needing to really worry about them.
evolution
Git deals with diff-able text much better
than with opaque binary blobs, especially as they evolve over time.
Every student that forks or clones your repo
will need to download the giant binary history,
even after the original pickles have rev'd or been deleted.
format
Pickle is not my favorite format.
Consider adopting PyArrow's .parquet file format,
which offers better support for compression and zero-copy access.
python3 classes
class Game(object):
Maybe you had some didactic goal related to inheritance when you wrote that? Or maybe it's just boilerplate copy-n-paste?
In python2 that line distinguishes
a "new-style" class MRO from an earlier scheme.
In python3 there's no different behavior to distinguish between,
so it's better to simply write class Game:
In particular, it is better to train students to write that.
zomg we see this everywhere. Global cleanup across three files, please.
comments
self.history = (
[]
) # history of prev actions; used for recurrent inference for training
Thank you for using black; I truly do appreciate it. But here the formatting is rather mangled. Black is essentially suggesting that you instead author these two lines of code:
# history of prev actions; used for recurrent inference for training
self.history = []
Similarly for child_visits.
Consider arranging for $ make lint to run the black formatter.
Also, elide the uninformative "Game specific environment" remark. If need be, beef up this docstring instead:
class Environment(object):
"""The environment MuZero is interacting with."""
Or if you feel the class name is too vague (I don't!)
then rename to GameEnvironment.
pre-condition
self.discount = discount
A student might plausibly mess this up,
and it's easy to validate that it's in the unit interval.
Consider adding an assert or a conditional raise ValueError
pass
def terminal(self) -> bool:
# Game specific termination rules.
pass
Teaching students about the pass keyword should be deferred
as long as possible, as it is one of the less intuitive aspects
of the python grammar.
I can't tell you how many times I have seen it inappropriately
embedded in if / else clauses, alone or with other statements.
Or incorrectly attempting to break out of a loop.
If you can satisfy syntax rules with something else, I
recommend doing that.
I might even put an ... Ellipsis object there, which we
evaluate and then discard the value.
Here, turning the comment into a """docstring""" seems the best option.
abc
This raises another question: should we maybe raise
a diagnostic that explains the student's obligation
to supply some sensible implementation for the method?
Should this class perhaps be an abstract base class?
parallel lists
This appears to be Fortran-style COMMON declared parallel arrays:
def apply(self, action: Action):
reward = ...
self.rewards.append(reward)
self.history.append(action)
That is, I think that rewards[i] and history[i]
always refer to the same time step.
But I'm not sure; the code doesn't make it clear.
Consider using this instead:
from collections import namedtuple
HistoryRecord = namedtuple("HistoryRecord", "action, reward")
...
self.history.append(HistoryRecord(action, reward))
ZeroDivisionError
sum_visits = sum(child.visit_count for child in root.children.values())
I imagine it is possible to prove that this sum is always positive.
It isn't obvious to me that that will be so.
Let's just say that the subsequent ... / sum_visits expression makes me nervous.
comments are not docstrings
def make_target(self, state_index: int, num_unroll_steps: int, td_steps: int):
# The value target is the discounted root value of the search tree N steps
# into the future, plus the discounted sum of all rewards until then.
Please teach students good habits. This should be a docstring. And it should begin with a sentence which describes its single responsibility, before it goes on to helpfully explain the meaning of "target".
Similarly for most of the class Network methods.
Optional[reward]
value += (
reward * self.discount**i
) # pytype: disable=unsupported-operands
I imagine that reward can be None here?
Consider using a no-op filter or an assert to
convince mypy that you've already dealt with that case
by the time we start multiplying.
Tacking on linter comments is sometimes very useful,
but here it seems easier to just deal with
the type ambiguity up front.
Then students will have fewer things to scratch their heads about.
helpful comments
# States past the end of games are treated as absorbing states.
You included some insightful remarks such as this one, for which I thank you.
the typing module is being phased out
class NetworkOutput(typing.NamedTuple):
Maybe there's a better way to phrase that annotation? I honestly don't know. So consider making this a @dataclass, just because they're easier to annotate.
(At a future time the typing module will be deleted
or at least it will become much much smaller.
It offered a good bridge to modern practice,
and now its days are numbered.)
no ASCII art
https://github.com/fnclovers/Minimal-AlphaZero/blob/1d7232bcd/alpha_zero.py#L26
##########################
####### Helpers ##########
MAXIMUM_FLOAT_VALUE = ...
##### End Helpers ########
##########################
##################################
####### Part 1: Self-Play ########
# Each self-play job is independent ...
######### End Self-Play ##########
##################################
##################################
####### Part 2: Training #########
No, please do not train students to write code that needs such visual decoration.
Rather, use the tools python gave you for organizing sections of code.
Use modules, e.g. alphazero_p1_self_play,
alphazero_p2_training, etc.
Additionally, rather than # comments
it would be much better to introduce
a module with a module-level """docstring"""
before the first line of code.
Careful thought went into authoring this high-quality code. It achieves its design goals.
I would be willing to delegate or accept maintenance tasks on it.