Return to Revisions

2 of 7

added 53 characters in body

edited Mar 22, 2024 at 17:01

J_H

43.3k
3
38
158

You aim to teach folks to write good code. A noble goal!

deps

Before a student can run or enhance this codebase, they first need to get over some installation hurdles. This is a common hangup for python newbies, and worth devoting some care to.

First we must clone the repo. But it is huge, something I will touch on below.

Then we have some text to wade through. It is well organized and not too long, so we'll soon find the how to run section. In my opinion it's better to offer the user short commands which will definitely succeed the first time and which reveal a path for exploring what longer interior commands do.

Here, I encourage you to do that with a Makefile, or if need be with a brief bash script that uses set -x so it is nearly as self explanatory as a make run. In particular, there's no reason for the student to attend to the administrivia of unpacking pickles in the crucial initial minutes where we're trying to show them something cool, quickly, to pull them in. Beware the danger of someone becoming a little frustrated before they're hooked, and wandering off to other pursuits.

The text completely ignores this detail:

$ python -c 'import numpy, torch'

We hope it will silently succeed. If it doesn't, we should offer some sort of venv / poetry advice. Ideally a makefile would automatically notice deficiencies and correct them.

requirements.txt

I don't know what interpreter version(s) you target, and more importantly what versions of numpy and torch you have tested against. You must add a conda environment.yml, a venv requirements.txt, or poetry.lock file to the repo. Else the codebase is unmaintainable.

CI / CD

Consider publishing this package on PyPI, and taking advantage of the icons they can auto-generate via continuous-integration continuous-deployment workflows. Or leverage GitHub's support for automatic workflows that will run unit tests for you. Remember, this isn't just about making life easy for you as a developer. It's about instilling confidence in a student some months down the road who hits a speedbump and needs the courage to persevere and make things work.

repo URL

Students use copy-n-paste all the time, and don't always include citations. Put at least one homepage URL, such as for the github repo, in your README.md. Then folks using a fork won't be confused.

serialized NN

size

Your muzero_2048.pkl is more than 520 MiB. I claim a github repo is not the appropriate place to put that. Or at least, keep your source repo "small" and banish giant binary pickles to another repo or some other download location. GitHub offers e.g. LFS support. Many pip-installed packages use the requests package to transparently cache binary assets without the end user needing to really worry about them.

evolution

Git deals with diff-able text much better than with opaque binary blobs, especially as they evolve over time. Every student that forks or clones your repo will need to download the giant binary history, even after the original pickles have rev'd or been deleted.

format

Pickle is not my favorite format. Consider adopting PyArrow's .parquet file format, which offers better support for compression and zero-copy access.

python3 classes

class Game(object):

Maybe you had some didactic goal related to inheritance when you wrote that? Or maybe it's just boilerplate copy-n-paste?

In python2 that line distinguishes a "new-style" class MRO from an earlier scheme. In python3 there's no different behavior to distinguish between, so it's better to simply write class Game:

In particular, it is better to train students to write that.

zomg we see this everywhere. Global cleanup across three files, please.

comments

        self.history = (
            []
        )  # history of prev actions; used for recurrent inference for training

Thank you for using black; I truly do appreciate it. But here the formatting is rather mangled. Black is essentially suggesting that you instead author these two lines of code:

        # history of prev actions; used for recurrent inference for training
        self.history = []

Similarly for child_visits.

Consider arranging for $ make lint to run the black formatter.

Also, elide the uninformative "Game specific environment" remark. If need be, beef up this docstring instead:

class Environment(object):
    """The environment MuZero is interacting with."""

Or if you feel the class name is too vague (I don't!) then rename to GameEnvironment.

pre-condition

        self.discount = discount

A student might plausibly mess this up, and it's easy to validate that it's in the unit interval. Consider adding an assert or a conditional raise ValueError

pass

    def terminal(self) -> bool:
        # Game specific termination rules.
        pass

Teaching students about the pass keyword should be deferred as long as possible, as it is one of the less intuitive aspects of the python grammar. I can't tell you how many times I have seen it inappropriately embedded in if / else clauses, alone or with other statements. Or incorrectly attempting to break out of a loop. If you can satisfy syntax rules with something else, I recommend doing that. I might even put an ... Ellipsis object there, which we evaluate and then discard the value. Here, turning the comment into a """docstring""" seems the best option.

abc

This raises another question: should we maybe raise a diagnostic that explains the student's obligation to supply some sensible implementation for the method?

Should this class perhaps be an abstract base class?

parallel lists

This appears to be Fortran-style COMMON declared parallel arrays:

    def apply(self, action: Action):
        reward = ...
        self.rewards.append(reward)
        self.history.append(action)

That is, I think that rewards[i] and history[i] always refer to the same time step. But I'm not sure; the code doesn't make it clear.

Consider using this instead:

from collections import namedtuple

HistoryRecord = namedtuple("HistoryRecord", "action, reward")
...
        self.history.append(HistoryRecord(action, reward))

ZeroDivisionError

        sum_visits = sum(child.visit_count for child in root.children.values())

I imagine it is possible to prove that this sum is always positive. It isn't obvious to me that that will be so. Let's just say that the subsequent ... / sum_visits expression makes me nervous.

comments are not docstrings

    def make_target(self, state_index: int, num_unroll_steps: int, td_steps: int):
        # The value target is the discounted root value of the search tree N steps
        # into the future, plus the discounted sum of all rewards until then.

Please teach students good habits. This should be a docstring. And it should begin with a sentence which describes its single responsibility, before it goes on to helpfully explain the meaning of "target".

Similarly for most of the class Network methods.

Optional[reward]

                value += (
                    reward * self.discount**i
                )  # pytype: disable=unsupported-operands

I imagine that reward can be None here? Consider using a no-op filter or an assert to convince mypy that you've already dealt with that case by the time we start multiplying. Tacking on linter comments is sometimes very useful, but here it seems easier to just deal with the type ambiguity up front. Then students will have fewer things to scratch their heads about.

helpful comments

                # States past the end of games are treated as absorbing states.

You included some insightful remarks such as this one, for which I thank you.

the typing module is being phased out

class NetworkOutput(typing.NamedTuple):

Maybe there's a better way to phrase that annotation? I honestly don't know. So consider making this a @dataclass, just because they're easier to annotate.

(At a future time the typing module will be deleted or at least it will become much much smaller. It offered a good bridge to modern practice, and now its days are numbered.)

no ASCII art

https://github.com/fnclovers/Minimal-AlphaZero/blob/1d7232bcd/alpha_zero.py#L26

##########################
####### Helpers ##########

MAXIMUM_FLOAT_VALUE = ...

##### End Helpers ########
##########################


##################################
####### Part 1: Self-Play ########


# Each self-play job is independent ...

######### End Self-Play ##########
##################################

##################################
####### Part 2: Training #########

No, please do not train students to write code that needs such visual decoration.

Rather, use the tools python gave you for organizing sections of code.

Use modules, e.g. alphazero_p1_self_play, alphazero_p2_training, etc.

Additionally, rather than # comments it would be much better to introduce a module with a module-level """docstring""" before the first line of code.

Careful thought went into authoring this high-quality code. It achieves its design goals.

I would be willing to delegate or accept maintenance tasks on it.

answered Mar 22, 2024 at 16:55

J_H

43.3k
3
38
158