Faulty premise
Consider an (unlikely but plausible) edge case:
In the first mutation, one particular base is replaced; say: 'A' -> 'T'. In an unlikely sequence - but possible from a fair RNG - only that same character is ever replaced:
T -> G -> C -> T -> C -> G -> C -> G -> T ...
That run of the simulation would flat line indicating only one mutation ('A' -> something not 'A') even though there is variation between every generation... Less unlikely edge cases would, similarly yield (perhaps marginally less-) misleading values and still corrupt any statistical results.
It's looking like less time should have been given to dealing with the complexities of Python and its arrays, lists and CSVs; and more thought given to how to produce data that would demonstrate the use for Phylogenomics.
Oversimplification can fuzz important nuances out of existence.