I am looking for python code to perform a run length encoding to obtain a regex-like summary of a string s, for a known length k for the blocks. How should I tackle this?
e.g.
s=TATTTTATTTTATTTTATGTTATGTTATGTTATGTTATGTTATGTTATGTTATGTTATGTTACATTATTTTA
with k=5 could become
(TATTT)3(TATGT)9TACATTATTTTA
(TATTT)3(TATGT)9TACATTA(T)4A?(.{5})\1*to look for all the repetitions of a 5-character sequence.(.{5})\1+, no? Also, for OP I think the pseudo-code would be capture usingk, surround in parenthesis, divide capture byk, append number after close parenthesis