I wrote this program over half a year ago, but I decided to not share it because the task seems trivial to me. Today I decided to post it here. As it is designed to work with strings and messiness of human language speed is not a concern, but I still optimized the program as best as I can, however I used conditionals very liberally to validate the input at every single step.
Now, what is an English numeral? It is something like this: "six hundred ninety-three vigintillion one hundred forty-seven novemdecillion one hundred eighty octodecillion five hundred fifty-nine septendecillion nine hundred forty-five sexdecillion three hundred nine quindecillion four hundred seventeen quattordecillion two hundred thirty-two tredecillion one hundred twenty-one duodecillion four hundred fifty-eight undecillion one hundred seventy-six decillion five hundred sixty-eight nonillion seventy-five octillion five hundred septillion one hundred thirty-four sextillion three hundred sixty quintillion two hundred fifty-five quadrillion two hundred fifty-four trillion one hundred twenty billion six hundred eighty million nine thousand four hundred ninety-three", which is equal to 693147180559945309417232121458176568075500134360255254120680009493 in Arabic numerals, or 693_147_180_559_945_309_417_232_121_458_176_568_075_500_134_360_255_254_120_680_009_493 with the thousand separator.
As you can see there is a lot of structure in the English numeral system, a lot of patterns, patterns that can be encoded algorithmically. My program supports a strict subset of English numeral, it supports English numeral with numeric value up to "nine hundred ninety-nine vigintillion nine hundred ninety-nine novemdecillion nine hundred ninety-nine octodecillion nine hundred ninety-nine septendecillion nine hundred ninety-nine sexdecillion nine hundred ninety-nine quindecillion nine hundred ninety-nine quattordecillion nine hundred ninety-nine tredecillion nine hundred ninety-nine duodecillion nine hundred ninety-nine undecillion nine hundred ninety-nine decillion nine hundred ninety-nine nonillion nine hundred ninety-nine octillion nine hundred ninety-nine septillion nine hundred ninety-nine sextillion nine hundred ninety-nine quintillion nine hundred ninety-nine quadrillion nine hundred ninety-nine trillion nine hundred ninety-nine billion nine hundred ninety-nine million nine hundred ninety-nine thousand nine hundred ninety-nine" (\$10^{66} - 1\$), and precisely it can convert two unvigintillion minus one (2E+66 - 1) numbers to and from English numeral.
More specifically it supports American short scale, which means it supports -illions but not -illiards, with each increase in -illion multiplying the scale by a thousand. And my program supports scale words up to vigintillion.
Now here are the rules:
Rule #1: An English numeral may be preceded by the words "positive" and "negative" exactly once, they set the sign of the numeral, these words are mutually exclusive. Meaning "positive ..." and "negative ..." are allowed, but "positive positive ...", "positive negative ...", "negative positive ...", "negative negative ..." and so on are disallowed.
Rule #2: An English numeral may contain the word "zero" exactly once, the word "zero" is mutually exclusive with all other numeric words, so there are only three valid English numerals containing the word "zero": "zero", "positive zero", "negative zero", all of them have the numeric value of 0.
Rule #3: An English numeral may contain the unit words "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", corresponding to numeric values 1 to 9, they may appear alone or in the rightmost place, in which case they modify the total numeric value by adding their corresponding numeric values, or they may appear to the right of words ending in -ty and a dash (-), in which case they modify the place value by adding their own numeric value to the place value, or they may appear to the left of the word "hundred" or words signifying powers of one thousand, in which case they modify the place numeric value by multiplying the place numeric value with their own numeric value.
Rule #4: An English numeral may contain the words: "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen", corresponding to numeric values
range(10, 20), they may appear alone or in the rightmost place, in which case they modify the total numeric value by adding their corresponding numeric values, or they may appear to the right of the word "hundred" or words signifying powers of one thousand, in which case they modify the place numeric value by adding the place numeric value with their own numeric value.Rule #5: An English numeral may contain the words: "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety", corresponding to numeric values
range(20, 91, 10), they may appear alone or in the rightmost place, in which case they modify the total numeric value by adding their corresponding numeric values, or they may appear to the right of the word "hundred" or words signifying powers of one thousand or to the left of a dash before a unit word, in which case they modify the place numeric value by adding the place numeric value with their own numeric value.Rule #6: An English numeral may contain the word "hundred" corresponding to 100, it may appear after the word "a" in which case no other words are allowed after it, or it may appear after the unit words in which case they modify the place numeric value by adding the unit numeric value multiplied with 100 to the place value.
Rule #7: An English numeral may contain any number of chunks each corresponding to a numeric value less than 1000, a chunk may be in this format "{unit_word}|{teen_word}|{ty_word}" in which case it has the numeric value of the word consisting the chunk, or a chunk may be in this format: "{unit_word} hundred (and )? ({teen_word}|{ty_word})", in which case the place value is the first times 100 added with the second, or a chunk may be in this format: "{unit_word} hundred (and )? {ty_word}-{unit_word}", in which case the place value is first * 100 + second + third.
Rule #8: An English numeral may contain the words: "thousand", "million", "billion", "trillion", "quadrillion", "quintillion", "sextillion", "septillion", "octillion", "nonillion", "decillion", "undecillion", "duodecillion", "tredecillion", "quattordecillion", "quindecillion", "sexdecillion", "septendecillion", "octodecillion", "novemdecillion", "vigintillion", corresponding to numeric values \$\left(10^{3i}|1 \le i \le 21\right)\$, they each must be preceded by a chunk, and they modify the total numeric value by adding the place value of the chunk preceding them multiplied with their own numeric value. A chunk may consist of the word "a" which has a numeric value of 1 and it must be followed by exactly one scale word and nothing else, it must not be preceded by a sign word. All scale words must be separated by chunks, all scale words must be appearing in strictly descending order, meaning no scale word can appear multiple times in an English numeral and for any scale word all scale words appearing after it must be strictly smaller than it.
And these are all we need to implement an English numeral converter.
As for Roman numerals, it is much simpler:
Rule #1: Roman numerals have digits
{"I": 1, "V": 5, "X": 10, "L": 50, "C": 100, "D": 500, "M": 1000}Rule #2: Roman numeral digits may appear alone or appear up to three times in a row, this is a chunk with numeric value equal to the digit value times the occurrences.
Rule #3: "IV" is 4, "IX" is 9, "XL" is 40, "XC" is 90, "CD" is 400 and "CM" is 900.
Rule #4: A Roman numeral can have up to four chunks in strictly descending order, no two chunks with the same order of magnitude can appear in a Roman numeral, and the numeric of a Roman numeral is the sum of the numeric values of the chunks.
And I have just explained all the code in my program.
Here is the code:
# fmt: off
LT_100 = {
"one": b"001", "two": b"002", "three": b"003", "four": b"004", "five": b"005",
"six": b"006", "seven": b"007", "eight": b"008", "nine": b"009", "ten": b"010",
"eleven": b"011", "twelve": b"012", "thirteen": b"013", "fourteen": b"014", "fifteen": b"015",
"sixteen": b"016", "seventeen": b"017", "eighteen": b"018", "nineteen": b"019", "twenty": b"020",
"twenty-one": b"021", "twenty-two": b"022", "twenty-three": b"023", "twenty-four": b"024", "twenty-five": b"025",
"twenty-six": b"026", "twenty-seven": b"027", "twenty-eight": b"028", "twenty-nine": b"029", "thirty": b"030",
"thirty-one": b"031", "thirty-two": b"032", "thirty-three": b"033", "thirty-four": b"034", "thirty-five": b"035",
"thirty-six": b"036", "thirty-seven": b"037", "thirty-eight": b"038", "thirty-nine": b"039", "forty": b"040",
"forty-one": b"041", "forty-two": b"042", "forty-three": b"043", "forty-four": b"044", "forty-five": b"045",
"forty-six": b"046", "forty-seven": b"047", "forty-eight": b"048", "forty-nine": b"049", "fifty": b"050",
"fifty-one": b"051", "fifty-two": b"052", "fifty-three": b"053", "fifty-four": b"054", "fifty-five": b"055",
"fifty-six": b"056", "fifty-seven": b"057", "fifty-eight": b"058", "fifty-nine": b"059", "sixty": b"060",
"sixty-one": b"061", "sixty-two": b"062", "sixty-three": b"063", "sixty-four": b"064", "sixty-five": b"065",
"sixty-six": b"066", "sixty-seven": b"067", "sixty-eight": b"068", "sixty-nine": b"069", "seventy": b"070",
"seventy-one": b"071", "seventy-two": b"072", "seventy-three": b"073", "seventy-four": b"074", "seventy-five": b"075",
"seventy-six": b"076", "seventy-seven": b"077", "seventy-eight": b"078", "seventy-nine": b"079", "eighty": b"080",
"eighty-one": b"081", "eighty-two": b"082", "eighty-three": b"083", "eighty-four": b"084", "eighty-five": b"085",
"eighty-six": b"086", "eighty-seven": b"087", "eighty-eight": b"088", "eighty-nine": b"089", "ninety": b"090",
"ninety-one": b"091", "ninety-two": b"092", "ninety-three": b"093", "ninety-four": b"094", "ninety-five": b"095",
"ninety-six": b"096", "ninety-seven": b"097", "ninety-eight": b"098", "ninety-nine": b"099"
}
SCALES = {
"thousand": 3,
"million": 6,
"billion": 9,
"trillion": 12,
"quadrillion": 15,
"quintillion": 18,
"sextillion": 21,
"septillion": 24,
"octillion": 27,
"nonillion": 30,
"decillion": 33,
"undecillion": 36,
"duodecillion": 39,
"tredecillion": 42,
"quattordecillion": 45,
"quindecillion": 48,
"sexdecillion": 51,
"septendecillion": 54,
"octodecillion": 57,
"novemdecillion": 60,
"vigintillion": 63,
}
PLACES = {"hundred": 100} | { scale: 10**power for scale, power in SCALES.items() }
REV_LT_100 = [
"" , "one" , "two" , "three" , "four" ,
"five" , "six" , "seven" , "eight" , "nine" ,
"ten" , "eleven" , "twelve" , "thirteen" , "fourteen" ,
"fifteen" , "sixteen" , "seventeen" , "eighteen" , "nineteen" ,
"twenty" , "twenty-one" , "twenty-two" , "twenty-three" , "twenty-four" ,
"twenty-five" , "twenty-six" , "twenty-seven" , "twenty-eight" , "twenty-nine" ,
"thirty" , "thirty-one" , "thirty-two" , "thirty-three" , "thirty-four" ,
"thirty-five" , "thirty-six" , "thirty-seven" , "thirty-eight" , "thirty-nine" ,
"forty" , "forty-one" , "forty-two" , "forty-three" , "forty-four" ,
"forty-five" , "forty-six" , "forty-seven" , "forty-eight" , "forty-nine" ,
"fifty" , "fifty-one" , "fifty-two" , "fifty-three" , "fifty-four" ,
"fifty-five" , "fifty-six" , "fifty-seven" , "fifty-eight" , "fifty-nine" ,
"sixty" , "sixty-one" , "sixty-two" , "sixty-three" , "sixty-four" ,
"sixty-five" , "sixty-six" , "sixty-seven" , "sixty-eight" , "sixty-nine" ,
"seventy" , "seventy-one" , "seventy-two" , "seventy-three", "seventy-four" ,
"seventy-five" , "seventy-six" , "seventy-seven", "seventy-eight", "seventy-nine" ,
"eighty" , "eighty-one" , "eighty-two" , "eighty-three" , "eighty-four" ,
"eighty-five" , "eighty-six" , "eighty-seven" , "eighty-eight" , "eighty-nine" ,
"ninety" , "ninety-one" , "ninety-two" , "ninety-three" , "ninety-four" ,
"ninety-five" , "ninety-six" , "ninety-seven" , "ninety-eight" , "ninety-nine"
]
HUNDREDS = [
"" , "one hundred" , "two hundred" , "three hundred", "four hundred" ,
"five hundred" , "six hundred" , "seven hundred", "eight hundred", "nine hundred" ,
]
SCALES_LIST = [
"thousand" , "million" , "billion" ,
"trillion" , "quadrillion" , "quintillion" ,
"sextillion" , "septillion" , "octillion" ,
"nonillion" , "decillion" , "undecillion" ,
"duodecillion" , "tredecillion" , "quattordecillion",
"quindecillion" , "sexdecillion" , "septendecillion" ,
"octodecillion" , "novemdecillion" , "vigintillion"
]
ROMAN_UNITS = ["", "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX"]
ROMAN_TENS = ["", "X", "XX", "XXX", "XL", "L", "LX", "LXX", "LXXX", "XC"]
ROMAN_HUNDREDS = ["", "C", "CC", "CCC", "CD", "D", "DC", "DCC", "DCCC", "CM"]
ROMAN_THOUSANDS = ["", "M", "MM", "MMM"]
ROMAN_NUMERALS = [
("VIII", 4, 1, 8 ), ("LXXX", 4, 2, 80 ), ("DCCC", 4, 3, 800 ),
("III" , 3, 1, 3 ), ("VII" , 3, 1, 7 ), ("XXX" , 3, 2, 30 ),
("LXX" , 3, 2, 70 ), ("CCC" , 3, 3, 300 ), ("DCC" , 3, 3, 700 ),
("MMM" , 3, 4, 3000), ("II" , 2, 1, 2 ), ("IV" , 2, 1, 4 ),
("VI" , 2, 1, 6 ), ("IX" , 2, 1, 9 ), ("XX" , 2, 2, 20 ),
("XL" , 2, 2, 40 ), ("LX" , 2, 2, 60 ), ("XC" , 2, 2, 90 ),
("CC" , 2, 3, 200 ), ("CD" , 2, 3, 400 ), ("DC" , 2, 3, 600 ),
("CM" , 2, 3, 900 ), ("MM" , 2, 4, 2000), ("I" , 1, 1, 1 ),
("V" , 1, 1, 5 ), ("X" , 1, 2, 10 ), ("L" , 1, 2, 50 ),
("C" , 1, 3, 100 ), ("D" , 1, 3, 500 ), ("M" , 1, 4, 1000)
]
# fmt: on
def parse_Roman_numeral(numeral: str) -> int:
if not isinstance(numeral, str):
raise TypeError("argument numeral must be str")
if numeral in {"", "N"}:
return 0
prev_len = 5
number = 0
pos = 0
end = len(numeral)
while pos < end:
for chunk, length, places, value in ROMAN_NUMERALS:
next_pos = pos + length
if numeral[pos:next_pos] == chunk:
pos = next_pos
break
else:
raise ValueError("detected invalid digit in Roman numeral")
if (cur_len := places) >= prev_len:
raise ValueError("Roman numerals were written in strictly descending order")
number += value
prev_len = cur_len
return number
def to_Roman_numeral(number: int) -> str:
if not isinstance(number, int):
raise TypeError("argument number must be int")
if not number:
return "N"
if number < 0 or number > 3999:
raise ValueError("Roman numerals can only represent numbers 0 to 3999")
numeral = ""
q, r = divmod(number, 1000)
numeral += ROMAN_THOUSANDS[q]
q, r = divmod(r, 100)
numeral += ROMAN_HUNDREDS[q]
q, r = divmod(r, 10)
numeral += ROMAN_TENS[q]
return numeral + ROMAN_UNITS[r]
def handle_singleton(length: int, words: list[str], values: dict) -> None:
if length != 2:
raise ValueError(
"'a' must be followed by exactly one scale word and nothing else"
)
second = words[1]
if not (power := PLACES.get(second)):
raise ValueError("the word after 'a' isn't a scale word")
values |= {"found": 1, "number": power}
def preprocess_English_numeral_worker(
first: str, length: int, words: list[str], values: dict
) -> None:
negative = first == "negative"
if negative or first == "positive":
if length == 1:
raise ValueError("sign must modify a number")
words.pop(0)
length = len(words)
if words[0] == "zero":
if length != 1:
raise ValueError(
"only the following valid English numerals can contain zero: 'negative zero', 'zero', 'positive zero'"
)
values |= {"found": 1, "number": 0}
else:
values |= {"sign": "-" * negative, "words": words}
def preprocess_English_numeral(numeral: str, values: dict) -> None:
numeral = numeral.strip().lower()
if not numeral:
raise ValueError("empty string isn't valid English numeral")
words = numeral.split(" ")
first = words[0]
length = len(words)
if first == "a":
handle_singleton(length, words, values)
else:
preprocess_English_numeral_worker(first, length, words, values)
def handle_special(
word: str, multiplier: bytes, allow_and: bool
) -> tuple[int, int, int, int]:
if word == "hundred":
if multiplier[:2] != b"00" or multiplier[2] == 48:
raise ValueError(
"'hundred' must be preceded by a number between one and nine"
)
return 1, 1, 0, multiplier[::-1]
if allow_and and word == "and":
return 1, 0, 1, multiplier
return 0, 0, 0, multiplier
def handle_multiplier(
partial: int, multiplier: bytes, multiplier_count: int
) -> tuple[int, int]:
if multiplier == b"000":
return partial, 1
elif (first := multiplier[:1]) != b"0" and multiplier_count == 1:
return first + partial[1:], 2
else:
raise ValueError("two separate multipliers cannot be adjacent")
def parse_English_numeral_worker(words: list[str]) -> list[tuple[bytes, int]]:
multiplier_count = scale_power = allow_and = 0
multiplier = b"000"
last = 64
chunks = []
for word in words:
skip, allow_and, used_and, multiplier = handle_special(
word, multiplier, allow_and
)
if skip:
continue
if partial := LT_100.get(word):
multiplier, multiplier_count = handle_multiplier(
partial, multiplier, multiplier_count
)
scale_power = 0
else:
if (
used_and
or multiplier == b"000"
or not (scale_power := SCALES.get(word))
or scale_power >= last
):
raise ValueError(
f"detected illegal word in standard short scale English numeral of integers for current position: {word}"
)
chunks.append((multiplier, scale_power))
multiplier_count = 0
multiplier = b"000"
last = scale_power
allow_and = used_and = False
if not scale_power:
chunks.append((multiplier, 0))
return chunks
def parse_English_numeral(numeral: str) -> int:
if not isinstance(numeral, str):
raise TypeError("argument numeral must be str")
values = {"found": False}
preprocess_English_numeral(numeral, values)
if values["found"]:
return values["number"]
chunks = parse_English_numeral_worker(values["words"])
length = chunks[0][1] + 3
digits = bytearray(
b"0" * length,
)
for chunk, shift in chunks:
end = length - shift
digits[end - 3 : end] = chunk
digits = digits.decode()
digits.lstrip()
return int(values["sign"] + digits)
def to_English_numeral(number: int) -> str:
if not isinstance(number, int):
raise TypeError("argument number must be int")
if not number:
return "zero"
sign = ""
if number < 0:
number = -number
sign = "negative "
decimal = str(number)
length = len(decimal)
if length >= 67:
raise ValueError("this function only supports scale words up to vigintillion")
mod = length % 3
padding = (3 - mod) * bool(mod)
length += padding
decimal = "0" * padding + decimal
chunks = []
hundreds, remainder = divmod(int(decimal[-3:]), 100)
if remainder:
chunks.append(REV_LT_100[remainder])
if hundreds:
chunks.append(HUNDREDS[hundreds])
for index, (start, end) in enumerate(
zip(range(length - 6, -1, -3), range(length - 3, 0, -3))
):
if chunk := int(decimal[start:end]):
hundreds, remainder = divmod(chunk, 100)
chunks.append(SCALES_LIST[index])
if remainder:
chunks.append(REV_LT_100[remainder])
if hundreds:
chunks.append(HUNDREDS[hundreds])
chunks.reverse()
return sign + " ".join(chunks)
Test:
In [2]: to_English_numeral(299792458)
Out[2]: 'two hundred ninety-nine million seven hundred ninety-two thousand four hundred fifty-eight'
In [3]: parse_English_numeral('two hundred ninety-nine million seven hundred ninety-two thousand four hundred fifty-eight')
Out[3]: 299792458
In [4]: to_English_numeral(86400)
Out[4]: 'eighty-six thousand four hundred'
In [5]: parse_English_numeral('eighty-six thousand four hundred')
Out[5]: 86400
In [6]: to_English_numeral(31556952)
Out[6]: 'thirty-one million five hundred fifty-six thousand nine hundred fifty-two'
In [7]: parse_English_numeral('thirty-one million five hundred fifty-six thousand nine hundred fifty-two')
Out[7]: 31556952
In [8]: to_English_numeral(9460536207068016)
Out[8]: 'nine quadrillion four hundred sixty trillion five hundred thirty-six billion two hundred seven million sixty-eight thousand sixteen'
In [9]: parse_English_numeral('nine quadrillion four hundred sixty trillion five hundred thirty-six billion two hundred seven million sixty-eight thousand sixteen')
Out[9]: 9460536207068016
In [10]: to_English_numeral(86315)
Out[10]: 'eighty-six thousand three hundred fifteen'
In [11]: parse_English_numeral('eighty-six thousand three hundred fifteen')
Out[11]: 86315
In [12]: to_English_numeral(-314159265358979323846264338327950288419716939937510582097494459230)
Out[12]: 'negative three hundred fourteen vigintillion one hundred fifty-nine novemdecillion two hundred sixty-five octodecillion three hundred fifty-eight septendecillion nine hundred seventy-nine sexdecillion three hundred twenty-three quindecillion eight hundred forty-six quattordecillion two hundred sixty-four tredecillion three hundred thirty-eight duodecillion three hundred twenty-seven undecillion nine hundred fifty decillion two hundred eighty-eight nonillion four hundred nineteen octillion seven hundred sixteen septillion nine hundred thirty-nine sextillion nine hundred thirty-seven quintillion five hundred ten quadrillion five hundred eighty-two trillion ninety-seven billion four hundred ninety-four million four hundred fifty-nine thousand two hundred thirty'
In [13]: parse_English_numeral('negative three hundred fourteen vigintillion one hundred fifty-nine novemdecillion two hundred sixty-five octodecillion three hundred fifty-eight septendecillion nin
...: e hundred seventy-nine sexdecillion three hundred twenty-three quindecillion eight hundred forty-six quattordecillion two hundred sixty-four tredecillion three hundred thirty-eight duodeci
...: llion three hundred twenty-seven undecillion nine hundred fifty decillion two hundred eighty-eight nonillion four hundred nineteen octillion seven hundred sixteen septillion nine hundred t
...: hirty-nine sextillion nine hundred thirty-seven quintillion five hundred ten quadrillion five hundred eighty-two trillion ninety-seven billion four hundred ninety-four million four hundred
...: fifty-nine thousand two hundred thirty')
Out[13]: -314159265358979323846264338327950288419716939937510582097494459230
In [14]: to_English_numeral(100200300400500600700800900)
Out[14]: 'one hundred septillion two hundred sextillion three hundred quintillion four hundred quadrillion five hundred trillion six hundred billion seven hundred million eight hundred thousand nine hundred'
In [15]: parse_English_numeral('one hundred septillion two hundred sextillion three hundred quintillion four hundred quadrillion five hundred trillion six hundred billion seven hundred million eigh
...: t hundred thousand nine hundred')
Out[15]: 100200300400500600700800900
In [16]: to_English_numeral(10**42)
Out[16]: 'one tredecillion'
In [17]: parse_English_numeral('one tredecillion')
Out[17]: 1000000000000000000000000000000000000000000
In [18]: parse_English_numeral("vigintillion")
---------------------------------------------------------------------------
...
ValueError: detected illegal word in standard short scale English numeral of integers for current position: vigintillion
In [19]: parse_English_numeral("a vigintillion")
Out[19]: 1000000000000000000000000000000000000000000000000000000000000000
In [20]: parse_English_numeral("one vigintillion")
Out[20]: 1000000000000000000000000000000000000000000000000000000000000000
In [21]: parse_English_numeral("one vigintillion million")
---------------------------------------------------------------------------
...
ValueError: detected illegal word in standard short scale English numeral of integers for current position: million
In [22]: parse_English_numeral("one vigintillion a million")
---------------------------------------------------------------------------
...
ValueError: detected illegal word in standard short scale English numeral of integers for current position: a
In [23]: parse_English_numeral("one vigintillion one million")
Out[23]: 1000000000000000000000000000000000000000000000000000000001000000
In [24]: parse_English_numeral("hundred")
---------------------------------------------------------------------------
...
ValueError: 'hundred' must be preceded by a number between one and nine
In [25]: parse_English_numeral("a hundred")
Out[25]: 100
In [26]: parse_English_numeral("hundred thousand")
---------------------------------------------------------------------------
...
ValueError: 'hundred' must be preceded by a number between one and nine
In [27]: parse_English_numeral("nine hundred thousand")
Out[27]: 900000
In [28]: parse_English_numeral("nine hundred hundred")
---------------------------------------------------------------------------
...
ValueError: 'hundred' must be preceded by a number between one and nine
In [29]: parse_English_numeral("nine hundred thousand thousand")
---------------------------------------------------------------------------
...
ValueError: detected illegal word in standard short scale English numeral of integers for current position: thousand
In [30]: parse_English_numeral("one billion billion billion billion billion billion billion billion billion billion")
---------------------------------------------------------------------------
...
ValueError: detected illegal word in standard short scale English numeral of integers for current position: billion
In [31]: parse_English_numeral("one novemdecillion")
Out[31]: 1000000000000000000000000000000000000000000000000000000000000
In [32]: parse_English_numeral("nineteen hundred")
---------------------------------------------------------------------------
...
ValueError: 'hundred' must be preceded by a number between one and nine
In [33]: parse_English_numeral("one thousand nine hundred")
Out[33]: 1900
In [34]: parse_English_numeral("one thousand and hundred")
---------------------------------------------------------------------------
...
ValueError: detected illegal word in standard short scale English numeral of integers for current position: and
In [35]: parse_English_numeral("one thousand and one")
---------------------------------------------------------------------------
...
ValueError: detected illegal word in standard short scale English numeral of integers for current position: and
In [36]: parse_English_numeral("one thousand zero hundred and one")
---------------------------------------------------------------------------
...
ValueError: detected illegal word in standard short scale English numeral of integers for current position: zero
In [37]: parse_English_numeral("one thousand one hundred and one")
Out[37]: 1101
In [38]: for i in range(4000):
...: assert parse_Roman_numeral(to_Roman_numeral(i)) == i
In [39]: max((to_Roman_numeral(i) for i in range(4000)), key=len)
Out[39]: 'MMMDCCCLXXXVIII'
In [40]: parse_Roman_numeral('MMMDCCCLXXXVIII')
Out[40]: 3888
In [41]: to_English_numeral(693147180559945309417232121458176568075500134360255254120680009493)
Out[41]: 'six hundred ninety-three vigintillion one hundred forty-seven novemdecillion one hundred eighty octodecillion five hundred fifty-nine septendecillion nine hundred forty-five sexdecillion three hundred nine quindecillion four hundred seventeen quattordecillion two hundred thirty-two tredecillion one hundred twenty-one duodecillion four hundred fifty-eight undecillion one hundred seventy-six decillion five hundred sixty-eight nonillion seventy-five octillion five hundred septillion one hundred thirty-four sextillion three hundred sixty quintillion two hundred fifty-five quadrillion two hundred fifty-four trillion one hundred twenty billion six hundred eighty million nine thousand four hundred ninety-three'
In [42]: to_English_numeral(10**66-1)
Out[42]: 'nine hundred ninety-nine vigintillion nine hundred ninety-nine novemdecillion nine hundred ninety-nine octodecillion nine hundred ninety-nine septendecillion nine hundred ninety-nine sexdecillion nine hundred ninety-nine quindecillion nine hundred ninety-nine quattordecillion nine hundred ninety-nine tredecillion nine hundred ninety-nine duodecillion nine hundred ninety-nine undecillion nine hundred ninety-nine decillion nine hundred ninety-nine nonillion nine hundred ninety-nine octillion nine hundred ninety-nine septillion nine hundred ninety-nine sextillion nine hundred ninety-nine quintillion nine hundred ninety-nine quadrillion nine hundred ninety-nine trillion nine hundred ninety-nine billion nine hundred ninety-nine million nine hundred ninety-nine thousand nine hundred ninety-nine'
In [43]: parse_English_numeral("a hundred one")
---------------------------------------------------------------------------
...
ValueError: 'a' must be followed by exactly one scale word and nothing else
In [44]: parse_English_numeral("twenty twenty")
---------------------------------------------------------------------------
...
ValueError: two separate multipliers cannot be adjacent
How can my program be improved?
Update
Per the rules of classical Roman numerals, specifically the rule about no consecutive occurrences of the same digit with length more than three, it is easy to see that the maximum number that can be expressed as classical Roman numeral is 3999, which is "MMMCMXCIX" in Roman numeral. Because for numbers 4000 and up, Romans didn't have digits for them, so they would need "MMMM..." which aren't allowed. Romans also didn't have the concept of zero but it is just silly, modern extensions of Roman numerals usually use "N" to denote 0. Also Romans denied negative numbers, that is silly too, but in today's usage Roman numerals are typically used to represent positive numbers only.
Update 1
To respond to a certain comment, although it is likely a joke, here is a serious response.
In Python or any programming language I know there is no signed 0 integer, to be more precise computers use two's complement which is inverting the bits and add one. For example, in int64_t "seven quintillion five hundred forty quadrillion one hundred thirteen trillion eight hundred four billion seven hundred forty-six million three hundred forty-six thousand four hundred twenty-nine" is 0b0110100010100011110111011000111001100001111011001100111110111101. To get the negative we invert the bits and add one. To invert the bits, XOR with all ones, 0b0110100010100011110111011000111001100001111011001100111110111101 ^ (1 << 64) - 1 = 0b1001011101011100001000100111000110011110000100110011000001000010. Then we add one to get 0b1001011101011100001000100111000110011110000100110011000001000011.
It may not seem much, but this forces the nonexistence of signed zero. Why? To start, when the left most bit is set, does that mean it is a positive number greater than 9223372036854775808 or a negative number smaller than -9223372036854775808? To resolve the ambiguity and to ensure unique representation for all integers, we must sacrifice the leftmost bit to denote the sign, and so we can only use 63 bits for the numeric value. Thus if the leftmost bit is set, the number is negative, if it isn't then it is positive.
Now, this also means the smallest number we can represent using 64 bits is -9223372036854775808. Because it is one followed by 63 zeroes, after inverting we have 63 ones and add one we propagate the carry all the way to the left and we get one followed by 63 zeroes. Consequently if any bit other than the first bit is set, it becomes 0 after inverting, thus it stops the carry propagation after adding one. For example, 0b1111111111111111111111111111111111111111111111111111111111111111 is -1 in int64_t.
Now what is the negative of 0? 0 is all zeros, after inverting we have all ones, after adding one we need to propagate the carry all the way through to the left, we would need one more bit to store the number than what we started, for example in int64_t we need 1 followed by 64 0s, but that requires 65 bits and we only have 64, so we ignore the leading bit thus -0 = +0. Alternatively after inverting we get all ones, which is negative one, negative one plus one is zero.
But what about signed zero? That is because of floating point. I don't want to go into the details, but floating points cannot natively represent 0. Because two to any power isn't zero, \$\log_2(0)\$ is undefined (or in any base for that matter). Thus it is a special case, when the exponent and mantissa are all zeroes, the number is treated as 0, and the first bit is the sign bit, giving positive zero and negative zero.
But if the mantissa is all zeroes would that mean zero too? It could, but it breaks uniqueness of representations, the same number can be represented in different ways by adding/subtracting the exponent and correspondingly right-shifting/left-shifting the mantissa, so 52 bits of the mantissa is explicitly stored with the first bit of the mantissa is implicitly set as 1 except in special cases.
In [55]: -0
Out[55]: 0
In [56]: +0
Out[56]: 0
In [57]: -0.0
Out[57]: -0.0
In [58]: +0.0
Out[58]: 0.0
Update 2
I thought my variable name was self-evident, or the contents of the container should make the variable name self-explanatory. Apparently I was wrong.
LT_100 contains all the positive integers below one hundred in English numeral, so it shouldn't be hard to guess that LT_100 means Less Than 100. And similarly REV_LT_100 is the REVerse of LT_100.
As for why I chose that name, well, don't blame me, I trust most Python programmers know what __lt__, __eq__, __gt__, __le__, __ne__, __ge__ mean.
-0.\$\endgroup\$