-
-
Notifications
You must be signed in to change notification settings - Fork 67
Open
Description
Unicode 16 has been released, and bstr doesn't match spec behavior anymore.
From some fiddling around last night, it looks like the following needs to happen:
- Update the test cases in src/unicode/data
- Update the regex-automata dependency to 0.4.8 to ensure the transitive dependency on regex-syntax supports Unicode 16
- Update grapheme.sh to the new grammar in the spec. As far as I can tell, the changes are the inclusion of
CRandLFas grapheme breakes on their own (previously only windows-styleCR LFwas included), and rule GB9c for new Indic_Conjunct_Break behavior. Draft Implementation - Re-generated the finite state machines with an up-to-date regex-cli
Unfortunately, the spec implementation of GB9c depends on support for a few obscure character classes that regex (nor ucd-generate, which codegens the relevant table) does not currently support.
We could get around this by inlining the character class definitions; that would be pretty brittle, though, and upstreaming support seems like a better solution.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels