Skip to content

[Enhancement] Improve lex_attrs for Spanish & Portuguese#13900

Open
weezymatt wants to merge 1 commit into
explosion:masterfrom
weezymatt:enhancement/lex_attrs
Open

[Enhancement] Improve lex_attrs for Spanish & Portuguese#13900
weezymatt wants to merge 1 commit into
explosion:masterfrom
weezymatt:enhancement/lex_attrs

Conversation

@weezymatt

Copy link
Copy Markdown
Contributor

This PR enhances support for Spanish (es) and Portuguese (pt) in their respective spacy/lang modules by updating the lex_attrs.py files. Each change is accompanied with regression tests in their test_text.py files, respectively.

Description

Spanish (es):

  • Add feminine & apocopation ordinals

  • Add abbreviation (e.g., 1.º) and plural rule for ordinals in like_num function

  • Refactor test_issue3803 to follow spaCy code conventions by using fixtures

  • Add regression test test_es_lex_attrs_like_number

Portuguese (pt):

  • Add number variations (i.e., uma, duas)

  • Fix typo "seicentos" -> "seiscentos"

  • Add gender rules to the hundreds [200-900]

  • Add feminine ordinals

  • Add plural rule for ordinals in like_num

  • Add tests test_pt_lex_attrs_like_number and test_pt_lex_attrs_like_number_for_ordinal to more or less maintain language coverage

Additional:

  • Add weezymatt.md in ./github/contributors

Last bits:

  • Code conventions are followed using flake8 and black 25.11

Types of change

My PR covers an enhancement to the existing code.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.
@weezymatt weezymatt changed the title Enhance lex_attrs for Spanish & Portuguese Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant