Skip to content

dict training deduplication and bundle info#832

Closed
Victor-C-Zhang wants to merge 1 commit into
facebook:devfrom
Victor-C-Zhang:export-D108207923
Closed

dict training deduplication and bundle info#832
Victor-C-Zhang wants to merge 1 commit into
facebook:devfrom
Victor-C-Zhang:export-D108207923

Conversation

@Victor-C-Zhang

Copy link
Copy Markdown
Contributor

Summary:
Add deduplication for trained dictionaries and bundle info packing support to the OpenZL training infrastructure.

base_dict_trainer.cpp:

  • Add DictIDCmp comparator for ZL_DictID to enable std::set usage with bytewise comparison.
  • Track unique dict IDs during training to skip duplicate dictionaries. When a generated dict ID already exists in the set, the duplicate is skipped since OpenZL dict loading will materialize it properly from the existing entry.

trained_candidate.cpp/h:

  • Add packBundleInfo() method to pack standalone bundle metadata without appending dict contents. Creates a ZL_BundleInfo structure with bundle ID, dict IDs, and packs it using BundleInfo_pack().

Reviewed By: daniellerozenblit

Differential Revision: D108207923

Summary:
Add deduplication for trained dictionaries and bundle info packing support to the OpenZL training infrastructure.

base_dict_trainer.cpp:
- Add DictIDCmp comparator for ZL_DictID to enable std::set usage with bytewise comparison.
- Track unique dict IDs during training to skip duplicate dictionaries. When a generated dict ID already exists in the set, the duplicate is skipped since OpenZL dict loading will materialize it properly from the existing entry.

trained_candidate.cpp/h:
- Add packBundleInfo() method to pack standalone bundle metadata without appending dict contents. Creates a ZL_BundleInfo structure with bundle ID, dict IDs, and packs it using BundleInfo_pack().

Reviewed By: daniellerozenblit

Differential Revision: D108207923
@meta-cla meta-cla Bot added the cla signed label Jun 22, 2026
@meta-codesync

meta-codesync Bot commented Jun 22, 2026

Copy link
Copy Markdown

@Victor-C-Zhang has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108207923.

@meta-codesync

meta-codesync Bot commented Jun 23, 2026

Copy link
Copy Markdown

This pull request has been merged in 77939b1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment