Conversation
|
📦 Build Artifacts Available |
Eagle-3 extends EAGLE with vocabulary mapping for cross-tokenizer speculation, enabling draft models with different tokenizers than the target model. Key features: - Vocabulary mapping between draft (32K) and target (128K) vocabularies - Custom attention layer accepting 2x hidden_size input - Fusion layer processing 3 verifier layers (3×H → H) - Configurable layer normalization placement (before/after residual) - Full checkpoint compatibility with HuggingFace models Implementation includes: - Eagle3SpeculatorConfig with vocabulary size configuration - Eagle3Attention module for modified attention computation - Eagle3DecoderLayer processing concatenated embeddings and hidden states - Eagle3Speculator main model with vocabulary mapping support - Proper type annotations and mypy compatibility Based on: https://arxiv.org/abs/2503.01840 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Rahul Tuli <rtuli@redhat.com>
shanjiaz
left a comment
There was a problem hiding this comment.
Looks great! Ran the verification and it works very well.
- Add target_hidden_size field to Eagle3SpeculatorConfig - Update fusion layer to use target model's hidden size (3 × target_hidden_size) - Properly handle 70B models where target hidden size (8192) differs from draft model (6144) - Update docstrings to clarify hidden states dimensions This fixes compatibility with Eagle3-LLaMA3.3-Instruct-70B checkpoint. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Rahul Tuli <rtuli@redhat.com>
markurtz
left a comment
There was a problem hiding this comment.
I'll go through in detail a bit later around the implementation, but quick thing is we need to add in unit tests at a minimum
MeganEFlynn
left a comment
There was a problem hiding this comment.
Looks good to me, only question is if we need to handle the case where the embeddings are not saved as part of the state dict from the saved draft model, but need to be pulled from the verifier?
|
Landing - @markurtz to follow-up with tests |
Eagle-3 Speculator Implementation
This PR adds support for Eagle-3
Architecture
Key Features
Implementation Details
Eagle3SpeculatorConfig: Configuration with vocabulary size settingsEagle3Attention: Modified attention for 2×hidden_size inputEagle3DecoderLayer: Processes concatenated embeddings and hidden statesEagle3Speculator: Main model with vocabulary mapping supportBased on: https://arxiv.org/abs/2503.01840
Verification
To verify the implementation works with your checkpoint:
Example Output