Skip to content
Navigation Menu
Toggle navigation
Sign in
Product
GitHub Copilot
Write better code with AI
GitHub Advanced Security
Find and fix vulnerabilities
Actions
Automate any workflow
Codespaces
Instant dev environments
Issues
Plan and track work
Code Review
Manage code changes
Discussions
Collaborate outside of code
Code Search
Find more, search less
Explore
Why GitHub
All features
Documentation
GitHub Skills
Blog
Solutions
By company size
Enterprises
Small and medium teams
Startups
Nonprofits
By use case
DevSecOps
DevOps
CI/CD
View all use cases
By industry
Healthcare
Financial services
Manufacturing
Government
View all industries
View all solutions
Resources
Topics
AI
DevOps
Security
Software Development
View all
Explore
Learning Pathways
Events & Webinars
Ebooks & Whitepapers
Customer Stories
Partners
Executive Insights
Open Source
GitHub Sponsors
Fund open source developers
The ReadME Project
GitHub community articles
Repositories
Topics
Trending
Collections
Enterprise
Enterprise platform
AI-powered developer platform
Available add-ons
GitHub Advanced Security
Enterprise-grade security features
Copilot for business
Enterprise-grade AI features
Premium Support
Enterprise-grade 24/7 support
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search syntax tips
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Sign in
Sign up
Reseting focus
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
{{ message }}
NVIDIA
/
NeMo
Public
Notifications
You must be signed in to change notification settings
Fork
2.8k
Star
13.7k
Code
Issues
56
Pull requests
136
Discussions
Actions
Projects
0
Security
Insights
Additional navigation options
Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights
Commits
Breadcrumbs
History for
NeMo
scripts
nlp_language_modeling
prepare_packed_ft_dataset.py
on
main
User selector
All users
Datepicker
All time
Commit History
Commits on Mar 6, 2025
minor fix to nemo1 sequence packing script (#12486)
Show description for ceba38f
ashors1
authored
ceba38f
Copy full SHA for ceba38f
Commits on Dec 12, 2024
Adding documentation for packed dataset preparation with context para… (#11564)
Show description for 4d2b765
tomlifu
authored
4d2b765
Copy full SHA for 4d2b765
Commits on Dec 2, 2024
Fix loss mask when answer_only_loss=True (#11444)
Show description for 8c921dc
ashors1
authored
8c921dc
Copy full SHA for 8c921dc
Commits on Nov 28, 2024
Context Parallel SFT Support for dataset in THD format (#10688)
Show description for 48349e4
3 people
authored
48349e4
Copy full SHA for 48349e4
Commits on Aug 1, 2024
Support hf tokenizer in packed seq preparation script (#9974)
Show description for a7fbf6b
cuichenx
authored
a7fbf6b
Copy full SHA for a7fbf6b
Commits on Jul 30, 2024
Adds a Knob for OnlineSampling by introducing 'global_sample_mapping' in the SFT config.yaml (#9913)
Show description for c29d91a
3 people
authored
c29d91a
Copy full SHA for c29d91a
Commits on May 23, 2024
Refactor Sequence Packing Script (#9271)
Show description for a589828
cuichenx
authored
a589828
Copy full SHA for a589828
Commits on Mar 16, 2024
Add script for packed sequence data preparation (#8682)
Show description for 0fa7c50
cuichenx
and
pre-commit-ci[bot]
authored
0fa7c50
Copy full SHA for 0fa7c50
Loading
You can’t perform that action at this time.