Skip to content

Improve AMD accelerator example#3901

Merged
peterschmidt85 merged 10 commits into
masterfrom
improve-amd-accelerator-example
May 26, 2026
Merged

Improve AMD accelerator example#3901
peterschmidt85 merged 10 commits into
masterfrom
improve-amd-accelerator-example

Conversation

@peterschmidt85

Copy link
Copy Markdown
Contributor

Summary

  • Rework the AMD accelerator example around fleets, inference, training, dev environments, Docker image, and metrics
  • Use MI300X examples that can request at least four GPUs for inference and training
  • Remove stale AMD links from the TRL and Axolotl training examples
@peterschmidt85 peterschmidt85 requested a review from Bihan May 23, 2026 19:36
resources:
gpu: MI300X:4..
disk: 100GB..
```

@Bihan Bihan May 26, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend adding

volumes:
  - /checkpoints:/checkpoints

and setting --output_dir /checkpoints

@peterschmidt85 peterschmidt85 merged commit 39d3453 into master May 26, 2026
25 checks passed
@peterschmidt85 peterschmidt85 deleted the improve-amd-accelerator-example branch May 26, 2026 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants