Profine automatically profiles and optimizes PyTorch training jobs on real GPUs, delivering measurable speedups and lower GPU costs before teams waste days tuning configs by hand.
-
Updated
May 20, 2026 - Python
Profine automatically profiles and optimizes PyTorch training jobs on real GPUs, delivering measurable speedups and lower GPU costs before teams waste days tuning configs by hand.
NAV extracts and analyzes GPU performance traces from NVIDIA Nsight™ Systems (NSYS), enabling comparative analysis and visualization for efficient performance profiling and regression testing.
NAV extracts and analyzes GPU performance traces from NVIDIA Nsight™ Systems (NSYS), enabling comparative analysis and visualization for efficient performance profiling and regression testing.
Collection of examples and links that uses different profiling tools to show memory usage and timings.
Automated GPU profiling analysis for Adreno — turns Snapdragon Profiler captures into actionable insights with LLM
Unified benchmarking and profiling framework for the JAX scientific ML ecosystem. Timing, GPU/energy monitoring, FLOPS counting, roofline analysis, statistical testing, regression detection, and CI integration.
Profiling and Triton-based KV-cache optimization for protein language model inference on consumer GPUs.
"The GPU Watchers swore upon their shared memory hierarchy, from L1 to global memory, which also served as their mandate as lords of parallel computation."
Kernel-only profiling workflow for CUDA and Triton kernels with Nsight Compute, standardized reports, visual analysis, and vendor-portable adapters.
Low-overhead GPU profiler for AI workloads — correlates CPU call stacks with GPU kernels using eBPF and CUDA
Add a description, image, and links to the gpu-profiling topic page so that developers can more easily learn about it.
To associate your repository with the gpu-profiling topic, visit your repo's landing page and select "manage topics."