A closed-loop control system for Large Language Models that steers internal activation states in real-time to prevent mode collapse and toxicity
reinforcement-learning pytorch control-theory ai-safety riser mechanistic-interpretability llm-steering activation-engineering
-
Updated
Feb 1, 2026 - Python