GPT‑5.4‑nano is a lightweight, ultra‑efficient model designed for low‑latency, cost‑effective tasks at massive scale.
GPT‑5.4‑mini is a compact, cost‑efficient model designed for reliable performance across high‑volume, everyday AI workloads.
GPT‑5.4-Pro is OpenAI’s most capable frontier model, built to deliver faster, more reliable results for complex professional work.
GPT‑5.4 is OpenAI’s most capable frontier model, built to deliver faster, more reliable results for complex professional work.
MiniMax M2.5 is a Mixture-of-Experts model built for state-of-the-art coding, agentic tool use, and search, trained with reinforcement learning across hundreds of thousands of real-world environments.
Claude Opus 4.6 is the latest version of Anthropic's most intelligent model, and the world's best model for coding, enterprise agents, and professional work. With a 1M token context window and 128K max output, Opus 4.6 is ideal for production code, sophist
Claude Sonnet 4.6 delivers frontier intelligence at scale—built for coding, agents, and enterprise workflows. With a 1M token context window and 128K max output, Sonnet 4.6 is ideal for coding, agents, office tasks, financial analysis, cybersecurity, and c
gpt-5.3-codex is designed for steerability, front end development, and interactivity.
Model router is a deployable AI model that is trained to select the most suitable large language model (LLM) for a given prompt.
Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base.
Qwen/Qwen3.59B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML token. ba
gpt-5.3-chat (preview) is an advanced, natural, multimodal, and context-aware conversations for enterprise applications.
Grok 4.1 Fast Non‑Reasoning is designed for low‑latency, near‑instant responses, emphasizing speed, high‑quality outputs, and smooth tool‑calling in agentic workflows, making it well‑suited for high‑throughput, real‑time scenarios where immediate responses
Grok 4.1 Fast Reasoning is a frontier multimodal model built for high‑performance, agentic execution—combining strong reasoning, advanced tool calling, and agentic search to handle complex tasks with speed and precision. It delivers natural, fluid dialogue
GLM 5 is a 744B-parameter Mixture-of-Experts model targeting complex systems engineering and long-horizon agentic tasks, using DeepSeek Sparse Attention for efficient long-context processing.
gpt-oss-120b is an open-weight Mixture-of-Experts model from OpenAI with 117B total parameters, designed for powerful reasoning, agentic tasks, and production-grade general-purpose use cases.
A new S2S (speech to speech) model with improved instruction following.
A new S2S (speech to speech) model with improved instruction following.
Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base.
DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance
Qwen/Qwen3.535BA3B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML tok
gpt-5.2-chat (preview) is an advanced, natural, multimodal, and context-aware conversations for enterprise applications.
Document conversion to markdown with interleaved images and text
gpt-5.2-codex is designed for steerability, front end development, and interactivity.
DeepSeek V3.2 is a 675.2B-parameter Mixture-of-Experts model that combines high computational efficiency with superior reasoning and agent performance, supporting a 163.8K token context window.
DeepSeek-V3.2 Speciale, a model that harmonizes high computational efficiency with superior reasoning and agent performance
GPT-5.2 is engineered for enterprise agent scenarios—delivering structured, auditable outputs, reliable tool use, and governed integrations.
Rerank improves search systems by sorting documents based on their semantic similarity to a query
Rerank improves search systems by sorting documents based on their semantic similarity to a query
Kimi K2 Thinking is the latest, most capable version of open-source thinking model
gpt-5.1-codex-max is agentic coding model designed to streamline complex development workflows with advanced efficiency
Claude Opus 4.5 is Anthropic’s most intelligent model, and an industry leader across coding, agents, computer use, and enterprise workflows. With a 200K token context window and 64K max output, Opus 4.5 is ideal for production code, sophisticated agents, o
Claude Sonnet 4.5 is Anthropic's most capable model for complex agents and an industry leader for coding and computer use.
gpt-5.1 is designed for logic-heavy and multi-step tasks.
gpt-5.1-codex is designed for steerability, front end development, and interactivity.
DeepSeek-V3.1 is a hybrid model that enhances tool usage, thinking efficiency, and supports both thinking and non-thinking modes via chat template switching
Mistral Large 3 is a state-of-the-art General-purpose Multimodal granular Mixture-of-Experts model with 39B active parameters, 673B total parameters featuring 128 experts per layer and Multi-Latent attention.
gpt-5-chat (preview) is an advanced, natural, multimodal, and context-aware conversations for enterprise applications.
Claude Haiku 4.5 delivers near-frontier performance for a wide range of use cases, and stands out as one of the best coding and agent models – with the right speed and cost to power free products and scaled sub-agents.
Claude Opus 4.1 is an industry leader for coding. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve.
Grok 4 is the latest reasoning model from xAI with advanced reasoning and tool-use capabilities, enabling it to achieve new state-of-the-art performance across challenging academic and industry benchmarks.
Sora 2 in Azure AI Foundry isn't just another video generation tool; it's a creative powerhouse, seamlessly integrated into a platform built for innovation, trust, and scale.
Embed 4 transforms texts and images into numerical vectors
gpt-5.1-chat (preview) is an advanced, natural, multimodal, and context-aware conversations for enterprise applications.
gpt-5.1-codex-mini is designed for steerability, front end development, and interactivity.
Grok 4 Fast is an efficiency-focused large language model developed by xAI, pre-trained on general-purpose data and post-trained on task demonstrations and tool use, with built-in safety features including refusal behaviors, a fixed system prompt enforcing
gpt-5-pro uses more compute to think harder and provide consistently better answers.
Llama 4 Maverick 17B 128E Instruct FP8 is great at precise image understanding and creative writing, offering high quality at a lower price compared to Llama 3.3 70B
gpt-5 is designed for logic-heavy and multi-step tasks.
DeepSeek-V3-0324 demonstrates notable improvements over its predecessor, DeepSeek-V3, in several key aspects, including enhanced reasoning, improved function calling, and superior code generation capabilities.
gpt-4.1 outperforms gpt-4o across the board, with major gains in coding, instruction following, and long-context understanding