All AI modelsBrowse our comprehensive collection of AI models from Microsoft Foundry
Total Models: 11287
gpt-5.4-nano
gpt-5.4-nano

GPT‑5.4‑nano is a lightweight, ultra‑efficient model designed for low‑latency, cost‑effective tasks at massive scale.

chat-completion
responses
gpt-5.4-mini
gpt-5.4-mini

GPT‑5.4‑mini is a compact, cost‑efficient model designed for reliable performance across high‑volume, everyday AI workloads.

chat-completion
responses
gpt-5.4-pro
gpt-5.4-pro

GPT‑5.4-Pro is OpenAI’s most capable frontier model, built to deliver faster, more reliable results for complex professional work.

chat-completion
responses
gpt-5.4
gpt-5.4

GPT‑5.4 is OpenAI’s most capable frontier model, built to deliver faster, more reliable results for complex professional work.

chat-completion
responses
FW-MiniMax-M2.5
FW-MiniMax-M2.5

MiniMax M2.5 is a Mixture-of-Experts model built for state-of-the-art coding, agentic tool use, and search, trained with reinforcement learning across hundreds of thousands of real-world environments.

chat-completion
claude-opus-4-6
claude-opus-4-6

Claude Opus 4.6 is the latest version of Anthropic's most intelligent model, and the world's best model for coding, enterprise agents, and professional work. With a 1M token context window and 128K max output, Opus 4.6 is ideal for production code, sophist

messages
claude-sonnet-4-6
claude-sonnet-4-6

Claude Sonnet 4.6 delivers frontier intelligence at scale—built for coding, agents, and enterprise workflows. With a 1M token context window and 128K max output, Sonnet 4.6 is ideal for coding, agents, office tasks, financial analysis, cybersecurity, and c

messages
gpt-5.3-codex
gpt-5.3-codex

gpt-5.3-codex is designed for steerability, front end development, and interactivity.

responses
model-router
model-router

Model router is a deployable AI model that is trained to select the most suitable large language model (LLM) for a given prompt.

chat-completion
Kimi-K2.5
Kimi-K2.5

Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base.

chat-completion
qwen-qwen3.5-9b
qwen-qwen3.5-9b

Qwen/Qwen3.59B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML token. ba

chat-completion
gpt-5.3-chat
gpt-5.3-chat

gpt-5.3-chat (preview) is an advanced, natural, multimodal, and context-aware conversations for enterprise applications.

chat-completion
responses
grok-4-1-fast-non-reasoning
grok-4-1-fast-non-reasoning

Grok 4.1 Fast Non‑Reasoning is designed for low‑latency, near‑instant responses, emphasizing speed, high‑quality outputs, and smooth tool‑calling in agentic workflows, making it well‑suited for high‑throughput, real‑time scenarios where immediate responses

chat-completion
grok-4-1-fast-reasoning
grok-4-1-fast-reasoning

Grok 4.1 Fast Reasoning is a frontier multimodal model built for high‑performance, agentic execution—combining strong reasoning, advanced tool calling, and agentic search to handle complex tasks with speed and precision. It delivers natural, fluid dialogue

chat-completion
FW-GLM-5
FW-GLM-5

GLM 5 is a 744B-parameter Mixture-of-Experts model targeting complex systems engineering and long-horizon agentic tasks, using DeepSeek Sparse Attention for efficient long-context processing.

chat-completion
FW-GPT-OSS-120B
FW-GPT-OSS-120B

gpt-oss-120b is an open-weight Mixture-of-Experts model from OpenAI with 117B total parameters, designed for powerful reasoning, agentic tasks, and production-grade general-purpose use cases.

chat-completion
gpt-audio-1.5
gpt-audio-1.5

A new S2S (speech to speech) model with improved instruction following.

audio-generation
gpt-realtime-1.5
gpt-realtime-1.5

A new S2S (speech to speech) model with improved instruction following.

audio-generation
FW-Kimi-K2.5
FW-Kimi-K2.5

Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base.

chat-completion
DeepSeek-V3.2
DeepSeek-V3.2

DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance

chat-completion
qwen-qwen3.5-35b-a3b
qwen-qwen3.5-35b-a3b

Qwen/Qwen3.535BA3B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML tok

chat-completion
gpt-5.2-chat
gpt-5.2-chat

gpt-5.2-chat (preview) is an advanced, natural, multimodal, and context-aware conversations for enterprise applications.

chat-completion
responses
mistral-document-ai-2512
mistral-document-ai-2512

Document conversion to markdown with interleaved images and text

image-to-text
gpt-5.2-codex
gpt-5.2-codex

gpt-5.2-codex is designed for steerability, front end development, and interactivity.

responses
FW-DeepSeek-V3.2
FW-DeepSeek-V3.2

DeepSeek V3.2 is a 675.2B-parameter Mixture-of-Experts model that combines high computational efficiency with superior reasoning and agent performance, supporting a 163.8K token context window.

chat-completion
DeepSeek-V3.2-Speciale
DeepSeek-V3.2-Speciale

DeepSeek-V3.2 Speciale, a model that harmonizes high computational efficiency with superior reasoning and agent performance

chat-completion
gpt-5.2
gpt-5.2

GPT-5.2 is engineered for enterprise agent scenarios—delivering structured, auditable outputs, reliable tool use, and governed integrations.

chat-completion
responses
Cohere-rerank-v4.0-pro
Cohere-rerank-v4.0-pro

Rerank improves search systems by sorting documents based on their semantic similarity to a query

text-classification
Cohere-rerank-v4.0-fast
Cohere-rerank-v4.0-fast

Rerank improves search systems by sorting documents based on their semantic similarity to a query

text-classification
Kimi-K2-Thinking
Kimi-K2-Thinking

Kimi K2 Thinking is the latest, most capable version of open-source thinking model

chat-completion
gpt-5.1-codex-max
gpt-5.1-codex-max

gpt-5.1-codex-max is agentic coding model designed to streamline complex development workflows with advanced efficiency

responses
claude-opus-4-5
claude-opus-4-5

Claude Opus 4.5 is Anthropic’s most intelligent model, and an industry leader across coding, agents, computer use, and enterprise workflows. With a 200K token context window and 64K max output, Opus 4.5 is ideal for production code, sophisticated agents, o

messages
claude-sonnet-4-5
claude-sonnet-4-5

Claude Sonnet 4.5 is Anthropic's most capable model for complex agents and an industry leader for coding and computer use.

messages
gpt-5.1
gpt-5.1

gpt-5.1 is designed for logic-heavy and multi-step tasks.

chat-completion
responses
gpt-5.1-codex
gpt-5.1-codex

gpt-5.1-codex is designed for steerability, front end development, and interactivity.

responses
DeepSeek-V3.1
DeepSeek-V3.1

DeepSeek-V3.1 is a hybrid model that enhances tool usage, thinking efficiency, and supports both thinking and non-thinking modes via chat template switching

chat-completion
Mistral-Large-3
Mistral-Large-3

Mistral Large 3 is a state-of-the-art General-purpose Multimodal granular Mixture-of-Experts model with 39B active parameters, 673B total parameters featuring 128 experts per layer and Multi-Latent attention.

chat-completion
gpt-5-chat
gpt-5-chat

gpt-5-chat (preview) is an advanced, natural, multimodal, and context-aware conversations for enterprise applications.

chat-completion
responses
claude-haiku-4-5
claude-haiku-4-5

Claude Haiku 4.5 delivers near-frontier performance for a wide range of use cases, and stands out as one of the best coding and agent models – with the right speed and cost to power free products and scaled sub-agents.

messages
claude-opus-4-1
claude-opus-4-1

Claude Opus 4.1 is an industry leader for coding. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve.

messages
grok-4
grok-4

Grok 4 is the latest reasoning model from xAI with advanced reasoning and tool-use capabilities, enabling it to achieve new state-of-the-art performance across challenging academic and industry benchmarks.

chat-completion
sora-2
sora-2

Sora 2 in Azure AI Foundry isn't just another video generation tool; it's a creative powerhouse, seamlessly integrated into a platform built for innovation, trust, and scale.

video-generation
embed-v-4-0
embed-v-4-0

Embed 4 transforms texts and images into numerical vectors

embeddings
summarization
gpt-5.1-chat
gpt-5.1-chat

gpt-5.1-chat (preview) is an advanced, natural, multimodal, and context-aware conversations for enterprise applications.

chat-completion
responses
gpt-5.1-codex-mini
gpt-5.1-codex-mini

gpt-5.1-codex-mini is designed for steerability, front end development, and interactivity.

responses
grok-4-fast-reasoning
grok-4-fast-reasoning

Grok 4 Fast is an efficiency-focused large language model developed by xAI, pre-trained on general-purpose data and post-trained on task demonstrations and tool use, with built-in safety features including refusal behaviors, a fixed system prompt enforcing

chat-completion
gpt-5-pro
gpt-5-pro

gpt-5-pro uses more compute to think harder and provide consistently better answers.

chat-completion
responses
Llama-4-Maverick-17B-128E-Instruct-FP8
Llama-4-Maverick-17B-128E-Instruct-FP8

Llama 4 Maverick 17B 128E Instruct FP8 is great at precise image understanding and creative writing, offering high quality at a lower price compared to Llama 3.3 70B

chat-completion
gpt-5
gpt-5

gpt-5 is designed for logic-heavy and multi-step tasks.

chat-completion
responses
DeepSeek-V3-0324
DeepSeek-V3-0324

DeepSeek-V3-0324 demonstrates notable improvements over its predecessor, DeepSeek-V3, in several key aspects, including enhanced reasoning, improved function calling, and superior code generation capabilities.

chat-completion
gpt-4.1
gpt-4.1

gpt-4.1 outperforms gpt-4o across the board, with major gains in coding, instruction following, and long-context understanding

chat-completion
responses
1