DeepSeek V3.2 — frontier open-source general-purpose. Faster + cheaper than V3.1; reasoning lives inside the span. Pick for single-shot tasks.
Thirty-eight models. One key.
Reasoning, chat, image, video, transcription, embeddings — all behind one OpenAI-compatible endpoint. One key, one invoice, one data-processing agreement.
- Azure OpenAI proprietary OpenAI passthrough
- Azure MaaS open + frontier models on Azure AI Foundry
- Sovereign fleet self-hosted open weights on our own GPU servers, residency-on by default
Reasoning & chat
DeepSeek V3.1 — the V3 family variant that ships native function-calling. Pick this for agent loops keyed on OpenAI tool_calls.
DeepSeek R1 — o1-class reasoning with transparent chain-of-thought.
Meta Llama 4 Maverick — first Llama 4 generation, 1M context, 12 languages, vision+text.
Meta Llama 3.3 70B Instruct — strong general-purpose with tool use.
Meta Llama 3.1 8B Instruct — fast, capable, cheap.
Mistral Large 3 — flagship European model, strong instruction following.
Mistral Medium 2505 — mid-tier between Nemo and Large 3, image+text.
Microsoft GPT-OSS 120B — open-weight reasoning model.
Cohere Command-A — multilingual (10 languages), strong RAG and tool-calling.
xAI Grok 4.1 Fast — fast Grok variant with reasoning + tool use.
Microsoft Phi-4 Mini — small, cheap reasoning model with thinking traces.
Moonshot AI Kimi K2.6 — latest frontier reasoning with thinking traces.
Moonshot AI Kimi K2.5 — strong reasoning model with thinking traces.
OpenAI GPT-5.5 — flagship, 1.05M context, image+text reasoning.
OpenAI GPT-5.4 — production workhorse, 1.05M context, strong agentic + long-doc.
OpenAI GPT-5.4 Mini — drop-in mid-tier, 400K context, reasoning.
OpenAI GPT-5.2 — improved reasoning and speed.
OpenAI GPT-5 — flagship proprietary model (first GPT-5 release).
Qwen 3 8B — fast multilingual, self-hosted on our own GPU fleet.
Qwen 2.5 Coder 7B — code-specialised self-hosted.
Mistral Nemo 12B — strong reasoning, self-hosted on our own GPU fleet.
Google Gemma 3 27B — strong vision + text, self-hosted on our own GPU fleet.
Zhipu AI GLM-4 9B — bilingual EN/ZH chat, self-hosted.
Image
OpenAI GPT-Image-2 — flagship image generation, photoreal + editorial. Strong typography rendering.
OpenAI GPT-Image-1 Mini — cheap image generation, good for thumbnails and batch.
Black Forest Labs FLUX.2 Pro — open frontier image gen, strong typography + composition.
Video
OpenAI Sora 2 — text-to-video. Async: POST /v1/videos → poll → fetch MP4. Sizes 720×1280, 1280×720, 1024×1792, 1792×1024.
Realtime voice
OpenAI Realtime API-compatible speech-to-speech over WebSocket at /v1/realtime — ~250 ms to first audio. Pair it with any chat model via ?model=. Server-side VAD with barge-in, automatic language detection across ten languages, transcript events both directions. ≈$0.0133/min typical incl. chat tokens. EU-resident on our own fleet. Realtime docs.
Premium voice tier: same /v1/realtime WebSocket, served by Azure Voice Live in Sweden Central — 600+ studio-grade HD neural voices, deep noise suppression, echo cancellation, semantic turn detection. Accepts G.711 (μ-law/A-law) for telephony. EU-resident, exact transcripts, ~1.2 s to first audio. ≈$0.03/min typical. Realtime docs.
Native speech-to-speech tier: OpenAI gpt-realtime-2 behind the same WebSocket — best-in-class prosody, hears tone rather than just words. ~1.0 s to first audio; accepts G.711 for telephony. Global inference routing (not EU-pinned); transcripts are model approximations. ≈$0.07/min typical. Realtime docs.
Piper neural text-to-speech in ten languages (EN, DE, FR, ES, NL, SV, DA, IT, FI, RU). Streams PCM16; the voice behind /v1/realtime. Self-hosted, EU-resident.
Audio & transcription
OpenAI Whisper Large V3 Turbo — V3 distilled to ~6× faster decoding with near-identical WER. Self-hosted on our own GPU fleet, EU-resident by default.
Real-time streaming ASR over WebSocket. Deepgram-shape protocol at /v1/listen. Same Turbo model, VAD-segmented, partial + final transcripts.
NVIDIA NeMo Parakeet-TDT-0.6B v2 — English-only, SOTA Open-ASR-Leaderboard WER, VAD-segmented streaming. Self-hosted.
OpenAI GPT-4o Transcribe — higher-accuracy ASR with semantic context.
OpenAI GPT-4o Transcribe with speaker diarization — multi-speaker meetings.
Embeddings
High-quality multilingual text embeddings, self-hosted on our own GPU fleet.
Five dollars of free credit. No card, no call. Get a key