§ Models api.kataleptic.com/v1

Forty-eight models. One key.

Reasoning, chat, image, video, transcription, embeddings — all behind one OpenAI-compatible endpoint. One key, one invoice, one data-processing agreement.

Azure OpenAI proprietary OpenAI passthrough
Azure MaaS open + frontier models on Azure AI Foundry
Sovereign fleet self-hosted open weights on our own GPU servers, residency-on by default

Reasoning & chat

deepseek-v4-pro

MaaS

DeepSeek V4 Pro — flagship of the V4 generation, frontier open-source reasoning, 1M context. Cached input $0.145/MTok — one twelfth of fresh.

deepseek-v4-flash

MaaS

DeepSeek V4 Flash — the cheap, fast V4 variant for high-throughput work, 1M context. Cached input $0.028/MTok — one seventh of fresh.

deepseek-v3-2-speciale

MaaS

DeepSeek V3.2 Speciale — reasoning-tuned V3.2, stronger on maths and competition-style problems.

deepseek-v3-2

MaaS

DeepSeek V3.2 — frontier open-source general-purpose. Faster + cheaper than V3.1; reasoning lives inside the span. Pick for single-shot tasks.

deepseek-v3-1

MaaS

DeepSeek V3.1 — the V3 family variant that ships native function-calling. Pick this for agent loops keyed on OpenAI tool_calls.

deepseek-r1

MaaS

DeepSeek R1 — o1-class reasoning with transparent chain-of-thought.

llama-4-maverick

MaaS

Meta Llama 4 Maverick — first Llama 4 generation, 1M context, 12 languages, vision+text.

llama-3.3-70b

MaaS

Meta Llama 3.3 70B Instruct — strong general-purpose with tool use.

mistral-medium-3.5

MaaS

Mistral Medium 3.5 — European mid-tier: image and PDF input, 18 languages, strong instruction following.

mistral-large-3

MaaS

Mistral Large 3 — flagship European model, strong instruction following.

gpt-oss-120b

MaaS

Microsoft GPT-OSS 120B — open-weight reasoning model.

cohere-command-a-plus

MaaS

Cohere Command-A Plus — upgraded Command-A: multilingual, agentic RAG and tool-calling, 128K context, image input.

cohere-command-a

MaaS

Cohere Command-A — multilingual (10 languages), strong RAG and tool-calling.

grok-4.3

MaaS

xAI Grok 4.3 — latest frontier Grok: reasoning, tool use, image input, 200K context. Cached input $0.20/MTok — one sixth of fresh.

grok-4-1-fast

MaaS

xAI Grok 4.1 Fast — fast Grok variant with reasoning + tool use.

phi-4-mini-reasoning

MaaS

Microsoft Phi-4 Mini — small, cheap reasoning model with thinking traces.

kimi-k2.7-code

MaaS

Moonshot AI Kimi K2.7 Code — coding-specialised K2 with thinking traces, 262K context, image input. Cached input $0.19/MTok — one fifth of fresh.

kimi-k2.6

MaaS

Moonshot AI Kimi K2.6 — latest frontier reasoning with thinking traces.

kimi-k2.5

MaaS

Moonshot AI Kimi K2.5 — strong reasoning model with thinking traces.

gpt-5.6-sol

OpenAI

OpenAI GPT-5.6 Sol — flagship coding + agentic workhorse, 1.05M context. Sol Ultra mode via reasoning_effort=high. Cached input $0.50/MTok — one tenth of fresh.

gpt-5.6-terra

OpenAI

OpenAI GPT-5.6 Terra — balanced tier, 1.05M context, everyday coding + reasoning at half Sol's price. Cached input $0.25/MTok — one tenth of fresh.

gpt-5.6-luna

OpenAI

OpenAI GPT-5.6 Luna — fastest and cheapest GPT-5.6 variant, 400K context, high-throughput chat. Cached input $0.10/MTok — one tenth of fresh.

gpt-5.5

OpenAI

OpenAI GPT-5.5 — flagship, 1.05M context, image+text reasoning. Cached input $0.25/MTok — one tenth of fresh.

gpt-5.4

OpenAI

OpenAI GPT-5.4 — production workhorse, 1.05M context, strong agentic + long-doc. Cached input $0.25/MTok — one tenth of fresh.

gpt-5.4-mini

OpenAI

OpenAI GPT-5.4 Mini — drop-in mid-tier, 400K context, reasoning. Cached input $0.025/MTok — one tenth of fresh.

gpt-5.2

OpenAI

OpenAI GPT-5.2 — improved reasoning and speed. Cached input $0.25/MTok — one tenth of fresh.

gpt-5

OpenAI

OpenAI GPT-5 — flagship proprietary model (first GPT-5 release). Cached input $0.25/MTok — one tenth of fresh.

qwen3-8b

Sovereign

Qwen 3 8B — fast multilingual, self-hosted on our own GPU fleet.

qwen2.5-coder-7b

Sovereign

Qwen 2.5 Coder 7B — code-specialised self-hosted.

mistral-nemo-12b

Sovereign

Mistral Nemo 12B — strong reasoning, self-hosted on our own GPU fleet.

gemma3-27b

Sovereign

Google Gemma 3 27B — strong vision + text, self-hosted on our own GPU fleet.

glm4-9b

Sovereign

Zhipu AI GLM-4 9B — bilingual EN/ZH chat, self-hosted.

Image

gpt-image-2

OpenAI

OpenAI GPT-Image-2 — flagship image generation, photoreal + editorial. Strong typography rendering.

gpt-image-1-mini

OpenAI

OpenAI GPT-Image-1 Mini — cheap image generation, good for thumbnails and batch.

flux-2-pro

MaaS

Black Forest Labs FLUX.2 Pro — open frontier image gen, strong typography + composition.

Video

sora-2

OpenAI

OpenAI Sora 2 — text-to-video. Async: POST /v1/videos → poll → fetch MP4. Sizes 720×1280, 1280×720, 1024×1792, 1792×1024.

Realtime voice

kataleptic-realtime

Sovereign

OpenAI Realtime API-compatible speech-to-speech over WebSocket at /v1/realtime — ~250 ms to first audio. Pair it with any chat model via ?model=. Server-side VAD with barge-in, automatic language detection across ten languages, transcript events both directions. ≈$0.0133/min typical incl. chat tokens. EU-resident on our own fleet. Realtime docs.

kataleptic-realtime-hd

Premium voice tier: same /v1/realtime WebSocket, served by Azure Voice Live in Sweden Central — 600+ studio-grade HD neural voices, deep noise suppression, echo cancellation, semantic turn detection. Accepts G.711 (μ-law/A-law) for telephony. EU-resident, exact transcripts, ~1.2 s to first audio. ≈$0.03/min typical. Realtime docs.

gpt-realtime-2.1

OpenAI

Latest native speech-to-speech tier: OpenAI gpt-realtime-2.1 behind the same WebSocket — successor to gpt-realtime-2 with better instruction following and turn-taking at the same token price, plus image input. Accepts G.711 for telephony. Global inference routing (not EU-pinned); transcripts are model approximations. ≈$0.07/min typical for voice; image input bills separately at $6.25/MTok. Realtime docs.

gpt-realtime-2.1-mini

OpenAI

Volume tier for native speech-to-speech: roughly a third the audio-token cost of gpt-realtime-2.1, same protocol, voices and telephony codecs — sized for call handling. Global inference routing (not EU-pinned). ≈$0.02/min typical for voice; image input bills separately at $1.00/MTok. Realtime docs.

gpt-realtime-2

OpenAI

Native speech-to-speech tier: OpenAI gpt-realtime-2 behind the same WebSocket — best-in-class prosody, hears tone rather than just words. ~1.0 s to first audio; accepts G.711 for telephony. Global inference routing (not EU-pinned); transcripts are model approximations. ≈$0.07/min typical for voice; image input bills separately at $6.25/MTok. Realtime docs.

piper-tts

Sovereign

Piper neural text-to-speech in ten languages (EN, DE, FR, ES, NL, SV, DA, IT, FI, RU). Streams PCM16; the voice behind /v1/realtime. Self-hosted, EU-resident.

Audio & transcription

whisper-large-v3-turbo

Sovereign

OpenAI Whisper Large V3 Turbo — V3 distilled to ~6× faster decoding with near-identical WER. Self-hosted on our own GPU fleet, EU-resident by default.

whisper-large-v3-turbo-stream

Sovereign

Real-time streaming ASR over WebSocket. Deepgram-shape protocol at /v1/listen. Same Turbo model, VAD-segmented, partial + final transcripts.

parakeet-tdt-0-6b-stream

Sovereign

NVIDIA NeMo Parakeet-TDT-0.6B v2 — English-only, SOTA Open-ASR-Leaderboard WER, VAD-segmented streaming. Self-hosted.

gpt-4o-transcribe

OpenAI

OpenAI GPT-4o Transcribe — higher-accuracy ASR with semantic context.

gpt-4o-transcribe-diarize

OpenAI

OpenAI GPT-4o Transcribe with speaker diarization — multi-speaker meetings.

Embeddings

nomic-embed

Sovereign

High-quality multilingual text embeddings, self-hosted on our own GPU fleet.

Five dollars of free credit. No card, no call. Get a key