Kataleptic
§ Models api.kataleptic.com/v1

Thirty-eight models. One key.

Reasoning, chat, image, video, transcription, embeddings — all behind one OpenAI-compatible endpoint. One key, one invoice, one data-processing agreement.

Reasoning & chat

deepseek-v3-2
MaaS

DeepSeek V3.2 — frontier open-source general-purpose. Faster + cheaper than V3.1; reasoning lives inside the span. Pick for single-shot tasks.

deepseek-v3-1
MaaS

DeepSeek V3.1 — the V3 family variant that ships native function-calling. Pick this for agent loops keyed on OpenAI tool_calls.

deepseek-r1
MaaS

DeepSeek R1 — o1-class reasoning with transparent chain-of-thought.

llama-4-maverick
MaaS

Meta Llama 4 Maverick — first Llama 4 generation, 1M context, 12 languages, vision+text.

llama-3.3-70b
MaaS

Meta Llama 3.3 70B Instruct — strong general-purpose with tool use.

llama-3.1-8b
MaaS

Meta Llama 3.1 8B Instruct — fast, capable, cheap.

mistral-large-3
MaaS

Mistral Large 3 — flagship European model, strong instruction following.

mistral-medium
MaaS

Mistral Medium 2505 — mid-tier between Nemo and Large 3, image+text.

gpt-oss-120b
MaaS

Microsoft GPT-OSS 120B — open-weight reasoning model.

cohere-command-a
MaaS

Cohere Command-A — multilingual (10 languages), strong RAG and tool-calling.

grok-4-1-fast
MaaS

xAI Grok 4.1 Fast — fast Grok variant with reasoning + tool use.

phi-4-mini-reasoning
MaaS

Microsoft Phi-4 Mini — small, cheap reasoning model with thinking traces.

kimi-k2.6
MaaS

Moonshot AI Kimi K2.6 — latest frontier reasoning with thinking traces.

kimi-k2.5
MaaS

Moonshot AI Kimi K2.5 — strong reasoning model with thinking traces.

gpt-5.5
OpenAI

OpenAI GPT-5.5 — flagship, 1.05M context, image+text reasoning.

gpt-5.4
OpenAI

OpenAI GPT-5.4 — production workhorse, 1.05M context, strong agentic + long-doc.

gpt-5.4-mini
OpenAI

OpenAI GPT-5.4 Mini — drop-in mid-tier, 400K context, reasoning.

gpt-5.2
OpenAI

OpenAI GPT-5.2 — improved reasoning and speed.

gpt-5
OpenAI

OpenAI GPT-5 — flagship proprietary model (first GPT-5 release).

qwen3-8b
Sovereign

Qwen 3 8B — fast multilingual, self-hosted on our own GPU fleet.

qwen2.5-coder-7b
Sovereign

Qwen 2.5 Coder 7B — code-specialised self-hosted.

mistral-nemo-12b
Sovereign

Mistral Nemo 12B — strong reasoning, self-hosted on our own GPU fleet.

gemma3-27b
Sovereign

Google Gemma 3 27B — strong vision + text, self-hosted on our own GPU fleet.

glm4-9b
Sovereign

Zhipu AI GLM-4 9B — bilingual EN/ZH chat, self-hosted.

Image

gpt-image-2
OpenAI

OpenAI GPT-Image-2 — flagship image generation, photoreal + editorial. Strong typography rendering.

gpt-image-1-mini
OpenAI

OpenAI GPT-Image-1 Mini — cheap image generation, good for thumbnails and batch.

flux-2-pro
MaaS

Black Forest Labs FLUX.2 Pro — open frontier image gen, strong typography + composition.

Video

sora-2
OpenAI

OpenAI Sora 2 — text-to-video. Async: POST /v1/videos → poll → fetch MP4. Sizes 720×1280, 1280×720, 1024×1792, 1792×1024.

Realtime voice

kataleptic-realtime
Sovereign

OpenAI Realtime API-compatible speech-to-speech over WebSocket at /v1/realtime — ~250 ms to first audio. Pair it with any chat model via ?model=. Server-side VAD with barge-in, automatic language detection across ten languages, transcript events both directions. ≈$0.0133/min typical incl. chat tokens. EU-resident on our own fleet. Realtime docs.

kataleptic-realtime-hd
EU

Premium voice tier: same /v1/realtime WebSocket, served by Azure Voice Live in Sweden Central — 600+ studio-grade HD neural voices, deep noise suppression, echo cancellation, semantic turn detection. Accepts G.711 (μ-law/A-law) for telephony. EU-resident, exact transcripts, ~1.2 s to first audio. ≈$0.03/min typical. Realtime docs.

gpt-realtime-2
OpenAI

Native speech-to-speech tier: OpenAI gpt-realtime-2 behind the same WebSocket — best-in-class prosody, hears tone rather than just words. ~1.0 s to first audio; accepts G.711 for telephony. Global inference routing (not EU-pinned); transcripts are model approximations. ≈$0.07/min typical. Realtime docs.

piper-tts
Sovereign

Piper neural text-to-speech in ten languages (EN, DE, FR, ES, NL, SV, DA, IT, FI, RU). Streams PCM16; the voice behind /v1/realtime. Self-hosted, EU-resident.

Audio & transcription

whisper-large-v3-turbo
Sovereign

OpenAI Whisper Large V3 Turbo — V3 distilled to ~6× faster decoding with near-identical WER. Self-hosted on our own GPU fleet, EU-resident by default.

whisper-large-v3-turbo-stream
Sovereign

Real-time streaming ASR over WebSocket. Deepgram-shape protocol at /v1/listen. Same Turbo model, VAD-segmented, partial + final transcripts.

parakeet-tdt-0-6b-stream
Sovereign

NVIDIA NeMo Parakeet-TDT-0.6B v2 — English-only, SOTA Open-ASR-Leaderboard WER, VAD-segmented streaming. Self-hosted.

gpt-4o-transcribe
OpenAI

OpenAI GPT-4o Transcribe — higher-accuracy ASR with semantic context.

gpt-4o-transcribe-diarize
OpenAI

OpenAI GPT-4o Transcribe with speaker diarization — multi-speaker meetings.

Embeddings

nomic-embed
Sovereign

High-quality multilingual text embeddings, self-hosted on our own GPU fleet.

Five dollars of free credit. No card, no call. Get a key