API Documentation

TokenFlow is fully compatible with the OpenAI API format. Just change your base URL.

⚡ Quick Start

Get up and running in 30 seconds. TokenFlow uses the same API format as OpenAI.

bash
# Install the OpenAI SDK
pip install openai
python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.tokenflow.ai/v1",
    api_key="tf-your-api-key"
)

response = client.chat.completions.create(
    model="deepseek-v3",  # or "auto" for smart routing
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in 3 sentences."}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

📦 Node.js

javascript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.tokenflow.ai/v1',
  apiKey: 'tf-your-api-key',
});

const response = await client.chat.completions.create({
  model: 'auto',
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(response.choices[0].message.content);

🔗 cURL

bash
curl https://api.tokenflow.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tf-your-api-key" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "What is model routing?"}
    ],
    "stream": true
  }'

🧠 Smart Model Routing

Use model: "auto" to let TokenFlow automatically select the best model based on:

  • Complexity analysis — Simple tasks → cheap models, complex tasks → premium models
  • Cost optimization — Always picks the most cost-effective option
  • Latency targets — Routes to the fastest available endpoint
  • Fallback chains — Automatic retry with alternative models on failure

Or specify a model directly: deepseek-v3, kimi-k2, gpt-4o, etc.

📋 Supported Models

Model IDProviderInput $/1MOutput $/1M
deepseek-v3DeepSeek$0.14$0.28
deepseek-r1DeepSeek$0.55$2.19
kimi-k2Moonshot$0.20$0.60
glm-4-plusZhipu$0.15$0.50
qwen-maxAlibaba$0.16$0.64
gpt-4oOpenAI$2.50$10.00
claude-sonnetAnthropic$3.00$15.00

⚠️ Error Handling

TokenFlow returns standard OpenAI-format errors:

json
{
  "error": {
    "message": "Rate limit exceeded. Try again in 1.5s",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}
StatusMeaning
401Invalid API key
429Rate limit exceeded
500Upstream model error (auto-retried)
503Service temporarily unavailable

🔧 Architecture: Powered by One-API

TokenFlow is built on top of the open-source One-API project, which provides:

  • Unified API gateway — All models accessed through one OpenAI-compatible endpoint
  • Load balancing — Automatic distribution across multiple upstream API keys
  • Token accounting — Per-user quota and billing at the token level
  • Rate limiting — Configurable per-user, per-model request throttling
  • Channel management — Hot-swap upstream providers without downtime

Self-hosting? Deploy with Docker:

bash
docker run -d --restart always \
  --name one-api \
  -p 3000:3000 \
  -e TZ=Asia/Shanghai \
  -v /home/ubuntu/data/one-api:/data \
  justsong/one-api