DeepSeek & Qwen through one OpenAI-compatible endpoint. Point your SDK at us and pay per token in USD.
Free $0.20 credit on signup · 1M context · billed in USD
from openai import OpenAI
client = OpenAI(
base_url="https://wutonglu.com/v1",
api_key="sk-tch-...",
)
resp = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Hi"}],
)
print(resp.choices[0].message.content)
A flat fee buys a generous token quota for the period. When it runs out, buy another plan or switch to pay-as-you-go.
💡 Quota counts billable tokens: cache-miss input & output 1:1, cached input only 1/50 (it's ~50× cheaper). A typical 60M-token cached day uses just ~1.2M of quota. Plans stack: buy another to add quota + time.
New users get extra credit on top-up.
Only pay for tokens used, in USD. Three-tier billing passed through (DeepSeek & Qwen): cached input is dramatically cheaper. /v1/models
| Model | Context | Input · cache hit | Input · cache miss | Output |
|---|---|---|---|---|
| Loading… | ||||
Prices per 1M tokens, USD. Still far below GPT-4o / Claude.
Skip the +86 SIM, real-name KYC and CNY-only rails. Email signup, USD top-up or PayPal.
Change two lines. Streaming, tool calls and JSON mode all work. 1M context, up to 384K output.
Low-latency Asia/Pacific routing with multi-key load-balancing and automatic failover.
stream = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role":"user","content":"Explain Bell's theorem."}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
✓ openai-python · openai-node · LangChain · LiteLLM · Vercel AI SDK
✓ chat/completions stream + non-stream
✓ Tool / function calling · response_format=json_object
✓ Base URL: https://wutonglu.com/v1