流式响应

使用 Server-Sent Events (SSE) 实现流式输出，让用户在模型生成的同时看到内容，大幅提升交互体验。

工作原理

流式响应通过 text/event-stream Content-Type 传输数据。每次设置 stream: true 后，API 会逐步返回生成的 token，而非等待全部完成后一次性返回。

特性	非流式	流式
首字延迟 (TTFT)	等于完整响应时间	通常 < 1s
用户体验	等待全部生成	实时看到内容
带宽占用	一次性传输	持续传输
Token 用量	相同	相同（计费不变）

OpenAI 协议流式

在请求中添加 "stream": true 参数即可启用流式输出。

cURLBash

curl https://www.tokpath.com/v1/chat/completions \
  -H "Authorization: Bearer <你的 API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4-mini",
    "messages": [
      {"role": "user", "content": "写一首关于秋天的短诗"}
    ],
    "stream": true
  }'

PythonPython

from openai import OpenAI
 
client = OpenAI(
    base_url="https://www.tokpath.com/v1",
    api_key="<你的 API_KEY>"
)
 
stream = client.chat.completions.create(
    model="openai/gpt-5.4-mini",
    messages=[{"role": "user", "content": "写一首关于秋天的短诗"}],
    stream=True
)
 
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

TypeScriptTypescript

import OpenAI from 'openai'
 
const client = new OpenAI({
  baseURL: 'https://www.tokpath.com/v1',
  apiKey: '<你的 API_KEY>'
})
 
async function main() {
  const stream = await client.chat.completions.create({
    model: 'openai/gpt-5.4-mini',
    messages: [{ role: 'user', content: '写一首关于秋天的短诗' }],
    stream: true
  })
 
  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '')
  }
}
 
main()

Anthropic 协议流式

Anthropic SDK 提供了更简洁的 .stream() 方法。

Python (Anthropic SDK)Python

import anthropic
 
client = anthropic.Anthropic(
    base_url="https://www.tokpath.com/anthropic",
    api_key="<你的 API_KEY>"
)
 
with client.messages.stream(
    model="anthropic/claude-sonnet-4.6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "写一首关于秋天的短诗"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Gemini 协议流式

Gemini SDK 使用 generate_content_stream 方法。

Python (Google GenAI SDK)Python

from google import genai
 
client = genai.Client(
    http_options={"base_url": "www.tokpath.com"},
    api_key="<你的 API_KEY>"
)
 
response = client.models.generate_content_stream(
    model="google/gemini-3.1-flash",
    contents="写一首关于秋天的短诗"
)
 
for chunk in response:
    print(chunk.text, end="", flush=True)

流式事件格式

OpenAI 兼容协议的 SSE 事件示例：

SSE 事件流Text

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"秋"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"风"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

最佳实践

UI 渲染：将流式内容追加到 DOM，避免重新渲染整个组件
错误处理：监听流中断，在连接断开时优雅降级
Usage 统计：最后一个 chunk 包含完整的 usage 信息（token 用量）
Function Calling：流式模式下函数调用参数会逐段返回，需要自行拼接

← 认证指南

函数调用 →

让模型调用外部工具和 API