流式响应
使用 Server-Sent Events (SSE) 实现流式输出,让用户在模型生成的同时看到内容,大幅提升交互体验。
工作原理
流式响应通过 text/event-stream Content-Type 传输数据。每次设置 stream: true 后,API 会逐步返回生成的 token,而非等待全部完成后一次性返回。
| 特性 | 非流式 | 流式 |
|---|---|---|
| 首字延迟 (TTFT) | 等于完整响应时间 | 通常 < 1s |
| 用户体验 | 等待全部生成 | 实时看到内容 |
| 带宽占用 | 一次性传输 | 持续传输 |
| Token 用量 | 相同 | 相同(计费不变) |
OpenAI 协议流式
在请求中添加 "stream": true 参数即可启用流式输出。
cURLBash
curl https://www.ulovegpt.com/v1/chat/completions \
-H "Authorization: Bearer <你的 API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.4-mini",
"messages": [
{"role": "user", "content": "写一首关于秋天的短诗"}
],
"stream": true
}'PythonPython
from openai import OpenAI
client = OpenAI(
base_url="https://www.ulovegpt.com/v1",
api_key="<你的 API_KEY>"
)
stream = client.chat.completions.create(
model="openai/gpt-5.4-mini",
messages=[{"role": "user", "content": "写一首关于秋天的短诗"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)TypeScriptTypescript
import OpenAI from 'openai'
const client = new OpenAI({
baseURL: 'https://www.ulovegpt.com/v1',
apiKey: '<你的 API_KEY>'
})
async function main() {
const stream = await client.chat.completions.create({
model: 'openai/gpt-5.4-mini',
messages: [{ role: 'user', content: '写一首关于秋天的短诗' }],
stream: true
})
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '')
}
}
main()Anthropic 协议流式
Anthropic SDK 提供了更简洁的 .stream() 方法。
Python (Anthropic SDK)Python
import anthropic
client = anthropic.Anthropic(
base_url="https://www.ulovegpt.com/anthropic",
api_key="<你的 API_KEY>"
)
with client.messages.stream(
model="anthropic/claude-sonnet-4.6",
max_tokens=1024,
messages=[{"role": "user", "content": "写一首关于秋天的短诗"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)Gemini 协议流式
Gemini SDK 使用 generate_content_stream 方法。
Python (Google GenAI SDK)Python
from google import genai
client = genai.Client(
http_options={"base_url": "www.ulovegpt.com"},
api_key="<你的 API_KEY>"
)
response = client.models.generate_content_stream(
model="google/gemini-3.1-flash",
contents="写一首关于秋天的短诗"
)
for chunk in response:
print(chunk.text, end="", flush=True)流式事件格式
OpenAI 兼容协议的 SSE 事件示例:
SSE 事件流Text
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"秋"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"风"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]最佳实践
- UI 渲染:将流式内容追加到 DOM,避免重新渲染整个组件
- 错误处理:监听流中断,在连接断开时优雅降级
- Usage 统计:最后一个 chunk 包含完整的
usage信息(token 用量) - Function Calling:流式模式下函数调用参数会逐段返回,需要自行拼接