Streaming
Streaming allows you to receive partial responses from Workers AI's text generation models in real-time using Server-Sent Events (SSE). By enabling streaming, you can improve user experiences in applications that rely on immediate feedback, such as chatbots or live content generation.
To enable streaming on Workers AI, set the stream parameter to true in your request. This changes the response format and MIME type to text/event-stream, allowing tokens to be sent incrementally.
Here's an example of enabling streaming with Workers AI using REST API:
curl -X POST \"https://api.cloudflare.com/client/v4/accounts/<account>/ai/run/@cf/meta/llama-2-7b-chat-int8" \-H "Authorization: Bearer <token>" \-H "Content-Type:application/json" \-d '{ "prompt": "where is new york?", "stream": true }'Response:
data: {"response":"New"}
data: {"response":" York"}
data: {"response":" is"}
data: {"response":" located"}
data: {"response":" in"}
data: {"response":" the"}
...
data: [DONE]The data: [DONE] signal indicates the end of the stream.
You can also use streaming directly within a Cloudflare Worker:
import { Ai } from "@cloudflare/ai";
export default { async fetch(request, env, ctx) { const ai = new Ai(env.AI, { sessionOptions: { ctx: ctx } }); const stream = await ai.run("@cf/meta/llama-2-7b-chat-int8", { prompt: "where is new york?", stream: true, }); return new Response(stream, { headers: { "content-type": "text/event-stream" }, }); },};If you want to consume the streamed output in a browser, you can use the following JavaScript code with an HTML page or a frontend framework, such as React or Vue, for example:
const source = new EventSource("/worker-endpoint");
source.onmessage = (event) => { if (event.data === "[DONE]") { // Close the connection to prevent automatic reconnection source.close(); return; }
const data = JSON.parse(event.data); document.getElementById("output").innerHTML += data.response;};The above code can be easily integrated into simple HTML pages or complex SPAs using frameworks like React, Angular, or Vue. For example, in React, you can manage the EventSource connection in a useEffect hook and update the state incrementally as data is streamed.