Why Exa in a voice agent?
Whether you are building an internal voice agent for your employees, a customer-facing voice agent to field questions, or as a personal project, calling Exa yields massive gains:
- Model agnostic: Works with OpenAI, Anthropic, or any open-source model
- Superior search: Faster, more relevant, and more comprehensive than model search calling
- Always current: Real-time information instead of stale training data
- Configurable: Exa's model parameters can dynamically be adjusted for any use case
Example pipeline
Here's what a typical query looks like end-to-end. Each stage runs as soon as its input is ready, keeping total latency low:
The pipeline
Speech-to-Text
Voice transcribed in real-time as you speak. A WebSocket connection streams audio to ElevenLabs Scribe, which returns partial and committed transcripts:
import { Scribe, RealtimeEvents } from "@elevenlabs/client";
const connection = Scribe.connect({
token: ELEVENLABS_TOKEN,
modelId: "scribe_v1",
commitStrategy: "vad",
microphone: {
echoCancellation: true,
noiseSuppression: true,
},
});
connection.on(RealtimeEvents.PARTIAL_TRANSCRIPT, (data) => {
setPartialTranscript(data.text);
});
connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, (data) => {
setTranscript(data.text);
});Powered by ElevenLabs Scribe. VAD (voice activity detection) automatically commits the transcript when the user stops speaking.
LLM Router
The LLM decides via tool-calling whether the query needs a web search or can be answered directly.
import { GoogleGenerativeAI, SchemaType } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({
model: "gemini-2.0-flash",
tools: [{
functionDeclarations: [{
name: "web_search",
description: "Search the web for current, real-time, or specific factual information using Exa.",
parameters: {
type: SchemaType.OBJECT,
properties: {
query: {
type: SchemaType.STRING,
description: "A natural language search query.",
},
},
required: ["query"],
},
}],
}],
});
const result = await model.generateContent({
contents: [{ role: "user", parts: [{ text: query }] }],
});
const functionCalls = result.response.functionCalls();The system prompt tells the model when to search. Adjust for your use case:
Exa Instant Search
If search is needed, page text is retrieved for LLM context using Exa's search endpoint with contents:
import Exa from "exa-js";
const exa = new Exa(process.env.EXA_API_KEY);
const result = await exa.searchAndContents(query, {
type: "instant",
numResults: 5,
text: { maxCharacters: 500 },
});LLM Answer
Generates a concise answer — grounded in search results with citations, or answered directly from knowledge.
const sources = results.map((r, i) =>
`[${i + 1}] ${r.title}\n${r.text}`
).join("\n\n");
const response = await model.generateContentStream({
contents: [{
role: "user",
parts: [{ text: `Question: "${query}"\n\nSOURCES:\n${sources}` }],
}],
});
for await (const chunk of response.stream) {
const text = chunk.text();
sendToClient(text);
sendToTTS(text);
}The search-grounded prompt tells the model how to synthesize an answer from sources:
Text-to-Speech
Answer streamed as audio via a WebSocket to ElevenLabs, played back immediately as chunks arrive.
const ws = new WebSocket(
`wss://api.elevenlabs.io/v1/text-to-speech/${voiceId}/stream-input?model_id=eleven_flash_v2_5&output_format=mp3_22050_32`,
{ headers: { "xi-api-key": ELEVENLABS_API_KEY } }
);
ws.on("open", () => {
ws.send(JSON.stringify({
text: " ",
voice_settings: { stability: 0.5, similarity_boost: 0.75 },
}));
});
ws.on("message", (data) => {
const { audio, isFinal } = JSON.parse(data);
if (audio) sendAudioToClient(audio);
if (isFinal) ws.close();
});That's it! The model decides when to search, executes Exa queries for real-time information, and speaks the answer — all in under a second.
Get started with Exa for free.