Voice Agent with Exa Instant

Build an AI Voice Agent that intelligently calls Exa to search the web for real-time information

Why Exa in a voice agent?

Whether you are building an internal voice agent for your employees, a customer-facing voice agent to field questions, or as a personal project, calling Exa yields massive gains:

  1. Model agnostic: Works with OpenAI, Anthropic, or any open-source model
  2. Superior search: Faster, more relevant, and more comprehensive than model search calling
  3. Always current: Real-time information instead of stale training data
  4. Configurable: Exa's model parameters can dynamically be adjusted for any use case

Example pipeline

Here's what a typical query looks like end-to-end. Each stage runs as soon as its input is ready, keeping total latency low:

Pipeline830ms question → answer
Speech
1.2s
Router
180ms
Search
220ms
Answer
350ms
TTS
280ms

The pipeline

1

Speech-to-Text

Voice transcribed in real-time as you speak. A WebSocket connection streams audio to ElevenLabs Scribe, which returns partial and committed transcripts:

import { Scribe, RealtimeEvents } from "@elevenlabs/client"; const connection = Scribe.connect({ token: ELEVENLABS_TOKEN, modelId: "scribe_v1", commitStrategy: "vad", microphone: { echoCancellation: true, noiseSuppression: true, }, }); connection.on(RealtimeEvents.PARTIAL_TRANSCRIPT, (data) => { setPartialTranscript(data.text); }); connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, (data) => { setTranscript(data.text); });

Powered by ElevenLabs Scribe. VAD (voice activity detection) automatically commits the transcript when the user stops speaking.

2

LLM Router

The LLM decides via tool-calling whether the query needs a web search or can be answered directly.

import { GoogleGenerativeAI, SchemaType } from "@google/generative-ai"; const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY); const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash", tools: [{ functionDeclarations: [{ name: "web_search", description: "Search the web for current, real-time, or specific factual information using Exa.", parameters: { type: SchemaType.OBJECT, properties: { query: { type: SchemaType.STRING, description: "A natural language search query.", }, }, required: ["query"], }, }], }], }); const result = await model.generateContent({ contents: [{ role: "user", parts: [{ text: query }] }], }); const functionCalls = result.response.functionCalls();

The system prompt tells the model when to search. Adjust for your use case:

3

Exa Instant Search

If search is needed, page text is retrieved for LLM context using Exa's search endpoint with contents:

import Exa from "exa-js"; const exa = new Exa(process.env.EXA_API_KEY); const result = await exa.searchAndContents(query, { type: "instant", numResults: 5, text: { maxCharacters: 500 }, });
4

LLM Answer

Generates a concise answer — grounded in search results with citations, or answered directly from knowledge.

const sources = results.map((r, i) => `[${i + 1}] ${r.title}\n${r.text}` ).join("\n\n"); const response = await model.generateContentStream({ contents: [{ role: "user", parts: [{ text: `Question: "${query}"\n\nSOURCES:\n${sources}` }], }], }); for await (const chunk of response.stream) { const text = chunk.text(); sendToClient(text); sendToTTS(text); }

The search-grounded prompt tells the model how to synthesize an answer from sources:

5

Text-to-Speech

Answer streamed as audio via a WebSocket to ElevenLabs, played back immediately as chunks arrive.

const ws = new WebSocket( `wss://api.elevenlabs.io/v1/text-to-speech/${voiceId}/stream-input?model_id=eleven_flash_v2_5&output_format=mp3_22050_32`, { headers: { "xi-api-key": ELEVENLABS_API_KEY } } ); ws.on("open", () => { ws.send(JSON.stringify({ text: " ", voice_settings: { stability: 0.5, similarity_boost: 0.75 }, })); }); ws.on("message", (data) => { const { audio, isFinal } = JSON.parse(data); if (audio) sendAudioToClient(audio); if (isFinal) ws.close(); });

That's it! The model decides when to search, executes Exa queries for real-time information, and speaks the answer — all in under a second.

Get started with Exa for free.