Exa
Back to demo

Voice Agent with Exa Instant

Build an AI Voice Agent that intelligently calls Exa to search the web for real-time information

Why Exa in a voice agent?

Whether you are building an internal voice agent for your employees, a customer-facing voice agent to field questions, or as a personal project, calling Exa yields massive gains:

  1. Model agnostic: Works with OpenAI, Anthropic, or any open-source model
  2. Superior search: Faster, more relevant, and more comprehensive than model search calling
  3. Always current: Real-time information instead of stale training data
  4. Configurable: Exa's model parameters can dynamically be adjusted for any use case

Example pipeline

Here's what a typical query looks like end-to-end. Each stage runs as soon as its input is ready, keeping total latency low:

Pipeline830ms question → answer
Speech
1.2s
Router
180ms
Search
220ms
Answer
350ms
TTS
280ms

The pipeline

1

Speech-to-Text

Voice transcribed in real-time as you speak. A WebSocket connection streams audio to ElevenLabs Scribe, which returns partial and committed transcripts:

javascript
import { Scribe, RealtimeEvents } from "@elevenlabs/client";

const connection = Scribe.connect({
  token: ELEVENLABS_TOKEN,
  modelId: "scribe_v1",
  commitStrategy: "vad",
  microphone: {
    echoCancellation: true,
    noiseSuppression: true,
  },
});

connection.on(RealtimeEvents.PARTIAL_TRANSCRIPT, (data) => {
  setPartialTranscript(data.text);
});

connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, (data) => {
  setTranscript(data.text);
});

Powered by ElevenLabs Scribe. VAD (voice activity detection) automatically commits the transcript when the user stops speaking.

2

LLM Router

The LLM decides via tool-calling whether the query needs a web search or can be answered directly.

javascript
import { GoogleGenerativeAI, SchemaType } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

const model = genAI.getGenerativeModel({
  model: "gemini-2.0-flash",
  tools: [{
    functionDeclarations: [{
      name: "web_search",
      description: "Search the web for current, real-time, or specific factual information using Exa.",
      parameters: {
        type: SchemaType.OBJECT,
        properties: {
          query: {
            type: SchemaType.STRING,
            description: "A natural language search query.",
          },
        },
        required: ["query"],
      },
    }],
  }],
});

const result = await model.generateContent({
  contents: [{ role: "user", parts: [{ text: query }] }],
});

const functionCalls = result.response.functionCalls();

The system prompt tells the model when to search. Adjust for your use case:

3

Exa Instant Search

If search is needed, page text is retrieved for LLM context using Exa's search endpoint with contents:

javascript
import Exa from "exa-js";

const exa = new Exa(process.env.EXA_API_KEY);

const result = await exa.searchAndContents(query, {
  type: "instant",
  numResults: 5,
  text: { maxCharacters: 500 },
});
4

LLM Answer

Generates a concise answer — grounded in search results with citations, or answered directly from knowledge.

javascript
const sources = results.map((r, i) =>
  `[${i + 1}] ${r.title}\n${r.text}`
).join("\n\n");

const response = await model.generateContentStream({
  contents: [{
    role: "user",
    parts: [{ text: `Question: "${query}"\n\nSOURCES:\n${sources}` }],
  }],
});

for await (const chunk of response.stream) {
  const text = chunk.text();
  sendToClient(text);
  sendToTTS(text);
}

The search-grounded prompt tells the model how to synthesize an answer from sources:

5

Text-to-Speech

Answer streamed as audio via a WebSocket to ElevenLabs, played back immediately as chunks arrive.

javascript
const ws = new WebSocket(
  `wss://api.elevenlabs.io/v1/text-to-speech/${voiceId}/stream-input?model_id=eleven_flash_v2_5&output_format=mp3_22050_32`,
  { headers: { "xi-api-key": ELEVENLABS_API_KEY } }
);

ws.on("open", () => {
  ws.send(JSON.stringify({
    text: " ",
    voice_settings: { stability: 0.5, similarity_boost: 0.75 },
  }));
});

ws.on("message", (data) => {
  const { audio, isFinal } = JSON.parse(data);
  if (audio) sendAudioToClient(audio);
  if (isFinal) ws.close();
});

That's it! The model decides when to search, executes Exa queries for real-time information, and speaks the answer — all in under a second.

Get started with Exa for free.