When you ask a question, Gemini uses tool-calling to decide whether it needs to search the web. Questions about current events or specific facts trigger an Exa search; general knowledge questions get answered directly. Either way, Exa speculatively searches every 200ms while you speak, so results are ready before you finish talking.
Your voice is transcribed in real time using ElevenLabs Scribe. As soon as you stop speaking, your query is sent for processing.
Gemini uses tool-calling to decide if your question needs a web search. Current events trigger Exa; general knowledge gets answered directly.
LLM decision, search, summary, and text-to-speech all run in one server-sent event stream — zero extra round trips.
| Stage | What happens |
|---|---|
Speech-to-Text | Voice transcribed in real-time as you speak |
LLM Decision | Gemini decides via tool-calling whether the query needs a web search or can be answered directly |
Content Search | If search is needed: full page text retrieved for LLM context |
LLM Summary | Generates a concise answer — grounded in search results with citations, or answered directly from knowledge |
Text-to-Speech | Answer streamed as audio, played back immediately |
When the LLM calls the web_search tool, Exa fetches full page content and the LLM generates a grounded answer with citations.
Example triggers:
"What's the latest news about OpenAI?"
"Who is the CEO of Anthropic?"
"Best restaurants in SF right now"
When the LLM can answer from its training knowledge, it skips the search entirely and responds directly — faster, with no search overhead.
Example triggers:
"What is the capital of France?"
"Explain how photosynthesis works"
"Write a haiku about rain"
// The LLM is given a web_search tool
const tools = [{
name: "web_search",
description: "Search the web for current or specific info",
parameters: {
query: "natural language search query"
}
}];
// If Gemini calls the tool → Search Path
// If Gemini responds directly → Direct Path