How This Demo Works

This demo combines Exa search with LLM tool-calling — the AI decides whether your question needs a web search or can be answered directly, then responds in real time.

The LLM decides when to search

When you ask a question, Gemini uses tool-calling to decide whether it needs to search the web. Questions about current events or specific facts trigger an Exa search; general knowledge questions get answered directly. Either way, Exa speculatively searches every 200ms while you speak, so results are ready before you finish talking.

What makes this possible

<1sspeech to text

Real-Time Transcription

Your voice is transcribed in real time using ElevenLabs Scribe. As soon as you stop speaking, your query is sent for processing.

2 pathssearch or direct

Smart Search Decision

Gemini uses tool-calling to decide if your question needs a web search. Current events trigger Exa; general knowledge gets answered directly.

1 requestsearch → voice

Single Streaming Request

LLM decision, search, summary, and text-to-speech all run in one server-sent event stream — zero extra round trips.

The pipeline

Stage	What happens	Powered by
Speech-to-Text	Voice transcribed in real-time as you speak	ElevenLabs Scribe
LLM Decision	Gemini decides via tool-calling whether the query needs a web search or can be answered directly	Gemini (tool-calling)
Content Search	If search is needed: full page text retrieved for LLM context	Exa /search + contents
LLM Summary	Generates a concise answer — grounded in search results with citations, or answered directly from knowledge	Gemini
Text-to-Speech	Answer streamed as audio, played back immediately	ElevenLabs

The two paths

Search Path

web_search tool called

When the LLM calls the web_search tool, Exa fetches full page content and the LLM generates a grounded answer with citations.

Example triggers:

"What's the latest news about OpenAI?"

"Who is the CEO of Anthropic?"

"Best restaurants in SF right now"

Direct Path

no tool called

When the LLM can answer from its training knowledge, it skips the search entirely and responds directly — faster, with no search overhead.

Example triggers:

"What is the capital of France?"

"Explain how photosynthesis works"

"Write a haiku about rain"

How the decision works

Gemini Tool Declarationfunction-calling

// The LLM is given a web_search tool
const tools = [{
  name: "web_search",
  description: "Search the web for current or specific info",
  parameters: {
    query: "natural language search query"
  }
}];

// If Gemini calls the tool → Search Path
// If Gemini responds directly → Direct Path