Exa
Back to demo

How It Works

Pages first, images second. Every tile comes from a real page Exa surfaced.

Exa returns pages with image candidates. Everything after that (deduping, regex cleanup, the optional AI ranker, and moderation) runs inside this demo app, not inside the Exa API.

1

Exa searches the web

Exa /search runs in real time with type:auto and category hints, returning the most relevant pages for your query.
2

Demo cleans up results

Exa attaches up to 20 image candidates per page via richImageLinks. This demo then drops SVGs, icons, logos, and tiny assets, and removes duplicates by URL, alt text, and CDN asset key.
3

Demo filters (optional)

Off shows every image Exa returned. Regex and AI modes are layers the demo adds: regex keeps one best image per page, AI asks gpt-5.4-nano to rank candidates using url, alt, title, and domain only (no pixels).

Exa Search Call

The core API call uses type: "auto" with extras.richImageLinks so every page returns candidate image URLs with alt text attached.

const res = await fetch("https://api.exa.ai/search", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "x-api-key": process.env.EXA_API_KEY,
  },
  body: JSON.stringify({
    query,
    type: "auto",
    numResults: 15,
    excludeDomains: BLOCKED_DOMAINS,
    contents: {
      extras: { richImageLinks: 20, imageLinks: 20 },
    },
  }),
});

Regex Filter

This demo runs a few dozen regex rules over the image candidates Exa returns, to drop obvious junk: logos, favicons, icons, SVGs, avatar-sized thumbnails, and navigation graphics. The rules are strict enough to never match real photos, so the nano classifier doesn't waste effort on them and Off mode stops showing UI pieces.

const JUNK_ALT =
  /logo|icon|avatar|banner|placeholder|brand|sprite|spacer|profile/i;

const JUNK_FILENAME =
  /(^|[^a-z])(logo|favicon|sprite|flag|ambox|placeholder|spacer|badge|icon|wordmark)/i;

const JUNK_URL_HOST =
  /^https?:\/\/(static\.licdn\.com\/aero-v1\/|.*\.bannerbear\.com\/)/i;

// tiny-asset size hints: "-40px", "80x80", "_200px" → drop if < 128
const TINY_SIZE_HINT = /(^|[^\d])(\d{1,3})(?:px|x\2)(?=[^\d]|$)/;

Nano Classifier

In AI mode, the demo makes one batched call to gpt-5.4-nano that ranks every candidate by relevance to the query. The model only sees metadata (url, alt, page title, domain), not the pixels. It returns a JSON array of indices, best first. Anything out of range or duplicated is dropped server-side.

// POST /api/classify: one batched request
{
  model: "gpt-5.4-nano",
  response_format: { type: "json_object" },
  messages: [
    { role: "system", content: "Rank image candidates by relevance..." },
    { role: "user", content: buildPrompt({ query, candidates, topN }) },
  ],
}

// Response
{ "picks": [3, 7, 0, 12, ...] }  // indices into candidates, best-first

Moderation

This demo adds a lightweight safety pass that filters obviously unsafe queries and results before they reach the UI, and the nano classifier is told to drop anything inappropriate it still sees.