0
Launch HN: Exa (YC S21) – The web as a database
Hey HN! We’re Will and Jeff from Exa (https://exa.ai). We recently launched Exa Websets, an embeddings-powered search engine designed to return exactly what you’re asking for. You can get precise results for complex queries like “all startups working on open-source developer tools based in SF, founded 2021-2025”.
Demo here - https://youtu.be/Unt8hJmCxd4We started working on Exa because we were frustrated that while LLM state-of-the-art is advancing every week, Google has gotten worse over time. The Internet used to feel like a magical information portal, but it doesn’t feel that way anymore when you’re constantly being pushed towards SEO-optimized clickbait.Websets is a step in the opposite direction. For every search, we perform dozens of embedding searches over Exa’s vector database of the web to find good search candidates, then we run agentic workflows on each result to verify they match exactly what you asked for.Websets results are good for two reasons. First, we train custom embedding models for our main search algorithm, instead of typical keyword matching search algorithms. Our embeddings models are trained specifically to return exactly the type of entity you ask for. In practice, that means if you search “startups working in nanotech”, keyword-based search engines return listicles about nanotech startups, because these listicles match the keywords in the query. In contrast, our embedding models return actual startup homepages, because these startup homepages match the meaning of the query.The second is that LLMs provide the last-mile intelligence needed to verify every result. Each result and piece of data is backed with supporting references that we used to validate that the result is actually a match for your search criteria. That’s why Websets can take minutes or even hours to run, depending on your query and how many results you ask for. For valuable search queries, we think this is worth it.Also notably, Websets are tables, not lists. You can add “enrichment” columns to find more information about each result, like “# of employees” or “does author have blog?”, and the cells asynchronously load in. This table format hopefully makes the web feel more like a database.A few examples of searches that work with Websets:- “Math blogs created by teachers from outside the US”: https://websets.exa.ai/cma1oz9xf007sis0ipzxgbamn- "research paper about ways to avoid the O(n^2) attention problem in transformers, where one of the first author's first name starts with "A","B", "S", or "T", and it was written between 2018 and 2022”: https://websets.exa.ai/cm7dpml8c001ylnymum4sp11h- “US based healthcare companies, with over 100 employees and a technical founder": https://websets.exa.ai/cm6lc0dlk004ilecmzej76qx2- “all software engineers in the Bay Area, with experience in startups, who know Rust and have published technical content before”: https://youtu.be/knjrlm1aibQYou can try it at https://websets.exa.ai/ and API docs are at https://docs.exa.ai/websets. We’d love to hear your feedback!

Comments URL: https://news.ycombinator.com/item?id=43906841
Points: 128
# Comments: 52

No comments yet.