Ollama Web Search
345 points
2 days ago
| 30 comments
| ollama.com
| HN
simonw
2 days ago
[-]
I'd love to know what search engine provider they're using under the hood for this. I asked them on Twitter and didn't get a reply (yet) https://twitter.com/simonw/status/1971210260015919488

Crucially, I want to understand the license that applies to the search results. Can I store them, can I re-publish them? Different providers have different rules about this.

reply
mchiang
2 days ago
[-]
We work with search providers and ensure that we have zero data retention policies in place.

The search results are yours to own and use. You are free to do what you want with it. Of course you are bound by local laws of the legal jurisdiction you are in.

reply
simonw
2 days ago
[-]
OK, so it looks like you aren't willing to share which providers you are working with. Can you share the rationale for not sharing that information instead?
reply
mchiang
2 days ago
[-]
We have relationships with many providers and I don't want to be seen as promoting or not promoting a specific provider. Some decent privacy-preserving vendors - Brave, Exa, Parallel Web Systems, DuckDuckGo etc

We will continue to monitor what's good to improve the output quality and results. Sometimes it could be the combination of providers to yield even better results. If I say one combination right now, and realize another combination is better, and make changes, I wouldn't need to broadcast it each time or risk misrepresenting the feature, which is to have amazing search and research capabilities that can augment models for a superior output.

reply
simonw
2 days ago
[-]
The reason I care about this is that different providers have different rules about how I can use the results.

Brave: https://api-dashboard.search.brave.com/terms-of-service "Licensee shall not at any time, and shall not permit others to: store the results of the API or any derivative works from the results of the API"

Exa: https://exa.ai/assets/Exa_Labs_Terms_of_Service.pdf "You may not [...] download, modify, copy, distribute, transmit, display, perform, reproduce, duplicate, publish, license, create derivative works from, or offer for sale any information contained on, or obtained from or through, the Services, except for temporary files that are automatically cached by your web browser for display purposes"

Many of the things I want to do with a search API are blocked by these rules! So I need to know which rules I am subject to.

reply
hobofan
2 days ago
[-]
IANAL, but if Ollama says "you can do with the results whatever you want", then they would be the ones liable for any breach of TOS.

That's admittedly a pretty foolish behaviour on their part and doesn't instill trust in Ollama as a service provider, but you as the end-user should be in the clear.

reply
danielcampos93
2 days ago
[-]
It's pretty wild that Brave's terms of service state as much, considering their search API is entirely derived from storing the results of other search systems. https://support.brave.app/hc/en-us/articles/4409406835469-Wh.... Aka Brave is blocking exactly what it does to Bing and Google.
reply
userbinator
2 days ago
[-]
(IANAL) You can normally safely ignore such things.
reply
simonw
2 days ago
[-]
My nightmare scenario is that I build my own crucial database of information partially derived from a search API... and then later get into legal trouble which forces me to delete that data, which is now intermingled with other information I've collected.
reply
avmich
1 day ago
[-]
So we don't have just data now, but data-obtained-by-particular-process? If you have a database, should it matter how it was gathered?
reply
simonw
1 day ago
[-]
Yes - it's important to me that I understand the source of the data I've collected and if that source results in restrictions on what I can do with that data.

Especially when I'm building databases that I want other organizations to be able to use.

Fun fact: many geocoding APIs have restrictions on what you can do with the data you get back from that geocoder - including how long you can store it and whether you are allowed to re-syndicate to other people. That's one of the reasons I like OpenCage: https://opencagedata.com/guides/how-to-compare-and-test-geoc...

reply
jrvarela56
2 days ago
[-]
I agree with you in spirit, but that’s not an answer you can apply when there’s someone else’s money at stake.
reply
dcreater
2 days ago
[-]
This information is very useful to the open source community. Whats the rationale in not "building in the public"? Is Ollama turning its back on the open source community? Also why should we believe ollama web search is better than my locally run searxng server?
reply
mchiang
2 days ago
[-]
Oh yes! that is why I want to provide the names of the providers we use. I do believe in building in the open. The web search functionality has a very generous free tier (it is behind Ollama's free account to prevent abuse) that allows you to give it a try comparing to running a searxng server locally.

On making the search functionality locally -- we made considerations and gave it a try but had trouble around result quality and websites blocking Ollama for making a crawler. Using a hosted API, we can get results for users much faster. I'd want us to revisit this at some point. I believe in having the power of local.

reply
langitbiru
2 days ago
[-]
How much is the generous free tier? I couldn't find it in the website.
reply
sandyarmstrong
2 days ago
[-]
I believe it's free.
reply
dcreater
1 day ago
[-]
> I'd want us to revisit this at some point. I believe in having the power of local

Thanks! please do!

reply
SpaceNoodled
2 days ago
[-]
DuckDuckGo isn't a provider, it's just Bing wearing a duck hat.
reply
jpencausse
2 hours ago
[-]
Would be curious about legal statement with EU AI Act that kills Bing API (Microsoft switch to Grounding Bing that rephrase the content)

Yes, Ephemeral queries must not retain any data, but there is also other rules, for instance it is forbidden for commercial services (where Ollama have a pricing model ?).

reply
kingnothing
2 days ago
[-]
You can say you're training an AI model and do whatever you want with it.
reply
theshrike79
2 days ago
[-]
The "Zuckerberg defence".

It's OK to pirate a massive amount of books if you're not reading or sharing, but rather just training an AI.

reply
alex1138
2 days ago
[-]
I don't know where I stand on the issue but it's interesting Facebook has been known to block PB links while Google seemed to refuse requests to do the same
reply
falcor84
2 days ago
[-]
What are peanut butter links?
reply
rezonant
2 days ago
[-]
I'm guessing Pirate Bay
reply
falcor84
2 days ago
[-]
Oh, I don't recall seeing anyone sharing Pirate Bay links; why not share just the magnet uri?

Or is it about sharing the domains of mirrors?

reply
alex1138
2 days ago
[-]
Yes

And by the way I prefer Google's approach in this particular case

Zuckerberg strikes me as far too adaptive, too fair weather

reply
userbinator
2 days ago
[-]
You should ask if search results are even copyrightable, if they are just a list of links.
reply
jillesvangurp
2 days ago
[-]
Instead of turning this into an academic debate about copyright, a more practical thing to do is to examine the terms and conditions of whatever API you are using. Because if you are going to end up in a conflict with a search API provider, those probably spell out pretty clearly what the provider wants to allow or not and what you are agreeing to by using their API.

Caching is a problem with many geocoding APIs (which I happen to be familiar with) and a good reason to prefer e.g. Opencage over the Google or Here geocoders because unlike most geocoder terms and conditions, Opencage actually encourages you to cache and store things; because it's all open data. The Here geocoder requires you to tell them how much data you store and will try to charge you extra for the privilege of storing and keeping data around. Because it's their data and the conditions under which they license it to you are limiting what you can and cannot do. Search APIs are very similar. Technically geocoding is a form of search (given a query, return a list of stuff).

reply
apimade
2 days ago
[-]
It is strange to launch this type of functionality with not even a privacy policy in place.

It makes me wonder if they’ve partnered with another of their VC’s peers who’s recently had a cash injection, and they’re being used as a design partner/customer story.

Exa would be my bet. YC backed them early, and they’ve also just closed a $85M Series B. Bing would be too expensive to run freely without Microsoft partnership.

Get on that privacy notice soon, Ollama. You’re HQ’d in CA, you’re definitely subject to CCPA. (You don’t need revenue to be subject to this, just being a data controller for 50,000 Californian residents is enough.)

https://oag.ca.gov/privacy/ccpa

I can imagine the reaction if it turns out the zero-retention provider backing them ended up being Alibaba.

reply
andrewmutz
2 days ago
[-]
Ollama is a business? They raised money? I thought it was just a useful open source product.

I wonder how they plan to monetize their users. Doesn't sound promising.

reply
blihp
2 days ago
[-]
There are very few recently launched pure open source projects these days (most are at least running donation-ware models or funded by corporate backers), none in the AI space that I'm aware of.
reply
brabel
2 days ago
[-]
Well the real open source project is llama.cpp which Ollama basically wrapped and made a nice interface on top of. Now they do more things as they want to be a real business, but llama.cpp is now doing most things people wanted from something like ollama, like serving a REST API compatible with OpenAPI, downloading and managing local LLMs… while remaining an actual open source project without VC money as far as I know.
reply
cantor_S_drug
2 days ago
[-]
https://codingwithintelligence.com/p/meta-gets-behind-open-s...

This is a new umbrella project for llama.cpp and whisper.cpp. The author, Georgi Gerganov, also announced he’s forming a company for the project as he raised money from Nat Friedman (CEO GitHub) and Daniel Gross (ex-YC AI, ex-Apple ML).

Not sure if this is just a good faith support.

reply
coolspot
2 days ago
[-]
They are former Docker employees running Docker playbook.
reply
Cheer2171
2 days ago
[-]
[flagged]
reply
cristoperb
2 days ago
[-]
Ollama is a ycombinator startup, so I guess they have to find some roi at some point.[1]

I personally found Ollama to be an easy way to try out local LLMs and appreciate them for that (and I still use it to download small models on my laptop and phone (via termux)), but I've long switched to llama.cpp + llama-swap[2] on my dev desktop. I download whatever ggufs I want from hugging face and just do `git pull` and `cmake --build build --config Release` from my llama.cpp directory whenever I want to update.

1: https://www.ycombinator.com/companies/ollama 2: https://github.com/mostlygeek/llama-swap

reply
Havoc
2 days ago
[-]
The launched a hosted platform a while back
reply
lynnharry
2 days ago
[-]
Until I saw your reply I had thought this post is about OpenAI lol.
reply
MisterBiggs
2 days ago
[-]
I was hoping for more details about their implementation, I saw ollama as the open source // platform agnostic tool but I worry their recent posturing is going against that
reply
jmorgan
2 days ago
[-]
We did consider building functionality into Ollama that would go fetch search results and website contents using a headless browser or similar. However we had a lot of worries about result quality and also IP blocking from Ollama creating crawler-like behavior. Having a hosted API felt like a fast path to get results into users' context window, but we are still exploring the local option. Ideally you'd be able to stay fully local if you want to (even when using capabilities like search)
reply
wirybeige
2 days ago
[-]
Their GUI is closed-source. If someone wants an easy to use & easy to setup app, may as well use LMStudio, which doesn't try to pretend to be OSS. Or use ramalama which is basically just containerizing LLMs and the relevant bits, pretty damn similar to ollama. Or just go back to "basics" and use llama.cpp or vllm.
reply
dcreater
2 days ago
[-]
Their posture has continually been getting worse and worse. It's deceptive and I've expunged it from all my systems
reply
Tepix
2 days ago
[-]
Looks like Ollama is focusing more and more on non-local offerings. Also their performance is worse than say vLLM.

What's a good Ollama alternative (for keeping 1-5x RTX 3090 busy) if you want to run things like open-webui (via an OpenAI compatible API) where your users can choose between a few LLMs?

reply
Ey7NFZ3P0nzAe
2 days ago
[-]
i heard about Llamaswap and vllm
reply
sorenjan
2 days ago
[-]
I had no idea they had their own cloud offering, I thought the whole point of Ollama was local models? Why would I pay $20/month to use small inferior models instead of using one of the usual AI companies like OpenAI or even Mistral? I'm not going to make an account to use models on my own computer.
reply
mchiang
2 days ago
[-]
Fair question. Some of the supported models are large and wouldn't fit on most local devices. This is just the beginning, and Ollama does not need to exclude cloud hosted frontier models either with the relationship we've built with the model providers. We just have to be mindful and understand that Ollama stands with developers, and solve the needs.

https://ollama.com/cloud

reply
sorenjan
2 days ago
[-]
> Some of the supported models are large and wouldn't fit on most local devices.

Why would I use those models on your cloud instead of using Google's or Anthropic's models? I'm glad there are open models available and that they get better and better, but if I'm paying money to use a cloud API I might as well use the best commercial models, I think they will remain much better than the open alternatives for quite some time.

reply
mchiang
2 days ago
[-]
When we started Ollama, we were told how open-source (open-weight wasn't a term back then) will always be inferior to the close-sourced models. This was 2 years ago (Ollama's birthday is July 18th, 2023).

Fast forward to now, open models are quickly catching up, and at a significantly lower price point for most and can be customized for specific tasks instead of being general purpose. For general purpose models, absolutely the closed models are currently dominating.

reply
typpilol
2 days ago
[-]
Ya a lot of ppl don't realize you could spend 2k on a 5090 to run some of the large models.

Or spend 20 a month for models even a 5090 couldn't run. And not have to spend your own electricity, hardware, maintenance, updates etc.

reply
oytis
2 days ago
[-]
20 a month for a commercial model is price dumping financed by investors. For ollama it's hopefully a sustainable price.
reply
theshrike79
2 days ago
[-]
The 20 a month models definitely aren't sustainable.

This is why everyone needs to get every flavour and speedrun building all the tools they need when the infinite money faucets are turned off.

At some point companies will start raising prices or moving towards per-token pricing (Which is sustainable, but expensive).

reply
gunalx
2 days ago
[-]
Depends. API pricing from oss model inference providers basically has to be sustainable, because of competition in the space.

And with that in mind, i definetly dont use more than a couple of bucks a month in API refils. (not that i really am a power user or anything)

So if you consider the 20 bucks to be balanced between poer and non power users, and with the existing rate limits, its probably not that far off being profitable, at least on the pure inference side.

reply
ineedasername
2 days ago
[-]
A person can use Google’s Gemma models on ollama’s cloud and possibly pay less. And have more quality control that way (and other types of control I guess) since there is no don’t need to wonder if a recent model update or load balance throttling impacted results. Your use case doesn’t generalize.
reply
disiplus
2 days ago
[-]
hi, to me this sounds like you are going into the direction of openrouter.
reply
kordlessagain
2 days ago
[-]
You make an account to use their hosted models AND to have them available via the Ollama API LOCALLY. I'm spending $100 on Claude and $200 on GPT5, so $20 bucks is NOTHING and totally worth having access to:

Qwen3 235b

Deepseek 3.1 671b (thinking and non thinking)

Llama 3.1 405b

GPT OSS 120b

Those are hardly "small inferior models".

What is really cool is that you can set Codex up to use Ollama's API and then have it run tools on different models.

reply
brabel
2 days ago
[-]
How does it compare to AzureAI which has all the best models and you don’t need to sign up with anyone other than Azure itself?
reply
mrheosuper
2 days ago
[-]
If you are on $100 tier Claude, what makes you think the $20 Tier Ollama is enough for you ?
reply
theshrike79
2 days ago
[-]
If your workflow is general enough, you can (and should) switch between models. They all have different styles and blind spots.

Like I had Codex + gpt-5-codex (20€ tier) build me a network connectivity monitor for my very specific use case.

It worked, but had some really weird choices. Gave it to Claude Code (20€ tier again) and it immediately found a few issues and simplifications.

reply
kordlessagain
2 days ago
[-]
Right. And then there's using an MCP tool that instantiates another agent except uses a different model.

Here's a good example. For summarization of a page of content. Content is maybe pulled down by an agentic crawler, so using a local model to summarize is great. It's fast, doesn't cost anything (or much) and I can run it without guardrails as it doesn't represent a cost risk if it ran out of control.

reply
kordlessagain
2 days ago
[-]
Clearly articulated and repeating what makes the $20 Ollama tier valuable to me is:

1. Access to specific large open models (Qwen3 235b, Deepseek 3.1 671b, Llama 3.1 405b, GPT OSS 120b)

2. Having them available via the Ollama API LOCALLY

3. The ability to set up Codex to use Ollama's API for running tools on different models

I mean, really, nothing else is even close at this point and I would rather eat a bug than use Microsoft's cloud.

reply
n4bz0r
2 days ago
[-]
Has anyone tried the hosted models? How do they compare to GPT-5?

I was thinking about trying ChatGPT Pro, but I seem to have completely missed that they bumped the price from $100 to $200. It was $100 just a while ago, right? Before GPT-5, I assume.

reply
pama
2 days ago
[-]
No it was never $100 for ChatGPT Pro.
reply
dcreater
2 days ago
[-]
Yeah it's been a steady pivot to profitable features. Wonderful to see them build a reputation through FOSS and codebase from free labor to then cash in.
reply
kergonath
2 days ago
[-]
As long as the software that runs locally gets maintained (and ideally improved, though if it is not I’ll simply move to something else), I find it difficult to be angry. I am more annoyed by software companies that offer a nerfed "community edition" whose only purpose is to coerce people into buying the commercial version.
reply
dcreater
2 days ago
[-]
> software companies that offer a nerfed "community edition" whose only purpose is to coerce people into buying the commercial version.

This is the play. Its only a matter of time till they do it. Investors will want their returns

reply
Imustaskforhelp
2 days ago
[-]
pardon me but is Ollama a company though? I didn't knew that actually.

And are they VC funded? Are they funded by Y-combinator or anything else..

I just thought it was a project by someone to write something similar to docker but for LLM's and that was its pitch for a really really long time I think

reply
dcreater
2 days ago
[-]
Yup thats exactly what I thought as well. I also found out late and to much surprise that its a VC backed startup: https://www.ycombinator.com/companies/ollama
reply
Imustaskforhelp
2 days ago
[-]
Oh well. Enshittenification is close then I suppose :<

Gotta pay those VC juicy returns somehow.

reply
all2
2 days ago
[-]
What sort of monetization model would you like to see? What model would you deem acceptable?
reply
dcreater
2 days ago
[-]
Ollama , the local inference platform, stays completely local. Maintained by a non-profit org with dev time contributed to by a for-profit company. That company can be VC backed and can make their cloud inference platform. And can use ollama as its backed, as a platform to market etc. But keep it as a separate product (not named ollama).

This is almost exactly how duckdb/motherduck functions and I think theyre doing an excellent job.

EDIT: grammar and readability

reply
depingus
2 days ago
[-]
You might want to check out RamaLama. It's a container based replacement for Ollama by the same folks that brought us Podman.

I tried it a while back, I was very surprised to find that simply running `uvx ramalama run deepseek-r1:1.5b` just worked. I'm on Fedora Silverblue with nothing layered on the ostree. Before RamaLama, getting llama.cpp working with my GPU was a major PITA.

https://github.com/containers/ramalama

reply
troyvit
2 days ago
[-]
If I were them I'd go whole-hog on local models and:

* Work with somebody like System76 or Framework to create great hardware systems come with their ecosystem preinstalled.

* Build out a PaaS, perhaps in partnership with an existing provider, that makes it easy for anybody to do what Ollama search does. I'm more than half certain I could convince our cash strapped organization to ditch elastic search for that.

* Partner with Home Assistant, get into home automation and wipe the floor with Echo and its ilk (yeah basically resurrect Mycroft but add whole-house automation to it).

Each of those are half-baked, but it also took me 7 minutes to come up with them, and they seem more in line with what Ollama tries to represent than a pure cloud play using low-power models.

reply
Cheer2171
2 days ago
[-]
Have ollama server support auth / API keys (closed as out of scope) and monetize the way everyone else does around SSO.
reply
Cheer2171
2 days ago
[-]
What reputation? People who actually know how to develop software or work with LLMs know ollama is a child's tricycle and to run the hell away from what is just a buggy shell around other people's inference engines.

Ollama is beloved by people who know how to write 5 lines of python and bash to do API calls, but can't possibly improve the actual app.

reply
dcreater
2 days ago
[-]
Thats what I thought so as well - that it was for people like me who arent professional SWEs and thus im sad to see them go this way. But what ive found is people are using it for "on-prem" style deployment, have no idea if this is common but I wouldnt be surprised given the reality of AI startups + the abundance of ollama in training dataset leading to relatively greater vibe coding success rate
reply
Cheer2171
2 days ago
[-]
If people are using ollama for on prem deployment, for anything more than single user hobby use or backend for a UI proof of concept, then run as far away as you can from those people. Major red flag, immediately disqualifying.
reply
hadlock
2 days ago
[-]
What's your preferred method to do on prem deployment today?
reply
ricardobeat
2 days ago
[-]
For models you can't run locally like gpt-oss-120b, deepseek or qwen3-coder 480b. And a way for them to monetize the success of Ollama.
reply
zmmmmm
2 days ago
[-]
a lot of "local" models are still very large to download and slow to run on regular hardware. I think it's great to have a way to evaluate them cheaply in the cloud before deciding to pull down the model to run locally.

At some level it's also more of a principle that I could run something locally that matters rather than actually doing it. I don't want to become dependent on technology that someone could take away from me.

reply
coffeecoders
2 days ago
[-]
On a slightly related note-

I've been thinking about building a home-local "mini-Google" that indexes maybe 1,000 websites. In practice, I rarely need more than a handful of sites for my searches, so it seems like overkill to rely on full-scale search engines for my use case.

My rough idea for architecture:

- Crawler: A lightweight scraper that visits each site periodically.

- Indexer: Convert pages into text and create an inverted index for fast keyword search. Could use something like Whoosh.

- Storage: Store raw HTML and text locally, maybe compress older snapshots.

- Search Layer: Simple query parser to score results by relevance, maybe using TF-IDF or embeddings.

I would do periodic updates and build a small web UI to browse.

Anyone tried it or are there similar projects?

reply
andai
2 days ago
[-]
Have you ever looked at Common Crawl dumps? I did a bit of data mining and holy cow is 99.99% of the web crap. Spam, porn, ads, flame wars, random blogs by angsty teens... I understand it has historical and cultural value — and maybe literary value, in a Douglas Coupland kind of way — but for my purposes, there was very little here that I considered of interest.

Which was very encouraging to me, because it implies that indexing the Actually Important Web Pages might even be possible for a single person on their laptop.

Wikipedia, for comparison, is only ~20GB compressed. (And even most of that is not relevant to my interests, e.g. the Wikipedia articles related to stuff I'd ever ask about are probably ~200MB tops.)

reply
harias
2 days ago
[-]
YaCy (https://yacy.net) can do all this I think. Cloudflare might block you IP pretty soon though if you try to crawl.
reply
fabiensanglard
2 days ago
[-]
Have you ever tried https://marginalia-search.com ? I love it.
reply
UltimateEdge
2 days ago
[-]
Drew DeVault tried building something similar to this under the name SearchHut, but the project was abandoned [1]. I tried hacking on it a while ago (since it's built on Postgres and a bit of Go), but I ran out of steam trying to understand the Postgres RUM extension.

[1]: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

reply
msephton
2 days ago
[-]
Perhaps not quite solving your problem, but I have a handful of domain-specific Google CSE (Custom Search Engine) that limit the results to predefined websites. I summon them from Alfred with short keywords when I'm doing interest-specific searches. https://blog.gingerbeardman.com/2021/04/20/interest-specific...
reply
mrkeen
2 days ago
[-]
Yep. Built a crawler, an indexer/queryprocessor, and an engine responsible for merging/compacting indexes.

Crawling was tricky. Something like stackoverflow will stop returning pages when it detects that you're crawling, much sooner than you'd expect.

reply
_flux
2 days ago
[-]
I think a lot of time an exhaustive searchable index just of what I've browsed would be useful, though I suppose refresh feature would be useful.
reply
matsz
2 days ago
[-]
You could take a look at the leaked Yandex source code from a few years ago. I'd believe their architecture should be decent enough.
reply
efilife
2 days ago
[-]
Where?
reply
matsz
2 days ago
[-]
I'm not sure if linking to those files is allowed by HN, and it could potentially expose me to lawsuits.

However, searching for "Yandex git sources magnet link" might help.

reply
bryanhogan
2 days ago
[-]
Reminds me of building a Obsidian vault with all the content in markdown form. There's also plugins to show vault results when doing a Google search, making notes within your vault show up before external websites.
reply
computerex
2 days ago
[-]
Kind of. I made ainews247.org that crawls certain sites and filters content so it's AI specific and valuable. I think it's a really good idea.
reply
toephu2
2 days ago
[-]
With LLMs why do you even need a mini-Google?
reply
andai
2 days ago
[-]
For my LLM to use! I want sources, excerpts, cross-referencing...
reply
mrkeen
2 days ago
[-]
Any tips on local/enterprise search?

I like using ollama locally and I also index and query locally.

I would love to know how to hook ollama up to a traditional full-text-search system rather than learning how to 'fine tune' or convert my documents into embeddings or whatnot.

reply
ineedasername
2 days ago
[-]
You can use solr, very good full text search and it has an mcp integration. That’s sufficient on its own and straightforward to setup:

https://github.com/mjochum64/mcp-solr-search

A slightly heavier lift, but only slightly, would be to also use solr to also store a vectorized version of your docs and simultaneously do vector similarity search, solr has built in knn support fort it. Pretty good combo to get good quality with both semantic and full-text search.

Though I’m not sure if it would be relatively similar work to do solr w/ chromadb, for the vector portion, and marry the result stewards via llm pixie dust (“you are the helpful officiator of a semantic full-text matrimonial ceremony” etc). Also not sure the relative strengths of chromadb vs solr on that- maybe scales better for larger vector stores?

reply
all2
2 days ago
[-]
docling might be a good way to go here. Or consider one of the existing full text search engines like Typesense.
reply
andai
2 days ago
[-]
I added search to my LLMs years ago with the python DuckDuckGo package.

However I found that Google gives better results, so I switched to that. (I forget exactly but I had to set up something in a Google dev console for that.)

I think the DDG one is unofficial, and the Google one has limits (so it probably wouldn't work well for deep research type stuff).

I mostly just pipe it into LLM apis. I found that "shove the first few Google results into GPT, followed by my question" gave me very good results most of the time.

It of course also works with Ollama, but I don't have a very good GPU, so it gets really slow for me on long contexts.

reply
ivape
2 days ago
[-]
How do you meaningfully use it without using scraping APIs? Aren't the official apis severely limited?
reply
selcuka
2 days ago
[-]
Google Programmable Search Engine [1] is pretty good if your needs are within their usage limits.

[1] https://programmablesearchengine.google.com/about/

reply
andai
2 days ago
[-]
That's the one I use, yeah! You set it up here:

https://programmablesearchengine.google.com/controlpanel/cre...

And then it's just a GET:

    import os
    import json
    from req import get

    url = "https://customsearch.googleapis.com/customsearch/v1"

    def search(query):
        data = {
            "q": query,
            "cx": os.getenv('GOOGLE_SEARCH_API_KEY'),
            "key": os.getenv('GOOGLE_SEARCH_API_ID')
        }
        results_json = get(url, data)
        results = json.loads(results_json)
        results = results["items"]
        return results
reply
drnick1
2 days ago
[-]
What "Ollama account?" I am confused, I thought the point of Ollama was to self-host models.
reply
mchiang
2 days ago
[-]
To provide additional features or using Ollama's cloud hosted models, you can signup for an Ollama account.

For starter, this is completely optional. It can be completely local too for you to publish your own models to ollama.com that you can share with others.

reply
thomastraum
2 days ago
[-]
I am just working on a tool using websearch and iterating over different providers.

openAI, xAI, gemini all suffer from not being allowed on respective competitor sites.

this searched works for me with some quick tests well on YT videos, which OpenAI web search can't access. It kind of failed on X but sometimes returned ok relevant results. Definitely hit and miss but on average good

reply
riskable
2 days ago
[-]
WTF is going to happen to Google's ad revenue if every PC has an AI that can perform searches on the user's behalf?
reply
onesociety2022
2 days ago
[-]
How is that any different than someone installing an ad blocker in their browser? Arguably ad blocker is much simpler technology than running a local LLM and has been available for years now. And yet Google’s ad revenue seems to have remained unaffected.
reply
hadlock
2 days ago
[-]
It's been demonstrated that as ChatGPT usage goes up, traffic to sites dependent on SEO search ranking has gone down, roughly proportionally, every month over the last ~18 months. ChatGPT is free and fast and requires no technical know-how. Installing an ad blocker requires knowing what one is, and the time and energy to install a browser plugin. Pretty much everyone I know thinks free online ChatGPT type products is an absolute existential thread to Google's ad dominance. Even mediocre LLMs provide a vastly better experience than ad choked pages linking to ad choked SEO optimized websites serving (largely) google's own ads.
reply
tartoran
2 days ago
[-]
They'll have to squeeze it all from Youtube!
reply
system2
2 days ago
[-]
There are millions of websites, and a local LLM cannot scrape all of them to make sense of them. Think about it. OpenAI can do it because they spend millions to train its systems.

Many sites have hidden sitemaps that cannot be found unless submitted to google directly. (Not even listed in robots txt most of the time). There is no way a local LLM can keep up with up to date internet.

reply
riskable
2 days ago
[-]
No, the AI will just use Google, DDG, Bing, etc on behalf of the user (behind the scenes). The ads will be shown to the AI which will ignore them.
reply
cantor_S_drug
2 days ago
[-]
I think because Google knows traditional search is gonna die, they will be aggressively pushing ads on traditional search to extract as much money as possible till they figure out newer ways of making money.
reply
thimabi
2 days ago
[-]
They can always pivot to their Search-via-API business :)

It takes lots of servers to build a search engine index, and there’s nothing to indicate that this will change in the near future.

reply
Havoc
2 days ago
[-]
That’s easy - they’re just going to ram the ads down your throat inline via Gemini
reply
andrewmcwatters
2 days ago
[-]
google.com/sorry
reply
frabonacci
2 days ago
[-]
This is a nice first step - web search makes sense, and it’s easy to imagine other tools being added next: filesystem, browser, maybe even full desktop control. Could turn Ollama into more than just a model runner. Curious if they’ll open up a broader tool API for third-party stuff too
reply
yggdrasil_ai
2 days ago
[-]
I wish they would instead focus on local tool use. I could just use my own web search via brave api.
reply
parthsareen
2 days ago
[-]
Hey! Author of the blogpost and I also work on Ollama's tool calling. There has been a big push on tool calling over the last year to improve the parsing. What's the issues you're running into with local tool use? What models are you using?
reply
vrzucchini
2 days ago
[-]
Hey, unrelated to the question you're answering but where do I see the rate limits for free and paid tiers?
reply
yggdrasil_ai
1 day ago
[-]
I went back and had another look at my implementation, and got it to work. Sorry I was mistaken!
reply
throwaway12345t
2 days ago
[-]
Do they pull their own index like brave or are they using Bing/Google in the background?
reply
tripplyons
2 days ago
[-]
Based on the fact that there are very few up-to-date English-language search indexes (Google, Bing, and Brave if you count it), it must be incredibly costly. I doubt they are maintaining their own.
reply
throwaway12345t
2 days ago
[-]
We need more indexes
reply
tripplyons
2 days ago
[-]
More competition in the space would be great for me as a consumer, but the problem is that the high fixed costs make starting an index difficult.
reply
andai
2 days ago
[-]
I've been wondering can't this be done p2p? Didn't we solve most of the technical problems in the late 90s / early 2000s? And then just abandoned that entire way of thinking for some reason?

If many thousands of people care about having a free / private / distributed search engine, wouldn't it make sense for them to donate 1% of their CPU/storage/network to an indexer / db that they they then all benefit from?

reply
hombre_fatal
2 days ago
[-]
Well, flesh it out more and it doesn't sound solved at all.

How do you make it trustless. How do you fetch/crawl the index when it's scattered across arbitrary devices. How do you index the decentralized index. What is actually stored on nodes. When you want to do something useful with the crawled info, what does that look like.

reply
andai
1 day ago
[-]
I think you could do it hierarchically, and with redundancy.

You'd figure out a replication strategy based on observed reliability (Lindy effect + uptime %).

It would be less "5 million flaky randoms" and more "5,000 very reliable volunteers".

Though for the crawling layer you can and should absolutely utilize 5 million flaky randoms. That's actually the holy grail of crawling. One request per random consumer device.

I think the actual issue wouldn't be the technical issue but the selection. How do you decide what's worth keeping.

You could just do it on a volunteer basis. One volunteer really likes Lizard Facts and volunteers to host that. Or you could dynamically generate the "desired semantic subspace" based on the search traffic...

reply
andai
1 day ago
[-]
Let me illustrate this with a more poetic example.

In 2015, I was working at a startup incubator hosted inside of an art academy.

I took a nap on the couch. I was the only person in the building, so my full attention was devoted to the strange sounds produced by the computers.

There were dozens of computers there. They were all on. They were all wasting hundreds of watts. They were all doing essentially nothing. Nothing useful.

I could feel the power there. I could feel, suddenly, all the computers in a thousand mile radius. All sitting there, all wasting time and energy.

reply
ineedasername
2 days ago
[-]
Do we know what OpenAI uses? Have they built their own, or piggy back on moneybags $MS and Bing?
reply
tripplyons
2 days ago
[-]
reply
pzo
2 days ago
[-]
perplexity added API today, got the following email:

> Dear API user, We’re excited to launch the Perplexity Search API — giving developers direct access to the same real-time, high-quality web index that powers Perplexity’s answers.

reply
tripplyons
4 hours ago
[-]
This doesn't mean they run their own index. They are likely just reselling access to whatever index they are using for their product.
reply
JumpCrisscross
2 days ago
[-]
> We need more indexes

Not particularly. Indexes are sort of like railroads. They're costly to build and maintain. They have significant external costs. (For railroads, in land use. For indexes, in crawler pressure on hosting costs.)

If you build an index, you should be entitled to a return on your investment. But you should also be required to share that investment with others (at a cost to them, of course).

reply
chrisshroba
2 days ago
[-]
Are the rate limits documented somewhere?
reply
Havoc
2 days ago
[-]
Was looking to and could see them
reply
lgats
1 day ago
[-]
it seems not, not even for the pro plan. just 'generous'
reply
jerrygoyal
2 days ago
[-]
I'm looking to use web search in production, but they haven't mentioned the price. Only thing that's mentioned is $20/month, but how much quota does it include?
reply
mchiang
2 days ago
[-]
Sorry about this. We are working really hard on providing a usage based pricing.

During the preview period we want to start offering a $20 / month plan tailored for individuals - and we are monitoring the usage and making changes as people hit rate limits so we can satisfy most use cases, and be generous.

reply
enoch2090
2 days ago
[-]
That's the essence of these services, they never explicitly mention the quota, or secretly lowers it at some point.
reply
anonyonoor
2 days ago
[-]
I know it might be a security nightmare, but I still want to see an implementation of client-side web search.

Like a full search engine that can visit pages on your behalf. Is anyone building this?

reply
apimade
2 days ago
[-]
AgenticSeek, or you can get pretty far with local qwen and Playwright-Stealth or SeleniumBase integrated directly into your Chrome (running with Chrome DevTools Protocol enabled).
reply
not_really
2 days ago
[-]
sounds like a good way to get your IP flagged by cloudflare
reply
dumbmrblah
2 days ago
[-]
What is the data retention policy for the free account versus the cloud account?
reply
kordlessagain
2 days ago
[-]
I have a MCP tool that uses SERP API and it works quite well.
reply
lxgr
2 days ago
[-]
Does this work with (tool use capable) models hosted locally?
reply
parthsareen
2 days ago
[-]
Hi - author of the post. Yes it does! The "build a search agent" example can be used with a local model. I'd recommend trying qwen3 or gpt-oss
reply
lxgr
2 days ago
[-]
Very cool, thank you!

Looking forward to try it with a few shell scripts (via the llm-ollama extension for the amazing Python ‘llm’) or Raycast (the lack of web search support for Ollama has been one of my biggest reasons for preferring cloud-hosted models).

reply
parthsareen
2 days ago
[-]
Since we shipped web search with gpt-oss in the Ollama app I've personally been using that a lot more especially for research heavy tasks that I can shoot off. Plus with a 5090 or the new macs it's super fast.
reply
yggdrasil_ai
2 days ago
[-]
I don't think ollama officially supports any proper tool use via api.
reply
lxgr
2 days ago
[-]
Huh, I was pretty sure I used it before, but maybe I’m confusing it with some other python-llm backend.

Is https://ollama.com/blog/tool-support not it?

reply
all2
2 days ago
[-]
It depends on the model. Deepseek-R1 says it supports tool use, but the system prompt template does not have the tool-include callouts. YMMV
reply
kgeist
2 days ago
[-]
I use Llama.cpp with Tavily search (they give free credits each month). LibreChat has built-in support for it. No Ollama needed.
reply
tempodox
2 days ago
[-]
Is the web search also integrated into the locally running native ollama binaries, and if so, how can I use it?
reply
chungus42
2 days ago
[-]
My biggest gripe with small models has been the inability to keep it informed with new data. Seems like this at least eases the process.
reply
mchiang
2 days ago
[-]
I was pleasantly surprised on the model improvements when testing this feature.

For smaller models, it can augment it with the latest data by fetching it from the web, solving the problem of smaller models lacking specific knowledge.

For larger models, it can start functioning as deep research.

reply
alberth
2 days ago
[-]
Dumb question: is this affiliated with Meta?

Or is this just someone trying to monetize Meta open source models?

reply
mchiang
2 days ago
[-]
No, Ollama is it's own project and separate. You can check it out via GitHub

https://github.com/ollama/ollama

reply
nextworddev
2 days ago
[-]
Can someone tell me how much this costs and how this compares to Tavily etc
reply
typpilol
2 days ago
[-]
Taviy gives you 1k free requests a month.

Even with heavy ai usage I'm only at like 400/1000 for the month

reply
orliesaurus
2 days ago
[-]
Exa, Tavily or Firecrawl. Which one is it?
reply
Cheer2171
2 days ago
[-]
Your regular reminder that you don't need ollama to get a quick chat engine on the command line, you can just do this with pretty much any major model on huggingface:

pip install transformers

transformers chat Qwen/Qwen2.5-0.5B-Instruct

reply
bigyabai
2 days ago
[-]
> Create an API key from your Ollama account.

Dead on arrival. Thanks for playing, Ollama, but you've already done the leg work in obsoleting yourself.

reply
disiplus
2 days ago
[-]
they had at some point start earning money.
reply
bigyabai
2 days ago
[-]
At some point you have to earn user trust. If Ollama won't be the Open Source Ollama API provider, there are several endpoint-compatible alternatives happy to replace them.

From where I'm standing, there's not enough money in B2C GPU hosting to make this sort of thing worthwhile. Features like paid search APIs this really hammer home how difficult it is to provide value around that proposition.

reply
timothymwiti
2 days ago
[-]
Does anyone know if the python and JavaScript examples on the blog work without an Ollama Account?
reply
mmaunder
2 days ago
[-]
So, use ollama to avoid cloud models and services, but ollama sells cloud models and services. The dissonance makes my teeth hurt.
reply
tripplyons
2 days ago
[-]
Just set up SearXNG locally if you want a free/local web search MCP: https://gist.github.com/tripplyons/a2f9d8bd553802f9296a7ec3b...
reply
disiplus
2 days ago
[-]
That's what i have together with open webui and gpt-oss-120b. it works reasonably well. But sometimes the searches are slow.
reply
tripplyons
2 days ago
[-]
You can try removing search engines that fail or reducing their timeout setting to something faster than the default of a few seconds.
reply
disiplus
2 days ago
[-]
SearXNG is fast, its mostly the code that triggers the searches. Because, my daily is chatgpt, i still did not try to tweak it.
reply
tripplyons
2 days ago
[-]
I haven't needed to tweak mine for similar reasons, but I'm surprised to hear that the "code that triggers the searches" is slow. Are you referring to something in Open WebUI?
reply
disiplus
2 days ago
[-]
It's tools that you can install from open webui

https://openwebui.com/tools

reply
mchiang
2 days ago
[-]
I haven't tried SearXNG personally. How does it compare to Ollama's web search in terms of the search content returned?
reply
tripplyons
2 days ago
[-]
I have no idea how well Ollama's works, but I haven't ran into any issues with SearXNG. The alternatives aren't worth paying for in any use case I've encountered.
reply