FilterHN

Launch HN: Sitefire (YC W26) – Automating actions to improve AI visibility

21 points

2 hours ago

| 6 comments

Hi HN! We're Vincent and Jochen from sitefire (https://sitefire.ai). Our platform makes it easy for brands to improve their visibility in AI search.

We’ve been working together for years and have backgrounds in RL/optimization at Stanford and software engineering. We came to this idea after speaking with marketing teams who were seeing declining traffic due to Google’s AI Overviews and didn’t know what to do.

This space can feel esoteric. Many case studies, few actual studies. Constant battle against myths (e.g. you need a llms.txt vs. you don't need a llms.txt) and "GEO hacks". We try to be more data-driven. And we try to be more bold and build a system that not only monitors, but actually improves traffic from AI search.

While Google performs a single search, AI search engines expand the user prompt into 3-10 fan-out queries. The sourced pages are ranked using a classified algorithm similar to Reciprocal Rank Fusion (RFF). Finally, the LLMs skim the pages and decide what snippets to cite. Our goal is making sure brands have the right content that makes it through this funnel.

Here is how sitefire works:

- The user defines a set of prompts they want to monitor. These are synthetic prompts - we generate them based on SEO keywords and their monthly search volume.

- We submit these prompts to ChatGPT, Gemini, Google AI Mode, etc. on a daily basis and capture the answers. We extract fan-out queries, sourced pages, citations, and brand mentions.

- For each topic, our agents analyze which web pages are sourced and cited the most, and why. They also consider similar pages that you already have.

- Based on the diagnosis, our content agents draft improvements or create new pages, and push them directly to the client’s CMS.

- We integrate with the client’s network logs and Google Analytics to monitor the increase in AI bot requests and human referrals to their page.

This system is continuously updated, so it always shows which content works, and how to adapt the existing sitemap. For one client that used sitefire to optimize their blog, the AI-optimized articles increased their AI bot requests from ~200/day to ~570/day within ten days.

A risk we recognize is that AI-generated content is filling brands’ websites with slop. Whilst it’s still early days and we don’t claim to have figured everything out yet, our intention is to mitigate this by focusing the content on specific, unique information: real product capabilities, real pricing, honest comparisons. The clients still review every page before it goes live, so they can ensure the content is true to their brand.

Some clients use our platform themselves. For others we act more like an agency, automating steps as we go. The goal is for sitefire to run mostly on its own, with clients approving changes via Slack, Claude or their CMS.

Here's a video demo: https://screen.studio/share/fw7VQQak

If you'd like to try what we've built so far, sign up at https://sitefire.ai.

▲

onecommit

1 hour ago

[-]

How do models deal with assessing the quality of content and its accuracy/veracity when recommending products currently? What do the providers do to avoid a situation where more content === more traffic? Would love to see links to relevant research on this, if you have them. much success to you, appreciate your ai slop risk awareness.

▲

vincko

55 minutes ago

[-]

There is the preselection, which depends on the fanout queries the model comes up with and the contents performance across those queries on the search index.

After that content is actually assessed by the model. This paper tried different strategies to improve performance for this last step: https://arxiv.org/pdf/2311.09735. Adding statistics, sources, original data are all strategies that we apply.

In classic SEO, creating more and more content leads to "cannibalization". Generally this hurts performance of all overlapping content so much that it is not worth it.

▲

Gobhanu

2 hours ago

[-]

how do you track where users are coming from?

▲

vincko

1 hour ago

[-]

We currently simply integrate with your Google Analytics and filter by Source. This tends to be a lower bound, since it's not always set correctly. Coming from some of the native apps, users might be categorized as direct visitors.

There are other data sources we want to enable in the future like Cloudflare.

▲

yunyu

2 hours ago

[-]

What do you guys do differently than Profound or Airops?

▲

vincko

2 hours ago

[-]

That's a super valid question, we get it a lot. There are a lot of overlaps.

In our view Profound and Airops are aimed at existing marketing teams. Our goal is to be more hands-off, so you don't need a team. With many of our clients we act more like an agency, communicating via Slack and automating step by step. That's the experience we want to create. We aren't there yet though.

▲

debarshri

2 hours ago

[-]

Add peec to that list.

▲

methyl

30 minutes ago

[-]

And Surfer, the OG content optimization platform.

▲

vincko

2 hours ago

[-]

True, it is very competitive.

Our view on Peec is that it is an analytics solution. They recently did launch an actions feature. But they do not take any actions (yet). Creating content takes a lot of resources. And agencies are expensive.

As an analytics solution it is a good option.

▲

ceejayoz

1 hour ago

[-]

Ugh. The worst of SEO, but a bunch more of it? Noooooo.

▲

vincko

1 hour ago

[-]

I get it, there is a lot of worry about slop.

We think about it like this: all of these agents will be most useful to users if they provide valuable answers. So they will be looking for valuable content for grounding their answer.

There are exploits, you can overfit on whatever they currently use as an objective function. But those tend to be temporary. So in the long run, valuable content will win. That's what we aim to create. It's a fine line.

▲

ceejayoz

1 hour ago

[-]

> all of these agents will be most useful to users if they provide valuable answers

This is a bald assertion.

▲

vincko

1 hour ago

[-]

Do you doubt the statement on how to maximize usefulness? Or do you mean that the companies behind the models might not optimize (exclusively) for usefulness to the user?

I do share doubts about the latter.

▲

ceejayoz

1 hour ago

[-]

> Do you doubt the statement on how to maximize usefulness?

Yes; the customer here is the site using it, not Google end users, who'll tend to accept whatever's the top search result even if it's deeply wrong or complete slop.

The wellbeing of search users isn't really the priority here, right?

▲

vincko

43 minutes ago

[-]

Yes, that is correct. We help the brands, not the end user.

Let me try to rephrase the line of thinking:

To maximize value to the end user, the [AI search] models generally aim to be helpful. The companies building these models [OpenAI, etc.] are incentivized to make the model use helpful content.

Our goal is to be aligned with their objective function long term. And that incentivizes us to create helpful content.

Not all of this is a given. We don't know for sure how it will play out. There will always be ways to game the system. But we think those will get fixed over time.

Edit: added some clarifications on what I mean by "models"

▲

ceejayoz

19 minutes ago

[-]

Let me rephrase, too.

> To maximize value to the paying customer, the models generally aim to be seen as helpful by Google's algorithm. The companies building these models are incentivized to make the model seem to use helpful content.

SEO does the same thing; the appearance of useful to Google is more important than the actual being useful to Google's visitors.

▲

a13n

1 hour ago

[-]

Please don't override the browser's default scroll behavior. It's so jarring and basically never a good idea.

▲

vincko

1 hour ago

[-]

Thank you for the feedback. We'll launch our new site soon where this is fixed.

▲

vahar

58 minutes ago

[-]

Regarding the topic of ambient agents, what’s the impact of your product? It’s hard for me to imagine the impact but I guess it must be a necessity if we have ambient agents to get discovered at all right? Nice to see a player from Europe on the market too!

▲

vincko

30 minutes ago

[-]

Do you mean agents not answering short specific user prompts?

For those types of agents, prompt tracking is less accurate since the context of the queries is so large. But it's still relevant to understand what web searches they tend to perform and if you do show up in those.

That's another reason why we want to integrate other data sources, especially network logs.