Show HN: Klovr – Convert any webpage to Markdown (Cloudflare covers only 5%)
1 points
1 hour ago
| 1 comment
| klovr.ai
| HN
vaibhavlodha98
1 hour ago
[-]
Hey HN! I'm the creator of Klovr.

  I built this to solve a problem I had when building AI agents: HTML wastes 60-95% of tokens, and Cloudflare's new "Markdown for Agents" only works on
  ~5% of the web (opt-in only).

  THE PROBLEM:
  I tested 100 popular websites with Cloudflare's Accept: text/markdown header. Only 3 actually served markdown. The rest? Still HTML. Turns out their
  markdown feature requires website owners to opt-in, which most won't do for years (if ever).

  MY SOLUTION:
  Klovr converts any webpage to markdown on-demand:
  - Same Accept headers as Cloudflare (100% compatible)
  - Works on 100% of sites (no opt-in needed)
  - Redis caching with 7-day TTL (10-100x speedup on repeated URLs)
  - Playwright for dynamic content (better anti-detection than Puppeteer)
  - Content-Signal headers for AI compliance

  TECH STACK:
  - Next.js 15 (App Router) + Vercel
  - Playwright for browser automation
  - Redis (via ioredis) for caching
  - Drizzle ORM + Neon PostgreSQL
  - Readability.js + Turndown for conversion

  FREE TIER: 10,000 conversions/month (no credit card)

  WHAT I LEARNED:
  1. Puppeteer-extra doesn't work on Vercel (ESM/CommonJS conflicts)
  2. Playwright has better anti-detection out of the box
  3. Redis caching is critical - first request is 2000ms, cached is 50ms
  4. Most sites still don't support Cloudflare's markdown (hence the need for universal conversion)

  CURRENT LIMITATIONS:
  - Payment processing is in development (everyone on free tier for now)
  - Dynamic content (Playwright) temporarily disabled for launch (re-enabling next week)
  - IP-based blocking (Reddit, LinkedIn) still happens - no way around datacenter IPs

  I'd love feedback on:
  - Architecture choices (should I use a different caching strategy?)
  - The positioning (am I framing the Cloudflare comparison correctly?)
  - What features would make this more useful for your AI agents?

  GitHub isn't public yet, but happy to share code snippets for specific parts (stealth script, caching layer, etc.).
reply