Show HN: I extract recipes from TikTok, Instagram, and the messy web
1 points
1 hour ago
| 0 comments
| taste-buddy.app
| HN
I kept losing recipes. You know how it goes — you're scrolling TikTok at midnight, see an amazing pasta dish, save it, and never find it again. So I built TasteBuddy to fix that for myself. What I didn't expect: parsing recipes from the internet is a rabbit hole that goes deep.

The thing is, recipe content is scattered everywhere in completely different formats. A food blog might have nice JSON-LD markup. A TikTok? Just someone talking over a video. An Instagram reel? Recipe buried in the comments. Pinterest? Links to blogs that died three years ago.

So I ended up building specialized extractors for each platform.

Websites are the "easy" case. I look for JSON-LD with @type: Recipe first — most food blogs have it, thanks to SEO plugins. But the real world is messy. I've seen duration fields as PT30M, 30 minutes, 0:30, and my personal favorite, just half an hour. About 30% of recipe URLs have no structured data at all, so I fall back to Gemini to make sense of the raw HTML.

TikTok is where it gets fun. There's no recipe API. My pipeline resolves short URLs, then checks if the creator says something like "link in bio" (I detect this in five languages because German food TikTok is surprisingly massive). If I can find their website, great — I scrape the actual recipe from there. If not, I download the video via Apify and let Gemini analyze the frames. It works, but it's slow and expensive, so that's a Pro-only feature.

Instagram and Facebook — similar deal. oEmbed gets me the image, but the recipe is usually in the caption or comments. Same link-in-bio detection, same website resolution.

Photos are actually straightforward — screenshot of a recipe, photo of a cookbook page, whatever. Gemini's vision model handles those surprisingly well.

One thing I'm proud of: the AI tiering. Not every task needs a big model.

- Gemini Flash Lite handles 90% of the work — classifying content, parsing ingredients, extracting recipe names. Cheap, fast, good enough. - Gemini Flash kicks in when structured data fails — parsing messy HTML, analyzing video frames. - Gemini Pro only for image generation (recipe share cards). - text-embedding-004 for semantic search across your recipe collection.

This keeps my costs sane as a solo dev.

Stuff I learned the hard way:

- JSON-LD in the wild is chaos. The spec is fine, but WordPress plugins are creative. - "Link in bio" is how recipe distribution actually works on social media. - AI as fallback beats AI as default. Structured data first, AI when it fails = 95%+ success at a fraction of the cost. - Tier your models aggressively. Don't throw dollars at a problem that cents can solve.

Stack: Flutter (just me, indie dev), Supabase (Postgres + Deno Edge Functions), Gemini, Apify, PostHog.

Free with a Pro tier for video extraction and household sharing. Happy to go deeper on any part of the extraction pipeline.

No one has commented on this post.