FilterHN

How do you handle lost webhooks in production?

4 points

2 hours ago

| 3 comments

I've worked at several companies where we'd discover hours later that critical webhooks from Stripe/Shopify never arrived (deployment, timeout, bug, etc.).

Every team ended up building the same solution: retry logic, dead letter queue, monitoring.

Curious how others handle this: - Do you rely on the provider's retry policy? - Built your own reliability layer? - Use a service? - Just manually reconcile when it happens?

(Context: Building https://relaehook.com to solve this, but genuinely curious what the norm is)

▲

super256

52 minutes ago

[-]

Ofc I rely on the retry policy. Stripe retries with exponential back off for three days. If Stripe can't reach our endpoint in 3 days we probably went bankrupt or a solar flare ate IT.

▲

everydaydev

36 minutes ago

[-]

Stripe does retries right, no argument there.

Where things get messy is when you have a mix of providers with wildly different retry behaviors, or internal services that have their own rate limits or downtime windows. A relay layer keeps the intake consistent even when the rest of the system isn’t.

▲

samarthr1

1 hour ago

[-]

Wait, so your product moves the point of failure from my infra to your infra?

Plus trusts y'all with contents of said webhook?

▲

everydaydev

51 minutes ago

[-]

Fair question — we’re not eliminating failure so much as isolating it behind a system that’s purpose-built for durability. Our infra is built with redundant queues, retry pipelines, and observability you typically wouldn’t stand up for a single product team.

And on the data side, we don’t use webhook payloads for anything other than delivery. They’re encrypted at rest, transit, and automatically purged based on retention settings.

▲

nickphx

4 minutes ago

[-]

Yeaaaaaaaaaaaaah.. I am not sure adding an additional third party and point of potential failure would help mitigate the issue of receiving data from third parties... but good luck.