FilterHN

Ask HN: Am I abusing Kafka? If yes, then what are the alternatives?

5 points

1 year ago

| 6 comments

I want to develop notification services that handle all events specific to users, such as mentions. I chose Kafka to process these notification events. I've created user-specific topics and publish all notification events on their respective topics. My challenge is how to manage user notification history. For instance, if a user wants to retrieve all read or unread notification events, can I distinguish between them on Kafka, or should I store all notification events in a database to differentiate between read and unread events?

▲

rubyissimo

1 year ago

[-]

user specific topics isn't the right path, conventional wisdom would be that you want < 1000s of topics.

It looks like that limit has been changed more recently: https://stackoverflow.com/questions/32950503/can-i-have-100s... but I'm still skeptical it's what you want. You'd need a consumer for each of them, you're tracking offsets for all of them.

Much more normal would be a single topic here, partitioned on user.

You can have 2 consumers then. One that pulls things off and notifies. Another one that pulls things off and stores to the DB, which is indexed by user ID.

Then when they read a notification, you flip a bit in the DB for the notifications.

▲

BWStearns

1 year ago

[-]

Pulsar is similar (and I think API compatible? but I might be remembering wrong) but has effectively no limit on topics. Maybe appropriate for their usecase depending on number of users?

▲

thorin

1 year ago

[-]

I would personally use a Db to manage application state and Kafka to manage the eventing part. That is in line with my experience of using Kafka mainly in a microservice or integration environment to pass different events between interested systems (topics).

▲

tiew9Vii

1 year ago

[-]

Kafka doesn’t scale well with partitions, throughput falls over quickly especially if using replication and acks.

If each user has a topic your partitions are unbounded. You have at least one partition per user.

I’d use a single notification topic, set a reasonable number of partitions on it and partition by user id.

Use the topic as a firehouse.

You can have a consumer sending push notifications.

You can have a consumer writing to a database and have the user inbox in the db with a read flag on the message. Etc. Users query the db, Kafka queues your writes. You may end up with consistency issues if using push notifications as mentioned above. If sending push notification you’d want to do cdc off your inbox table so the notification is only sent once it’s in the inbox.

You don’t necessarily need Kafka for this. It’s being used as a queue here instead of a log unless you want to keep every notification event sent to rebuild the inbox tables but then you’ll need to publish read states and start event sourcing and treating the inbox table as a materialised view.

Big rabbit hole. FIFO queue sounds simplest if you want asyn notification handling

▲

Jemaclus

1 year ago

[-]

Depending on your traffic levels, this could be possible. But Kafka has storage limits based on time and disk, so things can fall off the queue over time. If you have a small system, you can just have your consumer start from the beginning of the topic and consume everything in order to get the list. But that could be a lot of information to sift through to find nothing at all (maybe I don't have notifications, for example). Also keep in my Kafka is a distributed log, effectively, and not a database.

A more pragmatic and simple approach would be to have the consumer shove it into a database, then your app pulls from the DB. This is more persistent and is optimized for the kinds of queries you want to pull.

▲

aristofun

1 year ago

[-]

1. Kafka for temporary hot messages storage (and distribution). Topics based on your architecture not end users

2. db for longterm storage with frequent reads or updates

3. S3 etc. compressed files for longterm archival storage

4. Flink, beam etc for realtime or batch processing when some logic, transformations or aggregation needed on messages

▲

speedgoose

1 year ago

[-]

PostgreSQL.