Our goal was to build a tool that allowed us to test a range of "personal contexts" on a very focused everyday use case for us, reading HN!
We are exploring use of personal context with LLMs, specifically what kind of data, how much, and with how much additional effort on the user’s part was needed to get decent results. The test tool was a bit of fun on its own so we re-skinned it and decided to post it here.
First time posting anything on HN but folks at work encouraged me to drop a link. Keen on feedback or other interesting projects thinking about bootstrapping personal context for LLM workflows!
The tension we have been finding is that we dont want to require people to "know how to prompt" to get value out of having a profile, hence our ongoing thinking around how to bootstrap good personal profiles from various data sources.
As Koomen notes, a good profile feels like it could be the best weapon against "AI slop" in a case I want something sharp and specific. But getting to that requires knowing how to prompt most of the time.
edit: ooh, I see what the swiping did:
## Analysis of user's tech interest: The user demonstrates a strong interest in advanced technical topics, particularly in the realm of artificial intelligence, machine learning, and low-level systems programming/security (e.g., kernel exploitation). They are drawn to articles that involve practical application, model creation, and deep dives into complex technical architectures. Their interest in "Show HN" articles suggests an appreciation for new, innovative projects, especially those with a technical or AI focus. They show less interest in general hardware announcements (like new microcontrollers), historical tech accounts, or very niche, non-AI/ML/security-related programming topics.
Yeah, that's pretty much spot on. Wonder if there's a way to match that against the topics I actually commented on, but at a glance it's pretty cool!
Other than quality of life stuff (multiple pages for example), I'd like to see it continually learn.
A few things got miscategorized and I'd love for it to naturally correct that with additional input from me.
The idea of having some kind of thumbs up/down on what you see after getting recs, that gets added to your preferences, or being able to do another round of preferences (rather than just re-doing them like we have now) is for sure on our next steps if we continue with. Were not quite sure what the feedback loops will be yet (we did look at adding whole webhistory for example but that felt like a bit much and pretty invasive).
For the miscats, on a meta level what we are generally interested in is whether they come from compression of the preferences into your user profile (essentially if more or better data is the path to better context for such a specific usecase, or whether there is more bang for buck optimizing the various prompts. Keen to hear if its obvious from looking at your profile what was the case.
If we get serious with this evals are a must next step. We are only 2 days in at the moment :)
In my case, none of the topics I most like to read about and discuss on HN (package management, software freedom, next-gen CLI tools, next-gen shells, philosophy, desktop Linux, functional programming, hacker history, literate programming, Emacs, bitching about common development practices, programming language design, configuration languages) managed to appear in the 30-post sample I used. The profile it wrote for me was pretty good considering that, but definitely not great.
The assessment was also mistaken about my degree of interest in "low level" technical details like binary file formats (in fact it's rather low, although it has gradually increased over time), and my degree of interest in theoretical computer science issues (in fact it's high, but all of the theoretical papers in the sample were about machine learning, which was not an area of academic focus for me).
I do really like the simplicity and customizability of this (exposing the profile as Markdown and making it editable is awesome), and the quality of the results is very good given the tiny input size. But if your primary interests are not super aligned with the mainstream on HN, you won't get a chance to demonstrate that you like them. If users could type a few terms to say what their biggest interests are before running through the samples, this could work even better for people like me.
It would also be interesting if this could work based on article contents and not just headlines. Sometimes I open something and close it immediately, or I open it undecided as to whether I will skim or read closely.
In fact I would pose that I have a couple of disparate interests or "profiles" that i would like to have greater control over/support in generating, that are non overlapping sets of topics and types of content. The ability to have greater agency in creating them and managing them is something we are keen to explore.
The article comments one is a toughie, as LLM use skyrockets when you scrape and consume content from the links. It would be awesome to include it, but would likely need to be paid, just from a cost perspective.
Really appreciate the detail here, this makes it easier to turn your examples into a test/eval/feature case.
This sounds like a great feature! My appetites for different clusters content certainly vary according to my mood! Perhaps "mood" would actually be a cute-but-clear name for such distinct/multiple profiles. :)
> The article comments one is a toughie, as LLM use skyrockets when you scrape and consume content from the links. It would be awesome to include it, but would likely need to be paid, just from a cost perspective.
Hm. That is a good (and in retrospect, obvious) point. If it makes the feed a lot better, I think it could certainly be worth it for some users. If it only makes a small difference, maybe not. It might be interesting for you to experiment and write about, since what kind of difference it will make isn't obvious (at least to me) up front.
More generally a next feature we want for ourselves is a way to add just some generic text and "update" the profile with that, rather than generate it fresh exclusively off of the 30 examples. This circles back to us using this as a focus point to think about what data is enough to generate a good user profile, and what good is.
The only feature I’d love to see, is there are many posts where I’m more interested in the HN comments, rather than the articles themselves. It would be great to see this incorporated somehow.
Awesome work though. Will bookmark!
i think the bit that needs the most work is classifying each post on the home page; quite a lot of posts that i would mark as "Dive" given its own classification of me ended up as "Skim".
We aren't really sure yet how best to surface _why_ the model predicts what it does. You can hover over the skim label and there is a bit of reasoning text, which might shed some light on why for now. We will think more about how to make these relationships more clear in the process of tightening them up and generally improving them.
Once the relationships are a bit more clear theres probably the 80/20 rule of work to tighten up those predictions.
## Analysis of user's tech interest: The user shows a strong interest in foundational computing concepts, historical perspectives on technology, and cutting-edge advancements in AI/ML, particularly those related to model architecture and efficiency. They are also drawn to low-level programming, system design, and hardware. Conversely, they seem less interested in business/startup narratives, general data manipulation tools, and consumer-oriented tech news unless it has a deep technical underpinning.
Id love to have those assumptions challenged though, if there are examples you could point me towards.
It _in theory_ should try and pick up a content style (funny stuff??) even if the tech is seemingly random, but i wouldn't be surprised if it just failed.
On a meta level i was suuuuper conscious of writing every word of this post/comments myself, as my prior is that HN's community is very intollerant of and highly sensitive to low effort content, whether via AI or not. This is despite using AI tools for lots of other parts of work (drafting, coding, summarising, brainstorming etc).
Do you think HN has become more accepting of AI slop, the slop is becoming harder to detect, or isnt as discerning as i assume?
Would you be willing to share some more of the architecture/tech stack?
On the LLM side of things we are using Gemini 2.5 flash, mostly for speed, and found it to be reasonably good quality at a vibe level compared to something heavier like claude 4, probably because we've worked hard to keep the task very simple and explicit. But in saying that there are a bunch of comments on quality that really highlight that if we want to get serious about that we should put in some user feedback loops and evals.
Its all in JS/TS, using vercel ai for the LLM calls, storage is local, but in order to really dig into quality we might start saving things, but to do that well we'd have to add auth/users etc. and we wanted to keep it light for a demo. We have been recently exploring langfuse for tracing, and are really liking that, and will probably look at using them for first pass evals when we get to it for this project.
We also talked quite a bit about non-LLM recsys and aside from time to set up and do well, something I really like is the sense of transparency and agency. you can see your profile, and edit it if you like to see the change in your results. I almost think wed lean further into that rather than folding in some trad DS or recsys stuff even if that might make the results better. Just musings at this point though.
Also I know that depending on the days / weeks / mood I will want to read different content from HN, so I guess there should still be like 30% of "random articles" in each category just to create some noise
We played around with the idea of a "fun" or "random" category, but ultimately didn't include it in this little first demo, as we found it super hard to have it not be just literally random (although that might not be a bad thing as you say)
On the topic of different moods and headspaces, thats one of the things more broadly we are really thinking about outside of this demo, and hadn't really considered for here but should. What different data can we use (in this case maybe just a different survey for different "profiles"), and how can a user manage those different profiles and front pages will be questions to answer.
Id be really interested to know if anyone has done topic grouped or themed frontpages for hackernews, as this would map well to that concept. ill have a look.
I had an expectation that it'd go through posts and give me stuff i'd be interested in. Like here's 25 posts that would be interesting?
Only frontpage? no second page? No sort by new, which is my preferred.
When weve been testing things, we often find that if there wasnt a great match between the options when picking preferences, and whats currently on the front page, that the context it generates will result in a lot of skips (understandably, but not great UX). Right now can try regenerating your context (and going through the process again), or manually editing it to get to different results.
Theres also some work for us to better select the options when picking preferences, or ensuring we always surface some deep dives.
Applying the same process to more pages, or bubling up content from multiple pages, or new is a great idea. cool to hear thats where you would look.
We are focusing right now on how comments could be used to build up a better user context, and your comment has made me think about how we can feed comments in (instead of just titles and urls) for your selected preferences to make a better profile, without needing to scrape anything (expensive and slow).
But i think that would only work because of their quality and relevance generally, which would for sure make it an interesting knowledge source for pure LLM search. Feels like something someone should build or maybe already has! keen to see if anyone knows of projects like that.
Yep, it very much feels like that but it doesn't seems to have happened yet. Even a not-entirely-quite-working yet attempt there could be an interesting thing/discussion.
The post and comments include a bunch of other tools that feel similar, and the tool itself works.
Ill have to take some time to use it, but also see what i can learn about how they've consumed and used HN comments more generally.