It's incredibly amusing to me reading some people's comments here critical of AI, that if you didn't know any better, might make you think that AI is a worthless technology.
It's at least plausible that we are sufficiently complex that, even with tons of NSA and corporate data and extremely sophisticated models, you still wouldn't be able to predict someone's behavior with much accuracy.
We've effectively created a panopticon in recent years—there are cameras absolutely everywhere. Despite that, though, the effort to actually do something with all of those feeds has provided a sort of natural barrier to overreach: it'd be effectively impossible to have people constantly watching all of the millions of camera feeds available in a modern city and flagging things, but AI certainly could.
Right now the compute for that is a barrier, but it would surprise me if we don't see cameras (which currently offer a variety of fairly basic computer vision "AI" alerting features for motion and object detection) coming with free-text prompts to trigger alerts. "Alert me if you see a red Nissan drive past the house.", "Alert me if you see a neighbor letting his dog poop in my yard.", "Alert the police if you see crime taking place [default on, opt out required]."
They run simulations against N million personality models, accurately predicting the outcome of any news story/event/stimulus. They use this power to shape national and global events to their own ends. This is what privacy and digital sovereignty advocates have been warning the public about for over a decade, to no avail.
All the money in the world has been invested into trying to do it with stock markets, and they still can't do better than average.
But so much of the takeaway is that it's "impossible" for top-down government to actually process all of what was happening within the system they created, and to respond appropriately and timely-- thus creating problems like food shortages, corrupt industries, etc etc. So many of the problems were traced to the monolith information processing buildings owned by the state.
But honestly.. with modern LLMs all the way up the chain? I could envision a system like this working much more smoothly (while still being incredibly invasive and eroding most people's fundamental rights). And without massive food and labour shortages, where would the energy for change come from?
Factory machines transmitting their current rate of production all the way up to International Govt. which, being all knowing, can help you regulate your production based on current and forecasted worldwide consumption.
And your machines being flexible enough to reconfigure to produce something else.
Stores doing the same on their sales and Central Bank Digital Currency tying it all together.
But a major failing of the Soviet economic system was that there simply wasn't good data to make decisions, because at every layer people had the means and incentive to make their data look better than it really was. If you just add AI and modern technology to the system they had it still wouldn't work because wrong data leads us to the wrong conclusions. The real game changer would be industrial IoT, comprehensive tracking with QR codes, etc. And even then you'd have to do a lot of work to make sure factories don't mislabel their goods
If the economy were otherwise stagnant, maybe. But top-down planning just cannot take into account all the multitudes of inputs to plan anywhere near the scale that communist countries did. Bureaucrats are never going to be incentivized anywhere near the level that private decision making can be. Businesses (within a legal/regulatory framework) can "just do" things if they make economic sense via a relatively simple price signal. A top-down planner can never fully take that into account, and governments should only intervene in specific national interest situations (eg in a shortage environment legally mandating an important precursor medicine ingredient to medical companies instead of other uses).
The Soviet Union decided that defence was priority number one and shoved an enormous amount of national resources into it. In the west, the US government encouraged development that also spilled over into the civilian sector and vice-versa.
> But a major failing of the Soviet economic system was that there simply wasn't good data to make decisions, because at every layer people had the means and incentive to make their data look better than it really was.
It wasn't just data that was the problem, but also quality control, having to plan far, far ahead due to bureaucracy in the supply chain, not being able get spare parts because wear and tear wasn't properly planned, etc. There's an old saying even in private business that if you create and measure people on a metric they'll game or over concentrate on said metric. The USSR often pumped out large numbers of various widgets, but quality would often be poor (the stories of submarine and nuclear power plant manufacturers having to repeatedly deal and replace bad inputs was a massive source of waste).
The interesting leverage isn’t that AI can read more stuff than you; it’s that you can cheaply instrument your system (tests, properties, contracts, little spec fragments) and then let the model grind through iterations until something passes all of that. That just shifts the hard work back where it’s always been: choosing what to assert about the world. The tokens and the code are the easy part now.
This might make it into this week's https://hackernewsai.com/ newsletter.
And therefore it's impossible to test the accuracy if it's consuming your own data. AI can hallucinate on any data you feed it, and it's been proven that it doesn't summarize, but rather abridges and abbreviates data.
In the authors example
> "What patterns emerge from my last 50 one-on-ones?" AI found that performance issues always preceded tool complaints by 2-3 weeks. I'd never connected those dots.
Maybe that's a pattern from 50 one-on-ones. Or maybe it's only in the first two and the last one.
I'd be wary of using AI to summarize like this and expecting accurate insights
Do you have more resources on that? I'd love to read about the methodology.
> And therefore it's impossible to test the accuracy if it's consuming your own data.
Isn't it only if it's hard to verify the result? If it's a result that's hard to produce but easy to verify, a class which many problems fall into, you'd just need to look at the synthetized results.
If you ask it "given these arbitrary metrics, what is the best business plan for my company?" It'd be really hard to verify the result. I'd be hard to verify the result from anyone for that matter, even specialists.
So I think it's less about expecting the LLM to do autonomous work and more about using LLMs to more efficiently help you search the latent space for interesting correlations, so that you and not the LLM come up with the insights.
Have you ever met a human? I think one of the biggest reasons people become bearish on AI is that their measure of whether it's good/useful is that it needs to be absolutely perfect, rather than simply superior to human effort.
Humanity has gotten amazing results from unreliable stochastic processes, managing humans in organizations is an example of that. It's ok if something new is not completely deterministic to still be incredibly useful.
I don't really know what this means, or if the distinction is meaningful for the majority of cases.
Sure, but when do you have accurate results when using an iterative process? It can happen at the beginning or at the end when you’re bored, or have exhausted your powers of interrogation. Nevertheless, your reasoning will tell you if the AI result is good, great, acceptable, or trash.
For example, you can ask Chat—Summarize all 50 with names, dates and 2-3 sentence summaries and 2-3 pull quotes. Which can be sufficient to jog your memory, and therefore validate or invalidate the Chat conclusion.
That’s the tool, and its accuracy is still TBD. I for one am not ready to blindly trust our AI overlords, but darn if a talking dog isn’t worth my time if it can make an argument with me.
It may be those un-learning the previous iteration interactions once something stable arrives that are at a disadvantage?
How are you guys dealing with this risk? I'm sure on this site nobody is naive to the potential harms of tech, but if you're able to articulate how you've figured out that the risk is worth the benefits to you I'd love to hear it. I don't think I'm being to cynical to wait for either local LLMs to get good or for me to be able to afford expensive GPUs for current local LLMs, but maybe I should be time-discounting a bit harder?
I'm happy to elaborate on why I find it dangerous, too, if this is too vague. Just really would like to have a more nuanced opinion here.
And rightfully so. I've been looking at local LLMs because of that and they are slowly getting there. They will not be as "smart" as the big models, but even a 30B model (which you can easily run on a modern Macbook!) can do some summarization.
I just hope software for this will start getting better, because at the moment there is a plethora of apps, none of which are easy to use or even work with a larger number of documents.
The results are ... okay. The biggest problem is that I can't run some of the largest models on my hardware. The ones I'm running (mostly Qwen 3 at different numbers of parameters and quantization levels) often produce hallucinations. Overall, I can't say this is a practical or useful setup, but I'm just playing around so I don't mind.
That said, I doubt SOTA models would be that much better at this task. IMO LLM generated summaries and insights are never very good or useful. They're fine for assessing whether a particular text is worth reading, but they often extract the wrong information, or miss some critical information, or over-focus on one specific part of the text.
This does mean that, useful as e.g. Claude Code is, for any business with NDA-type obligations, I don't think I could recommend it over a locally hosted model, even though the machine needed to run a decent local model might cost €10k (with current price increases due to demand exceeding supply), that the machine is still slower than what hosts the hosted models, that the rapid rate of improvement means a 3-month delay between SOTA in open-weights and private-weights is enough to matter*.
But until then? If I'm vibe coding a video game I'd give away for free anyway, or copy-editing a blog post that's public anyway, or using it to help with some short stories that I'd never be able to charge money for, or uploading pictures of the plants in my garden right by the public road… that's fine.
* When the music (money for training) stops, it could be just about any provider whose model is best, whatever that is is likely to still get distilled down fairly cheaply and/or some 3-month-old open-weights model is likely to get fine-tuned for each task fairly cheaply; independently of this, without the hyper-scalers the supply chains may shift back from DCs to PCs and make local models much more affordable.
That's fortunate as uploading them to a LLM was you leaking them.
And it has to be unauthorised, e.g. the New York Times getting to see my ChatGPT history isn't itself a leak because that's court-ordered and hence authorised, all the >1200 "trusted partners" in GDPR popups if you give consent that's authorised, etc.
In general for chat platforms you're right though, uploading/copy-pasting long documents and asking the LLM to find not one, but multiple needles in a haystack tend to give you really poor results. You need a workflow/process for getting accuracy for those sort of tasks.
And after that? What's next?
If it was not fun for me, I would not have bought 3 GPUs just to run better local LLMs. Actual time, effort and money spent on my local setup compared to the value I get does not justify it at all. For 99% of the things I do I could have just used an API and paid like $17 in total. Though it would not have been as fun. For the other 1% I could have just rented some machine in cloud and ran LLMs there.
If you don't have your private crypto keys in your notes worth millions, but still worry about your privacy, I'd recommend just renting a machine/GPU in a smaller cloud provider (not the big 3 or 5) and do these kind of things there.
This is specific, but if you start replying to LLM summaries of emails, instead of reading and responding to the content of the email itself, you are quickly going to become a burden socially.
The people you are responding to __will__ be able to tell, and will dislike you for your lack of consideration.
You can really see the limitations of LLMs when you look at how poorly they do at summarization. They most often just extract a few key quotes from the text, and provide an abbreviated version of the original text (often missing key parts!)
Abbreviation is not summarization. To properly summarized you need to be able to understand higher level abstractions implied in the text. At a fundamental level this is not what LLMs are designed to do. They can interpolate and continue existing text in remarkable and powerful ways, but they aren't capable of getting the "big picture". This is likely related to why they frequently ignore very important passages when "summarizing".
> We're still thinking about AI like it's 2023.
Just a reminder that in 2023 we were all told that AI was on a path of exponential progress. Were this true, you wouldn't need to argue that we're using it "wrong" because the technology would have improved dramatically more than it did from 2021-2023 such that there would be no need to argue that its better, using it "wrong" would still be a massive improvement.
Still, I find the models to be excellent synthesisers of vast quantities of data on subjects in which I have minimal prior knowledge. For instance, when I wanted to translate some Lorca and Cavafy poems into English I discovered that ChatGPT had excellent knowledge of the poems in their native languages, and the difficulties translators faced when rendering them into English. Once I was able to harness the models to assist me translate a poem, rather than generate a translation for me (every LLM is convinced it's a Poet), I managed to write some reasonable poems that met my personal requirements.
I wrote about the experience here: https://rikverse2020.rikweb.org.uk/blog/adventures-in-poetry...
I take notes for remembrance and relevance (what is interesting for me). But linking concepts is all my thinking. Doing whatever rhe article is prescribing is like sending someone on a tourist trip to take pictures and then bragging that you visited the country. While knowing that some pictures are photoshopped.
Text is a very linear medium. It's just the spark while our wealth of experiences is the fuel. No amount of wrangling the word "pain" will compare to actually experiencing it.
You'll better be served by just having a space repetition system for the notes you've taken. In this way, you'll be reminded of the whole experience when you took the note instead of reading words that were never written by someone who have lived.
> instant and total recall of our thoughts/notes/experiences
Closest is with vector searches & RAG etc., but even that isn't total recall because it will misclassify stuff with current SOTA.
Throwing everything in a pile and hoping an LLM will sort it all out for you, is at present even more limited.
They're good, sure, but you're overstating them.
Still, all credit to him for creating that asset in the first place.
I think meetings is one thing I'm missing out on. How do you put meeting information into your Obsidian? Is it just transcripts?
LLMs can be thought as one big stochastic JOIN. The new insight capabilities - thanks to their massive recall - is there. The problem is the stochasticity. They can retrieve stuff from the depths and slap them together but in these use cases we have no clue how relevant their inner ranking results or intermediary representations were. Even with the best read of user intent they can only simulate relevance, not really compute it in a grounded and groundable way.
So I take such automatic insight generation tasks with a massive grain of salt. Their simulation is amusing and feels relevant but so does a fortune teller doing a mostly cold read with some facts sprinkled in.
> → I solve problems faster by finding similar past situations → I make better decisions by accessing forgotten context → I see patterns that were invisible when scattered across time
All of which makes me skeptical of this claim. I have no doubt they feel productive but it might just as well be a part of that simulation, with all the biases, blind spots etc originating from the machine. Which could be worse than not having used the tool. Not having augmented recall is OK, forgetting things are OK - because memory is not a passive reservoir of data but an active reranker of relevance.
LLMs can’t be the final source of insight and wisdom, they are at best sophists, or as Terrence Tao put it more kindly, a mere source of cleverness. In this, they can just as well augment our self-deception capacity, maybe even more than counterbalancing them.
Exercise: whatever amusing insight a machine produces for you, ask for a very strong counter to it. You might be equally amused.
The AI summary at the top was surprisingly good! Of course, the AI isn't doing anything original; instead, it created a summary of whatever written material is already out there. Which is exactly what I wanted.
This isn't strictly a case against AI, just a case that we have a contradiction on the definition of "well informed". We value over-consumption, to the point where we see learning 3 things in 5 minutes as better than learning 1 thing in 5 minutes, even if that means being fully unable to defend or counterpoint what we just read.
I'm speficially referring to what you said: "the speaker used some obscure technical terminology I didn't know" this is due to lack of assumed background knowledge, which makes it hard to verify a summary on your own.
So someone who wants a war or wants Tweedledum to get more votes than Tweedledee has incentives to poison the well and disseminate fake content that makes it into the training set. Then there's a whole department of "safety" that has to manually untrain it to not be politically incorrect, racist etc. Because the whole thesis is don't think for yourself, let the AI think for you.
The 3 things in 5 minutes is even worse - it’s like taking Google Maps everywhere without even thinking about how to get from point A to point B - the odds of knowing anything at all from that are near zero.
And since it summarizes the original content, it’s an even bigger issue - we never even have contact with the thing we’re putatively learning from, so it’s even harder to tell bullshit from reality.
It’s like we never even drove the directions Google Maps was giving us.
We’re going to end up with a huge number of extremely disconnected and useless people, who all absolutely insist they know things and can do stuff. :s
It's true. I previously had no idea of the proper number of rocks to eat, but thanks to a notorious summary (https://www.bbc.com/news/articles/cd11gzejgz4o) I have all the rock-eating knowledge I need.
If you ask Google about news, world history, pop culture, current events, places of interest, etc., it will lie to you frequently and confidently. In these cases, the "low quality summary" is very often a completely idiotic and inane fabrication.
I looked up a medical term, that is frequently misused (eg. "retarded"), and asked the Gemini to compare it with similar conditions.
Because I have enough of a background in the subject matter, I could tell what it had construed by its mixing the many incorrect references with the much fewer correct references in the training data.
I asked it for sources, and it failed to provide anything useful. But once I am looking at sources, I would be MUCH better off searching and only reading the sources might actually be useful.
I was sitting with a medical professional at the time (who is not also a programmer) and he completely swallowed what Gemini was feeding him. He commented that he appreciates that these summaries let him know when he is not up to date with the latest advances, and he learnt alot from the response.
As an aside, I am not sure I appreciate that Google's profile would now associate me with that particular condition.
Scary!
I've written my whole lifestory, the parts I'm willing to share that is, and posted it in Claude. It helped me way better with all kinds of things. It took me 2 days to write without formatting, pretty much how I write all my HN comments (but then 2 days straight: eat, sleep, write).
I've also exported all my notes, but it's too big for the context. That's why I wrote my life story.
From a practical standpoint I think the focus is on context management. Obsidian can help with this (I haven't used it so don't know the details). For code, it means doing things like static and dynamic analysis to see which functions calls what and create a topology of function calls and send that as context, then Claude Code can more easily know what to edit, and it doesn't need to read all the code.
So yea, I definitely, add to the "AI generated" text part but I read over all the texts, and usually they don't get sent out. Ultimately, it's still a lot quicker to do it this way.
For career planning, so far it hasn't beaten my own insights but it came close. For example, it mentioned that I should actually be a developer advocate instead of a software engineer. 2 to 3 years ago I came to that same thought. I ultimately rejected the idea due to how I am but it is a good one to think about.
What I see now, I think the best job for me would be a tech consultant. Or as I'd also like to call it: a data analyst that spots problems and then uses his software engineering or teaching skills to solve that problem. I don't think that job has a good catch all title as it is a pretty generalist job. I'm currently at a company that allows me to do this but the pay is quite low, so I'm looking for a tech company where I could do something similar. Maybe a product manager role? It really depends on the company culture.
What I also noticed it did better: it doesn't reduce me to data engineering anymore. It understands that I aspire to learn everything and anything I can get my hands on. It's my mode of living and Claude understands that.
So nothing too spectacular yet, but it'll come. It requires more prompt/context engineering and fine tuning of certain things. I didn't get around to it yet.
I'm really glad you are getting some personal growth out of these tools, but I hesitate to give Claude as much credit as you do. And I'm really cautious about saying Claude "understands" because that word has many meanings and it isn't clear which ones apply hear.
What I'm hearing is that you use it like a kind of rubber-duck debugger. Except this is a special rubber duck because it can replay/rephrase what you said.
Most of "AI"s superpower is tricking monkeys into anthropomorphizing it. It's just a giant, complicated, expensive, environmentally destructive math computer with no capability to create novel thought. If it did have one superpower it's gaslighting and manipulation.
I'd like to be able to point a model at a news story and have it follow every fact and claim back to an origin, (or lack of one). I'm not sure when they will be able to do that, they aren't up to the task yet. Reading the news would be so much different if you could separate the 'we report this happened' from the 'we report that someone else reported this happened"
I would love to try this out but don’t feel comfortable sharing all my personal notes with a third party.
I have a written novel draft and something like a million words of draft fiction but have struggled with how to get meaningful analytics from it.
Either way, they are D.I.C.s
"Everyone’s using AI wrong." Oh, we are? Please, enlightened us thought leader, tell us how we’ve all been doing it wrong this entire time.
"Here’s how most people use AI." No, that’s how you use AI. Can we stop projecting our own habits onto the whole world and calling it insight?
Everyone is justifiably afraid of AI because it's pretty obvious that Claude Opus 4.5 level agents replace developers.
Is it though? I really don't see it. Replacing developers requires way more than writing the right code. I can agree it can replace junior to mid level engineers at some tasks, specifically in greenfield projects and popular stacks. And, don't get me wrong, it's very helpful even for senior engineers. But to "replace" those it will require some new iterations of "Opus 4.5".
Well for most humans that's the more super of the powers too ;)
I opened Claude Code in the repo and asked it to tell me about myself based on my writing.
Claude's answer overestimated my technical skills (I take notes on stuff I don't know, not on things I know, so it assumed that I had deep expertise in things I'm currently learning, and ignored areas where I do have a fair amount of experience), but the personal side really resonated with me.