FilterHN

gchamonlive

12 days ago

[-]

I think it's interesting to juxtapose traditional coding, neural network weights and prompts because in many areas -- like the example of the self driving module having code being replaced by neural networks tuned to the target dataset representing the domain -- this will be quite useful.

However I think it's important to make it clear that given the hardware constraints of many environments the applicability of what's being called software 2.0 and 3.0 will be severely limited.

So instead of being replacements, these paradigms are more like extra tools in the tool belt. Code and prompts will live side by side, being used when convenient, but none a panacea.

karpathy

12 days ago

[-]

I kind of say it in words (agreeing with you) but I agree the versioning is a bit confusing analogy because it usually additionally implies some kind of improvement. When I’m just trying to distinguish them as very different software categories.

miki123211

12 days ago

[-]

What do you think about structured outputs / JSON mode / constrained decoding / whatever you wish to call it?

To me, it's a criminally underused tool. While "raw" LLMs are cool, they're annoying to use as anything but chatbots, as their output is unpredictable and basically impossible to parse programmatically.

Structured outputs solve that problem neatly. In a way, they're "neural networks without the training". They can be used to solve similar problems as traditional neural networks, things like image classification or extracting information from messy text, but all they require is a Zod or Pydantic type definition and a prompt. No renting GPUs, labeling data and tuning hyperparameters necessary.

They often also improve LLM performance significantly. Imagine you're trying to extract calories per 100g of product, but some product give you calories per serving and a serving size, calories per pound etc. The naive way to do this is a prompt like "give me calories per 100g", but that forces the LLM to do arithmetic, and LLMs are bad at arithmetic. With structured outputs, you just give it the fifteen different formats that you expect to see as alternatives, and use some simple Python to turn them all into calories per 100g on the backend side.

12 days ago

[-]

Even more than that. With Structured Outputs we essentially control layout of the response, so we can force LLM to go through different parts of the completion in a predefined order.

One way teams exploit that - force LLM to go through a predefined task-specific checklist before answering. This custom hard-coded chain of thought boosts the accuracy and makes reasoning more auditable.

12 days ago

[-]

I also think that structured outputs are criminally underused, but it isn't perfect... and per your example, it might not even be good, because I've done something similar.

I was trying to make a decent cocktail recipe database, and scraped the text of cocktails from about 1400 webpages. Note that this was just the text of the cocktail recipe, and cocktail recipes are comparatively small. I sent the text to an LLM for JSON structuring, and the LLM routinely miscategorized liquor types. It also failed to normalize measurements with explicit instructions and the temperature set to zero. I gave up.

https://www.boundaryml.com/blog/schema-aligned-parsing

11 days ago

[-]

have you tried schema-aligned parsing yet?

the idea is that instead of using JSON.parse, we create a custom Type.parse for each type you define.

so if you want a:

   class Job { company: string[] }

And the LLM happens to output:

   { "company": "Amazon" }

We can upcast "Amazon" -> ["Amazon"] since you indicated that in your schema.

and since its only post processing, the technique will work on every model :)

for example, on BFCL benchmarks, we got SAP + GPT3.5 to beat out GPT4o ( https://www.boundaryml.com/blog/sota-function-calling )

11 days ago

[-]

Interesting! I was using function calling in OpenAI and JSON mode in Ollama with zod. I may revisit the project with SAP.

instig007

11 days ago

[-]

    so if you want a:

       class Job { company: string[] }

    We can upcast "Amazon" -> ["Amazon"] since you indicated that in your schema.

Congratulations! You've discovered Applicative Lifting.

10 days ago

[-]

its a bit more nuanced than applicative lifting. parts of of SAP is that, but there's also supporting strings that don't have quotation marks, supporting recursive types, supporting unescaped quotes like: `"hi i wanted to say "hi""`, supporting markdown blocks inside of things that look like "json", etc.

but applicative lifting is a big part of it as well!

gloochat.notion.site/benefits-of-baml

11 days ago

[-]

Ok. Tried it, I'm not super impressed.

    Client: Ollama (phi4) - 90164ms. StopReason: stop. Tokens(in/out): 365/396
    ---PROMPT---
    user: Extract from this content:
    Grave Digger: 
     Ingredients
    
    - 1 1/2 ounces vanilla-infused brandy*
    
    - 3/4 ounce coffee liqueur
    
    - 1/2 ounce Grand Marnier
    
    - 1 ounce espresso, freshly brewed
    
    - Garnish: whipped cream
    
    - Garnish: oreo cookies, crushed
    
    Steps
    
    1.  Add all ingredients into a shaker with ice and shake until
        well-chilled.
    
    2.  Strain into a coupe.
    
    3.  Top with whipped cream and crushed Oreo cookies (discarding cream in
        center).
    
    *Vanilla-infused brandy: Cut 2 fresh vanilla pods lengthwise and place
    into a 750 mL bottle of brandy. Let sit for 3 to 5 days, shaking
    occasionally. Strain out pods and store.
    
    
    
    IngredientType
    ----
    - Alcohol
    - Sweetener
    - Sour
    - Aromatic
    - Bittering_agent
    - Food
    - Dilution
    
    LiquorType
    ----
    - Gin
    - Vodka
    - Rum
    - Whiskey
    - Tequila
    - Mezcal
    - Brandy
    - Cognac
    - Liqueur
    - Wine
    - Absinthe
    
    Answer in JSON using this schema:
    {
      Name: string,
      Ingredients: [
        {
          Text: string,
          Type: IngredientType,
          Liquor_type: LiquorType or null,
          Name_brand: string or null,
          Unit_of_measure: string,
          Measurement_or_unit_count: string,
        }
      ],
      Steps: [
        {
          Number: int,
          Instruction: string,
        }
      ],
      Garnishes: string[],
    }
    
    ---LLM REPLY---
    ```json
    {
      "Name": "Grave Digger",
      "Ingredients": [
        {
          "Text": "vanilla-infused brandy*",
          "Type": "Alcohol",
          "Liquor_type": "Brandy",
          "Name_brand": null,
          "Unit_of_measure": "ounces",
          "Measurement_or_unit_count": "1 1/2"
        },
        {
          "Text": "coffee liqueur",
          "Type": "Liqueur",
          "Liquor_type": "Liqueur",
          "Name_brand": null,
          "Unit_of_measure": "ounce",
          "Measurement_or_unit_count": "3/4"
        },
        {
          "Text": "Grand Marnier",
          "Type": "Liqueur",
          "Liquor_type": "Liqueur",
          "Name_brand": "Grand Marnier",
          "Unit_of_measure": "ounce",
          "Measurement_or_unit_count": "1/2"
        },
        {
          "Text": "espresso, freshly brewed",
          "Type": "Bittering_agent",
          "Liquor_type": null,
          "Name_brand": null,
          "Unit_of_measure": "ounces",
          "Measurement_or_unit_count": "1"
        }
      ],
      "Steps": [
        {
          "Number": 1,
          "Instruction": "Add all ingredients into a shaker with ice and shake until well-chilled."
        },
        {
          "Number": 2,
          "Instruction": "Strain into a coupe."
        },
        {
          "Number": 3,
          "Instruction": "Top with whipped cream and crushed Oreo cookies (discarding cream in center)."
        }
      ],
      "Garnishes": [
        "whipped cream",
        "oreo cookies, crushed"
      ]
    }
    ```
    ---Parsed Response (class Recipe)---
    {
      "Name": "Grave Digger",
      "Ingredients": [
        {
          "Text": "vanilla-infused brandy*",
          "Type": "Alcohol",
          "Liquor_type": "Brandy",
          "Name_brand": null,
          "Unit_of_measure": "ounces",
          "Measurement_or_unit_count": "1 1/2"
        },
        {
          "Text": "espresso, freshly brewed",
          "Type": "Bittering_agent",
          "Liquor_type": null,
          "Name_brand": null,
          "Unit_of_measure": "ounces",
          "Measurement_or_unit_count": "1"
        }
      ],
      "Steps": [
        {
          "Number": 1,
          "Instruction": "Add all ingredients into a shaker with ice and shake until well-chilled."
        },
        {
          "Number": 2,
          "Instruction": "Strain into a coupe."
        },
        {
          "Number": 3,
          "Instruction": "Top with whipped cream and crushed Oreo cookies (discarding cream in center)."
        }
      ],
      "Garnishes": [
        "whipped cream",
        "oreo cookies, crushed"
      ]
    }

Processed Recipe: { Name: 'Grave Digger', Ingredients: [ { Text: 'vanilla-infused brandy*', Type: 'Alcohol', Liquor_type: 'Brandy', Name_brand: null, Unit_of_measure: 'ounces', Measurement_or_unit_count: '1 1/2' }, { Text: 'espresso, freshly brewed', Type: 'Bittering_agent', Liquor_type: null, Name_brand: null, Unit_of_measure: 'ounces', Measurement_or_unit_count: '1' } ], Steps: [ { Number: 1, Instruction: 'Add all ingredients into a shaker with ice and shake until well-chilled.' }, { Number: 2, Instruction: 'Strain into a coupe.' }, { Number: 3, Instruction: 'Top with whipped cream and crushed Oreo cookies (discarding cream in center).' } ], Garnishes: [ 'whipped cream', 'oreo cookies, crushed' ] }

So, yeah, the main issue being that it dropped some ingredients that were present in the original LLM reply. Separately, the original LLM Reply misclassified the `Type` field in `coffee liqueur`, which should have been `Alcohol`.

10 days ago

[-]

appreciate you tyring it. the reason it dropped the day was due to your type system not being understood by the LLM you're using.

the model replied with

       {
          "Text": "coffee liqueur",
          "Type": "Liqueur",
          "Liquor_type": "Liqueur",
          "Name_brand": null,
          "Unit_of_measure": "ounce",
          "Measurement_or_unit_count": "3/4"
        },

but you expected a { Text: string, Type: IngredientType, Liquor_type: LiquorType or null, Name_brand: string or null, Unit_of_measure: string, Measurement_or_unit_count: string, }

there's no way to cast `Liqueur` -> `IngredientType`. but since the the data model is a `Ingredient[]` we attempted to give you as many ingredients as possible.

The model itself being wrong isn't something we can do much about. that depends on 2 things (the capabilities of the model, and the prompt you pass in).

If you wanted to capture all of the items with more rigor you could write it in this way:

    class Recipe {
        name string
        ingredients Ingredient[]
        num_ingredients int
        ...

        // add a constraint on the type
        @@assert(counts_match, {{ this.ingredients|length == this.num_ingredients }})
    }

And then if you want to be very wild, put this in your prompt:

   {{ ctx.output_format }}
   No quotes around strings

And it'll do some cool stuff

10 days ago

[-]

if you share your prompt with me on promptfiddle.com i can play around with it and see how i can make it better!

handfuloflight

12 days ago

[-]

Which LLM?

coderatlarge

11 days ago

[-]

note the per 100g prompt might lead the llm to reach for the part of its training distribution that is actually written in terms of the 100g standard and just lead to different recall rather than a suboptimal calculation based on non-standardized per 100g training examples.

BobbyJo

12 days ago

[-]

The versioning makes sense to me. Software has a cycle where a new tool is created to solve a problem, and the problem winds up being meaty enough, and the tool effective enough, that the exploration of the problem space the tool unlocks is essentially a new category/skill/whatever.

computers -> assembly -> HLL -> web -> cloud -> AI

Nothing on that list has disappeared, but the work has changed enough to warrant a few major versions imo.

12 days ago

[-]

For me it's even simpler:

V1.0: describing solutions to specific problems directly, precisely, for machines to execute.

V2.0: giving machine examples of good and bad answers to specific problems we don't know how to describe precisely, for machine to generalize from and solve such indirectly specified problem.

V3.0: telling machine what to do in plain language, for it to figure out and solve.

V2 was coded in V1 style, as a solution to problem of "build a tool that can solve problems defined as examples". V3 was created by feeding everything and the kitchen sink into V2 at the same time, so it learns to solve the problem of being general-purpose tool.

BobbyJo

12 days ago

[-]

That's less a versioning of software and more a versioning of AI's role in software. None -> Partial -> Total. Its a valid scale with regard to AI's role specifically, but I think Karpathy was intending to make a point about software as a whole, and even the details of how that middle "Partial" era evolves.

lymbo

11 days ago

[-]

What are some predictions people are anticipating for V4?

My Hail Mary is it’s going to be groups of machines gathering real world data, creating their own protocols or forms of language isolated to their own systems in order to optimize that particular system’s workflow and data storage.

lodovic

11 days ago

[-]

But that means AGI is going to write itself

gchamonlive

12 days ago

[-]

> versioning is a bit confusing analogy because it usually additionally implies some kind of improvement

Exactly what I felt. Semver like naming analogies bring their own set of implicit meanings, like major versions having to necessarily supersede or replace the previous version, that is, it doesn't account for coexistence further than planning migration paths. This expectation however doesn't correspond with the rest of the talk, so I thought I might point it out. Thanks for taking the time to reply!

12 days ago

[-]

Andrej, maybe Software 3.0 is not written in spoken language like code or prompts. Software 3.0 is recorded in behavior, a behavior that today's software lacks. That behavior is written and consumed by machine and annotated by human interaction. Skipping to 3.0 is premature, but Software 2.0 is a ramp.

mclau157

12 days ago

[-]

Would this also be more of a push towards robotics and getting physical AI in our every day lives

12 days ago

[-]

Very insightful! How you would describe boiling an egg is different than how a machine would describe it to another machine.

fc417fc802

11 days ago

[-]

Funny that you should use boiling an egg as an example. https://www.nature.com/articles/s44172-024-00334-w

swyx

12 days ago

[-]

no no, it actually is a good analogy in 2 ways:

1) it is a breaking change from the prior version

2) it is an improvement in that, in its ideal/ultimate form, it is a full superset of capabilities of the previous version

gyomu

11 days ago

[-]

It's not just the hardware constraints - it's also the training constraints, and the legibility constraints.

Training constraints: you need lots, and lots of data to build complex neural network systems. There are plenty of situations where the data just isn't available to you (whether for legal reasons, technical reasons, or just because it doesn't exist).

Legibility constraints: it is extremely hard to precisely debug and fix those systems. Let's say you build a software system to fill out tax forms - one the "traditional" way, and one that's a neural network. Now your system exhibits a bug where line 58(b) gets sometimes improperly filled out for software engineers who are married, have children, and also declared a source of overseas income. In a traditionally implemented system, you can step through the code and pinpoint why those specific conditions lead to a bug. In a neural network system, not so much.

So totally agreed with you that those are extra tools in the toolbelt - but their applicability is much, much more constrained than that of traditional code.

In short, they excel at situations where we are trying to model an extremely complex system - one that is impossible to nail down as a list of formal requirements - and where we have lots of data available. Signal processing (like self driving, OCR, etc) and human language-related problems are great examples of such problems where traditional programming approaches have failed to yield the kind of results we wanted (ie, beyond human performance) in 70+ years of research and where the modern, neural network approach finally got us the kind of results we wanted.

But if you can define the problem you're trying to solve as formal requirements, then those tools are probably ill-suited.

radicalbyte

12 days ago

[-]

Weights are code being replaced by data; something I've been making heavy use of since the early 00s. After coding for 10 years you start to see the benefits of it and understand where you should use it.

LLMs give us another tool only this time it's far more accessible and powerful.

dcsan

11 days ago

[-]

LLMs have already replaced some code directly for me eg NLP stuff. Previously I might write a bunch of code to do clustering now I just ask the LLM to group things. Obviously this is a very basic feature native to LLMs but there will be more first class LLM callable functions over time.

OJFord

11 days ago

[-]

I'm not sure about the 1.0/2.0/3.0 classification, but it did lead me to think about LLMs as a programming paradigm: we've had imperative & declarative, procedural & functional languages, maybe we'll come to view deterministic vs. probabilistic (LLMs) similarly.

    def __main__:
        You are a calculator. Given an input expression, you compute the result and print it to stdout, exiting 0.
        Should you be unable to do this, you print an explanation to stderr and exit 1.

(and then, perhaps, a bunch of 'DO NOT express amusement when the result is 5318008', etc.)

llflw

11 days ago

[-]

Why bother using human language to communicate with a computer? You interact with a computer using a programming language—code—which is more precise and effective. Specifically: → In 1.0, you communicate with computers using compiled code. → In 2.0, you communicate with compilers using high-level programming languages. → In 3.0, you interact with LLMs using prompts, which arguably should not be in natural human language. Nonetheless, you should communicate with AGIs using human language, just as you would with other human beings.

standeven

11 days ago

[-]

Why bother using higher-level programming languages to communicate with a computer? You interact with a computer using assembly - raw bit shifting and memory addresses - which is more precise and effective.

dustbunny

11 days ago

[-]

Using assembly is not really more precise in terms of solving the problem. You can definitely make an argument that using a higher level language is equally if not more precise. Especially since your low level assembly will be limited to which architectures it can run on, you can state that the c++ that generates that assembly is "more precisely defining a calculator program".

rictic

10 days ago

[-]

I agree with your general point, but C++ isn't a great example, as it is so underspecified. Imagine as part of our calculator we wrote the function:

    int add(int a, int b) {
      return a + b;
    }

What is the result of add(32767, 1)? C++ does not presume to define just one meaning for such an expression. Or even any meaning at all. What to do when the program tries to add ints that large is left to the personal conscience of compiler authors.

d0mine

9 days ago

[-]

Precision is not boolean (present or absent/0 or 1). There may be many numbers between 0 and 1. Compared to human languages, programming languages are much more precise that makes the results much more predictable in practice.

I can imagine OS being written in C++ and working most of the time. I don't think you can replace Linux written in C with any number of LLM prompts.

LLM can be a [bad so far] programmer but a prompt is not a program.

tim333

10 days ago

[-]

Using code may not be more precise in terms of solving a problem than english. Take the NHS. With better AI, saying build a good IT system for the NHS may have worked better than this stuff https://www.theguardian.com/society/2013/sep/18/nhs-records-...

wing-_-nuts

11 days ago

[-]

You can express dang near anything you wish to express in assembly in a higher order programming language because it is designed to allow that level of clarity and specificity. In fact most have compile time checks to stop you if you have not properly specified certain behavior.

The English language is not comparable. It is a language designed to capture all the ambiguity of human thought, and as such is not appropriate for computation.

TLDR: There's a reason why programmers still exist after the dawn of 4GL / 'no code' frameworks. Otherwise we'd all be product managers typing specs into JIRA and getting fully formed applications out the other side.

softfalcon

11 days ago

[-]

If this is what it comes to, it would explain the many, many software malfunctions in Star Trek. If everything is an LLM/LRM (or whatever super advanced version they have in the 23rd century) then everything can evolve into weird emergent behaviours.

stares at every weird holo-deck episode

semiquaver

11 days ago

[-]

LLMs are not inherently indeterministic. Batching, temperature, and other things make them appear so when run by big providers but a locally-run LLM model at zero temperature will always produce the same output given the same input.

oytis

11 days ago

[-]

That's an improvement, they are still "chaotic" though in that small changes in input can change the output unpredictably strong

behnamoh

11 days ago

[-]

Yes, this paper says exactly what you talked about: https://arxiv.org/abs/2404.01332

lmeyerov

11 days ago

[-]

That assumes they were implemented with deterministic operators, which isn't the default assumption when using neural network libs on GPUs. Imagine random seeds, cublas optimizations - like you can configure all these things, but I wouldn't assume it, esp in GPU-optimized OSS..

ai-christianson

11 days ago

[-]

Why does this remind me of COBOL.

wiz21c

11 days ago

[-]

'cos COBOL was designed to be human readable (writable ?).

dheera

11 days ago

[-]

    def __main__:
        You run main(). If there are issues, you edit __file__ to try to fix the errors and re-run it. You are determined, persistent, and never give up.

beambot

11 days ago

[-]

Output "1" if the program halts; "0" if it doesn't.

fragmede

11 days ago

[-]

funnily enough, you can give the LLM the code and ask it if the function will halt, and for some cases of input, it is able to say that the program does/does not halt.

pxc

11 days ago

[-]

The halting problem is about being able to answer this question in full generality, though. Being able to answer the question for specific cases is already feasible and always was.

OJFord

11 days ago

[-]

You know, the more I think about it, the more I like this model.

What we have today with ChatGPT and the like (and even IDE integrations and API use) is imperative right, it's like 'answer this question' or 'do this thing for me', it's a function invocation. Whereas the silly calculator program I presented above is (unintentionally) kind of a declarative probabilistic program - it's 'this is the behaviour I want, make it so' or 'I have these constraints and these unknowns, fill in the gaps'.

What if we had something like Prolog, but with the possibility of facts being kind of on-demand at runtime, powered by the LLM driving it?

crsn

11 days ago

[-]

This (sort of) is already a paradigm: https://en.m.wikipedia.org/wiki/Probabilistic_programming

stabbles

11 days ago

[-]

That's entirely orthogonal.

In probabilistic programming you (deterministically) define variables and formulas. It's just that the variables aren't instances of floats, but represent stochastic variables over floats.

This is similar to libraries for linear algebra where writing A * B * C does not immediately evaluate, but rather builds an expression tree that represent the computation; you need to do say `eval(A * B * C)` to obtain the actual value, and it gives the library room to compute it in the most efficient way.

It's more related to symbolic programming and lazy evaluation than (non-)determinism.

no_wizard

11 days ago

[-]

I wonder when companies will remove the personality out of LLMs by default, especially for tools

dingnuts

11 days ago

[-]

that would require actually curating the training data and eliminating sources that contain casual conversation

too expensive since those are all licensed sources, much easier to train on Reddit data

amelius

11 days ago

[-]

Just ask an LLM to remove the personality from the training data. Then train a new LLM on that.

omneity

10 days ago

[-]

It will work, but at the scale needed for pretraining you are bound to have many quality issues that will destroy your student model, so your data cleaning process better be very capable.

One way to think of it is that any little bias or undesirable path in your teacher model will be amplified in the resulting data and is likely to become over represented in the student model.

11 days ago

[-]

> maybe we'll come to view deterministic vs. probabilistic (LLMs) similarly

I can't believe someone would seriously write this and not realize how nonsensical it is.

"indeterministic programming", you seriously cannot come up with a bigger oxymoron.

11 days ago

[-]

Why do people keep having this reaction to something we're already used to? When you're developing against an API, you're already doing the same thing, planning for what happens when the request hangs, or fails completely, or gives a different response, and so on. Same for basically any IO.

It's almost not even new, just that it generates text instead of JSON, or whatever. But we've already been doing "indeterministic programming" for a long time, where you cannot always assume a function 100% returns what it should all the time.

11 days ago

[-]

You’re right about the trees but wrong (hear me out) about the forest.

Yes, programming isn’t always deterministic, not just due to the leftpad API endpoint being down, but by design - you can’t deterministically tell which button the user is going to click. So far so good.

But, you program for the things that you expect to happen, and handle the rest as errors. If you look at the branching topology of well-written code, the majority of paths lead to an error. Most strings are not valid json, but are handled perfectly well as errors. The paths you didn’t predict can cause bugs, and those bugs can be fixed.

Within this system, you have effective local determinism. In practice, this gives you the following guarantee: if the program executed correctly until point X, the local state is known. This state is used to build on top of that, and continue the chain of bounded determinism, which is so incredibly reliable on modern CPUs that you can run massive financial transactions and be sure it works. Or, run a weapons system or a flight control system.

So when people point out that LLMs are non-deterministic (or technically unstable, to avoid bike-shedding), they mean that it’s a fundamentally different type of component in an engineering system. It’s not like retrying an HTTP request, because when things go wrong it doesn’t produce “errors”, it produces garbage that looks like gold.

loudmax

11 days ago

[-]

Programmers aren't deterministic either. If I ask ten programmers to come up with a solution to the same problem, I'm not likely to get ten identical copies. Different programmers, even competent experienced programmers, might have different priorities that aren't in the requirements. For example, trading off program maintainability or portability over performance.

The same could apply to LLMs, or even different runs from the same LLMs.

10 days ago

[-]

> Programmers aren't deterministic either.

No but programs are. An LLM can be a programmer too, but it’s not a program the way we want and expect programs to behave: deterministically. Even if a programmer could perform a TLS handshake manually very fast, ignoring the immense waste of energy, the program is a much better engineering component, simply because it is deterministic and does the same thing every time. If there’s a bug, it can be fixed, and then the bug will not re-appear.

> If I ask ten programmers to come up with a solution to the same problem, I'm not likely to get ten identical copies.

Right, but you only want one copy. If you need different clients speaking with each other you need to define a protocol and run conformance tests, which is a lot of work. It’s certainly doable, but you don’t want a different program every time you run it.

I really didn’t expect arguing for reproducibility in engineering to be controversial. The primary way we fix bugs is by literally asking for steps to reproduction. This is not possible when you have a chaos agent in the middle, no matter how good. The only reasonable conclusion is to treat AI systems as entirely different components and isolate them such that you can keep the boring predictability of mechanistic programs. Basically separating engineering from the alchemy.

11 days ago

[-]

Not really, we have many implementations of web servers or ftp clients, but they all follow the same protocol. So you can pair any two things that talk the same protocol and have a consistent systems. If you gave ten programmers a specs, you get ten implementations that follows the specs. With LLMs, you get random things.

dax_

11 days ago

[-]

Why would we embrace that even more? In Software Development we try to keep things deterministic as much as possible. The more variables we're introducing into our software, the more complicated it becomes.

The whole notion of adding LLM prompts as a replacement for code just seems utterly insane to me. It would be a massive waste of resources as we're reprompting AI a lot more frequently than we need to. Also must be fun to debug, as it may or may not work correctly depending on how the LLM model is feeling at that moment. Compilation should always be deterministic, given the same environment.

jason_oster

8 days ago

[-]

Some algorithms are inherently probabilistic (bloom filters are a very common example, HyperLogLog is another). If we accept that probabilistic algorithms are useful, then we can extrapolate that to using LLMs (or other neural networks) for similar useful work.

You can make the LLM/NN deterministic. That was never a problem.

alganet

11 days ago

[-]

> request hangs, or fails completely, or gives a different response

I try to avoid those, not celebrate them.

12 days ago

[-]

Great talk, thanks for putting it online so quickly. I liked the idea of making the generation / verification loop go brrr, and one way to do this is to make verification not just a human task, but a machine task, where possible.

Yes, I am talking about formal verification, of course!

That also goes nicely together with "keeping the AI on a tight leash". It seems to clash though with "English is the new programming language". So the question is, can you hide the formal stuff under the hood, just like you can hide a calculator tool for arithmetic? Use informal English on the surface, while some of it is interpreted as a formal expression, put to work, and then reflected back in English? I think that is possible, if you have a formal language and logic that is flexible enough, and close enough to informal English.

Yes, I am talking about abstraction logic [1], of course :-)

So the goal would be to have English (German, ...) as the ONLY programming language, invisibly backed underneath by abstraction logic.

[1] http://abstractionlogic.com

12 days ago

[-]

> So the question is, can you hide the formal stuff under the hood, just like you can hide a calculator tool for arithmetic? Use informal English on the surface, while some of it is interpreted as a formal expression, put to work, and then reflected back in English?

The problem with trying to make "English -> formal language -> (anything else)" work is that informality is, by definition, not a formal specification and therefore subject to ambiguity. The inverse is not nearly as difficult to support.

Much like how a property in an API initially defined as being optional cannot be made mandatory without potentially breaking clients, whereas making a mandatory property optional can be backward compatible. IOW, the cardinality of "0 .. 1" is a strict superset of "1".

12 days ago

[-]

> The problem with trying to make "English -> formal language -> (anything else)" work is that informality is, by definition, not a formal specification and therefore subject to ambiguity. The inverse is not nearly as difficult to support.

Both directions are difficult and important. How do you determine when going from formal to informal that you got the right informal statement? If you can judge that, then you can also judge if a formal statement properly represents an informal one, or if there is a problem somewhere. If you detect a discrepancy, tell the user that their English is ambiguous and that they should be more specific.

amelius

11 days ago

[-]

LLMs are pretty good at writing small pieces of code, so I suppose they can very well be used to compose some formal logic statements.

12 days ago

[-]

> Use informal English on the surface, while some of it is interpreted as a formal expression, put to work, and then reflected back in English? I think that is possible, if you have a formal language and logic that is flexible enough, and close enough to informal English.

That sounds like a paradox.

Formal verification can prove that constraints are held. English cannot. mapping between them necessarily requires disambiguation. How would you construct such a disambiguation algorithm which must, by its nature, be deterministic?

11 days ago

[-]

Going from informal to formal can be done using autoformalization [1]. The real question is, how do you judge that the result is correct?

[1] Autoformalization with Large Language Models — https://papers.nips.cc/paper_files/paper/2022/hash/d0c6bc641...

andrepd

11 days ago

[-]

Not gonna lie, after skimming the website and a couple preprints for 10 minutes my crank detector is off the charts. Your very vague comments adds to it.

But maybe I just don't understand.

11 days ago

[-]

Yes, you just don't understand :-)

I am working on making it simpler to understand, and particularly, simpler to use.

PS: People keep browsing the older papers although they are really outdated. I've updated http://abstractionlogic.com to point to the newest information instead.

redbell

11 days ago

[-]

> "English is the new programming language."

For those who missed it, here's the viral tweet by Karpathy himself: https://x.com/karpathy/status/1617979122625712128

throwaway314155

11 days ago

[-]

Referenced in the video of course. Not that everyone should watch a 40 minute long video before commenting but his reaction to the "meme" that vibe coding became when his tweet was intended as more of a shower thought is worth checking out.

11 days ago

[-]

> became when his tweet was intended as more of a shower thought

That was so obvious to me, and most of the people I talked to at the time, yet the ecosystem and media seems to have run with his "vibe-coding" idea as something people should implement in production yesterday, even though it wasn't meant as a "mantra" or even "here is where we should go"...

singularity2001

12 days ago

[-]

lean 4/5 will be a rising star!

12 days ago

[-]

You would definitely think so, Lean is in a great position here!

I am betting though that type theory is not the right logic for this, and that Lean can be leapfrogged.

gylterud

12 days ago

[-]

I think type theory is exactly right for this! Being so similar to programming languages, it can piggy back on the huge amount of training the LLMs have on source code.

I am not sure lean in part is the right language, there might be challengers rising (or old incumbents like Agda or Roq can find a boost). But type theory definitely has the most robust formal systems at the moment.

12 days ago

[-]

> Being so similar to programming languages

I think it is more important to be close to English than to programming languages, because that is the critical part:

"As close to a programming language as necessary, as close to English as possible"

is the goal, in my opinion, without sacrificing constraints such as simplicity.

gylterud

12 days ago

[-]

Why? Why would the language used to express proof of correctness have anything to do with English?

English was not developed to facilitate exact and formal reasoning. In natural language ambiguity is a feature, in formal languages it is unwanted. Just look at maths. The reasons for all the symbols is not only brevity but also precision. (I dont think the symbolism of mathematics is something to strive for though, we can use sensible names in our languages, but the structure will need to be formal and specialised to the domain.)

I think there could be meaningful work done to render the statements of the results automatically into (a restricted subset of) English for ease of human verification that the results proven are actually the results one wanted. I know there has been work in this direction. This might be viable. But I think the actual language of expressing results and proofs would have to be specialised for precision. And there I think type theory has the upper hand.

12 days ago

[-]

My answer is already in my previous comment: if you have two formal languages to choose from, you want the one closer to natural language, because it will be easier to see if informal and formal statements match. Once you are in formal land, you can do transformations to other formal systems as you like, as these can be machine-verified. Does that make sense?

polivier

11 days ago

[-]

> if you have two formal languages to choose from, you want the one closer to natural language

Given the choice I'd rather use Python than COBOL even though COBOL is closer to English than Python.

11 days ago

[-]

Not really. You want the one more aligned to the domain. Think music notation. Languages have more evolved to match abstractions that help with software engineering principles than to help with layman understanding. (take SQL and the relational model, they have more relation with each other than the former with natural languages)

voidhorse

12 days ago

[-]

Why? By the completeness theorem, shouldn't first order logic already be sufficient?

The calculus of constructions and other approaches are already available and proven. I'm not sure why we'd need a special logic for LLMs unless said logic somehow accounts for their inherently stochastic tendencies.

tylerhou

11 days ago

[-]

Completeness for FOL specifically says that semantic implications (in the language of FOL) have syntactic proofs. There are many concepts that are inexpressible in FOL (for example, the class of all graphs which contain a cycle).

12 days ago

[-]

If first-order logic is already sufficient, why are most mature systems using a type theory? Because type theory is more ergonomic and practical than first-order logic. I just don't think that type theory is ergonomic and practical enough. That is not a special judgement with respect to LLMs, I want a better logic for myself as well. This has nothing to do with "stochastic tendencies". If it is easier to use for humans, it will be easier for LLMs as well.

kordlessagain

12 days ago

[-]

This thread perfectly captures what Karpathy was getting at. We're witnessing a fundamental shift where the interface to computing is changing from formal syntax to natural language. But you can see people struggling to let go of the formal foundations they've built their careers on.

uncircle

12 days ago

[-]

> This thread perfectly captures what Karpathy was getting at. We're witnessing a fundamental shift where the interface to computing is changing from formal syntax to natural language.

Yes, telling a subordinate with natural language what you need is called being a product manager. Problem is, the subordinate has encyclopedic knowledge but it's also extremely dumb in many aspects.

I guess this is good for people that got into CS and hate the craft so prefer doing management, but in many cases you still need in your team someone with a IQ higher than room temperature to deliver a product. The only "fundamental" shift here is killing the entry-level coder at the big corp tasked at doing menial and boilerplate tasks, when instead you can hire a mechanical replacement from an AI company for a few hundred dollars a month.

sponnath

11 days ago

[-]

I think the only places where the entry-level coder is being killed are corps that never cared about the junior to senior pipeline. Some of them love off-shoring too so I'm not sure much has changed.

kevinventullo

11 days ago

[-]

“Wait… junior engineers don’t have short-term positive ROI?”

“Never did.”

bobxmax

11 days ago

[-]

> Problem is, the subordinate has encyclopedic knowledge but it's also extremely dumb in many aspects.

Most PMs would say the same thin

norir

12 days ago

[-]

Have you thought through the downsides of letting go of these formal foundations that have nothing to do with job preservation? This comes across as a rather cynical interpretation of the motivations of those who have concerns.

otabdeveloper4

12 days ago

[-]

> We're witnessing a fundamental shift where the interface to computing is changing from formal syntax to natural language.

People have said this every year since the 1950's.

No, it is not happening. LLMs won't help.

Writing code is easy, it's understanding the problem domain is hard. LLMs won't help you understand the problem domain in a formal manner. (In fact they might make it even more difficult.)

andrepd

11 days ago

[-]

Exactly. It's the uncomfortable truths well xkcd.

simplify

11 days ago

[-]

Let's be real, people have said similar things about AI too. It was all fluff, until it wasn't.

otabdeveloper4

11 days ago

[-]

AI still doesn't have a valid and sustainable business use case.

People are just assuming that the hallucination and bullshitting issues will just go away with future magic releases, but they won't.

megaman821

11 days ago

[-]

Yep, that why I never write anything out using mathmatical expressions. Natural language only baby!

Eggpants

11 days ago

[-]

No. Karpathy has long embraced the Silly-con valley “Fake it until you make it” mind set. One of his slides even had a frame of Tesla self driving video that was later revealed to be faked.

It’s in his financial self interest to over inflate LLM’s beyond their “cool math bar trick” level. They are a lossy text compression technique with stolen text sources.

All this “magic” is just function calls behind the scenes doing web/database/math/etc for the LLM.

Anyone who claims LLMs have a soul either truly doesn’t understand how they work (association rules++) or has hitched their financial wagon to this grift. It’s the crypto coin bruhs looking for their next score.

12 days ago

[-]

Not really. There’s a problem to be solved, and the solution is always best exprimed in formal notation, because we can then let computers do it and not worry about it.

We already have natural languages for human systems and the only way it works is because of shared metaphors and punishment and rewards. Everyone is incentivized to do a good job.

mkleczek

12 days ago

[-]

This is why I call all this AI stuff BS.

Using a formal language is a feature, not a bug. It is a cornerstone of all human engineering and scientific activity and is the _reason_ why these disciplines are successful.

What you are describing (ie. ditching formal and using natural language) is moving humanity back towards magical thinking, shamanism and witchcraft.

[1] https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...

bwfan123

11 days ago

[-]

> Using a formal language is a feature, not a bug. It is a cornerstone of all human engineering and scientific activity and is the _reason_ why these disciplines are successful

A similar argument was also made by Dijkstra in this brief essay here [1] - which is timely to this debate of why "english is the new programming language" is not well-founded.

I quote a brief snippet here:

"The virtue of formal texts is that their manipulations, in order to be legitimate, need to satisfy only a few simple rules; they are, when you come to think of it, an amazingly effective tool for ruling out all sorts of nonsense that, when we use our native tongues, are almost impossible to avoid."

andrepd

11 days ago

[-]

_Amazing_ read. It's really remarkable how many nuggets of wisdom are contained in such a small text!

catoc

10 days ago

[-]

If only we could get our politicians to only express themselves using formal texts. The clarity it would bring… the honesty it would enforce… the efficiency they would achieve.

jason_oster

11 days ago

[-]

> What you are describing (ie. ditching formal and using natural language) is moving humanity back towards magical thinking ...

"Any sufficiently advanced technology is indistinguishable from magic."

discreteevent

11 days ago

[-]

indistinguishable from magic != magic

11 days ago

[-]

Exactly. Clearly LLMs are not magic, so why do people insist that using LLMs is the same as believing in magic?

12 days ago

[-]

> is the _reason_ why these disciplines

Would you say that ML isn't a successful discipline? ML is basically balancing between "formal language" (papers/algorithms) and "non-deterministic outcomes" (weights/inference) yet it seems useful in a wide range of applications, even if you don't think about LLMs at all.

> towards magical thinking, shamanism and witchcraft.

I kind of feel like if you want to make a point about how something is bullshit, you probably don't want to call it "magical thinking, shamanism and witchcraft" because no matter how good your point is, if you end up basically re-inventing the witch hunt, how is what you say not bullshit, just in the other way?

11 days ago

[-]

ML is basically greedy determinism. If we can’t get the correct answer, we try to get one that is most likely wrong, but give us enough information that we can make a decision. So the answer is not useful, but its nature is.

If we take object detection in computer vision, the detection by itself is not accurate, but it helps with resources management. instead of expensive continuous monitoring, we now have something cheaper which moves the expensive part to be discrete.

But something deterministic would be always more preferable because you only needs to do verification once.

12 days ago

[-]

> Would you say that ML isn't a successful discipline?

Not yet it isn't; all I am seeing are tools to replace programmers and artists :-/

Where are the tools to take in 400 recipes and spit out all of them in a formal structure (poster upthread literally gave up on trying to get an LLM to do this). Tools that can replace the 90% of office staff who aren't programmers?

Maybe it's a successful low-code industry right now, it's not really a successful AI industry.

12 days ago

[-]

> Not yet it isn't; all I am seeing are tools to replace programmers and artists :-/

You're missing a huge part of the ecosystem, ML is so much more than just "generative AI", which seems to be the extent of your experience so far.

Weather predictions, computer vision, speech recognition, medicine research and more are already improved by various machine learning techniques, and already was before the current LLM/generative AI. Wikipedia has a list of ~50 topics where ML is already being used, in production, today ( https://en.wikipedia.org/wiki/Machine_learning#Applications ) if you're feeling curious about exploring the ecosystem more.

12 days ago

[-]

> You're missing a huge part of the ecosystem, ML is so much more than just "generative AI", which seems to be the extent of your experience so far.

I'm not missing anything; I'm saying the current boom is being fueled by claims of "replacing workers", but the only class of AI being funded to do that are LLMs, and the only class of worker that might get replaced are programmers and artists.

Karpathy's video, and this thread, are not about the un-hyped ML stuff that has been employed in various disciplines since 2010 and has not been proposed as a replacement for workers.

mkleczek

12 days ago

[-]

> Would you say that ML isn't a successful discipline? ML is basically balancing between "formal language" (papers/algorithms) and "non-deterministic outcomes" (weights/inference) yet it seems useful in a wide range of applications

Usefulness of LLMs has yet to be proven. So far there is more marketing in it than actual, real world results. Especially comparing to civil and mechanical engineering, maths, electrical engineering and plethora of disciplines and methods that bring real world results.

12 days ago

[-]

> Usefulness of LLMs has yet to be proven.

What about ML (Machine Learning) as a whole? I kind of wrote ML instead of LLMs just to avoid this specific tangent. Are you feelings about that field the same?

mkleczek

12 days ago

[-]

> What about ML (Machine Learning) as a whole? I kind of wrote ML instead of LLMs just to avoid this specific tangent. Are you feelings about that field the same?

No - I only expressed my thoughts about using natural language for computing.

neuronic

12 days ago

[-]

It's called gatekeeping and the gatekeepers will be the ones left in the dust. This has been proven time and time again. Better learn to go with the flow - judging LLMs on linear improvements or even worse on today's performance is a fool's errand.

Even if improvements level off and start plateauing, things will still get better and for careful guided, educated use LLMs have already become a great accelerator in many ways. StackOverflow is basically dead now which in itself is a fundamental shift from just 3-4 years ago.

12 days ago

[-]

This was my favorite talk at AISUS because it was so full of concrete insights I hadn't heard before and (even better) practical points about what to build now, in the immediate future. (To mention just one example: the "autonomy slider".)

If it were up to me, which it very much is not, I would try to optimize the next AISUS for more of this. I felt like I was getting smarter as the talk went on.

kaycebasques

11 days ago

[-]

On one hand, I think Karpathy is a gifted educator in a way that's not repeatable as a science. On the other, if the conference leaders next year told every presenter to watch this talk and emulate how Karpathy focuses on concrete insights and suggests what to build now, then the overall quality of presentations would probably trend higher.

hgl

12 days ago

[-]

It’s fascinating to think about what true GUI for LLM could be like.

It immediately makes me think a LLM that can generate a customized GUI for the topic at hand where you can interact with in a non-linear way.

https://x.com/OriolVinyalsML/status/1935005985070084197

karpathy

12 days ago

[-]

Fun demo of an early idea was posted by Oriol just yesterday :)

spamfilter247

12 days ago

[-]

My takeaway from the demo is less that "it's different each time", but more a "it can be different for different users and their styles of operating" - a poweruser can now see a different Settings UI than a basic user, and it can be generated realtime based on the persona context of the user.

Example use case (chosen specifically for tech): An IDE UI that starts basic, and exposes functionality over time as the human developer's skills grow.

superfrank

12 days ago

[-]

On one hand, I'm incredibly impressed by the technology behind that demo. On the other hand, I can't think of many things that would piss me off more than a non-deterministic operating system.

I like my tools to be predictable. Google search trying to predict that I want the image or shopping tag based on my query already drives me crazy. If my entire operating system did that, I'm pretty sure I'd throw my computer out a window.

12 days ago

[-]

> incredibly impressed by the technology behind that demo

An LLM generating some HTML?

superfrank

12 days ago

[-]

At a speed that feels completely seamless to navigate through. Yeah, I'm pretty impressed by that.

11 days ago

[-]

Read the code that is actually being generated. It's only the content of the page, which itself is loaded progressively.

It takes 2 seconds to generate an extremely basic 300 characters page of content. Again, what is impressive here?

It's not fast, it gives the illusion of being fast.

superfrank

10 days ago

[-]

I know what it's doing and I'm impressed. If you understand what it's doing and aren't impressed, that's cool too. I think we just see things differently and I doubt either of us will convince the other one to change their mind on this

hackernewds

12 days ago

[-]

it's impressive but it seems like a crappier UX? that none of the patterns can really be memorized

asterisk_

11 days ago

[-]

I feel like one quickly hits a similar partial observability problem as with e.g. light sensors. How often do you wave around annoyed because the light turned off.

To get _truly_ self driving UIs you need to read the mind of your users. It's some heavy tailed distribution all the way down. Interesting research problem on its own.

We already have adaptive UIs (profiles in VSC anyone? Vim, Emacs?) they're mostly under-utilized because takes time to setup + most people are not better at designing their own workflow relative to the sane default.

aprilthird2021

12 days ago

[-]

This is crazy cool, even if not necessarily the best use case for this idea

throwaway314155

11 days ago

[-]

I would bet good money that many of the functions they chose not to drill down into (such as settings -> volume) do nothing at all or cause an error.

It's a fronted generator. It's fast. That's cool. But is being pitched as a functioning OS generator and I can't help but think it isn't given the failure rates for those sorts of tasks. Further, the success rates for HTML generation probably _are_ good enough for a Holmes-esque (perhaps too harsh) rugpull (again, too harsh) demo.

A cool glimpse into what the future might look like in any case.

superconduct123

11 days ago

[-]

That looks both cool and infuriating

suddenlybananas

12 days ago

[-]

Having different documents come up every time you go into the documents directory seems hellishly terrible.

falcor84

12 days ago

[-]

It's a brand of terribleness I've somewhat gotten used to, opening Google Drive every time, when it takes me to the "Suggested" tab. I can't recall a single time when it had the document I care about anywhere close to the top.

There's still nothing that beats the UX of Norton Commander.

12 days ago

[-]

[flagged]

https://news.ycombinator.com/newsguidelines.html

11 days ago

[-]

"Please don't fulminate."

"Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative."

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize."

danielbln

12 days ago

[-]

Maybe we can collect all of this salt and operate a Thorium reactor with it, this in turn can then power AI.

12 days ago

[-]

We'll need to boil a few more lakes before we get to that stage I'm afraid, who needs water when you can have your AI hallucinate some for you after all?

12 days ago

[-]

Who needs water when all these hot takes come from sources so dense, they're about to collapse into black holes.

12 days ago

[-]

Is me not wanting the UI of my OS to shift with every mouse click a hot take? If me wanting to have the consistent "When I click here, X happens" behavior instead of the "I click here and I'm Feeling Lucky happens" behavior is equal to me being dense, so be it I guess.

12 days ago

[-]

No. But you interpreting and evaluating the demo in question as suggesting the things you described - frankly, yes. It takes a deep gravity well to miss a point this clear from this close.

It's a tech demo. It shows you it's possible to do these things live, in real time (and to back Karpathy's point about tech spread patterns, it's accessible to you and me right now). It's not saying it's a good idea - but there are obvious seeds of good ideas there. For one, it shows you a vision of an OS or software you can trivially extend yourself on the fly. "I wish it did X", bam, it does. And no one says it has to be non-deterministic each time you press some button. It can just fill what's missing and make additions permanent, fully deterministic after creation.

cjcenizal

12 days ago

[-]

My friend Eric Pelz started a company called Malleable to do this very thing: https://www.linkedin.com/posts/epelz_every-piece-of-software...

whatarethembits

11 days ago

[-]

I'm curious where this ends up going.

Personally I think its a mistake; at least at "team" level. One of the most valuable things about a software or framework dictating how things are done is to give a group of people a common language to communicate with and enforce rules. This is why we generally prefer to use a well documented framework, rather than letting a "rockstar engineer" roll their own. Only they will understand its edge cases and ways of thinking, everyone else will pay a price to adapt to that, dragging everyone's productivity down.

Secondly, most people don't know what they want or how they want to work with a specific piece of software. Its simply not important enough, in the hierarchy of other things they care about, to form opinions about how a specific piece of software ought to work. What they want, is the easiest and fastest way to get something done and move on. It takes insight, research and testing to figure out what that is in a specific domain. This is what "product people" are supposed to figure out; not farm it out to individual users.

swader999

11 days ago

[-]

You bake those rules into the folders in a Claude.md file and it becomes it's guide when building or changing anything. Ubiquitous language and all that Jazz.

jonny_eh

12 days ago

[-]

An ever-shifting UI sounds unlearnable, and therefore unusable.

12 days ago

[-]

It wouldn't be unlearnable if it fits the way the user is already thinking.

12 days ago

[-]

AI is not mind reading.

12 days ago

[-]

Behavioral patterns are not unpredictable. Who knows how far an LLM could get by pattern-matching what a user is doing and generating a UI to make it easier. Since the user could immediately say whether they liked it or not, this could turn into a rapid and creative feedback loop.

kevinventullo

11 days ago

[-]

So, if the user likes UI’s that don’t change, the LLM will figure out that it should do nothing?

One problem LLM’s don’t fix is the misalignment between app developers’ incentives and users’ incentives. Since the developer controls the LLM, I imagine that a “smart” shifting UI would quickly devolve into automated dark patterns.

10 days ago

[-]

A user who doesn't want such changes shouldn't be subjected to them in the first place, so there should be nothing for an LLM to figure out.

I'm with you on disliking dark patterns but it seems to me a separate issue.

NitpickLawyer

12 days ago

[-]

A sufficiently advanced prediction engine is indistinguishable from mind reading :D

OtherShrezzing

12 days ago

[-]

A mixed ever-shifting UI can be excellent though. So you've got some tools which consistently interact with UI components, but the UI itself is altered frequently.

Take for example world-building video games like Cities Skylines / Sim City or procedural sandboxes like Minecraft. There are 20-30 consistent buttons (tools) in the game's UX, while the rest of the game is an unbounded ever-shifting UI.

12 days ago

[-]

The rest of the game is very deterministic where its state is controlled by the buttons. The slight variation is caused by the simulation engine and follows consistent patterns (you can’t have building on fire if there’s no building yet).

9rx

12 days ago

[-]

Tools like v0 are a primitive example of what the above is talking about. The UI maintains familiar conventions, but is laid out dynamically based on surrounding context. I'm sure there are still weird edge cases, but for the most part people have no trouble figuring out how to use the output of such tools already.

sotix

12 days ago

[-]

Like Spotify ugh

dpkirchner

12 days ago

[-]

Like a HyperCard application?

necrodome

12 days ago

[-]

We (https://vibes.diy/) are betting on this

12 days ago

[-]

Border-line off-topic, but since you're flagrantly self-promoting, might as well add some more rule breakage to it.

You know websites/apps who let you enter text/details and then not displaying sign in/up screen until you submit it, so you feel like "Oh but I already filled it out, might as well sign up"?

They really suck, big time! It's disingenuous, misleading and wastes people's time. I had no interest in using your thing for real, but thought I'd try it out, potentially leave some feedback, but this bait-and-switch just made the whole thing feel sour and I'll probably try to actively avoid this and anything else I feel is related to it.

necrodome

12 days ago

[-]

Thanks for the benefit of the doubt. I typed that in a hurry, and it didn’t come out the way I intended.

We had the idea that there’s a class of apps [1] that could really benefit from our tooling - mainly Fireproof, our local-first database, along with embedded LLM calling and image generation support. The app itself is open source, and the hosted version is free.

Initially, there was no login or signup - you could just generate an app right away. We knew that came with risks, but we wanted to explore what a truly frictionless experience could look like. Unfortunately, it didn’t take long for our LLM keys to start getting scraped, so the next best step was to implement rate limiting in the hosted version.

[1] https://tools.simonwillison.net/

12 days ago

[-]

My complaint isn't about that you need to protect it with a login/signup, but where in the process you put that login/signup.

Put it before letting people enter text, rather than once they've entered text and pressed the button, and people won't feel mislead anymore.

jchrisa

11 days ago

[-]

The generation is running while you login, so this appreciable decreases wait time from idea to app, because by the time you click through the login, your app is ready. (Vibes DIY CEO here.)

If login takes 30 seconds, and app gen 90, we think this is better for users (but clearly not everyone agrees.) Thanks for the feedback!

stoisesky

12 days ago

[-]

This talk https://www.youtube.com/watch?v=MbWgRuM-7X8 explores the idea of generative / malleable personal user interfaces where LLMs can serve as the gateway to program how we want our UI to be rendered.

nbbaier

12 days ago

[-]

I love this concept and would love to know where to look for people working on this type of thing!

stuartmemo

12 days ago

[-]

It's probably Jira. https://medium.com/question-park/all-aboard-the-ai-train-b03...

semi-extrinsic

12 days ago

[-]

Humans are shit at interacting with systems in a non-linear way. Just look at Jupyter notebooks and the absolute mess that arises when you execute code blocks in arbitrary order.

bicepjai

9 days ago

[-]

What is the mess you are referring with regards to Jupyter notebooks ?

semi-extrinsic

9 days ago

[-]

If you run cells out of order, you get weird results. Thus you have efforts like marimo which replace jupyter with something that reruns all dependent cells.

nilirl

12 days ago

[-]

Where do these analogies break down?

1. Similar cost structure to electricity, but non-essential utility (currently)?

2. Like an operating system, but with non-determinism?

3. Like programming, but ...?

Where does the programming analogy break down?

PeterStuer

12 days ago

[-]

Define non-essenti

The way I see dependency in office ("knowledge") work:

- pre-(computing) history. We are at the office, we work

- dawn of the pc: my computer is down, work halts

- dawn of the lan: the network is down, work halts

- dawn of the Internet: the Internet connection is down, work halts (<- we are basically all here)

- dawn of the LLM: ChatGPT is down, work halts (<- for many, we are here already)

nilirl

12 days ago

[-]

I see your point. It's nearing essential.

rudedogg

12 days ago

[-]

> programming

The programming analogy is convenient but off. The joke has always been “the computer only does exactly what you tell it to do!” regarding logic bugs. Prompts and LLMs most certainly do not work like that.

I loved the parallels with modern LLMs and time sharing he presented though.

12 days ago

[-]

> Prompts and LLMs most certainly do not work like that.

It quite literally works like that. The computer is now OS + user-land + LLM runner + ML architecture + weights + system prompt + user prompt.

Taken together, and since you're adding in probabilities (by using ML/LLMs), you're quite literally getting "the computer only does exactly what you tell it to do!", it's just that we have added "but make slight variations to what tokens you select next" (temperature>0.0) sometimes, but it's still the same thing.

Just like when you tell the computer to create encrypted content by using some seed. You're getting exactly what you asked for.

politelemon

12 days ago

[-]

only in English, and also non-deterministic.

malux85

12 days ago

[-]

Yeah, wherever possible I try to have the llm answer me in Python rather than English (especially when explaining new concepts)

English is soooooo ambiguous

falcor84

12 days ago

[-]

For what it's worth, I've been using it to help me learn math, and I added to my rules an instruction that it should always give me an example in Python (preferably sympy) whenever possible.

[1] https://medium.com/@drewwww/the-gambler-and-the-genie-08491d...

mikewarot

12 days ago

[-]

A few days ago, I was introduced to the idea that when you're vibe coding, you're consulting a "genie", much like in the fables, you almost never get what you asked for, but if your wishes are small, you might just get what you want.

The primagen reviewed this article[1] a few days ago, and (I think) that's where I heard about it. (Can't re-watch it now, it's members only) 8(

anythingworks

12 days ago

[-]

that's a really good analogy! It feels like wicked joke that llms behave in such a way that they're both intelligent and stupid at the same time

fudged71

12 days ago

[-]

“You are an expert 10x software developer. Make me a billion dollar app.” Yeah this checks out

12 days ago

[-]

Tight feedback loops are the key in working productively with software. I see that in codebases up to 700k lines of code (legacy 30yo 4GL ERP systems).

The best part is that AI-driven systems are fine with running even more tight loops than what a sane human would tolerate.

Eg. running full linting, testing and E2E/simulation suite after any minor change. Or generating 4 versions of PR for the same task so that the human could just pick the best one.

bandoti

12 days ago

[-]

Here’s a few problems I foresee:

1. People get lazy when presented with four choices they had no hand in creating, and they don’t look over the four and just click one, ignoring the others. Why? Because they have ten more of these on the go at once, diminishing their overall focus.

2. Automated tests, end-to-end sim., linting, etc—tools already exist and work at scale. They should be robust and THOROUGHLY reviewed by both AI and humans ideally.

3. AI is good for code reviews and “another set of eyes” but man it makes serious mistakes sometimes.

An anecdote for (1), when ChatGPT tries to A/B test me with two answers, it’s incredibly burdensome for me to read twice virtually the same thing with minimal differences.

Code reviewing four things that do almost the same thing is more of a burden than writing the same thing once myself.

12 days ago

[-]

A simple rule applies: "No matter what tool created the code, you are still responsible for what you merge into main".

As such, task of verification, still falls on hands of engineers.

Given that and proper processes, modern tooling works nicely with codebases ranging from 10k LOC (mixed embedded device code with golang backends and python DS/ML) to 700k LOC (legacy enterprise applications from the mainframe era)

xpe

12 days ago

[-]

> A simple rule applies: "No matter what tool created the code, you are still responsible for what you merge into main".

Beware of claims of simple rules.

Take one subset of the problem: code reviews in an organizational environment. How well does they simple rule above work?

The idea of “Person P will take responsibility” is far from clear and often not a good solution. (1) P is fallible. (2) Some consequences are too great to allow one person to trigger them, which is why we have systems and checks. (3) P cannot necessarily right the wrong. (4) No-fault analyses are often better when it comes to long-term solutions which require a fear free culture to reduce cover-ups.

But this is bigger than one organization. The effects of software quickly escape organizational boundaries. So when we think about giving more power to AI tooling, we have to be really smart. This means understanding human nature, decision theory, political economy [1], societal norms, and law. And building smart systems (technical and organizational)

Recommending good strategies for making AI generated code safe is hard problem. I’d bet it is a much harder than even “elite” software developers people have contemplated, much less implemented. Training in software helps but is insufficient. I personally have some optimism for formal methods, defense in depth, and carefully implemented human-in-the-loop systems.

[1] Political economy uses many of the tools of economics to study the incentives of human decision making

ponector

12 days ago

[-]

> As such, task of verification, still falls on hands of engineers.

Even before LLM it was a common thing to merge changes which completely brake test environment. Some people really skip verification phase of their work.

bandoti

12 days ago

[-]

Agreed. I think engineers though following simple Test-Driven Development procedures can write the code, unit tests, integration tests, debug, etc for a small enough unit by default forces tight feedback loops. AI may assist in the particulars, not run the show.

I’m willing to bet, short of droid-speak or some AI output we can’t even understand, that when considering “the system as a whole”, that even with short-term gains in speed, the longevity of any product will be better with real people following current best-practices, and perhaps a modest sprinkle of AI.

Why? Because AI is trained on the results of human endeavors and can only work within that framework.

12 days ago

[-]

Agreed. AI is just a tool. Letting in run the show is essentially what the vibe-coding is. It is a fun activity for prototyping, but tends to accumulate problems and tech debt at an astonishing pace.

Code, manually crafted by professionals, will almost always beat AI-driven code in quality. Yet, one has still to find such professionals and wait for them to get the job done.

I think, the right balance is somewhere in between - let tools handle the mundane parts (e.g. mechanically rewriting that legacy Progress ABL/4GL code to Kotlin), while human engineers will have fun with high-level tasks and shaping the direction of the project.

eddd-ddde

12 days ago

[-]

With lazy people the same applies for everything, code they do write, or code they review from peers. The issue is not the tooling, but the hands.

chamomeal

12 days ago

[-]

I am not a lazy worker but I guarantee you I will not thoroughly read through and review four PRs for the same thing

freehorse

12 days ago

[-]

The more tedious the work is, the less motivation and passion you get for doing it, and the more "lazy" you become.

Laziness does not just come from within, there are situations that promote behaving lazy, and others that don't. Some people are just lazy most of the time, but most people are "lazy" in some scenarios and not in others.

https://blog.stackademic.com/my-new-hobby-watching-copilot-s...

bandoti

12 days ago

[-]

Seurat created beautiful works of art composed of thousands of tiny dots, painted by hand; one might find it meditational with the right mindset.

Some might also find laziness itself dreadfully boring—like all the Microsoft employees code-reviewing AI-Generated pull requests!

OvbiousError

12 days ago

[-]

I don't think the human is the problem here, but the time it takes to run the full testing suite.

tlb

12 days ago

[-]

Yes, and (some near-future) AI is also more patient and better at multitasking than a reasonable human. It can make a change, submit for full fuzzing, and if there's a problem it can continue with the saved context it had when making the change. It can work on 100s of such changes in parallel, while a human trying to do this would mix up the reasons for the change with all the other changes they'd done by the time the fuzzing result came back.

LLMs are worse at many things than human programmers, so you have to try to compensate by leveraging the things they're better at. Don't give up with "they're bad at such and such" until you've tried using their strengths.

HappMacDonald

12 days ago

[-]

You can't run N bots in parallel with testing between each attempt unless you're also running N tests in parallel.

If you could run N tests in parallel, then you could probably also run the components of one test in parallel and keep it from taking 2 hours in the first place.

To me this all sounds like snake oil to convince people to do something they were already doing, but by also spinning up N times as many compute instances and run a burn endless tokens along the way. And by the time it's demonstrated that it doesn't really offer anything more than doing it yourself, well you've already given them all of your money so their job is done.

12 days ago

[-]

Running tests is already an engineering problem.

In one of the systems (supply chain SaaS) we invested so much effort in having good tests in a simulated environment, that we could run full-stack tests at kHz. Roughly ~5k tests per second or so on a laptop.

12 days ago

[-]

Humans tend to lack inhumane patience.

12 days ago

[-]

It is kind of a human problem too, although that the full testing suite takes X hours to run is also not fun, but it makes the human problem larger.

Say you're Human A, working on a feature. Running the full testing suite takes 2 hours from start to finish. Every change you do to existing code needs to be confirmed to not break existing stuff with the full testing suite, so some changes it takes 2 hours before you have 100% understanding that it doesn't break other things. How quickly do you lose interest, and at what point do you give up to either improve the testing suite, or just skip that feature/implement it some other way?

Now say you're Robot A working on the same task. The robot doesn't care if each change takes 2 hours to appear on their screen, the context is exactly the same, and they're still "a helpful assistant" 48 hours later when they still try to get the feature put together without breaking anything.

If you're feeling brave, you start Robot B and C at the same time.

12 days ago

[-]

This is the workflow that ChatGPT Codex demonstrates nicely. Launch any number of «robotic» tasks in parallel, then go on your own. Come back later to review the results and pick good ones.

12 days ago

[-]

Well, they're demonstrating it somewhat, it's more of a prototype today. First tell is the low limit, I think the longest task for me been 15 minutes before it gives up. Second tell is still using a chat UI which is simple to implement, easy to implement and familiar, but also kind of lazy. There should be a better UX, especially with the new variations they just added. From the top of my head, some graph-like UX might have been better.

12 days ago

[-]

I guess, it depends on the case and the approach.

It works really nice with the following approach (distilled from experiences reported by multiple companies)

(1) Augment codebase with explanatory texts that describe individual modules, interfaces and interactions (something that is needed for the humans anyway)

(2) Provide Agent.MD that describes the approach/style/process that the AI agent must take. It should also describe how to run all tests.

(3) Break down the task into smaller features. For each feature - ask first to write a detailed implementation plan (because it is easier to review the plan than 1000 lines of changes. spread across a dozen files)

(4) Review the plan and ask to improve it, if needed. When ready - ask to draft an actual pull request

(5) The system will automatically use all available tests/linting/rules before writing the final PR. Verify and provide feedback, if some polish is needed.

(6) Launch multiple instances of "write me an implementation plan" and "Implement this plan" task, to pick the one that looks the best.

This is very similar to git-driven development of large codebases by distributed teams.

Edit: added newlines

12 days ago

[-]

> distilled from experiences reported by multiple companies

Distilled from my experience, I'd still say that the UX is lacking, as sequential chat just isn't the right format. I agree with Karpathy that we haven't found the right way of interacting with these OSes yet.

Even with what you say, variations were implemented in a rush. Once you've iterated with one variation you can not at the same time iterate on another variant, for example.

10 days ago

[-]

Yes. I believe, the experience will get better. Plus more AI vendors will catch up with OpenAI and offer similar experiences in their products.

It will just take a few months.

12 days ago

[-]

Worked in such a codebase for about 5 years.

No one really cares about improving test times. Everyone either suffers in private or gets convinced it's all normal and look at you weird when you suggest something needs to be done.

12 days ago

[-]

There a few of us around, but it's not a lot, agree. It really is an uphill battle trying to get development teams to design and implement test suites the same way they do with other "more important" code.

londons_explore

12 days ago

[-]

The full test suite is probably tens of thousands of tests.

But AI will do a pretty decent job of telling you which tests are most likely to fail on a given PR. Just run those ones, then commit. Cuts your test time from hours down to seconds.

Then run the full test suite only periodically and automatically bisect to find out the cause of any regressions.

Dramatically cuts the compute costs of tests too, which in big codebase can easily become whole-engineers worth of costs.

tele_ski

12 days ago

[-]

It's an interesting idea, but reactive, and could cause big delays due to bisecting and testing on those regressions. There's the 'old' saying that the sooner the bug is found the cheaper it is to fix, seems weird to intentionally push finding side effect bugs later in the process because faster CI runs. Maybe AI will get there but it seems too aggressive right now to me. But yeah, put the automation slider where you're comfortable.

Byamarro

12 days ago

[-]

I work in web dev, so people sometimes hook code formatting as a git commit hook or sometimes even upon file save. The tests are problematic tho. If you work at huge project it's a no go idea at all. If you work at medium then the tests are long enough to block you, but short enough for you not to be able to focus on anything else in the meantime.

9rx

12 days ago

[-]

Unless you are doing something crazy like letting the fuzzer run on every change (cache that shit), the full test suite taking a long time suggests that either your isolation points are way too large or you are letting the LLM cross isolated boundaries and "full testing suite" here actually means "multiple full testing suites". The latter is an easy fix: Don't let it. Force it stay within a single isolation zone just like you'd expect of a human. The former is a lot harder to fix, but I suppose ending up there is a strong indicator that you can't trust the human picking the best LLM result in the first place and that maybe this whole thing isn't a good idea for the people in your organization.

yahoozoo

12 days ago

[-]

The problem is that every time you run your full automation with linting and tests, you’re filling up the context window more and more. I don’t know how people using Claude do it with its <300k context window. I get the “your message will exceed the length of this chat” message so many times.

12 days ago

[-]

I don't know exactly how Claude works, but the way I work around this with my own stuff is prompting it to not display full outputs ever, and instead temporary redirect the output somewhere then grep from the log-file what it's looking for. So a test run outputting 10K lines of test output and one failure is easily found without polluting the context with 10K lines.

12 days ago

[-]

Claude's approach is currently a bit dated.

Cursor.sh agents or especially OpenAI Codex illustrate that a tool doesn't need to keep on stuffing context window with irrelevant information in order to make progress on a task.

And if really needed, engineers report that Gemini Pro 2.5 keeps on working fine within 200k-500k token context. Above that - it is better to reset the context.

the_mitsuhiko

12 days ago

[-]

I started to use sub agents for that. That does not pollute the context as much

elif

12 days ago

[-]

In my experience with Jules and (worse) Codex, juggling multiple pull requests at once is not advised.

Even if you tell the git-aware Jules to handle a merge conflict within the context window the patch was generated, it is like sorry bro I have no idea what's wrong can you send me a diff with the conflict?

I find i have to be in the iteration loop at every stage or else the agent will forget what it's doing or why rapidly. for instance don't trust Jules to run your full test suite after every change without handholding and asking for specific run results every time.

It feels like to an LLM, gaslighting you with code that nominally addresses the core of what you just asked while completely breaking unrelated code or disregarding previously discussed parameters is an unmitigated success.

layer8

12 days ago

[-]

> Tight feedback loops are the key in working productively with software. […] even more tight loops than what a sane human would tolerate.

Why would a sane human be averse to things happening instantaneously?

latexr

12 days ago

[-]

> Or generating 4 versions of PR for the same task so that the human could just pick the best one.

That sounds awful. A truly terrible and demotivating way to work and produce anything of real quality. Why are we doing this to ourselves and embracing it?

A few years ago, it would have been seen as a joke to say “the future of software development will be to have a million monkey interns banging on one million keyboards and submit a million PRs, then choose one”. Today, it’s lauded as a brilliant business and cost-saving idea.

We’re beyond doomed. The first major catastrophe caused by sloppy AI code can’t come soon enough. The sooner it happens, the better chance we have to self-correct.

chamomeal

12 days ago

[-]

I say this all the time!

Does anybody really want to be an assembly line QA reviewer for an automated code factory? Sounds like shit.

Also I can’t really imagine that in the first place. At my current job, each task is like 95% understanding all the little bits, and then 5% writing the code. If you’re reviewing PRs from a bot all day, you’ll still need to understand all the bits before you accept it. So how much time is that really gonna save?

12 days ago

[-]

> Does anybody really want to be an assembly line QA reviewer for an automated code factory? Sounds like shit.

On the other hand, does anyone really wanna be a code-monkey implementing CRUD applications over and over by following product specifications by "product managers" that barely seem to understand the product they're "managing"?

See, we can make bad faith arguments both ways, but what's the point?

consumer451

11 days ago

[-]

I hesitate to divide a group as diverse as software devs into two categories, but here I go:

I have a feeling that devs who love LLM coding tools are more product-driven than those who hate them.

Put another way, maybe devs with their own product ideas love LLM coding tools, whilr devs without them do not.

I am genuinely not trying to throw shade here in any way. Does this rough division ring true to anyone else? Is there any better way to put it?

chamomeal

22 hours ago

[-]

No I think that’s accurate! But maybe instead of “devs who think about product stuff vs devs who don’t”, it depends on what hat you’re wearing.

When I’m working on something that I just want it to work, I love using LLMs. Shell functions for me to stuff into my config and use without ever understanding, UI for side projects that I don’t particularly care about, boilerplate nestjs config crap. Anything where all I care about is the result, not the process or the extensibility of the code: I love LLMs for that stuff.

When it’s something that I’m going to continue working on for a while, or the whole point is the extensibility/cleanliness of the program, I don’t like to use LLMs nearly as much.

I think it might be because most codebases are built with two purposes: 1) to be used as a product 2) to be extended and turned into something else

LLMs are super good at the first purpose, but not so good at the second.

I heard an interesting interview on the playdate dev podcast by the guy who made Obra Dinn. He said something along the lines of “making a game is awesome because the code can be horrible. All that matters is that the game works and is fun, and then you are done. It can just be finished, and then the code quality doesn’t matter anymore.”

So maybe LLMs are just really good for when you need something specific to work, and the internals don’t matter too much. Which are more the values of a product manager than a developer.

So it makes sense that when you are thinking more product-oriented, LLMs are more appealing!

nevertoolate

12 days ago

[-]

Issue is if product people will do the “coding” and you have to fix it is miserable

12 days ago

[-]

Even worse would be if we asked the accountants to do the coding, then you'll learn what miserable means.

What was the point again?

nevertoolate

12 days ago

[-]

Yes

ponector

12 days ago

[-]

>That sounds awful.

Not for the cloud provider. AWS bill to the moon!

osigurdson

12 days ago

[-]

I'm not sure that AI code has to be sloppy. I've had some success with hand coding some examples and then asking codex to rigorously adhere to prior conventions. This can end up with very self consistent code.

Agree though on the "pick the best PR" workflow. This is pure model training work and you should be compensated for it.

elif

12 days ago

[-]

Yep this is what Andrej talks about around 20 minutes into this talk.

You have to be extremely verbose in describing all of your requirements. There is seemingly no such thing as too much detail. The second you start being vague, even if it WOULD be clear to a person with common sense, the LLM views that vagueness as a potential aspect of it's own creative liberty.

pja

12 days ago

[-]

> You have to be extremely verbose in describing all of your requirements. There is seemingly no such thing as too much detail.

Sounds like ... programming.

Program specification is programming, ultimately. For any given problem if you’re lucky the specification is concise & uniquely defines the required program. If you’re unlucky the spec ends up longer than the code you’d write to implement it, because the language you’re writing it in is less suited to the problem domain than the actual code.

longhaul

11 days ago

[-]

Agree, I used to say that documenting a program precisely and comprehensively ends up being code. We either need a DSL that can specify at a higher level or use domain specific LLMs.

jebarker

12 days ago

[-]

> the LLM views that vagueness as a potential aspect of it's own creative liberty.

I think that anthropomorphism actually clouds what’s going on here. There’s no creative choice inside an LLM. More description in the prompt just means more constraints on the latent space. You still have no certainty whether the LLM models the particular part of the world you’re constraining it to in the way you hope it does though.

joshuahedlund

12 days ago

[-]

> You have to be extremely verbose in describing all of your requirements. There is seemingly no such thing as too much detail

I understand YMMV, but I have yet to find a use case where this takes me less time than writing the code myself.

throw234234234

11 days ago

[-]

I've found myself personally thinking English is OK when I'm happy with a "lossy expansion" and don't need every single detail defined (i.e. the tedious boilerplate, or templating kind of code). After all to me an LLM can be seen as a lossy compression of actual detailed examples of working code - why not "uncompress it" and let it assume the gaps. As an example I want a UI to render some data but I'm not as fussed about the details of it, I don't want to specify exact co-ordinates of each button, etc

However when I want detailed changes I find it more troublesome at present than just typing in the code myself. i.e. I know exactly what I want and I can express it just as easily (sometimes easier) in code.

I find AI in some ways a generic DSL personally. The more I have to define, the more specific I have to be the more I start to evaluate code or DSL's as potentially more appropriate tools especially when the details DO matter for quality/acceptance.

9rx

12 days ago

[-]

> You have to be extremely verbose in describing all of your requirements. There is seemingly no such thing as too much detail.

If only there was a language one could use that enables describing all of your requirements in a unambiguous manner, ensuring that you have provided all the necessary detail.

Oh wait.

SirMaster

12 days ago

[-]

I'm really waiting for AI to get on par with the common sense of most humans in their respective fields.

12 days ago

[-]

I think you'll be waiting for a very long time. Right now we have programmable LLMs, so if you're not getting the results, you need to reprogram it to give the results you want.

bonoboTP

12 days ago

[-]

If it's monkeylike quality and you need a million tries, it's shit. It you need four tries and one of those is top-tier professional programmer quality, then it's good.

agos

12 days ago

[-]

if the thing producing the four PRs can't distinguish the top tier one, I have strong doubts that it can even produce it

12 days ago

[-]

Making 4 PRs for a well-known solution sounds insane, yes, but to be the devil's advocate, you could plausibly be working with an ambiguous task: "Create 4 PRs with 4 different dependency libraries, so that I can compare their implementations." Technically it wouldn't need to pick the best one.

I have apprehension about the future of software engineering, but comparison does technically seem like a valid use case.

layer8

12 days ago

[-]

The problem is, for any change, you have to understand the existing code base to assess the quality of the change in the four tries. This means, you aren’t relieved from being familiar with the code and reviewing everything. For many developers this review-only work style isn’t an exciting prospect.

And it will remain that way until you can delegate development tasks to AI with a 99+% success rate so that you don’t have to review their output and understand the code base anymore. At which point developers will become truly obsolete.

12 days ago

[-]

Top-tier professional programmer quality is exceedingly, impractically optimistic, for a few reasons.

1. There's a low probability of that in the first place.

2. You need to be a top-tier professional programmer to recognize that type of quality (i.e. a junior engineer could select one of the 3 shit PRs)

3. When it doesn't produce TTPPQ, you wasted tons of time prompting and reviewing shit code and still need to deliver, net negative.

I'm not doubting the utility of LLMs but the scattershot approach just feels like gambling to me.

12 days ago

[-]

Also as a consequence of (1) the LLMs are trained on mediocre code mostly, so they often output mediocre or bad solutions.

12 days ago

[-]

> A truly terrible and demotivating way to work and produce anything of real quality

You clearly have strong feelings about it, which is fine, but it would be much more interesting to know exactly why it would terrible and demotivating, and why it cannot produce anything of quality? And what is "real quality" and does that mean "fake quality" exists?

> million monkey interns banging on one million keyboards and submit a million PRs

I'm not sure if you misunderstand LLMs, or the famous "monkeys writing Shakespeare" part, but that example is more about randomness and infinity than about probabilistic machines somewhat working towards a goal with some non-determinism.

> We’re beyond doomed

The good news is that we've been doomed for a long time, yet we persist. If you take a look at how the internet is basically held up by duct-tape at this point, I think you'd feel slightly more comfortable with how crap absolutely everything is. Like 1% of software is actually Good Software while the rest barely works on a good day.

12 days ago

[-]

If "AI" worked (which fortunately isn't the case), humans would be degraded to passive consumers in the last domain in which they were active creators: thinking.

Moreover, you would have to pay centralized corporations that stole all of humanity's intellectual output for engaging in your profession. That is terrifying.

The current reality is also terrifying: Mediocre developers are enabled to have a 10x volume (not quality). Mediocre execs like that and force everyone to use the "AI" snakeoil. The profession becomes even more bureaucratic, tool oriented and soulless.

People without a soul may not mind.

12 days ago

[-]

> If "AI" worked (which fortunately isn't the case), humans would be degraded to passive consumers in the last domain in which they were active creators: thinking.

"AI" (depending on what you understand that to be) is already "working" for many, including myself. I've basically stopped using Google because of it.

> humans would be degraded to passive consumers in the last domain in which they were active creators: thinking

Why? I still think (I think at least), why would I stop thinking just because I have yet another tool in my toolbox?

> you would have to pay centralized corporations that stole all of humanity's intellectual output for engaging in your profession

Assuming we'll forever be stuck in the "mainframe" phase, then yeah. I agree that local models aren't really close to SOTA yet, but the ones you can run locally can already be useful in a couple of focused use cases, and judging by the speed of improvements, we won't always be stuck in this mainframe-phase.

> Mediocre developers are enabled to have a 10x volume (not quality).

In my experience, which admittedly been mostly in startups and smaller companies, this has always been the case. Most developers seem to like to produce MORE code over BETTER code, I'm not sure why that is, but I don't think LLMs will change people's mind about this, in either direction. Shitty developers will be shit, with or without LLMs.

11 days ago

[-]

The AI as it is currently, will not come up with that new app idea or that clever innovative way of implementing an application. It will endlessly rehash the training data it has ingested. Sure, you can tell an AI to spit out a CRUD, and maybe it will even eventually work in some sane way, but that's not innovative and not necessarily a good software. It is blindly copying existing approaches to implement something. That something is then maybe even working, but lacks any special sauce to make it special.

Example: I am currently building a web app. My goal is to keep it entirely static, traditional template rendering, just using the web as a GUI framework. If I had just told the AI to build this, it would have thrown tons of JS at the problem, because that is what the mainstream does these days, and what it mostly saw as training data. Then my back button would most likely no longer work, I would not be able to use bookmarks properly, it would not automatically have an API as powerful as the web UI, usable from any script, and the whole thing would have gone to shit.

If the AI tools were as good as I am at what I am doing, and I relied upon that, then I would not have spent time trying to think of the principles of my app, as I did when coming up with it myself. As it is now, the AI would not even have managed to prevent duplicate results from showing up in the UI, because I had a GPT4 session about how to prevent that, and none of the suggested AI answers worked and in the end I did what I thought I might have to do when I first discovered the issue.

11 days ago

[-]

> The AI as it is currently, will not come up with that new app idea or that clever innovative way of implementing an application

Who has claimed that they can do that sort of stuff? I don't think my comment hints at that, nor does the talk in the submission.

You're absolutely right with most of your comment, and seem to just be rehashing what Karpathy talks about but with different words. Of course it won't create good software unless you specify exactly what "good software" is for you, and tell it that. Of course it won't know you want "traditional static template rendering" unless you tell it to. Of course it won't create a API you can use from anywhere unless you say so. Of course it'll follow what's in the training data. Of course things won't automatically implement whatever you imagine your project should have, unless you tell it about those features.

I'm not sure if you're just expanding on the talk but chose my previous comment to attach it to, or if you're replying to something I said in my comment.

3dsnano

12 days ago

[-]

> And what is "real quality" and does that mean "fake quality" exists?

I think there is no real quality or fake quality, just quality. I am referencing the quality that Persig and C. Alexander have written about.

It’s… qualitative, so it’s hard to measure but easy to feel. Humans are really good at perceiving it then making objective decisions. LLMs don’t know what it is (they’ve heard about it and think they know).

12 days ago

[-]

> LLMs don’t know what it is

Of course they don't, they're probability/prediction machines, they don't "know" anything, not even that Paris is the capital of France. What they do "know" is that once someone writes "The capital of France is", the most likely tokens to come after that, is "Paris". But they don't understand the concept, nor anything else, just that probably 54123 comes after 6723 (or whatever the tokens are).

Once you understand this, I think it's easy to reason about why they don't understand code quality, why they couldn't ever understand it, and how you can make them output quality code regardless.

12 days ago

[-]

It is actually funny that current AI+Coding tools benefit a lot from domain context and other information along the lines of Domain-Driven Design (which was inspired by the pattern language of C. Alexander).

A few teams have started incorporating `CONTEXT.MD` into module descriptions to leverage this.

12 days ago

[-]

> That sounds awful. A truly terrible and demotivating way to work and produce anything of real quality

This is the right way to work with generative AI, and it already is an extremely common and established practice when working with image generation.

xphos

12 days ago

[-]

"If the only tool you have is a hammer, you tend to see every problem as a nail."

I think the worlds leaning dangerously into LLMs expecting them to solve every problem under the sun. Sure AI can solve problems but I think that domain 1 they Karpathy shows if it is the body of new knowledge in the world doesn't grow with LLMs and agents maybe generation and selection is the best method for working with domain 2/3 but there is something fundamentally lost in the rapid embrace of these AI tools.

A true challenge question for people is would you give up 10 points of IQ for access to the next gen AI model? I don't ask this in the sense that AI makes people stupid but rather that it frames the value of intelligence is that you have it. Rather than, in how you can look up or generate an answer that may or may not be correct quickly. How we use our tools deeply shapes what we will do in the future. A cautionary tale is US manufacturing of precision tools where we give up on teaching people how to use Lathes, because they could simply run CNC machines instead. Now that industry has an extreme lack of programmers for CNC machines, making it impossible to keep up with other precision instrument producing countries. This of course is a normative statement and has more complex variables but I fear in this dead set charge for AI we will lose sight of what makes programming languages and programming in general valuable

deadbabe

12 days ago

[-]

It is not. The right way to work with generative AI is to get the right answer in the first shot. But it's the AI that is not living up to this promise.

Reviewing 4 different versions of AI code is grossly unproductive. A human co-worker can submit one version of code and usually have it accepted with a single review, no other "versions" to verify. 4 versions means you're reading 75% more code than is necessary. Multiply this across every change ever made to a code base, and you're wasting a shitload of time.

RHSeeger

12 days ago

[-]

That's not really comparing apples to apples though.

> A human co-worker can submit one version of code and usually have it accepted with a single review, no other "versions" to verify.

But that human co-worker spent a lot of time generating what is being reviewed. You're trading "time saved coding" for "more time reviewing". You can't complain about the added time reviewing and then ignore all the time saved coding. THat's not to say it's necessarily a win, but it _is_ a tradeoff.

Plus that co-worker may very well have spent some time discussing various approaches to the problem (with you), with is somewhat parallel to the idea of reviewing 4 different PRs.

12 days ago

[-]

> Reviewing 4 different versions of AI code is grossly unproductive.

You can have another AI do that for you. I review manually for now though (summaries, not the code, as I said in another message).

notTooFarGone

12 days ago

[-]

I can recognize images in one look.

How about that 400 Line change that touches 7 files?

12 days ago

[-]

Exactly!

This is why there has to be "write me a detailed implementation plan" step in between. Which files is it going to change, how, what are the gotchas, which tests will be affected or added etc.

It is easier to review one document and point out missing bits, than chase the loose ends.

Once the plan is done and good, it is usually a smooth path to the PR.

bayindirh

12 days ago

[-]

So you can create a more buggy code remixed from scraped bits from the internet which you don't understand, but somehow works rather than creating a higher quality, tighter code which takes the same amount of time to type? All the while offloading all the work to something else so your skills can atrophy at the same time?

Sounds like progress to me.

11 days ago

[-]

Here is another way to look at the problem.

There is a team of 5 people that are passionate about their indigenous language and want to preserve it from disappearing. They are using AI+Coding tools to:

(1) Process and prepare a ton of various datasets for training custom text-to-speech, speech-to-text models and wake word models (because foundational models don't know this language), along with the pipelines and tooling for the contributors.

(2) design and develop an embedded device (running ESP32-S3) to act as a smart speaker running on the edge

(3) design and develop backend in golang to orchestrate hundreds of these speakers

(4) a whole bunch of Python agents (essentially glorified RAGs over folklore, stories)

(5) a set of websites for teachers to create course content and exercises, making them available to these edge devices

All that, just so that kids in a few hundred kindergartens and schools would be able to practice their own native language, listen to fairy tales, songs or ask questions.

This project was acknowledged by the UN (AI for Good programme). They are now extending their help to more disappearing languages.

None of that was possible before. This sounds like a good progress to me.

Edit: added newlines.

bayindirh

11 days ago

[-]

What you are describing is another application. My comment was squarely aimed at "vibe coding".

Protecting and preserving dying languages and culture is a great application for natural language processing.

For the record, I'm neither against LLMs, nor AI. What I'm primarily against is, how LLMs are trained and use the internet via their agents, without giving any citations, and stripping this information left and right and cry "fair use!" in the process.

Also, Go and Python are a nice languages (which I use), but there are other nice ways to build agents which also allows them to migrate, communicate and work in other cooperative or competitive ways.

So, AI is nice, LLMs are cool, but hyping something to earn money, deskill people, and pointing to something which is ethically questionable and technically inferior as the only silver bullet is not.

IOW; We should handle this thing way more carefully and stop ripping people's work in the name of "fair use" without consent. This is nuts.

Disclosure: I'm a HPC sysadmin sitting on top of a datacenter which runs some AI workloads, too.

10 days ago

[-]

I think there are two different layers that get frequently mixed.

(1) LLMs as models - just the weights and an inference engine. These are just tools like hammers. There is a wide variety of models, starting from transparent and useless IBM Granite models, to open-weights Llama/Qwen to proprietary.

(2) AI products that are built on top of LLMs (agents, RAG, search, reasoning etc). This is how people decide to use LLMs.

How these products display results - with or without citations, with or without attribution - is determined by the product design.

It takes more effort to design a system that properly attributes all bits of information to the sources, but it is doable. As long as product teams are willing to invest that effort.

mistersquid

12 days ago

[-]

> I can recognize images in one look.

> How about that 400 Line change that touches 7 files?

Karpathy discusses this discrepancy. In his estimation LLMs currently do not have a UI comparable to 1970s CLI. Today, LLMs output text and text does not leverage the human brain’s ability to ingest visually coded information, literally, at a glance.

Karpathy surmises UIs for LLMs are coming and I suspect he’s correct.

variadix

12 days ago

[-]

The thing required isn’t a GUI for LLMs, it’s a visual model of code that captures all the behavior and is a useful representation to a human. People have floated this idea before LLMs, but as far as I know there isn’t any real progress, probably because it isn’t feasible. There’s so much intricacy and detail in software (and getting it even slightly wrong can be catastrophic), any representation that can capture said detail isn’t going to be interpretable at a glance.

11 days ago

[-]

There’s no visual model for code as code isn’t 2d. There’s 2 mechanism in the turing machine model: a state machine and a linear representation of code and data. The 2d representation of state machine has no significance and the linear aspect of code and data is hiding more dimensions. We invented more abstractions, but nothing that map to a visual representation.

mistersquid

12 days ago

[-]

> The thing required isn’t a GUI for LLMs, it’s a visual model of code that captures all the behavior and is a useful representation to a human.

The visual representation that would be useful to humans is what Karpathy means by “GUI for LLMs”.

12 days ago

[-]

In my prompt I ask the LLM to write a short summary of how it solved the problem, run multiple instances of LLM concurrently, compare their summaries, and use the output of whichever LLM seems to have interpreted instructions the best, or arrived at the best solution.

elt895

12 days ago

[-]

And you trust that the summary matches what was actually done? Your experience with the level of LLMs understanding of code changes must significantly differ from mine.

12 days ago

[-]

It matched every time so far.

sothatsit

12 days ago

[-]

I find Karpathy's focus on tightening the feedback loop between LLMs and humans interesting, because I've found I am the happiest when I extend the loop instead.

When I have tried to "pair program" with an LLM, I have found it incredibly tedious, and not that useful. The insights it gives me are not that great if I'm optimising for response speed, and it just frustrates me rather than letting me go faster. Worse, often my brain just turns off while waiting for the LLM to respond.

OTOH, when I work in a more async fashion, it feels freeing to just pass a problem to the AI. Then, I can stop thinking about it and work on something else. Later, I can come back to find the AI results, and I can proceed to adjust the prompt and re-generate, to slightly modify what the LLM produced, or sometimes to just accept its changes verbatim. I really like this process.

geeunits

12 days ago

[-]

I would venture that 'tightening the feedback loop' isn't necessarily 'increasing the number of back and forth prompts'- and what you're saying you want is ultimately his argument. i.e. if integral enough it can almost guess what you're going to say next...

sothatsit

12 days ago

[-]

I specifically do not want AI as an auto-correct, doing auto-predictions while I am typing. I find this interrupts my thinking process, and I've never been bottlenecked by typing speed anyway.

I want AI as a "co-worker" providing an alternative perspective or implementing my specific instructions, and potentially filling in gaps I didn't think about in my prompt.

jwblackwell

12 days ago

[-]

Yeah I am currently enjoying giving the LLM relatively small chunks of code to write and then asking it to write accompanying tests. While I focus on testing the product myself. I then don't even bother to read the code it's written most of the time

blobbers

12 days ago

[-]

Software 3.0 is the code generated by the machine, not the prompts that generated it. The prompts don't even yield the same output; there is randomness.

The new software world is the massive amount of code that will be burped out by these agents, and it should quickly dwarf the human output.

pelagicAustral

12 days ago

[-]

I think that if you give the same task to three different developers you'll get three different implementations. It's not a random result if you do get the functionality that was expected, and at that, I do think the prompt plays an important role in offering a view of how the result was achieved.

12 days ago

[-]

> I think that if you give the same task to three different developers you'll get three different implementations.

Yes, but if you want them to be compatible you need to define a protocol and conformance test suite. This is way more work than writing a single implementation.

The code is the real spec. Every piece of unintentional non-determinism can be a hazard. That’s why you want the code to be the unit of maintenance, not a prompt.

12 days ago

[-]

I know! Let's encode the spec into a format that doesn't have the ambiguities of natural language.

12 days ago

[-]

Right. Great idea. Maybe call it ”formal execution spec for LLM reference” or something. It could even be versioned in some kind of distributed merkle tree.

blobbers

11 days ago

[-]

Interestingly, I was generating some scraping code today. I prompted it fairly generically and it decided to spit out some Selenium code. I reported the stack trace failure, and it gave me a new script with playwright. That failed also and it gave me some suggestions to fix it. I asked it to update the whole script rather than snippets, and it responded with "Hey let's not use either of these and here we'll use the site's API." and proceeded to do that.

Kind of crazy, it basically found 3 different hammers to hit the nail I wanted. The API unfortunately seems to be timeing out (I had to add the timeout=10 to the post u_u)

fritzo

11 days ago

[-]

Code is read much more often than it is written. Code generated by the machine today will be prompt read by the machine going forward. It's a closed loop.

Software is a world in motion. Software 1.0 was animated by developers pushing it around. Software 3.0 is additionally animated by AI agents.

tamersalama

12 days ago

[-]

How I understood it is that natural language will form relatively large portions of stacks (endpoint descriptions, instructions, prompts, documentations, etc…). In addition to code generated by agents (which would fall under 1.0)

12 days ago

[-]

It is not the code, which just like prompts is a written language. Software 3.0 will be branches of behaviors, by the software and by the users all documented in a feedback loop. The best behaviors will be merged by users and the best will become the new HEAD. Underneath it all will be machine code for the hardware, but it will be the results that dictate progress.

beacon294

12 days ago

[-]

What is this "clerk" library he used at this timestamp to tell him what to do? https://youtu.be/LCEmiRjPEtQ?si=XaC-oOMUxXp0DRU0&t=1991

Gemini found it via screenshot or context: https://clerk.com/

This is what he used for login on MenuGen: https://karpathy.bearblog.dev/vibe-coding-menugen/

xnx

12 days ago

[-]

That blog post is a great illustration that most of the complexity/difficulty of a web app is in the hosting and not in the useful code.

fullstackchris

11 days ago

[-]

clerk is an auth library - and finally one that doesnt require dozens of lines to do things like, i dont know, check if the user is logged in

and wild... you used gemini to process a screenshot to find the website for a 5 letter word library?

10 days ago

[-]

Not gemini but google lens. Maybe gemini already has some agentic capabilities

wjohn

12 days ago

[-]

The comparison of our current methods of interacting with LLMs (back and forth text) to old-school terminals is pretty interesting. I think there's still a lot work to be done to optimize how we interact with these models, especially for non-dev consumers.

informal007

12 days ago

[-]

Audio maybe the better option.

recursive

12 days ago

[-]

Based on my experience with voicemail, I'd say that audio is not always best, and is sometimes in the running for worst.

magicloop

11 days ago

[-]

I think this is a brilliant talk and truly captures the "zeitgeist" of our times. He sees the emergent patterns arising as software creation is changing.

I am writing a hobby app at the moment and I am thinking about its architecture in a new way now. I am making all my model structures comprehensible so that LLMs can see the inside semantics of my app. I merely provide a human friendly GUI over the top to avoid the linear wall-of-text problem you get when you want to do something complex via a chat interface.

We need to meet LLMs in the middle ground to leverage the best of our contributions - traditional code, partially autonomous AI, and crafted UI/UX.

Part of, but not all of, programming is "prompting well". It goes along with understanding the imperative aspects, developing a nose for code smells, and the judgement for good UI/UX.

I find our current times both scary and exciting.

11 days ago

[-]

I am actually working on building a semantic TypeScript server right now. It's going really good, check it out. https://github.com/screencam/typescript-mcp-server

polishdude20

10 days ago

[-]

With your tool can I hook it up to Cursor and just have it use it?

10 days ago

[-]

I’ve only used it with Claude code

I did a post in Show HN where you can see the installation instructions. I would put them here, but ware on the iPad. It’s an MCP server so it should work. I would’ve thought cursor and other IDs would have some type of sytactic analysis built-in

anythingworks

12 days ago

[-]

loved the analogies! Karpathy is consistently one of the clearest thinkers out there.

interesting that Waymo could do uninterrupted trips back in 2013, wonder what took them so long to expand? regulation? tailend of driving optimization issues?

noticed one of the slides had a cross over 'AGI 2027'... ai-2027.com :)

AlotOfReading

12 days ago

[-]

You don't "solve" autonomous driving as such. There's a long, slow grind of gradually improving things until failures become rare enough.

petesergeant

12 days ago

[-]

I wonder at what point all the self-driving code becomes replaceable with a multimodal generalist model with the prompt “drive safely”

anon7000

12 days ago

[-]

Very advanced machine learning models are used in current self driving cars. It all depends what the model is trying to accomplish. I have a hard time seeing a generalist prompt-based generative model ever beating a model specifically designed to drive cars. The models are just designed for different, specific purposes

tshaddox

12 days ago

[-]

I could see it being the case that driving is a fairly general problem, and this models intentionally designed to be general end up doing better than models designed with the misconception that you need a very particular set of driving-specific capabilities.

12 days ago

[-]

Driving is not a general problem, though. Its a contextual landscape of fast-based reactions and predictions. Both are required, and done regularly by the human element. The exact nature of every reaction, and every prediction, change vastly within the context window.

You need image processing just as much as you need scenario management, and they're orthoganol to each other, as one example.

If you want a general transport system... We do have that. It's called rail. (And can and has been automated.)

12 days ago

[-]

It partially is. You have the specialized part of maneuvering a fast moving vehicle in physical world, trying to keep it under control at all times and never colliding with anything. Then you have the general part, which is navigating the human environment. That's lanes and traffic signs and road works and schoolbuses, that's kids on the road and badly parked trailers.

Current breed of autonomous driving systems have problems with exceptional situations - but based on all I've read about so far, those are exactly of the kind that would benefit from a general system able to understand the situation it's in.

tshaddox

11 days ago

[-]

Yes, that’s exactly what I meant. I’d go even further and say the hard parts of driving are the parts where you are likely better off with a general model. And it’s not just signs, construction, police stopping traffic, etc. Even just basic navigation amongst traffic seems to require a general model of the other nearby drivers. It’s important to be able to model drivers’ intentions, and also to drive your own car in a predictable manner.

melvinmelih

12 days ago

[-]

> Driving is not a general problem, though.

But what's driving a car? A generalist human brain that has been trained for ~30 hours to drive a car.

12 days ago

[-]

Human brain's aren't generalist!

We have multiple parts of the brain that interact in vastly different ways! Your cerebellum won't be running the role of the pons.

Most parts of the brain cannot take over for others. Self-healing is the exception, not the rule. Yes, we have a degree of neuroplasticity, but there are many limits.

(Sidenote: Driver's license here is 240 hours.)

azan_

12 days ago

[-]

> We have multiple parts of the brain that interact in vastly different ways!

Yes, and thanks to that human brains are generalist

[0] https://doi.org/10.1016/j.neuroimage.2022.119673

12 days ago

[-]

Only if that was a singular system, however, it is not. [0]

For example... The nerve cells in your gut may speak to the brain, and interact with it in complex ways we are only just beginning to understand, but they are separate systems that both have control over the nervous system, and other systems. [1]

General Intelligence, the psychological theory, and General Modelling, whilst sharing words, share little else.

[1] https://doi.org/10.1126/science.aau9973

yusina

12 days ago

[-]

240 hours sounds excessive. Where is "here"?

Zanfa

12 days ago

[-]

> Human brain's aren't generalist!

What? Human intelligence is literally how AGI is defined. Brain’s physical configuration is irrelevant.

12 days ago

[-]

A human brain is not a general model. We have multiple overlapping systems. The physical configuration is extremely relevant to that.

AGI is defined in terms of "General Intelligence", a theory that general modelling is irrelevant to.

anythingworks

12 days ago

[-]

exactly! I think that was tesla's vision with self-driving to begin with... so they tried to frame it as problem general enough, that trying to solve it would also solve questions of more general intelligence ('agi') i.e. cars should use vision just like humans would

but in hindsight looks like this slowed them down quite a bit despite being early to the space...

mannicken

12 days ago

[-]

Speed and Moore's law. You don't need to just make a decision without hallucinations, you need to do it fast enough for it to propagate to the power electronics and hit the gas/brake/turn the wheel/whatever. Over and over and over again on thousands of different tests.

A big problem I am noticing is that the IT culture over the last 70 years has existed in a state of "hardware gun get faster soon". And over the last ten years we had a "hardware cant get faster bc physics sorry" problem.

The way we've been making software in the 90s and 00s just isn't gonna be happening anymore. We are used to throwing more abstraction layers (C->C++->Java->vibe coding etc) at the problem and waiting for the guys in the fab to hurry up and get their hardware faster so our new abstraction layers can work.

Well, you can fire the guys in the fab all you want but no matter how much they try to yell at the nature it doesn't seem to care. They told us the embedded c++-monkeys to spread the message. Sorry, the moore's law is over, boys and girls. I think we all need to take a second to take that in and realize the significance of that.

[1] The "guys in the fab" are a fictional character and any similarity to the real world is a coincidence.

[2] No c++-monkeys were harmed in the process of making this comment.

AlotOfReading

12 days ago

[-]

One of the issues with deploying models like that is the lack of clear, widely accepted ways to validate comprehensive safety and absence of unreasonable risk. If that can be solved, or regulators start accepting answers like "our software doesn't speed in over 95% of situations", then they'll become more common.

yokto

12 days ago

[-]

This is (in part) what "world models" are about. While some companies like Tesla bring together a fleet of small specialised models, others like CommaAI and Wayve train generalist models.

12 days ago

[-]

> Karpathy is consistently one of the clearest thinkers out there.

Eh, he ran Teslas self driving division and put them into a direction that is never going to fully work.

What they should have done is a) trained a neural net to represent sequence of frames into a physical environment, and b)leveraged Mu Zero, so that self driving system basically builds out parallel simulations into the future, and does a search on the best course of action to take.

Because thats pretty much what makes humans great drivers. We don't need to know what a cone is - we internally compute that something that is an object on the road that we are driving towards is going to result in a negative outcome when we collide with it.

AlotOfReading

12 days ago

[-]

Aren't continuous, stochastic, partial knowledge environments where you need long horizon planning with strict deadlines and limited compute exactly the sort of environments muzero variants struggle with? Because that's driving.

It's also worth mentioning that humans intentionally (and safely) drive into "solid" objects all the time. Bags, steam, shadows, small animals, etc. We also break rules (e.g. drive on the wrong side of the road), and anticipate things we can't even see based on a theory of mind of other agents. Human driving is extremely sophisticated, not reducible to rules that are easily expressed in "simple" language.

10 days ago

[-]

I didn't say use Mu Zero end to end, I said leverage it.

This is how I would do it:

First, you come up with a compressed representation of the state space of the terrain + other objects around your car that encodes the current states of everything, and its predicted evolution like ~5 seconds into the future.

The idea is that you would leverage physics, which means objects need to behave according to laws of motion, so this means you can greatly compress how this is represented. For example, a meshgrid of "terrain" other than empty road that is static, lane lines representing the road, and 3d boxes representing moving objects with a certain mass, with initial 6 dof state (xyz position, orientation), intial 6dof velocities, and 6 dof forcing functions with parameter of time that represent how these objects move.

So given this representation, you can write a program that simulates the evolution of the state space given any initial condition, and essentially simulate collisions.

Then you divide into 3 teams.

1st team trains a model to translate sensor data into this state space representation, with continuous updates on every cycle, leveraging things like Kalman filtering because of the correlation of certain things that leads to better accuracy. Overall you would get something where things like red brake lights would lead to deceleration forcing functions.

(If you wanted to get fancy, instead of a simulation, you build out probability space instead. I.e when you run the program, it would spit out a heat map of where certain objects are more likely to end up)

2nd team trains a model on real world traffic to find correlations between forcing functions of vehicles. I.e if a car slows down, the cars behind it would slow down. You could do this kinda like Tesla did - equip all your cars with sensors, assume driver inputs as the forcing function, observe the state space change given the model from team 1.

3nd team trains a Mu Zero like model given the 2 above. Given a random initial starting state, the "game" is to chose the sequence of accelerations, decelerations, and steering (quantized with finite values) that gets the highest score by a) avoiding collision b) following traffic laws, c) minimizing disturbance to other vehicles, and d) maximizing space around your own vehicle.

What all of this does is allow the model to compute not only expected behavior, but things that are realistically possible. For example, in a situation where collision is imminent, like you sitting at a red stop light, and the sensors detect a car rapidly approaching, the model would make a decision to drive into the intersection when there are no cars present to avoid getting rear ended, which is quantifiably way better than average human.

Furthermore, the models from team 2 and 3 can self improve real time, which is equivalent to humans getting used to driving habits of others in certain areas. You simply to batch training runs to improve prediction capability of other drivers. Then when your policy model makes a correct decision, you build a shortcut into the MCTS that lets you know that this works, which then means in the finite time compute span, you can search away from that tree for a more optimal solution, and if you don't find it, you already have the best one that works, and next time you search even more space. So essentially you get a processing speed up the more you use it.

visarga

12 days ago

[-]

> We don't need to know what a cone is

The counter argument is that you can't zoom in and fix a specific bug in this mode of operation. Everything is mashed together in the same neural net process. They needed to ensure safety, so testing was crucial. It is harder to test an end-to-end system than its individual parts.

impossiblefork

12 days ago

[-]

I don't think that would have worked either.

But if they'd gone for radars and lidars and a bunch of sensors and then enough processing hardware to actually fuse that, then I think they could have built something that had a chance of working.

10 days ago

[-]

Think about this. If I give you GTA 5 traffic in single player with only NPC drivers, could you manually write a policy that gets a player from point a to point b in a car, assuming you have in game positions of all cars?

suddenlybananas

12 days ago

[-]

That's absolutely not what makes humans great drivers?

10 days ago

[-]

Enlighten me please.

tayo42

12 days ago

[-]

Is that the approach that waymo uses?

10 days ago

[-]

Dunno what Waymo uses, but they definitely work in 3d space as a start, rather than trying to map sequences of pictures to action. They also need training on specific areas.

11 days ago

[-]

This DevOps friction is exactly why I'm building an open-source "Firebase for LLMs." The moment you want to add AI to an app, you're forced to build a backend just to securely proxy API calls—you can't expose LLM API keys client-side. So developers who could previously build entire apps backend-free suddenly need servers, key management, rate limiting, logging, deployment... all just to make a single OpenAI call. Anyone else hit this wall? The gap between "AI-first" and "backend-free" development feels very solvable.

https://developer.apple.com/documentation/foundationmodels

smpretzer

11 days ago

[-]

I think this lines up with Apple’s thesis of on-device models being a useful feature for developers who don’t want to deal with calling out the OpenAI

sockboy

11 days ago

[-]

Yeah, hit this exact wall building a small AI tool. Ended up spinning up a whole backend just to keep the keys safe. Feels like there should be a simpler way, but haven’t seen anything that’s truly plug-and-play yet. Curious to see what you’re working on.

dieortin

11 days ago

[-]

It’s very obvious this account was just created to promote your product…

11 days ago

[-]

I don't even have a product although I'd love people to work on something open source together. Also, I'm not nearly cool enough to earn a green username.

swyx

11 days ago

[-]

> This DevOps friction is exactly why I'm building an open-source "Firebase for LLMs."

i dont understand your earlier statement then

shwaj

11 days ago

[-]

I think they were replying to the person with the green user name :-)

androng

11 days ago

[-]

I think the way the friction could be reduced to almost zero was through OpenAI "custom GPTs" https://help.openai.com/en/articles/8554397-creating-a-gpt or "Alexa skills". how much easier can it get than the user using their own OpenAI account? Of course I'd rather have them on my own website but if were talking complete ease of use then I think that is a contender

11 days ago

[-]

Fair point. I'm no expert in custom GPTs, I wonder what limitations there would be beyond the obvious branding and UI/UX control. Like, how far can someone customize a custom GPT (ha). I imagine any multi-step/agentic flows might be a challenge or impossible as it currently exists. It also seems like custom GPTs have been completely forgotten, but I very well could be wrong and OpenAI announced a big investment in them and new features tomorrow.

jeremyjh

11 days ago

[-]

Do you think Firebase and Superbase are working on this? Good luck but to me it sounds like a platform feature, not a standalone product.

11 days ago

[-]

Probably some sort. In the meantime it doesn't currently exist and I want it for myself. I also feel like having something open source and that allows you to bring your own LLM provider might still be useful.

12 days ago

[-]

His dismissal of smaller and local models suggests he underestimates their improvement potential. Give phi4 a run and see what I mean.

mprovost

12 days ago

[-]

You can disagree with his conclusions but I don't think his understanding of small models is up for debate. This is the person who created micrograd/makemore/nanoGPT and who has produced a ton of educational materials showing how to build small and local models.

12 days ago

[-]

I’m going to edit, it was badly formulated, he underestimates their potential for growth is what I meant by that

12 days ago

[-]

> underestimates their potential for growth

As far as I understood the talk and the analogies, he's saying that local models will eventually replace the current popular "mainframe" architecture. How is that underestimating them?

12 days ago

[-]

> suggests a lack of understanding of these smaller models capabilities

If anything, you're showing a lack of understanding of what he was talking about. The context is this specific time, where we're early in a ecosystem and things are expensive and likely centralized (ala mainframes) but if his analogy/prediction is correct, we'll have a "Linux" moment in the future where that equation changes (again) and local models are competitive.

And while I'm a huge fan of local models run them for maybe 60-70% of what I do with LLMs, they're nowhere near proprietary ones today, sadly. I want them to, really badly, but it's important to be realistic here and realize the differences of what a normal consumer can run, and what the current mainframes can run.

12 days ago

[-]

He understands the technical part, of course, I was referring to his prediction that large models will be always be necessary.

There is a point where an LLM is good enough for most tasks, I don’t need a megamind AI in order to greet clients, and both large and small/medium model size are getting there, with the large models hitting a computing/energy demand barrier. The small models won’t hit that barrier anytime soon.

vikramkr

12 days ago

[-]

Did he predict they'd always be necessary? He mostly seemed to predict the opposite, that we're at the early stage of a trajectory that has yet to have it's Linux moment

11 days ago

[-]

I understand, thanks for pointing that out

12 days ago

[-]

I edited to make it clearer

sriram_malhar

12 days ago

[-]

Of all the things you could suggest, a lack of understanding is not one that can be pinned on Karpathy. He does know his technical stuff.

12 days ago

[-]

We all have blind spots

12 days ago

[-]

Sure, but maybe suggesting that the person who literally spent countless hours educating others on how to build small models locally from scratch, is lacking knowledge about local small models is going a bit beyond "people have blind spots".

12 days ago

[-]

Their potential, not how they work, it was very badly formulated, just corrected it

12 days ago

[-]

He ain't dismissing them. Comparing local/"open" model to Linux (and closed services to Windows and MacOS) is high praise. It's also accurate.

12 days ago

[-]

This is a bad comparison

12 days ago

[-]

I tried the local small models. They are slow, much less capable, and ironically much more expensive to run than the frontier cloud models.

12 days ago

[-]

Phi4-mini runs on a basic laptop CPU at 20T/s… how is that slow? Without optimization…

12 days ago

[-]

I was running Qwen3-32B locally even faster, 70T/s, still way too slow for me. I'm generating thousands of tokens of output per request (not coding), running locally I could get 6 mil tokens per day and pay electricity, or I can get more tokens per day from Google Gemini 2.5 Flash for free.

Running models locally is a privilege for the rich and those with too much disposable time.

yencabulator

1 day ago

[-]

Try Qwen3-30B-A3B. It's MoE to an extent where its use of memory bandwidth looks more like a 3B model, and thus it typically goes faster.

1: https://x.com/karpathy/status/1935077692258558443

nico

12 days ago

[-]

Thank you YC for posting this before the talk became deprecated[1]

sandslash

12 days ago

[-]

We couldn't let that happen!

eitally

12 days ago

[-]

It's going to be very interesting to see how things evolve in enterprise IT, especially but not exclusively in regulated industries. As more SaaS services are at least partly vibe coded, how are CIOs going to understand and mitigate risk? As more internal developers are using LLM-powered coding interfaces and become less clear on exactly how their resulting code works, how will that codebase be maintained and incrementally updated with new features, especially in solo dev teams (which is common)?

I easily see a huge future for agentic assistance in the enterprise, but I struggle mightily to see how many IT leaders would accept the output code of something like a menugen app as production-viable.

Additionally, if you're licensing code from external vendors who've built their own products at least partly through LLM-driven superpowers, how do you have faith that they know how things work and won't inadvertently break something they don't know how to fix? This goes for niche tools (like Clerk, or Polar.sh or similar) as much as for big heavy things (like a CRM or ERP).

I was on the CEO track about ten years ago and left it for a new career in big tech, and I don't envy the folks currently trying to figure out the future of safe, secure IT in the enterprise.

charlie0

12 days ago

[-]

It will succeed due to the same reason other sloppy strategies succeed, it has large short term gains and moves risk into the nebulous future. Management LOVES these types of things.

r2b2

12 days ago

[-]

I've found that as LLMs improve, some of their bugs become increasingly slippery - I think of it as the uncanny valley of code.

Put another way, when I cause bugs, they are often glaring (more typos, fewer logic mistakes). Plus, as the author it's often straightforward to debug since you already have a deep sense for how the code works - you lived through it.

So far, using LLMs has downgraded my productivity. The bugs LLMs introduce are often subtle logical errors, yet "working" code. These errors are especially hard to debug when you didn't write the code yourself — now you have to learn the code as if you wrote it anyway.

I also find it more stressful deploying LLM code. I know in my bones how carefully I write code, due to a decade of roughly "one non critical bug per 10k lines" that keeps me asleep at night. The quality of LLM code can be quite chaotic.

That said, I'm not holding my breath. I expect this to all flip someday, with an LLM becoming a better and more stable coder than I am, so I guess I will keep working with them to make sure I'm proficient when that day comes.

thegeomaster

12 days ago

[-]

I have been using LLMs for coding a lot during the past year, and I've been writing down my observations by task. I have a lot of tasks where my first entry is thoroughly impressed by how e.g. Claude helped me with a task, and then the second entry is a few days after when I'm thoroughly irritated by chasing down subtle and just _strange_ bugs it introduced along the way. As a rule, these are incredibly hard to find and tedious to debug, because they lurk in the weirdest places, and the root cause is usually some weird confabulation that a human brain would never concoct.

throw234234234

11 days ago

[-]

Saw a recent talk where someone described AI as making errors, but not errors that a human would naturally make and are usually "plausible but wrong" answers. i.e. the errors that these AI's make are of a different nature than what a human would do. This is the danger - that reviews now are harder; I can't trust it as much as a person coding at present. The agent tools are a little better (Claude Code, Aider, etc) in that they can at least take build and test output but even then I've noticed it does things that are wrong but are "plausible and build fine".

I've noticed it in my day-to-day: an AI PR review is different than if I get the PR from a co-worker with different kinds of problems. Unfortunately the AI issues seem to be more of the subtle kind - the things if I'm not diligent could sneak into production code. It means reviews are more important, and I can't rely on previous experience of a co-worker and the typical quality of their PR's - every new PR is a different worker effectively.

DanHulton

12 days ago

[-]

I'm curious where that expectation of the flip comes from? Your experience (and mine, frankly) would seem to indicate the opposite, so from whence comes this certainty that one day it'll change entirely and become reliable instead?

I ask (and I'll keep asking) because it really seems like the prevailing narrative is that these tools have improved substantially in a short period of time, and that is seemingly enough justification to claim that they will continue to improve until perfection because...? waves hands vaguely

Nobody ever seems to have any good justification for how we're going to overcome the fundamental issues with this tech, just a belief that comes from SOMEWHERE that it'll happen anyway, and I'm very curious to drill down into that belief and see if it comes from somewhere concrete or it's just something that gets said enough that it "becomes true", regardless of reality.

dapperdrake

12 days ago

[-]

Just like when all regulated industries started only using decision trees and ordinary least-squares regression instead of any other models.

gosub100

12 days ago

[-]

> how many IT leaders would accept the output code of something like a menugen app as production-viable.

probably all of the ones at microsoft

- https://blog.nilenso.com/blog/2025/05/29/ai-assisted-coding/

amai

12 days ago

[-]

The quite good blog post mentioned by Karpathy for working with LLMs when building software:

11 days ago

[-]

I like the idea of having a single source of truth RULES.md, however I'm wondering why you used symlinks as opposed to the ability to link/reference other files in cursor rules, CLAUDE.md, etc. I understand that functionality doesn't exist for all coding agents, but I think it gives you more flexibility when composing rules files (for example you can have the standard cursor rules headers and then point to @RULES.md lower in the file)

12 days ago

[-]

The slide at 13m claims that LLMs flip the script on technology diffusion and give power to the people. Nothing could be further from the truth.

Large corporations, which have become governments in all but name, are the only ones with the capability to create ML models of any real value. They're the only ones with access to vast amounts of information and resources to train the models. They introduce biases into the models, whether deliberately or not, that reinforces their own agenda. This means that the models will either avoid or promote certain topics. It doesn't take a genius to imagine what will happen when the advertising industry inevitably extends its reach into AI companies, if it hasn't already.

Even open weights models which technically users can self-host are opaque blobs of data that only large companies can create, and have the same biases. Even most truly open source models are useless since no individual has access to the same large datasets that corporations use for training.

So, no, LLMs are the same as any other technology, and actually make governments and corporations even more powerful than anything that came before. The users benefit tangentially, if at all, but will mostly be exploited as usual. Though it's unsurprising that someone deeply embedded in the AI industry would claim otherwise.

moffkalast

12 days ago

[-]

Well there are cases like OLMo where the process, dataset, and model are all open source. As expected though, it doesn't really compare well to the worst closed model since the dataset can't contain vast amounts of stolen copyrighted data that noticeably improves the model. Llama is not good because Meta knows what they're doing, it's good because it was pretrained on the entirety of Anna's Archive and every pirated ebook they could get their hands on. Same goes for Elevenlabs and pirated audiobooks.

Lack of compute on the Ai2's side also means the context OLMo is trained for is miniscule, the other thing that you need to throw brazillions of dollars at to make model that's maybe useful in the end if you're very lucky. Training needs high GPU interconnect bandwidth, it can't be done in distributed horde in any meaningful way even if people wanted to.

The only ones who have the power now are the Chinese, since they can easily ignore copyright for datasets, patents for compute, and have infinite state funding.

12 days ago

[-]

He sounds like Terrence Howard with his nonsense.

12 days ago

[-]

I watched Karpathy's Intro to Large Language Models[0] not so long ago and must say that I'm a bit confused by this presentation, and it's a bit unclear to me what it adds.

1,5 years ago he saw all the tool uses in agent systems as the future of LLMs, which seemed reasonable to me. There was (and maybe still is) potential for a lot of business cases to be explored, but every system is defined by its boundaries nonetheless. We still don't know all the challenges we face at that boundaries, whether these could be modelled into a virtual space, handled by software, and therefor also potentially AI and businesses.

Now it all just seems to be analogies and what role LLMs could play in our modern landscape. We should treat LLMs as encapsulated systems of their own ...but sometimes an LLM becomes the operating system, sometimes it's the CPU, sometimes it's the mainframe from the 60s with time-sharing, a big fab complex, or even outright electricity itself?

He's showing an iOS app, which seems to be, sorry for the dismissive tone, an example for a better looking counter. This demo app was in a presentable state for a demo after a day, and it took him a week to implement Googles OAuth2 stuff. Is that somehow exciting? What was that?

The only way I could interpret this is that it just shows a big divide we're currently in. LLMs are a final API product for some, but an unoptimized generative software-model with sophisticated-but-opaque algorithms for others. Both are utterly in need for real world use cases - the product side for the fresh training data, and the business side for insights, integrations and shareholder value.

Am I all of a sudden the one lacking imagination? Is he just slurping the CEO cool aid and still has his investments in OpenAI? Can we at least agree that we're still dealing with software here?

[0]: https://www.youtube.com/watch?v=zjkBMFhNj_g

bwfan123

12 days ago

[-]

> Am I all of a sudden the one lacking imagination?

No, The reality of what these tools can do is sinking in.. The rubber is meeting the road and I can hear some screaching.

The boosters are in 5 stages of grief coming to terms with what was once AGI and is now a mere co-pilot, while the haters are coming to terms with the fact that LLMs can actually be useful in a variety of usecases.

acedTrex

12 days ago

[-]

I actually quite agree with this, there is some reckoning on both sides happening. It's quite entertaining to watch, a bit painful as well of course as someone who is on the "they are useless" side and is noticing some very clear usecases where a value add is present.

natebc

12 days ago

[-]

I'm with you. I give several of 'em a shot a few times a week (thanks Kagi for the fantastic menu of choices!). Over the last quarter or so I've found that the bullshit:useful ratio is creeping to the useful side. They still answer like a high school junior writing a 5 paragraph essay but a decade of sifting through blogspam has honed my own ability to cut through that.

12 days ago

[-]

> but a decade of sifting through blogspam has honed my own ability to cut through that.

Now, a different skill need to be honed :) Add "Be concise and succinct without removing any details" to your system prompt and hopefully it can output its text slightly better.

Joel_Mckay

12 days ago

[-]

In general, the functional use-case traditionally covered by basic heuristics is viable for a reasoning LLM. These are useful for search. media processing, and language translation.

LLM is not AI, and never was... and while the definition has been twisted in marketing BS it does not mean either argument is 100% correct or in err.

LLM is now simply a cult, and a rather old one dating back to the 1960s Lisp machines.

Have a great day =3

johnxie

12 days ago

[-]

LLMs aren’t perfect, but calling them a “cult” misses the point. They’re not just fancy heuristics, they’re general-purpose function approximators that can reason, plan, and adapt across a huge range of tasks with zero task-specific code.

Sure, it’s not AGI. But dismissing the progress as just marketing ignores the fact that we’re already seeing them handle complex workflows, multi-step reasoning, and real-time interaction better than any previous system.

This is more than just Lisp nostalgia. Something real is happening.

Joel_Mckay

11 days ago

[-]

Sure, I have seen the detrimental impact on some teams, and it does not play out as Marketers suggest.

The trick is in people seeing meaning in well structured nonsense, and not understanding high dimension vector spaces simply abstracting associative false equivalency with an inescapable base error rate.

I wager Neuromorphic computing is likely more viable than LLM cults. The LLM subject is incredibly boring once your tear it apart, and less interesting than watching Opuntia cactus grow. Have a wonderful day =3

anothermathbozo

12 days ago

[-]

> The reality of what these tools can do is sinking in

It feels premature to make determinations about how far this emergent technology can be pushed.

Joel_Mckay

12 days ago

[-]

The cognitive dissonance is predictable.

Now hold my beer, as I cast a superfluous rank to this trivial 2nd order Tensor, because it looks awesome wasting enough energy to power 5000 homes. lol =3

hn_throwaway_99

11 days ago

[-]

> The boosters are in 5 stages of grief coming to terms with what was once AGI and is now a mere co-pilot, while the haters are coming to terms with the fact that LLMs can actually be useful in a variety of usecases.

I couldn't agree with this more. I often get frustrated because I feel like the loudest voices in the room are so laughably extreme. One on side you have the "AGI cultists", and on the other you have the "But the hallucinations!!!" people. I've personally been pretty amazed by the state of AI (nearly all of this stuff was the domain of Star Trek just a few years ago), and I get tons of value out of many of these tools, but at the same time I hit tons of limitations and I worry about the long-term effect on society (basically, I think this "ask AI first" approach, especially among young people, will kinda turn us all into idiots, similar to the way Google Maps made it hard for most of us to remember the simple directions). I also can't help but roll my eyes when I hear all the leaders of these AI companies going on about how AI will make a "white collar bloodbath" - there is some nuggets of truth in that, but these folks are just using scare tactics to hype their oversold products.

pera

11 days ago

[-]

Exactly! What skeptics don't get is that AGI is already here and we are now starting a new age of infinite prosperity, it's just that exponential growth looks flat at first, obviously...

Quantum computers and fusion energy are basically solved problems now. Accelerate!

hn_throwaway_99

11 days ago

[-]

This sounds like clear satire to me, but at this point I really can't tell.

_se

11 days ago

[-]

Nah this one is just a lemming.

pera

10 days ago

[-]

How dare you

westoncb

11 days ago

[-]

> and must say that I'm a bit confused by this presentation, and it's a bit unclear to me what it adds.

I think the disconnect might come from the fact that Karpathy is speaking as someone who's day-to-day computing work has already been radically transformed by this technology (and he interacts with a ton of other people for whom this is the case), so he's not trying to sell the possibility of it: that would be like trying to sell the possibility of an airplane for someone who's already just cruising around in one every day. Instead the mode of the presentation is more: well, here we are at the dawn of a new era of computing, it really happened. Now how can we relate this to the history of computing to anticipate where we're headed next?

> ...but sometimes an LLM becomes the operating system, sometimes it's the CPU, sometimes it's the mainframe from the 60s with time-sharing, a big fab complex, or even outright electricity itself?

He uses these analogies in clear and distinct ways to characterize separate facets of the technology. If you were unclear on the meanings of the separate analogies it seems like the talk may offer some value for you after all but you may be missing some prerequisites.

> This demo app was in a presentable state for a demo after a day, and it took him a week to implement Googles OAuth2 stuff. Is that somehow exciting? What was that?

The point here was that he'd built the core of the app within a day without knowing the Swift language or ios app dev ecosystem by leveraging LLMs, but that part of the process remains old-fashioned and blocks people from leveraging LLMs as they can when writing code—and he goes on to show concretely how this could be improved.

Workaccount2

12 days ago

[-]

The fundamental mistake I see is people applying LLMs to the current paradigm of software; enormous hulking codebases made to have as many features as possible to appeal to as many users as possible.

LLMs are excellent at helping non-programmers write narrow use case, bespoke programs. LLMs don't need to be able to one-shot excel.exe or Plantio.apk so that Christine can easily track when she watered and fed her plants nutrients.

The change that LLMs will bring to computing is much deeper than Garden Software trying to slot in some LLM workers to work on their sprawling feature-pack Plantio SaaS.

I can tell you first hand I have already done this numerous times as a non-programmer working a non-tech job.

11 days ago

[-]

The thing is that there’s a need to integrate all these little tools because the problems they solve is part of the same domain. And that’s where problems lie. Something like Excel have an advantage as being a common platform for both data and procedures. Unix adopted text and pipes for integration.

demosthanos

12 days ago

[-]

What you're missing is the audience.

This talk is different from his others because it's directed at aspiring startup founders. It's about how we conceptualize the place of an LLM in a new business. It's designed to provide a series of analogies any one of which which may or may not help a given startup founder to break out of the tired, binary talking points they've absorbed from the internet ("AI all the things" vs "AI is terrible") in favor of a more nuanced perspective of the role of AI in their plans. It's soft and squishy rhetoric because it's not about engineering, it's about business and strategy.

I honestly left impressed that Karpathy has the dynamic range necessary to speak to both engineers and business people, but it also makes sense that a lot of engineers would come out of this very confused at what he's on about.

11 days ago

[-]

I get that, motivating young founders is difficult, and I think he has a charming geeky way of provoking some thoughts. But on the other hand: Why mainframes with time-sharing from the 60s? Why operating systems? LLMs to tell you how to boil an egg, seriously?

Putting my engineering hat on, I understand his idea of the "autonomy slider" as lazy workaround for a software implementation that deals with one system boundary. He should aspire people there to seek out for unknown boundaries, not provide implementation details to existing boundaries. His MenuGen app would probably be better off using a web image search instead of LLM image generation. Enhancing deployment pipelines with LLM setups is something for the last generation of DevOps companies, not the next one.

Please mention just once the value proposition and responsibilities when handling large quantities of valuable data - LLMs wouldn't exist without them! What makes quality data for an LLM, or personal data?

nodesocket

12 days ago

[-]

llms.txt makes a lot of sense, especially for LLMs to interact with http APIs autonomously.

Seems like you could set a LLM loose and like the Google Bot have it start converting all html pages into llms.txt. Man, the future is crazy.

12 days ago

[-]

Couldn’t believe my eyes. The www is truly bankrupt. If anyone has a browser plugin which automatically redirects to llms.txt sign me up.

Website too confusing for humans? Add more design, modals, newsletter pop ups, cookie banners, ads, …

Website too confusing for LLMs? Add an accessible, clean, ad-free, concise, high entropy, plain text summary of your website. Make sure to hide it from the humans!

PS: it should be /.well-known/llms.txt but that feels futile at this point..

PPS: I enjoyed the talk, thanks.

12 days ago

[-]

> If anyone has a browser plugin which automatically redirects to llms.txt sign me up.

Not a browser plugin, but you can prefix URLs with `pure.md/` to get the pure markdown of that page. It's not quite a 1:1 to llms.txt as it doesn't explain the entire domain, but works well for one-off pages. [disclaimer: I'm the maintainer]

FergusArgyll

11 days ago

[-]

I've been actually using it for my own consumption (I am not an llm...) It's great! thanks

12 days ago

[-]

The next version of the llms.txt proposal will allow an llms.txt file to be added at any level of a path, which isn't compatible with /.well-known.

(I'm the creator of the llms.txt proposal.)

12 days ago

[-]

Doesn’t this conflict with the original proposal of appending .md to any resource, e.g. /foo/bar.html.md? Or why not tell servers to respond to the Accept header when it’s set to text/markdown?

achempion

12 days ago

[-]

Even with this future approach, it still can live under the `/.well-known`, think of `/.well-known/llm/<mirrored path>` or `/.well-known/llm.json` with key/value mappings.

12 days ago

[-]

[flagged]

https://news.ycombinator.com/newsguidelines.html

12 days ago

[-]

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

12 days ago

[-]

Fair

11 days ago

[-]

PS apologies to jph00. I still believe what I believe but I should have phrased it differently or not at all. Good luck on your endeavors either way.

12 days ago

[-]

The web started dying with mobile social media apps, in which hyperlinks are a poor UX choice. Then again with SEO banning outlinks. Now this. The web of interconnected pages that was the World Wide Web is dead. Not on social media? No one sees you. Run a website? more bots than humans. Unless you sell something on the side with the website it's not profitable. Hyperlinking to other websites is dead.

Gen Alpha doesn't know what a web page is and if they do, it's for stuff like neocities aka as a curiosity or art form only. Not as a source of information anymore. I don't blame them. Apps (social media apps) have less friction than web sites but have a higher barrier for people to create. We are going back to pre World Wide Web days in a way, kind of like Bulletin Board Systems on dial up without hyperlinking, and centralized (social media) Some countries mostly ones with few technical people llike the ones in Central America have moved away from the web almost entirely and into social media like Instagram.

Due to the death of the web, google search and friends now rely mostly on matching queries with titles now so just like before the internet you have to know people to learn new stuff or wait for an algorithm to show it to you or someone to comment it online or forcefully enroll in a university. Maybe that's why search results have declined and poeple search using ChatGPT or maybe perplexity. Scholarly search engines are a bit better but frankly irrelevant for most poeple.

Now I understand why Google established their own DNS server at 8.8.8.8. If you have a directory of all domains on DNS, you can still index sites without hyperlinks between them, even if the web dies. They saw it coming.

11 days ago

[-]

This. Only recently I realized, that in China for example many young people do not have a browser on their phone. Everything they do is via messenger "mini apps", which I can't imagine to be easier to create than a website. At every restaurant you get a QR code to scan, to download a so called mini app, which is a website in disguise. They run (un)social media apps like Tiktok or Xiaohongshu as their information input. I am not even sure they still use Baidu or something else as a search engine. Maybe that is another app, but they don't have a browser installed to go to a search engine's website.

I think it is an extremely debilitating situation and it results in people not even knowing what a website is. What it consists of, or how one could possibly make one oneself. They would have to go straight to app development and have it in big tech's stores, in order to make anything their peers could see or use.

10 days ago

[-]

Speaking of app development, people often use the same 25 apps (dependent on region) so this situation monopolizes hosting of information within a few apps. It might seem paranoid but what if they went down like Vine or decided to delete or downrank content to please advertisers? Or in other words uprank content to drive advertising revenue via engagement? Their UI makes it so that downranked content is inaccessible and invisible by almost any means. The chances of finding that content are very near zero.

Tiktok and Instagram already do it. The algorithm reinforces biases which might seem like a personal problem, but there's no real way to escape them because the app doesn't have any UI to do it. Bluesky is an improvement but it hasn't really taken off and that feature is kind of buried. It also doesn't really let you browse through everything posted to the site. This situation reduces discoverability of information. Like in the old days before the internet except everything is online now.

Maybe Bluesky indicates, that people don't really care. Web surfing is dead. Or is it just a consequence of not allowing hyperlinks anymore? Like in the old days when hyperlinks didn't exist? I hope to use AI to facilitate hyperlinking, based on how common a word is. What if you had to tap and hold to access a hyperlink on mobile, hopefully solving the UX problem? What if all hyperlinks were buttons on a side of the screen? There has to be a way to keep an area clear for scrolling with fingers without banning hyperlinks..

But then how would they be presented to the user without making them think that you want to keep them hooked? TT and IG give the illusion of being able to quit anytime, in an uncluttered UI. Then you have social media banning outlinks too in order to keep you on their app and drive up ad revenue. This again kills the web because it reduces linking. When they do show links like hashtags, they are hard to press since they are close together and look cluttered. How can we bring hyperlinks to a mobile-friendly, user-friendly UI? E-commerce apps do it well, are highly discoverable, but why not social media? The algorithm has been proven to drive ad revenue instead of hyperlinks and discoverability, and they still ban outlinks.

Apps are a single point of failure. It reduces information diversity in several ways: sources, hosting and contents. It's as if all books were published by the same or a few publishers (apps) which might not seem like a big deal, but it's very very hard to move to another publisher. And its very very hard to make new publishers available to people, because they don't know any better. The migration costs are very high. Everyone only knows how to read from that particular publisher and maybe 2 others.

If the publisher thinks something won't sell or will harm it, it's not revealed even if it would be very helpful and drive sales. AKA the algorithm. I've had to create and painstakingly curate new accounts because of this, and people don't even know it's possible. Since they can't read stuff published by another publisher they don't learn about it. It's very hard to find out about other publishers and migrate to them due to familiarity, muscle memory and frankly the algorithm. Existing publishers can prevent people from moving to other publishers by preventing people from learning about them. For example Instagram banned links or even mentions of pixelfed and 404 media.

Hyperlinks are probably the most underrated, revolutionary invention of the last century. They are being eliminated for the sake of UI cleanliness. But at the cost of going back to darker times.

12 days ago

[-]

If you have different representations of the same thing (llms.txt / HTML), how do you know it is actually equivalent to each other? I am wondering if there are scenarios where webpage publishers would be interested in gaming this.

12 days ago

[-]

11 days ago

[-]

Also HTTP headers Accept/Content-Type which in theory could let you serve HTML, XML and JSON all under the same URL/URI but depending on Accept values.

12 days ago

[-]

That's not what llms.txt is. You can just use a regular markdown URL or similar for that.

llms.txt is a description for an LLM of how to find the information on your site needed for an LLM to use your product or service effectively.

blixt

12 days ago

[-]

If we extrapolate these points about building tools for AI and letting the AI turn prompts into code I can’t help but reach the conclusion that future programming languages and their runtimes will be heavily influenced by the strengths and weaknesses of LLMs.

What would the code of an application look like if it was optimized to be efficiently used by LLMs and not humans?

* While LLMs do heavily tend towards expecting the same inputs/outputs as humans because of the training data I don’t think this would inhibit co-evolution of novel representations of software.

mythrwy

12 days ago

[-]

It does seem a bit silly long term to have something like Python which was developed as a human friendly language written by LLMs.

If AI is going to write all the code going forward, we can probably dispense with the user friendly part and just make everything efficient as possible for machines.

doug_durham

12 days ago

[-]

I don't agree. Important code will need to be audited. I think the language of the future will be easy to read by human reviewers but deterministic. It won't be a human language. Instead it will be computer language with horrible ergonomics. I think Python or straight up Java would be a good start. Things like templates wouldn't be necessary since you could express that deterministically in a higher level syntax (e.g. A list of elements that can accept any type). It would be an interesting exercise.

mostlysimilar

12 days ago

[-]

If humans don't understand it to write the data the LLM is trained on, how will the LLM be able to learn it?

thierrydamiba

12 days ago

[-]

Is a world driven by the strengths and weaknesses of programming languages better than the one driven by the strengths and weaknesses of LLMs?

ivape

12 days ago

[-]

Better to think of it as a world driven by the strengths and weaknesses of people. Is the world better if more people can express themselves via software? Yes.

I don’t believe in coincidences. I don’t think the universe provided AI by accident. I believe it showed up just at the moment where the universe wants to make it clear - your little society of work and status and money can go straight to living hell. And that’s where it’s going, the developer was never supposed to be a rockstar, they were always meant to be creatives who do it because they like it. Fuck this job bullshit, those days are over. You will program the same way you play video games, it’s never to be work again (it’s simply too creative).

Will the universe make it so a bunch of 12 year olds dictate software in natural language in a Roblox like environment that rivals the horeshit society sold for billions just a decade ago? Yes, and thank god. It’s been a wild ride, thank you god for ending it (like he did with nuclear bombs after ww2, our little universe of war shrunk due to that).

Anyways, always pay attention to the little details, it’s never a coincidence. The universe doesn’t just sit there and watch our fiasco believe it or not, it gets involved.

s_ting765

12 days ago

[-]

Given the plethora of programming languages that exist today, I'm not worried at all about AI taking over SWE jobs.

old_man_cato

11 days ago

[-]

The image of a bunch of children in a room gleefully playing with their computers is horror movie type stuff, but because it's in a white room with plants and not their parent's basement with the lights off, it's somehow a wonderful future.

Karpathy and his peer group are some of the most elitist and anti social people who have ever lived. I wonder how history will remember them.

whatarethembits

11 days ago

[-]

Its early days. Agree with your point that the "vision" of the future laid out by tech people doesn't have much of a chance of becoming (accepted) reality, because its necessarily a reflection of their own inner world, largely devoid of importance and interactions with other people. Prime example, see metaverse. Most of us don't want to replace the real world with a (crappy) digital one; the sooner we build things that respects that fundamental value, the sooner we can build things that actually improves our lives.

8note

11 days ago

[-]

did you not have the computer room open to flash games and the like over lunch time? competitive 4 player bmtron was a blast way back whenhttps://www.games1729.com/archive/

old_man_cato

11 days ago

[-]

I did. I also had basically unlimited access to pornography and I saw more than one video of someone having their head severed off. But yeah, I played a lot of computer games. That was fun.

mirsadm

11 days ago

[-]

I thought that video was generated. Everything about it seemed off

bicepjai

9 days ago

[-]

A lot of people reach for the “electricity” analogy whenever a tech wave crests—crypto, cloud, and now LLMs. With crypto, the comparison always felt forced: the utility was niche, and the energy cost was hard to justify. LLMs, on the other hand, are genuinely useful, but is the electricity comparison still valid ?

0xjunhao

10 days ago

[-]

Before I became a software engineer, I was a computational physicist. My days back then were pretty much tweaking some parameters, running a job, then reading papers and checking back after a few minutes or hours. Increasingly, I’m starting to think my days as a software engineer will be pretty similar.

ankurdhama

11 days ago

[-]

Where are the debugging tools for the so called "Software 3.0" ?

autobodie

11 days ago

[-]

If the prompt is good, the LLM will tell you when it's wrong, but you can use production testing if necessary like Tesla.

raffael_de

12 days ago

[-]

I'm a little surprised at how negative he is towards textual interfaces and text for representing information.

11 days ago

[-]

I didn't get the impression that he's against text per se, just that LLMs should use a format that's most concise for humans in the given scenario. Example from the video: showing the (textual) diff between old and new versions of text/code, rather than just the new version. Or converting a text-only restaurant menu to photos+text.

Waterluvian

11 days ago

[-]

This got me thinking about something…

Isn’t an LLM basically a program that is impossible to virus scan and therefore can never be safely given access to any capable APIs?

For example: I’m a nice guy and spend billions on training LLMs. They’re amazing and free and I hand out the actual models for you all to use however you want. But I’ve trained it very heavily on a specific phrase or UUID or some other activation key being a signal to <do bad things, especially if it has console and maybe internet access>. And one day I can just leak that key into the world. Maybe it’s in spam, or on social media, etc.

How does the community detect that this exists in the model? Ie. How does the community virus scan the LLM for this behaviour?

robertk

11 days ago

[-]

You may be interested in: https://www.anthropic.com/research/sleeper-agents-training-d... https://arxiv.org/abs/2404.13660

Waterluvian

11 days ago

[-]

Yes these look perfect! Thank you.

orbital-decay

11 days ago

[-]

This is what mechanistic interpretability studies are trying to achieve, and it's not yet realistically possible for a general case.

avarun

11 days ago

[-]

Similarly to how you can never guarantee that one of your trusted employees won’t be made a foreign asset.

theGnuMe

11 days ago

[-]

This is a good insight. There’s also a similar insight about compilers back in the days before AV.. we will have AV LLMs etc… basically reinvent everything for the new stack.

jedimastert

11 days ago

[-]

I was just talking to somebody at work about a "Trusting Trust" style attack from LLMs. I will remain deeply suspicious of them

autobodie

11 days ago

[-]

Profit over security, outsource liability

LZ_Khan

11 days ago

[-]

I do feel like large scale LLM vulnerabilities will be the real Y2K

tinyhouse

12 days ago

[-]

After Cursor is sold for $3B, they should transfer Karpathy 20%. (it also went viral before thanks to him tweeting about it)

Great talk like always. I actually disagree on a few things with him. When he said "why would you go to ChatGPT and copy / paste, it makes much more sense to use a GUI that is integrated to your code such as Cursor".

Cursor and the like take a lot of the control from the user. If you optimize for speed then use Cursor. But if you optimize for balance of speed, control, and correctness, then using Cursor might not be the best solution, esp if you're not an expert of how to use it.

It seems that Karpathy is mainly writing small apps these days, he's not working on large production systems where you cannot vibe code your way through (not yet at least)

mentalgear

12 days ago

[-]

Meanwhile, I asked this morning Claude 4 to write a simple EXIF normalizer. After two rounds of prompting it to double-check its code, I still had to point out that it makes no sense to load the entire image for re-orientating if the EXIF orientation is fine in the first place.

Vibe vs reality, and anyone actually working in the space daily can attest how brittle these systems are.

Maybe this changes in SWE with more automated tests in verifiable simulators, but the real world is far to complex to simulate in its vastness.

12 days ago

[-]

> Meanwhile

What do you mean "meanwhile", that's exactly (among other things) the kind of stuff he's talking about? The various frictions and how you need to approach it

> anyone actually working in the space

Is this trying to say that Karpathy doesn't "actually work" with LLMs or in the ML space?

I feel like your whole comment is just reacting to the title of the YouTube video, rather than actually thinking and reflecting on the content itself.

demaga

12 days ago

[-]

I'm pretty sure "actually work" part refers to SWE space rather than LLM/ML space

Seanambers

12 days ago

[-]

Seems to me that this is just another level of throwing compute at the problem.

Same way programs was way more efficient before and now they are "bloated" with packages, abstractions, slow implementations of algos and scaffolding.

The concept of what is good software development might be changing as well.

LLMs might not write the best code, but they sure can write a lot of it.

ApeWithCompiler

12 days ago

[-]

A manager in our company introduced Gemini as a chat bot coupled to our documentation.

> It failed to write out our company name.The rest was flawed with hallucinations also, hardly worth to mention.

I wish this is a rage bait towards others, but what should me feelings be? After all this is the tool thats sold to me, I am expected to work with.

gorbachev

12 days ago

[-]

We had exactly the opposite experience. CoPilot was able to answer questions accurately and reformatted the existing documentation to fit the context of users' questions, which made the information much easier to understand.

Code examples, which we offer as sort of reference implementations, were also adopted to fit the specific questions without much issues. Granted these aren't whole applications, but 10 - 25 line examples of doing API setup / calls.

We didn't, of course, just send users' questions directly to CoPilot. Instead there's a bit of prompt magic behind the scenes that tweaks the context so that CoPilot can produce better quality results.

ramon156

12 days ago

[-]

The real question is how long it'll take until they're not brittle

kubb

12 days ago

[-]

Or will they ever be reliable. Your question is already making an assumption.

vFunct

12 days ago

[-]

Its perfectly reliable for the things you know it to be, such as operations within its context window size.

Don't ask LLMs to "Write me Microsoft Excel".

Instead, ask it to "Write a directory tree view for the Open File dialog box in Excel".

Break your projects down into the smallest chunks you can for the LLMs. The more specific you are, the more reliable it's going to be.

The rest of this year is going to be companies figuring out how to break down large tasks into smaller tasks for LLM consumption.

12 days ago

[-]

They're reliable already if you change the way you approach them. These probabilistic token generators probably never will be "reliable" if you expect them to 100% always output exactly what you had in mind, without iterating in user-space (the prompts).

kubb

12 days ago

[-]

I also think they might never become reliable.

flir

12 days ago

[-]

There is a bar below which they are reliable.

"Write a Python script that adds three numbers together".

Is that bar going up? I think it probably is, although not as fast/far as some believe. I also think that "unreliable" can still be "useful".

12 days ago

[-]

But what does that mean? If you tell the LLM "Say just 'hi' without any extra words or explanations", do you not get "hi" back from it?

12 days ago

[-]

That's literally the wrong way to use LLMs though.

LLMs think in tokens, the less they emit the dumber they are, so asking them to be concise, or to give the answer before explanation, is extremely counterproductive.

12 days ago

[-]

I was trying to make a point regarding "reliability", not a point about how to prompt or how to use them for work.

12 days ago

[-]

This is relevant. Your example may be simple enough, but for anything more complex, letting the model have its space to think/compute is critical to reliability - if you starve it for compute, you'll get more errors/hallucinations.

12 days ago

[-]

Yeah I mean I agree with you, but I'm still not sure how it's relevant. I'd also urge people to have unit tests they treat as production code, and proper system prompts, and X and Y, but it's really beyond the original point of "LLMs aren't reliable" which is the context in this sub-tree.

kubb

12 days ago

[-]

Sometimes I get "Hi!", sometimes "Hey!".

12 days ago

[-]

Which model? Just tried a bunch of ChatGPT, OpenAI's API, Claude, Anthropic's API and DeepSeek's API with both chat and reasonee, every single one replied with a single "hi".

throwdbaaway

12 days ago

[-]

o3-mini-2025-01-31 with high reasoning effort replied with "Hi" after 448 reasoning tokens.

gpt-4.5-preview-2025-02-27 replied with "Hi!"

https://i.imgur.com/Y923KXB.png

12 days ago

[-]

> o3-mini-2025-01-31 with high reasoning effort replied with "Hi" after 448 reasoning tokens.

I got "hi", as expected. What is the full system prompt + user message you're using?

> gpt-4.5-preview-2025-02-27

Same "hi": https://i.imgur.com/VxiIrIy.png

throwdbaaway

11 days ago

[-]

Ah right, my bad. Somehow I thought the prompt was only:

    Say just 'hi'

while the "without any extra words or explanations" part was for the readers of your comment. Perhaps kubb also made a similar mistake.

I used empty system prompt.

12 days ago

[-]

I remember when people were saying here on HN that AIs will never be able to generate picture of hands with just 5 fingers because they just "don't have common sense"

yahoozoo

12 days ago

[-]

“Treat it like a junior developer” … 5 years later … “Treat it like a junior developer”

agile-gift0262

12 days ago

[-]

while True:

  print("This model that just came out changes everything. It's flawless. It doesn't have any of the issues the model from 6 months ago had. We are 1 year away from AGI and becoming jobless")
  sleep(timedelta(days=180).total_seconds)

12 days ago

[-]

Usable LLMs are 3 years old at this point. ChatGPT, not Github Copilot, is the marker.

LtWorf

12 days ago

[-]

Usable for fun yes.

12 days ago

[-]

∞

hombre_fatal

12 days ago

[-]

On the other hand, posts like this are like watching someone writing ask jeeves search queries into google 20 years ago and then gesturing how google sucks while everyone else in the room has figured out how to be productive with it and cringes at his "boomer" queries.

If you're still struggling to make LLMs useful for you by now, you should probably ask someone. Don't let other noobs on HN +1'ing you hold you back.

mirrorlake

12 days ago

[-]

Perhaps consider making some tutorials, then, and share your wealth of knowledge rather than calling people stupid.

hombre_fatal

11 days ago

[-]

I wouldn't want to rob them of the satisfaction of self-reliance.

https://theeducationist.info/everything-amazing-nobody-happy...

coreyh14444

12 days ago

[-]

belter

12 days ago

[-]

AI Snake Oil: https://press.princeton.edu/books/hardcover/9780691249131/ai...

12 days ago

[-]

There's also those instances where Microsoft unleashed Copilot on the .NET repo, and it resulted in the most hilariously terrible PRs that required the maintainers to basically tell Copilot every single step it should take to fix the issue. They were basically writing the PRs themselves at that point, except doing it through an intermediary that was much dumber, slower and less practical than them.

And don't get me started on my own experiences with these things, and no, I'm not a luddite, I've tried my damndest and have followed all the cutting-edge advice you see posted on HN and elsewhere.

Time and time again, the reality of these tools falls flat on their face while people like Andrej hype things up as if we're 5 minutes away from having Claude become Skynet or whatever, or as he puts it, before we enter the world of "Software 3.0" (coincidentally totally unrelated to Web 3.0 and the grift we had to endure there, I'm sure).

To intercept the common arguments,

- no I'm not saying LLMs are useless or have no usecases

- yes there's a possibility if you extrapolate by current trends (https://xkcd.com/605/) that they indeed will be Skynet

- yes I've tried the latest and greatest model released 7 minutes ago to the best of my ability

- yes I've tried giving it prompts so detailed a literal infant could follow along and accomplish the task

- yes I've fiddled with providing it more/less context

- yes I've tried keeping it to a single chat rather than multiple chats, as well as vice versa

- yes I've tried Claude Code, Gemini Pro 2.5 With Deep Research, Roocode, Cursor, Junie, etc.

- yes I've tried having 50 different "agents" running and only choosing the best output form the lot.

I'm sure there's a new gotcha being written up as we speak, probably something along the lines of "Well for me it doubled my productivity!" and that's great, I'm genuinely happy for you if that's the case, but for me and my team who have been trying diligently to use these tools for anything that wasn't a microscopic toy project, it has fallen apart time and time again.

The idea of an application UI or god forbid an entire fucking Operating System being run via these bullshit generators is just laughable to me, it's like I'm living on a different planet.

12 days ago

[-]

You're not the first, nor the last person, to have a seemingly vastly different experience than me and others.

So I'm curious, what am I doing differently from what you did/do when you try them out?

This is maybe a bit out there, but would you be up for sending me like a screen recording of exactly what you're doing? Or maybe even a video call sharing your screen? I'm not working in the space, have no products or services to sell, only curious is why this gap seemingly exists between you and me, and my only motive would be to understand if I'm the one who is missing something, or there are more effective ways to help people understand how they can use LLMs and what they can use them for.

My email is on my profile if you're up for it. Invitation open for others in the same boat as parent too.

bsenftner

12 days ago

[-]

I'm a greybeard, 45+ years coding, including active in AI during the mid 80's and used it when it applied throughout my entire career. That career being media and animation production backends, where the work is both at the technical and creative edge.

I currently have an AI integrated office suite, which has attorneys, professional writers, and political activists using the system. It is office software, word processing, spreadsheets, project management and about two dozen types of AI agents that act as virtual co-workers.

No, my users are not programmers, but I do have interns; college students with anything from 3 to 10 years experience writing software.

I see the same AI use problem issues with my users, and my interns. My office system bends over backwards to address this, but people are people: they do not realize that AI does not know what they are talking about. They will frequently ask questions with no preamble, no introduction to the subject. They will change topics, not bothering to start a new session or tell the AI the topic is now different. There is a huge number of things they do, often with escalating frustration evident in their prompts, that all violate the same basic issue: the LLM was not given a context to understand the subject at hand, and the user is acting like many people and when explaining they go further, past the point of confusion, now adding new confusion.

I see this over and over. It frustrates the users to anger, yet at the same time if they acted, communicated to a human, in the same manner they'd have a verbal fight almost instantly.

The problem is one of communications. ...and for a huge number of you I just lost you. You've not been taught to understand the power of communications, so you do not respect the subject. How to communication is practically everything when it comes to human collaboration. It is how one orders their mind, how one collaborates with others, AND how one gets AI to respond in the manner they desire.

But our current software development industry, and by extension all of STEM has been short changed by never been taught how to effectively communicate, no not at all. Presentations and how to sell are not effective communications, that's persuasion, about 5% of what it takes to convey understanding in others which then unblocks resistance to changes.

12 days ago

[-]

So AI is simultaneously going to take over everyone's job and do literally everything, including being used as application UI somehow... But you have to talk to it like a moody teenager at their first job lest you get nothing but garbage? I have to put just as much (and usually, more) effort talking to this non-deterministic black box as I would to an intern who joined a week ago to get anything usable out of it?

Yeah, I'd rather just type things out myself, and continue communicating with my fellow humans rather than expending my limited time on this earth appeasing a bullshit generator that's apparently going to make us all jobless Soon™

12 days ago

[-]

> But you have to talk to it like a moody teenager at their first job lest you get nothing but garbage?

No, you have to talk to it like to an adult human being.

If one's doing so and still gets garbage results from SOTA LLMs, that to me is a strong indication one also cannot communicate with other human beings effectively. It's literally the same skill. Such individual is probably the kind of clueless person we all learn to isolate and navigate around, because contrary to their beliefs, they're not the center of the world, and we cannot actually read their mind.

bsenftner

12 days ago

[-]

Consider that these AIs are trained on human communications, they mirror that communication. They are literally damaged document repair models, they use what they are given to generate a response - statistically. The fact that a question generates text that appears like an answer is an exploited coincidence.

It's a perspective shift few seem to have considered: if one wants an expert software developer from their AI, they need to create an expert software developer's context by using expert developer terminology that is present in the training data.

One can take this to an extreme, and it works: read the source code of an open source project and get and idea of both the developer and their coding style. Write prompts that mimic both the developer and their project, and you'll find that the AI's context now can discuss that project with surprising detail. This is because that project is in the training data, the project is also popular, meaning it has additional sites of tutorials and people discussing use of that project, so a foundational model ends up knowing quite a bit, if one knows how to construct the context with that information.

This is, of course, tricky with hallucination, but that can be minimized. Which is also why we will all become aware of AI context management if we continue writing software that incorporates AIs. I expect context management is what was meant by prompt engineering. Communicating within engineering disciplines has always been difficult.

12 days ago

[-]

But parent explicitly mentioned:

> - yes I've tried giving it prompts so detailed a literal infant could follow along and accomplish the task

Which you are saying that might have missed in the end regardless?

bsenftner

12 days ago

[-]

I'd like to see the prompt. I suspect that "literal infant" is expected to be a software developer without preamble. The initial sentence to an LLM carries far more relevance, it sets the context stage to understand what follows. If there is no introduction to the subject at hand, the response will be just like anyone fed a wall of words: confusion as to what all this is about.

12 days ago

[-]

You and me both :) But I always try to read the comments here with the most charitable interpretation I can come up with.

crmi

12 days ago

[-]

I've got a working theory that models perform differently when used in different timezones... As in during US working hours they dont work as well due to high load. When used at 'offpeak' hours not only are they (obviously) snappier but the outputs appear to be a higher standard. Thought this for a while but now noticing with Claude4 [thinking] recently. Textbook case of anecdata of course though.

12 days ago

[-]

Interesting thought, if nothing less. Unless I misunderstand, it would be easy to run a study to see if this is true; use the API to send the same but slightly different prompt (as to avoid the caches) which has a definite answer, then run that once per hour for a week and see if the accuracy oscillates or not.

crmi

12 days ago

[-]

Yes good idea - although it appears we would also have to account for the possibility of providers nerfing their models. I've read others also think models are being quantized after a while to cut costs.

jim180

12 days ago

[-]

Same! I did notice, a couples of months ago, that same prompt in the morning failed and then, later that day, when starting from scratch with identical prompts, the results were much better.

crmi

12 days ago

[-]

To add to this, I ran into a lot of issues too. And similar when using cursor... Until I started creating a mega list of rules for it to follow that attaches to the prompts. Then outputs improved (but fell off after the context window got too large). At that stage I then used a prompt to summarize, to continue with a new context.

kypro

12 days ago

[-]

I think part of the problem is that code quality is somewhat subjective and developers are of different skill levels.

If you're fine with things that kinda working okay and you're not the best developer yourself then you probably think coding agents work really really well because the slop they produce isn't that much worse than yourself. In fact I know a mid-level dev who believes agent AIs write better code than himself.

If you're very critical of code quality then it's much tougher... This is even more true in complex codebases where simply following some existing pattern to add a new feature isn't going to cut it.

The degree to which it helps any individual developer will vary, and perhaps it's not that useful for yourself. For me over the last few months the tech has got to the point where I use it and trust it to write a fair percentage of my code. Unit tests are an example where I find it does a really good job.

12 days ago

[-]

Listen, I won't pretend to be The God Emperor Of Writing Code or anything of the sort, I'm realistically quite mediocre/dead average in the grand scheme of things.

But literally yesterday, with Claude Code running 4 opus (aka: The latest and greatest, to intercept the "dId YoU tRy X" comment) which has full access to my entire Vue codebase at work, that has dedicated rules files I pass to it, that can see the fucking `.vue` file extension on every file in the codebase, after prompting it to "generate this vue component that does X, Y and Z" spat out React code at me.

You don't have to be Bjarne Stroustrup to get annoyed at this kinda stuff, and it happens constantly for a billion tiny things on the daily. The biggest pushers of AI have finally started admitting that it's not literally perfect, but am I really supposed to pretend that this workflow of having AIs generate dozens of PRs where a single one is somewhat acceptable is somehow efficient or good?

It's great for random one-offs, sure, but is that really deserving of this much insane, blind hype?

12 days ago

[-]

> If you're very critical of code quality then it's much tougher

I'm not sure, I'm hearing developers I know are sloppy and produce shit code both having no luck with LLMs, and some of them having lots of luck with them.

On the other side, those who really think about the design/architecture and are very strict (which is the group I'd probably put myself into, but who wouldn't?) are split in a similar way.

I don't have any concrete proof, but I'm guessing "expectations + workflow" differences would explain the vast difference in perception of usefulness.

ffsm8

12 days ago

[-]

Unironically, your comment mirrors my opinion as of last month.

Since then I've given it another try last week and was quite literally mind blown how much it improved in the context of Vibe coding (Claude code). It actually improved so much that I thought "I would like to try that on my production codebase", (mostly because I want if to fail, because that's my job ffs) but alas - that's not allowed at my dayjob.

From the limited experience I could gather over the last week as a software dev with over 10 yrs of experience (along with another 5-10 doing it as a hobby before employment) I can say that I expect our industry to get absolutely destroyed within the next 5 yrs.

The skill ceiling for devs is going to get mostly squashed for 90% of devs, this will inevitably destroy our collective bargaining positions. Including for the last 10%, because the competition around these positions will be even more fierce.

It's already starting, even if it's currently very misguided and mostly down to short-sightedness.

But considering the trajectory and looking at how naive current llms coding tools are... Once the industry adjusts and better tooling is pioneered... it's gonna get brutal.

And most certainly not limited to software engineering. Pretty much all desk jobs will get hemorrhaged as soon as a llm-player basically replaces SAP with entirely new tooling.

Frankly, I expect this to go bad, very very quickly. But I'm still hoping for a good ending.

darqis

12 days ago

[-]

when I started coding at the age of 11 in machine code and assembly on the C64, the dream was to create software that creates software. Nowadays it's almost reality, almost because the devil is always in the details. When you're used to write code, writing code is relatively fast. You need this knowledge to debug issues with generated code. However you're now telling AI to fix the bugs in the generated code. I see it kind of like machine code becomes overlaid with asm which becomes overlaid with C or whatever higher level language, which then uses dogma/methodology like MVC and such and on top of that there's now the AI input and generation layer. But it's not widely available. Affording more than 1 computer is a luxury. Many households are even struggling to get by. When you see those what 5 7 Mac Minis, which normal average Joe can afford that or does even have to knowledge to construct an LLM at home? I don't. This is a toy for rich people. Just like with public clouds like AWS, GCP I left out, because the cost is too high and running my own is also too expensive and there are cheaper alternatives that not only cost less but also have way less overhead.

What would be interesting to see is what those kids produced with their vibe coding.

kordlessagain

12 days ago

[-]

Kids? Think about all the domain experts, entrepreneurs, researchers, designers, and creative people who have incredible ideas but have been locked out of software development because they couldn't invest 5-10 years learning to code.

A 50-year-old doctor who wants to build a specialized medical tool, a teacher who sees exactly what educational software should look like, a small business owner who knows their industry's pain points better than any developer. These people have been sitting on the sidelines because the barrier to entry was so high.

The "vibe coding" revolution isn't really about kids (though that's cute) - it's about unleashing all the pent-up innovation from people who understand problems deeply but couldn't translate that understanding into software.

It's like the web democratized publishing, or smartphones democratized photography. Suddenly expertise in the domain matters more than expertise in the tools.

pton_xd

12 days ago

[-]

> Think about all the domain experts, entrepreneurs, researchers, designers, and creative people who have incredible ideas but have been locked out of software development because they couldn't invest 5-10 years learning to code.

> it's about unleashing all the pent-up innovation from people who understand problems deeply but couldn't translate that understanding into software.

This is just a fantasy. People with "incredible ideas" and "pent-up innovation" also need incredible determination and motivation to make something happen. LLMs aren't going to magically help these people gain the energy and focus needed to pursue an idea to fruition. Coding is just a detail; it's not the key ingredient all these "locked out" people were missing.

agentultra

12 days ago

[-]

100% this. There have been generations of tools built to help realize this idea and there is... not a lot of demand for it. COBOL, BASIC, Hypercard, the wasteland of no-code and low-code tools. The audience for these is incredibly small.

A doctor has an idea. Great. Takes a lot more than a eureka moment to make it reality. Even if you had a magic machine that could turn it into the application you thought of. All of the iterations, testing with users, refining, telemetry, managing data, policies and compliance... it's a lot of work. Code is such a small part. Most doctors want to do doctor stuff.

We've had mind-blowing music production software available to the masses for decades now... not a significant shift in people lining up to be the musicians they always wanted to be but were held back by limited access to the tools to record their ideas.

pphysch

12 days ago

[-]

> These people have been sitting on the sidelines because the barrier to entry was so high.

This comment is wildly out of touch. The SMB owner can now generate some Python code. Great. Where do they deploy it? How do they deploy it? How do they update it? How do they handle disaster recovery? And so on and so forth.

LLMs accelerate only the easiest part of software engineering, writing greenfield code. The remaining 80% is left as an exercise to the reader.

bongodongobob

12 days ago

[-]

All the devs I work with would have to go through me to touch the infra anyway, so I'm not sure I see the issue here. No one is saying they need to deploy fully through the stack. It's a great start for them and I can help them along the way just like I would with anyone else deploying anything.

pphysch

12 days ago

[-]

In other words, most of the barriers to leveraging custom software are still present.

bongodongobob

11 days ago

[-]

Yes, the parts we aren't talking about that have nothing to do with LLMs, ie normal business processes.

nevertoolate

12 days ago

[-]

It sounds too good to be true. Why do you think llm is better in coding then in how education software should be designed?

12 days ago

[-]

> those kids produced with their vibe coding

No one, including Karpathy in this video, is advocating for "vibe coding". If nothing more, LLMs paired with configurable tool-usage, is basically a highly advanced and contextual search engine you can ask questions. Are you not using a search engine today?

Even without LLMs being able to produce code or act as agents they'd be useful, because of that.

But it sucks we cannot run competitive models locally, I agree, it is somewhat of a "rich people" tool today. Going by the talk and theme, I'd agree it's a phase, like computing itself had phases. But you're gonna have to actually watch and listen to the talk itself, right now you're basically agreeing with the video yet wrote your comment like you disagree.

12 days ago

[-]

This is most definitely not toys for rich people. Now perhaps depending on your country it may be considered rich but I would comfortably say that for most of the developed world, the costs for these tools are absolutely attainable, there is a reason ChatGPT has such a large subscriber base.

Also the disconnect for me here is I think back on the cost of electronics, prices for the level of compute have generally gone down significantly over time. The c64 launched around the $5-600 price level, not adjusted for inflation. You can go and buy a Mac mini for that price today.

bawana

12 days ago

[-]

I suspect that economies of scale are different for software and hardware. With hardware, iteration results in optimization of the supply chain, volume discount as the marginal cost is so much less than the fixed cost, and lower prices in time. The purpose of the device remains fixed. With software, the software becomes ever more complex with technical debt - featuritis, patches, bugs, vulnerabilities, and evolution of purpose to try and capture more disparate functions under one environment in an attempt to capture and lock in users. Price tends to increase in time. (This trajectory incidentally is the opposite of the unix philosophy - having multiple small fast independent tools than can be concatenated to achieve a purpose.) This results in ever increasing profits for software and decreasing profits for hardware at equilibrium. In the development of AI we are already seeing this-first we had gpt, then chatbots, then agents, now integration with existing software architectures.Not only is each model ever larger and more complex (RNN->transformer->multihead-> add fine tuning/LoRA-> add MCP), but the bean counters will find ways to make you pay for each added feature. And bugs will multiply. Already prompt injection attacks are a concern so now another layer is needed to mitigate those.

For the general public, these increasing costs will besubsidized by advertising. I cant wait for ads to start appearring in chatGPT- it will be very insidious as the advertising will be comingled with the output so there will be no way to avoid it.

11 days ago

[-]

I’m struggling to follow your argument, it feels more speculative than evidence-based. Runtime costs have consistently fallen.

As for advertising, it’s possible, but with so many competitors and few defensible moats, there’s real pressure to stay ad-free. These tools are also positioned to command pricing power in a way search never was, given search has been free for decades.

The hardware vs. software angle seems like a distraction. My original point was in response to the claim that LLMs are “toys for the rich.” The C64 was a rich kid’s toy too—and far less capable.

kapildev

12 days ago

[-]

>What would be interesting to see is what those kids produced with their vibe coding.

I think you are referring to what those kids in the vibe coding event produced. Wasn't their output available in the video itself?

12 days ago

[-]

> This is a toy for rich people

GitHub copilot has a free tier.

Google gives you thousands of free LLM API calls per day.

There are other free providers too.

12 days ago

[-]

1st dose is free

palmfacehn

12 days ago

[-]

Agreed. It is worth noting how search has evolved over the years.

12 days ago

[-]

LLM APIs are pretty darn cheap for most of the developed worlds income levels.

12 days ago

[-]

Yeah, because they're bleeding money like crazy now.

You should consider how much it actually costs, not how much they charge.

How do people fail to consider this?

NitpickLawyer

12 days ago

[-]

No, there are 3rd party providers that run open-weights models and they are (most likely) not bleeding money. Their prices are kind of similar, and make sense in a napkin-math kind of way (we looked into this when ordering hardware).

You are correct that some providers might reduce prices for market capture, but the alternatives are still cheap, and some are close to being competitive in quality to the API providers.

Eggpants

12 days ago

[-]

Starts with “No” then follows that up with “most likely”.

So in other words you don’t know the real answer but posted anyways.

NitpickLawyer

12 days ago

[-]

That most likely is for the case where they made their investment calculations wrong and they won't be able to recoup their hw costs. So I think it's safe to say there may be the outlier 3rd party provider that may lose money in the long run.

But the majority of them are serving at ~ the same price, and that matches to the raw cost + some profit if you actually look into serving those models. And those prices are still cheap.

So yeah, I stand by what I wrote, "most likely" included.

My main answer was "no, ..." because the gp post was only considering the closed providers only (oai, anthropic, goog, etc). But youc an get open-weight models pretty cheap, and they are pretty close to SotA, depending on your needs.

Eggpants

12 days ago

[-]

Just wait for the enshitencation of LLM services.

It going to get wild when the tech bro investors demand ads be the included in responses.

It will be trivial for a version of AdWords where someone pays for response words be replaced. “Car” replaced by “Honda”, variable names like “index” by “this_index_variable_is_sponsered_by_coinbase” etc.

I’m trying to be funny with the last one but something like this will be coming sooner than later. Remember, google search used to be good and was ruined by bonus seeking executives.

bdangubic

12 days ago

[-]

how much does it cost?

12 days ago

[-]

>You should consider how much it actually costs, not how much they charge. How do people fail to consider this?

Sure, nobody can predict the long-term economics with certainty but companies like OpenAI already have compelling business fundamentals today. This isn’t some scooter startup praying for margins to appear; it’s a platform with real, scaled revenue and enterprise traction.

But yeah, tell me more about how my $200/mo plan is bankrupting them.

NoOn3

12 days ago

[-]

It's cheap now. But if you take into account all the training costs, then at such prices they cannot make a profit in any way. This is called dumping to capture the market.

12 days ago

[-]

No doubt the complete cost of training and to getting where we are today has been significant and I don’t know how the accounting will look years from now but you are just making up the rest based on feelings. We know operationally OpenAI is profitable on purely the runtime side, nobody knows how that will look when accounting for R&D but you have no qualification to say they cannot make a profit in any way.

12 days ago

[-]

Except they have to retrain constantly, so why would you not consider the cost of training?

11 days ago

[-]

In the medium to long term that R&D matters. In the short term it’s not as important of a metric. I absolutely agree from an underwriting prospective one would ideally be considering those costs but I also think it’s dishonest to simply say they are bleeding money, end of story.

They dont have to retrain constantly and that’s where opinions like yours fall short. I don’t believe anyone has a concrete vision on the economics in the medium to a long term. It’s biased ignorance to hold a strong position in the down or up case.

NoOn3

12 days ago

[-]

Yes, if you do not take into account the cost of training, I think it is very likely profitable. The cost of working models is not so high. This is just my opinion based on open models and I admit that I have not carried out accurate calculations.

12 days ago

[-]

> But if you take into account all the training costs

Not everyone has to paid that cost, as some companies are releasing weights for download and local use (like Llama) and then some other companies are going even further and releasing open source models+weights (like OLMo). If you're a provider hosting those, I don't think it makes sense to take the training cost into account when planning your own infrastructure.

Although I don't it makes much sense personally, seemingly it makes sense for other companies.

12 days ago

[-]

There is no "capture" here, it's trivial to switch LLM/providers, they all use OpenAI API. It's literally a URL change.

jamessinghal

12 days ago

[-]

This is changing; OpenAI's newer API (Responses) is required to include reasoning tokens in the context while using the API, to get the reasoning summaries, and to use some of the OpenAI provided tools. Google's OpenAI compatibility supports Chat Completions, not Responses.

As the LLM developers continue to add unique features to their APIs, the shared API which is now OpenAI will only support the minimal common subset and many will probably deprecate the compatibility API. Devs will have to rely on SDKs to offer comptibility.

12 days ago

[-]

It's still trivial to map to a somewhat different API. Google has it's Vertex/GenAI API flavors.

At least for now, LLM APIs are just JSONs with a bunch of prompts/responses in them and maybe some file URLs/IDs.

jamessinghal

11 days ago

[-]

It isn't necessarily difficult, but it's significantly more effort than swapping a URL as I originally was replying to.

11 days ago

[-]

> There is no "capture" here, it's trivial to switch LLM/providers, they all use OpenAI API. It's literally a URL change.

So? That's true for search as well, and yet Google has been top-dog for decades in spite of having worse results and a poorer interface than almost all of the competition.

lubujackson

12 days ago

[-]

Generally, people behind big revolutionary tech are the worst suited for understanding how it will do "in the wild". Forest for the trees and all that.

Some good nuggets in this talk, specifically his concept that Software 1.0, 2.0 and 3.0 will all persist and all have unique use cases. I definitely agree with that. I disagree with his belief that "anyone can vibe code" mindset - this works to a certain level of fidelity ("make an asteroids clone") but what he overlooks is his ability, honed over many years, to precisely document requirements that will translate directly to code that works in an expected way. If you can't write up a Jira epic that covers all bases of a project, you probably can't vibe code something beyond a toy project (or an obvious clone). LLM code falls apart under its own weight without a solid structure, and I don't think that will ever fundamentally change.

Where we are going next, and a lot of effort is being put behind, is figuring out exactly how to "lengthen the leash" of AI through smart framing, careful context manipulation and structured requests. We obviously can have anyone vibe code a lot further if we abstract different elements into known areas and simply allow LLMs to stitch things together. This would allow much larger projects with a much higher success rate. In other words, I expect an AI Zapier/Yahoo Pipes evolution.

Lastly, I think his concept of only having AI pushing "under 1000 line PRs" that he carefully reviews is more short-sighted. We are very, very early in learning how to control these big stupid brains. Incrementally, we will define sub-tasks that the AI can take over completely without anyone ever having to look at the code, because the output will always be within an accepted and tested range. The revolution will be at the middleware level.

superconduct123

12 days ago

[-]

Where was he was saying you could vibe code beyond a simple app?

He even said it could be a gateway to actual programming

jmsdnns

12 days ago

[-]

There is another angle to this too.

Prior to LLMs, it was amusing to consider how ML folks and software folks would talk passed each other. It was amusing because both sides were great at what they do, neither side understood the other side, and they had to work together anyway.

After LLMs, we now have lots of ML folks talking about the future of software, so ething previously established to be so outside their expertise that communication with software engineers was an amusing challenge.

So I must ask, are ML folks actually qualified to know the future of software engineering? Shouldnt we be listening to software engineers instead?

tomrod

12 days ago

[-]

> So I must ask, are ML folks actually qualified to know the future of software engineering?

Probably not CRUD apps typical to back office or website software, but don't forget that ML folks come from the stock of people that built Apollo, Mars Landers, etc. Scientific computing shares some significant overlap with SWE, and ML is a subset of that.

IMHO, the average SWE and ML person are different types when it comes to how they cargocult develop, but the top 10% show significant understanding and re speed across domains.

abeppu

12 days ago

[-]

This seems to be overstating the separation. For people doing applied ML, there's often been a dual responsibility that included a significant amount of software engineering. I wouldn't necessarily listen to such declarations from an ML researcher whose primary output is papers, but from ML engineers who have built and shipped products/services/libraries I think it's much more reasonable.

AlexCoventry

12 days ago

[-]

I've seen evidence of "anyone can vibe code", but at this stage the result tends to be a 5,000-line application intricately entangled with 500,000 lines of irrelevant slop. Still, the wonder is that the bear can dance at all. That's a new thing under the sun.

nsagent

12 days ago

[-]

Having worked with game designers writing code for their missions/levels in a scripting language, I'd say this has been the case for quite a long while.

They start with the code from another level, then modify it until it seems to do what they want. During the alpha testing phase, we'd have a programmer read through the code and remove all the useless cruft and fix any associated bugs.

In some sense that's what vibe coding with an AI is like if you don't know how to code. You have the AI make some initial set of code that you can't evaluate for correctness, then slowly modify it until it seems to behave generally like you want. You might even learn to recognize a few things in the code over time, at which point you can directly change some variables or structures in the code directly.

AlexCoventry

12 days ago

[-]

I'm not kidding about the orders of magnitude, though. It's been literally roughly 100 lines to per line required to competently implement the app. It doesn't seem economically feasible to me, at this stage. I would prefer to just rewrite. (I know it's a common bias.)

fergie

12 days ago

[-]

There were some cool ideas- I particularly liked "psychology of AI"

Overall though I really feel like he is selling the idea that we are going to have to pay large corporations to be able to write code. Which is... terrifying.

Also, as a lazy developer who is always trying to make AI do my job for me, it still kind of sucks, and its not clear that it will make my life easier any time soon.

teekert

12 days ago

[-]

He says that now we are in the mainframe phase. We will hit the personal computing phase hopefully soon. He says llama (and DeepSeek?) are like Linux in a way, OpenAI and Claude are like Windows and MacOS.

So, No, he’s actually saying it may be everywhere for cheap soon.

I find the talk to be refreshingly intellectually honest and unbiased. Like the opposite of a cringey LinkedIn post on AI.

mirkodrummer

12 days ago

[-]

Being Linux is not a good thing imo, it took decades for tech like proton to run Windows games reliably, if not better as now, than Windows does. Software is still mostly develop for Windows and macOS. Not to mention the Linux Desktop that never took off, I mean one could mention Android but there is a large corporation behind it. Sure Linux is successfull in many ways, it's embedded everywhere but nowhere near being the OS of the everyday people, "traditional linux desktop" never took off

teekert

11 days ago

[-]

You mention consumer stuff, but Linux runs the world. In numbers it's more like insects vs human than any "fair balance". You probably have more Linux machines serving you than Windows or MacOS/iOS machine at any given time.

12 days ago

[-]

I think it used to be like that before the GNU people made gcc, completely destroying the market of compilers.

> Also, as a lazy developer who is always trying to make AI do my job for me, it still kind of sucks, and its not clear that it will make my life easier any time soon.

Every time I have to write a simple self contained couple of functions I try… and it gets it completely wrong.

It's easier to just write it myself rather than to iterate 50 times and hope it will work, considering iterations are also very slow.

ykonstant

12 days ago

[-]

At least proprietary compilers were software you owned and could be airgapped from any network. You didn't create software by tediously negotiating with compilers running on remote machines controlled by a tech corp that can undercut you on whatever you are trying to build (but of course they will not, it says so in the Agreement, and other tales of the fantastic).

geraneum

12 days ago

[-]

On a tangent, I find the analogies interesting as well. However, while Karpathy is an expert in Computer Science, NLP and machine vision, his understanding of how human psychology and brain work is as good as you an I (non-experts). So I take some of those comparisons as a lay person’s feelings about the subject. Still, they are fun to listen to.

j45

12 days ago

[-]

It's interesting how researchers are ahead on some insights and introducing them, and it feels like some are new to them but it might already exist and they're helping present them to the world.

A positive video all around, have got to learn a lot from Andrej's Youtube account.

LLMs are really strange, I don't know if I've seen a technology where the technology class that applies it (or can verify applicability) has been so separate or unengaged compared to the non-technical people looking to solve problems.

pera

12 days ago

[-]

Is it possible to vibe code NFT smart contracts with Software 3.0?

https://github.com/screencam/typescript-mcp-server

11 days ago

[-]

I've been working on this project. I built this in about two days, using it to build itself at the tail end of the effort. It's not perfect, but I see the promise in it. It stops the thrashing the LLMs can do when they're looking for types or trying to resolve anything like that.

11 days ago

[-]

> Traditional: Read 5000 lines → Find method → Replace → Write 5000 lines

What of today's agents work like this? None of the ones I've tried would do something like that, but instead would grep/search the file, then do a smaller edit (different tools do those in different ways).

Overall, it does feel like a strawman argument against "Traditional" when almost none of the tooling actually works like that.

10 days ago

[-]

Apologies - I had the LLM generate the readme and it looks like it got a bit overzealous. I'll get some actual benchmarks and cost usage analysis going. I've tamed the README somewhat. Please check it out!

https://github.com/EvolvingAgentsLabs/llmunix

matiasmolinas

12 days ago

[-]

An experiment to explore Kaparthy ideas

bawana

12 days ago

[-]

how do i install this thing?

maleldil

12 days ago

[-]

As far as I understand, you don't. You open Claude Code inside the repo and prompt `boot llmunix` inside Claude Code. The CLAUDE.md file tells Claude how to respond to that.

bawana

12 days ago

[-]

Thank you for the hint. I guess I need a claude API token. From the images it seems he is opening it from his default directory. I sees the 'base env' so it is unclear if any other packages were installed beyond the default linux. I see he simply typed 'boot llmunix' so he must have symlinked 'boot' to his PATH.

MoonGhost

9 days ago

[-]

He didn't mention multi-modal models. Probably because they don't fit in the oversimplified picture.

nickalex

8 days ago

[-]

I believe AI relies too heavily on logic—and that, surprisingly, can be a disadvantage. Logical solutions don’t always work in real-world situations, because logic isn't the same as creativity. And creativity is essential.

wiremine

11 days ago

[-]

I spent a lot of time thinking about this recently. Ultimately, English is not a clean, deterministic abstraction layer. This isn't to say that LLMs aren't useful, and can create some great efficiencies.

npollock

11 days ago

[-]

no, but a subset of English could be

axxto

11 days ago

[-]

You just invented programming languages, halfway

freehorse

11 days ago

[-]

Thought we already had that?

smnplk

11 days ago

[-]

Yeah, let's bring back COBOL

4gotunameagain

11 days ago

[-]

Let me introduce to you.. python ;)

yahoozoo

12 days ago

[-]

I was trying to do some reverse engineering with Claude using an MCP server I wrote for a game trainer program that supports Python scripts. The context window gets filled up _so_ fast. I think my server is returning too many addresses (hex) when Claude searches for values in memory, but it’s annoying. These things are so flaky.

11 days ago

[-]

Yeah, usually I'd steer my agents to never use the output directly from any command, and instead redirect it to a logfile, then force it to search/grep stuff directly from the log-file instead of just getting all the outputs at all times. Seems to work OK.

https://videotobe.com/play/youtube/LCEmiRjPEtQ

meerab

10 days ago

[-]

See complete transcript of Andrej Karpathy's video

sockboy

11 days ago

[-]

Definitely hit this wall too. The backend just for API proxy feels like a detour when all you want is to ship a quick prototype. Would love to see more tools that make this seamless, especially for solo builders.

fnord77

12 days ago

[-]

Him claiming govts don't use AI or are behind the curve is not accurate.

Modern military drones are very much AI agents

11 days ago

[-]

Governments obviously lead in military tech, but do you think they have access to better AI (in general) than consumers? Unless they do, I think it's fair to say that governments are behind the curve, since consumers tend to adopt things more quickly.

11 days ago

[-]

> but do you think they have access to better AI (in general) than consumers?

Absolutely. One of the top AI labs today is OpenAI, with ties to the US military, not least through Paul M. Nakasone, but also active contracts with the military, announced just a couple of days ago

> In June 2025, the U.S. Department of Defense awarded OpenAI a $200 million one-year contract to develop AI tools for military and national security applications. OpenAI announced a new program, OpenAI for Government, to give federal, state, and local governments access to its models, including ChatGPT. - https://en.wikipedia.org/wiki/OpenAI#Use_by_military

It would be foolish to assume those collaborations are just about API usage with the same models that consumer have access to, there is definitely deeper collaborations than that.

fnord77

11 days ago

[-]

what consumer AI can send a vehicle a long distance, locate and track things of interest, and then decide to take actions against those things of interest?

Imagine a consumer AI that could go to the grocery store, find your favorite loaf of bread and bring it back.

romain_batlle

12 days ago

[-]

Can't believe they wanted to postpone this video by a few weeks

11 days ago

[-]

No one wanted to! I think we might have bitten off more than we could chew in terms of video production. There is a lot of content to publish.

Once it was clear how high the demand was for this talk, the team adapted quickly.

That's how it goes sometimes! Future iterations will be different.

12 days ago

[-]

Software 3.0 is where Engineers only create the kernel or seed of an idea. Then all users are developers creating their own branch using the feedback loop of their own behavior.

ldenoue

12 days ago

[-]

Full playable transcript https://www.appblit.com/scribe?v=LCEmiRjPEtQ

swyx

12 days ago

[-]

slides: https://docs.google.com/presentation/d/1sZqMAoIJDxz79cbC5ap5...

longhaul

10 days ago

[-]

QA is what SEs will be doing - testing , followed by feedback to LLMs. Why can’t just product folks do this eventually w/o SEs?

klysm

10 days ago

[-]

Product folks don’t know what they want a lot of the time and don’t know what’s possible

himanshuy

11 days ago

[-]

why there are so many bots posting comments?

kdrvr

10 days ago

[-]

I honestly like his perspective around vibe coding. I feel like his original tweet has been taken misunderstood by the mainstream. (Proof-of-concepts churned out over the weekend will usually die or be mostly rewritten, anyways.) For programmers dipping their feet into new areas, I believe it can be useful.

Though, I do not see it being useful as a "gateway drug" (as he says) for kids learning to code. I have seen that children can understand langs and base programming concepts, given the right resources and encouragement. If kids in the 80s/early 90s learned BASIC and grew up to become software engineers; then what we have now (Scratch, Python, even Javascript + something like P5) are perfectly adequate to that task. Vibe coding really just teaches kids how to prompt LLMs properly.

belter

12 days ago

[-]

Painful to watch. The new tech generation deserves better than hyped presentations from tech evangelists.

This reminds me of the Three Amigos and Grady Booch evangelizing the future of software while ignoring the terrible output from Rational Software and the Unified Process.

At least we got acknowledgment that self-driving remains unsolved: https://youtu.be/LCEmiRjPEtQ?t=1622

And Waymo still requires extensive human intervention. Given Tesla's robotaxi timeline, this should crash their stock valuation...but likely won't.

You can't discuss "vibe coding" without addressing security implications of the produced artifacts, or the fact that you're building on potentially stolen code, books, and copyrighted training data.

And what exactly is Software 3.0? It was mentioned early then lost in discussions about making content "easier for agents."

digianarchist

11 days ago

[-]

In his defense he clearly articulated that meaningful change has not yet been achieved and could be a decade away. Even pointing to specific examples of LLMs failing to count letters and do basic arithmetic.

What I find absent is where do we go from LLMs? More hardware, more training. "This isn't the scientific breakthrough you're looking for".

https://software3.com/index.htm

goosebump

11 days ago

[-]

Amazing!!!

12 days ago

[-]

It's interesting to see people here and on Blind are more wary? of AI than people in say, Reddit or Youtube comments

sponnath

12 days ago

[-]

Reddit and YouTube are such huge social media platforms that it really depends on which bubble (read: subreddits/yt channels) you're looking at. There's the "AGI is here" people over at r/singularity and then the "AI is useless" people at r/programming. I'm simplifying arguments from both sides here but you get my point.

11 days ago

[-]

Even looking at r/programming I felt they were less wary of AI, or even comparing the comments here vs those on YouTube for this video

11 days ago

[-]

Some places are more "echo-chambery" than others, reddit is probably an extreme example in echo-chambers. At least the bigger subreddits, smaller ones can be a bit more diverse and enjoyable.

jes5199

12 days ago

[-]

okay I’m practicing my new spiel:

this focus on coding is the wrong level of abstraction

coding is no longer the problem. the problem is getting the right context to the coding agent. this is much, much harder

“vibe coding” is the new “horseless carriage”

the job of the human engineer is “context wrangling”

12 days ago

[-]

> coding is no longer the problem.

"Coding" - The art of literally using your fingers to type weird characters into a computer, was never a problem developers had.

The problem has always been understanding and communication, and neither of those have been solved at this moment. If anything, they have gotten even more important, as usually humans can infer things or pick up stuff by experience, but LLMs cannot, and you have to be very precise and exact about what you're telling them.

And so the problem remains the same. "How do I communicate what I want to this person, while keeping the context as small as possible as to not overflow, yet extensive enough to cover everything?" except you're sending it to endpoint A instead of endpoint B.

ofjcihen

12 days ago

[-]

I’d take it a step further honestly. You need to be precise and exact but you also have to have enough domain knowledge to know when the LLM is making a huge mistake.

12 days ago

[-]

> you also have to have enough domain knowledge

I'm a bit 50/50 on this. Generally I agree, how are you supposed to review it otherwise? Blindly accepting whatever the LLM tells you or gives you is bound to create trouble in the future, you still need to understand and think about what the thing you're building is, and how to design/architect it.

I love making games, but I'm also terrible at math. Sometimes, I end up out of my depth, and sometimes it could take me maybe a couple of days to solve something that probably would be trivial for a lot of people. I try my best to understand the fundamentals and the theory behind it, but also not get lost in rabbit holes, but it's still hard, for whatever reason.

So I end up using LLMs sometimes to write small utility functions used in my games for specific things. It takes a couple of minutes. I know exactly what I want to pass into it, and what I want to get back, but I don't necessarily understand 100% of the math behind it. And I think I'm mostly OK with this, as long as I can verify that the expected inputs get the expected outputs, which I usually do with unit or E2E tests.

Would I blindly accept information about nuclear reactors, another topic I don't understand much about? No, I'd still take everything a LLM outputs with a "grain of probability" because that's how they work. Would I blindly accept it if I can guarantee that for my particular use case, it gives me what I expect from it? Begrudgingly, yeah, because I just wanna create games and I'm terrible at math.

ofjcihen

12 days ago

[-]

Oh yeah definitely. The context matters.

For making CRUD apps or anything that doesn’t involve security or stores sensitive information I 100 percent agree it’s fine.

The issue I see is that we get some people storing extremely sensitive info in apps made with these and they don’t know enough to verify the security of it. They’ll ask the LLM “is it secure?” But it doesn’t matter if they don’t know it’s not BSing

throw234234234

11 days ago

[-]

I will counter this with the fact that sometimes, and depending on the abstraction level that you are trying to solve/work at code or some other determinstic language is the right and easier way language to describe the context. This doesn't just apply to SWE, but all forms of engineering (electrical, civil, mechanical, etc).

We have math notation for maths, diagrams for circuits, plans for houses, etc etc. Would hate to have to give long paragraphs of "English" to my house builder and watch what the result could be. Feels like being a lawyer at this point. English can be appropriate and now we also have that in our toolbox.

Describing context at the abstraction level and accuracy you care about has always been the issue. The context of what matters though as you grow and the same system has to deal with more requirements at once together IMV is always the challenge in ANY engineering discipline.

AIorNot

12 days ago

[-]

Love his analogies and clear eyed picture

pyman

12 days ago

[-]

"We're not building Iron Man robots. We're building Iron Man suits"

pryelluw

12 days ago

[-]

Funny thing is that in more than one of the iron man movies the suits end up being bad robots. Even the ai iron man made shows up to ruin the day in the avengers movie. So it’s a little in the nose that they’d try to pitch it this way.

wiseowise

12 days ago

[-]

That’s looking too much into this. It’s just an obvious plot twist to justify making another movie, nothing else.

reducesuffering

12 days ago

[-]

[flagged]

throwawayoldie

12 days ago

[-]

I'm old enough to remember when Twitter was new, and for a moment it felt like the old utopian promise of the Internet finally fulfilled: ordinary people would be able to talk, one-on-one and unmediated, with other ordinary people across the world, and in the process we'd find out that we're all more similar than different and mainly want the same things out of life, leading to a new era of peace and empathy.

It was a nice feeling while it lasted.

tock

12 days ago

[-]

I believe the opposite happened. People found out that there are huge groups of people with wildly differing views on morality from them and that just encouraged more hate. I genuinely think old school facebook where people only interacted with their own private friend circles is better.

prisenco

12 days ago

[-]

Broadcast networks like Twitter only make sense for influencers, celebrities and people building a brand. They're a net negative for literally anyone else.

| old school facebook where people only interacted with their own private friend circles is better.

100% agree but crazy that option doesn't exist anymore.

msgodel

11 days ago

[-]

Was Twitter ever really meant for that? As far as I can tell the primary purpose of twitter is moderated access to celebrities with the utopian ideas about communication just used to sell it.

_kb

12 days ago

[-]

Believe it or not, humans did in fact have forms of written language and communication prior to twitter.

https://news.ycombinator.com/newsguidelines.html

12 days ago

[-]

Can you please make your substantive points without snark? We're trying for something a bit different here.

_kb

11 days ago

[-]

Fair call out. The snark wasn’t intended or well placed.

throwawayoldie

12 days ago

[-]

You missed the point, but that's fine, it happens.

benob

12 days ago

[-]

You can generate 1.0 programs with 3.0 programs. But can you generate 2.0 programs the same way?

olmo23

12 days ago

[-]

2.0 programs (model weights) are created by running 1.0 programs (training runs).

I don't think it's currently possible to ask a model to generate the weights for a model.

movedx01

12 days ago

[-]

But you can generate synthetic data using a 3.0 program to train a smaller, faster, cheaper-to-run 2.0 program.

taegee

11 days ago

[-]

I can't stop thinking about these agents as Agent Smith, The Architect, etc.

politelemon

12 days ago

[-]

The beginning was painful to watch as is the cheering in this comment section.

The 1.0, 2.0, and 3.0 simply aren't making sense. They imply a kind of a succession and replacement and demonstrate a lack of how programming works. It sounds as marketing oriented as "Web 3.0" that has been born inside an echo chamber. And yet halfway through, the need for determinism/validation is now being reinvented.

The analogies make use of cherry picked properties, which could apply to anything.

mentalgear

12 days ago

[-]

The whole AI scene is starting to feel a lot like the cryptocurrency bubble before it burst. Don’t get me wrong, there’s real value in the field, but the hype, the influencers, and the flashy “salon tricks” are starting to drown out meaningful ML research (like Apple's critical research that actually improves AI robustness). It’s frustrating to see solid work being sidelined or even mocked in favor of vibe-coding.

Vibe vs reality, and anyone actually working in the space daily can attest how brittle these systems are.

rxtexit

12 days ago

[-]

I think part of the problem is that people have the wrong mental models currently.

I am a non-software engineer and I fully expect someday to be a professional "vibe coder". It will be within a domain though and not a generalist like a real software engineer.

I think "vibe coding" in this context will have a type of relationship to software engineering the way excel has a relationship to the professional mathematician.

The knocks on "vibe coding" by software engineers are like a mathematician shitting on Excel for not being able to do symbolic manipulation.

It is not wrong but missing the forest for the trees.

monsieurbanana

12 days ago

[-]

> "Because they all have slight pros and cons, and you may want to program some functionality in 1.0 or 2.0, or 3.0, or you're going to train in LLM, or you're going to just run from LLM"

He doesn't say they will fully replace each other (or had fully replaced each other, since his definition of 2.0 is quite old by now)

whiplash451

12 days ago

[-]

I think Andrej is trying to elevate the conversation in an interesting way.

That in and on itself makes it worth it.

No one has a crystal clear view of what is happening, but at least he is bringing a novel and interesting perspective to the field.

11 days ago

[-]

> The beginning was painful to watch as is the cheering in this comment section.

Yours is the second comment claiming there is "cheering" and "fanboying" in this comment section. What comments are you talking about? I've read through this submission multiple times since yesterday, yet I've seen none of that. What specific comments are the "cheering" ones?

amelius

12 days ago

[-]

The version numbers mean abrupt changes.

Analogy: how we "moved" from using Google to ChatGPT is an abrupt change, and we still use Google.

ukprogrammer

12 days ago

[-]

Why do non-users of LLM's like to despise/belittle them so much?

Just don't use them, and, outcompete those who do. Or, use them and outcompete those who don't.

Belittling/lamenting on any thread about them is not helpful and akin to spam.

djeastm

11 days ago

[-]

Some people are annoyed at the hype, some are making good faith arguments about the pros/cons, and some people are just cranky. AI is a popular subject and we've all got our hot takes.

dmitrijbelikov

12 days ago

[-]

I think that Andrej presents “Software 3.0” as a revolution, but in essence it is a natural evolution of abstractions.

Abstractions don't eliminate the need to understand the underlying layers - they just hide them until something goes wrong.

Software 3.0 is a step forward in convenience. But it is not a replacement for developers with a foundation, but a tool for acceleration, amplification and scaling.

If you know what is under the hood — you are irreplaceable. If you do not know — you become dependent on a tool that you do not always understand.

11 days ago

[-]

Foundational programmers form the base of where the seed can grow.

In a way programmers found where our roots grow, they can not find your limits.

Software 3.0 is a step into a different light, where software finds its own limits.

If we know where they are rooted, we will merge their best attempts. Only because we appreciate their resultant behavior.

dmitrijbelikov

11 days ago

[-]

The software does nothing but what you tell it to do. And if you can't figure out the limits, then it's probably a personal problem that you haven't solved for yourself yet.

bedit

12 days ago

[-]

I love the "people spirits" analogy. For casual tasks like vibecoding or boiling an egg, LLM errors aren't a big deal. But for critical work, we need rigorous checks—just like we do with human reasoning. That's the core of empirical science: we expect fallibility, so we verify. A great example is how early migration theories based on pottery were revised with better data like ancient DNA (see David Reich). Letting LLMs judge each other without solid external checks misses the point—leaderboard-style human rankings are often just as flawed.

kypro

12 days ago

[-]

I know we've had thought leaders in tech before, but am I the only one who is getting a bit fed up by practically anything a handful of people in the AI space say being circulated everywhere in tech spaces at the moment?

11 days ago

[-]

If there are lesser-known voices who are as interesting as karpathy or simonw (to mention one other example), I'd love to know who they are so we can get them into circulation on HN.

danny_codes

12 days ago

[-]

No it’s incredibly annoying I agree.

The hype hysteria is ridiculous.

kaycey2022

12 days ago

[-]

I hope this excellent talk brings some much needed sense into the discourse around vibe coding.

12 days ago

[-]

If anything I wished the conversation turned away from "vibe-coding" which was essentially coined as a "lol look at this go" thing, but media and corporations somehow picked up as "This is the new workflow all developers are adopting".

LLMs as another tool in your toolbox? Sure, use it where it makes sense, don't try to make them do 100% of everything.

LLMs as a "English to E2E product I'm charging for"? Lets maybe make sure the thing works well as a tool before letting it be responsible for stuff.

12 days ago

[-]

95% terrible expression of the landscape, 5% neatly dumbed down analogies.

English is a terrible language for deterministic outcomes in complex/complicated systems. Vibe coders won't understand this until they are 2 years into building the thing.

LLMs have their merits and he sometimes aludes to them, although it almost feels accidental.

Also, you don't spend years studying computer science to learn the language/syntax, but rather the concepts and systems, which don't magically disappear with vibe coding.

This whole direction is a cheeky Trojan horse. A dramatic problem, hidden in a flashy solution, to which a fix will be upsold 3 years from now.

I'm excited to come back to this comment in 3 years.

12 days ago

[-]

> English is a terrible language for deterministic outcomes in complex/complicated systems

I think that you seem to be under the impression that Karpathy somehow alluded to or hinted at that in his talk, which indicates you haven't actually watched the talk, which makes your first point kind of weird.

I feel like one of the stronger points he made, was that you cannot treat the LLMs as something they're explicitly not, so why would anyone expect deterministic outcomes from them?

He's making the case for coding with LLMs, not letting the LLMs go by themselves writing code ("vibe coding"), and understanding how they work before attempting to do so.

12 days ago

[-]

I watched the entire talk, quite carefully. He explicitly states how excited he was about his tweet mentioning English.

The disclaimer you mention was indeed mentioned, although it's "in one ear, out the other" with most of his audience.

If I give you a glazed donut with a brief asterisk about how sugar can cause diabetes will it stop you from eating the donut?

You also expect deterministic outcomes when making analogies with power plants and fabs.

12 days ago

[-]

I think this is the moment you're referring to? https://youtu.be/LCEmiRjPEtQ?si=QWkimLapX6oIqAjI&t=236

> maybe you've seen a lot of GitHub code is not just like code anymore there's a bunch of like English interspersed with code and so I think kind of there's a growing category of new kind of code so not only is it a new programming paradigm it's also remarkable to me that it's in our native language of English and so when this blew my mind a few uh I guess years ago now I tweeted this and um I think it captured the attention of a lot of people and this is my currently pinned tweet uh is that remarkably we're now programming computers in English now

I agree that it's remarkable that you can tell a computer "What is the biggest city in Maresme?" and it tries to answer that question. I don't think he's saying "English is the best language to make complicated systems uncomplicated with", or anything to that effect. Just like I still think "Wow, this thing is fucking flying" every time I sit onboard a airplane, LLMs are kind of incredible in some ways, yet so "dumb" in some other ways. It sounds to me like he's sharing a similar sentiment but about LLMs.

> although it's "in one ear, out the other" with most of his audience.

Did you talk with them? Otherwise this is just creating an imaginary argument against some people you just assume they didn't listen.

> If I give you a glazed donut with a brief asterisk about how sugar can cause diabetes will it stop you from eating the donut?

If I wanted to eat a donut at that point, I guess I'd eat it anyways? But my aversion to risk (or rather the lack of it) tend to be non-typical.

What does my answer mean in the context of LLMs and non-determinism?

> You also expect deterministic outcomes when making analogies with power plants and fabs.

Are you saying that the analogy should be deterministic or that power plants and fabs are deterministic? Because I don't understand if the former, and the latter really isn't deterministic by any definition I recognize that word by.

12 days ago

[-]

> That's a lot of people to talk to in a day more or less, since the talk happened. Were they all there and you too, or you all had a watch party or something?

hehe, I wish.

The topics in the talk are not new. They have been explored and pondered up for quite a while now.

As for the outcome of the donut experiment, I don't know. You tell me. Apply it repeatedly at a big scale and see if you should alter the initial offer for best outcomes (as relative as "best" might be).

12 days ago

[-]

> The topics in the talk are not new.

Sure, but your initial dismissal ("95% X, 5% Y") is literally about this talk no? And when you say 'it's "in one ear, out the other" with most of his audience' that's based on some previous experience, rather than the talk itself? I guess I got confused what applied to what event.

> As for the outcome of the donut experiment, I don't know. You tell me. Apply it repeatedly at a big scale and see if you should alter the initial offer for best outcomes (as relative as "best" might be).

Maybe I'm extra slow today, how does this tie into our conversation so far? Does it have anything to do with determinism or what was the idea behind bringing it up? I'm afraid you're gonna have to spell it out for me, sorry about that :)

12 days ago

[-]

> Did you talk with them? Otherwise this is just creating an imaginary argument against some people you just assume they didn't listen.

I have, unfortunately. Start-up founders, managers, investors who taunt the need for engineers because "AI can fix it".

Don't get me wrong, there are plenty of "stochastic parrot" engineers even without AI, but still, not enough to make blanket statements.

12 days ago

[-]

That's a lot of people to talk to in a day more or less, since the talk happened. Were they all there and you too, or you all had a watch party or something?

Still, what's the outcome of our "glazed donut" argument, you got me curious what that would lead to. Did I die of diabetes?

jbeninger

12 days ago

[-]

I think the analogy is that vibe coding is bad for you but feels good. Like a donut.

But I'd say the real situation is more akin to "if you eat this donut quickly, you might get diabetes, but if you eat it slowly, it's fine", which is a bad analogy, but a bit more accurate.

pama

12 days ago

[-]

Your experience with fabs must be somewhat limited if you think that the state of the art in fabs produces deterministic results. Please lookup (or ask friends) for the typical yields and error mitigation features of modern chips and try to visualize if you think it is possible to have determinism when the density of circuits starts to approach levels that cannot be imspected with regular optical microscopes anymore. Modern chip fabrication is closer to LLM code in even more ways than what is presented in the video.

12 days ago

[-]

Fair. No process is 100% efficient and the depths of many topics become ambiguous to the point where margins of error need to be introduced.

Chip fabs are defo far into said depths.

Must we apply this at more shallow levels too?

12 days ago

[-]

> Modern chip fabrication is closer to LLM code

As is, I don't quite understand what you're getting at here. Please just think that through and tell us what happens to the yield ratio when the software running on all those photolithography machines wouldn't be deterministic.

kadushka

12 days ago

[-]

An output of a fab, just like an output of an LLM, is non-deterministic, but is good enough, or is being optimized to be good enough.

Non-determinism is not the problem, it's the quality of the software that matters. You can repeatedly ask me to solve a particular leetcode puzzle, and every time I might output a slightly different version. That's fine as long as the code solves the problem.

The software running on the machines (or anywhere) just needs to be better (choose your metric here) than the software written by humans. Software written by GPT-4 is better than software written by GPT-3.5, and the software written by o3 is better than software written by GPT-4. That's just the improvement from the last 3 years, and there's a massive, trillion-dollar effort worldwide to continue the progress.

12 days ago

[-]

Hardware always involves some level of non-determinism, because the physical world is messier than the virtual software world. Every hardware engineer accepts that and learns how to design solutions despite those constraints. But you're right, non-determinism is not the current problem in some fabs, because the whole process has been modeled with it in mind, and it's the yield ratio that needs to be deterministic enough to offer a service. Remember the struggles in Intels fabs? Revenue reflects that at fabs.

The software quality at companies like ASML seems to be in a bad shape already, and I remember ex-employees stating that there are some team leads higher up who can at least reason about existing software procedures, their implementation, side effects and their outcomes. Do you think this software is as thoroughly documented as some open source project? The purchase costs for those machines are in the mid-3-digit million range (operating costs excluded) and are expected to run 24/7 to be somewhat worthwhile. Operators can handle hardware issues on the spot and work around them, but what do you think happens with downtime due to non-deterministic software issues?

pama

11 days ago

[-]

The output of the verilog optimizer is different every time. The output of a fab is different in every batch. Each chip in a batch is different from others in that batch. Quality control drops the fraction of truly poor chips, and hardware design features might downgrade some of the partially failed chips to be classified as lesser versions of the same initial design. The final chips work as intended, mostly, but perhaps the error tolerance to overclocking or the mean time between failures is slightly different between chips. We can all work with them just fine almost all the time. The same principles apply to complex LLM-orchestrated code projects. I dont mind if my compiler gives different code each time because it uses a stochastic optimizer, but I want my code to do what I want and to not fail more than a certain tolerance I have for this code, which depends on the application. By giving more insight into the layers of testing to more people, and by encouraging the new documentation practices that Andrej mentioned, LLM coding will change the practice of software engineering rather dramatically. Code 2.0 was flexible and could yield results that were better than human coded efforts for complex problems, but the architecture, code, data, were selected by humans. In code 3.0 humans have access to (non-deterministic) building blocks that are written in natural language, to bug fixes and feature addition that happen in a conversation style. Similar engineering principles as with code 1.0 still apply (even more so than with code2.0, unless the product is a neural net), but the emphasis on verification increased dramatically as a fraction of the total effort, even though the total effort has gone down a lot. I can’t wait to see increased help in code verification efforts from this batch of people in the AI startup school as a result of Andrej’s presentation.

fifilura

12 days ago

[-]

Either way, I am not sure it is a requirement on HN to read/view the source.

Particularly not a 40min video.

Maybe it is tongue-in-cheek, maybe I am serious. I am not sure myself. But sometimes the interesting discussions comes from what is on top of the posters mind when viewing the title. Is that bad?

12 days ago

[-]

> Is that bad?

It doesn't have to be. But it does get somewhat boring and trite after a while when you start noticing that certain subjects on HN tend to attract general and/or samey comments about $thing, rather than the submission topic within $thing, and I do think that is against the guidelines.

> Please don't post shallow dismissals [...] Avoid generic tangents. Omit internet tropes. [...]

The specific part of:

> English is a terrible language for deterministic outcomes

Strikes me as both as a generic tangent about LLMs, and the comment as a whole feels like a shallow dismissal of the entire talk, as Karpathy never claims English is a good language for deterministic outcomes, nor have I heard anyone else make that claim.

12 days ago

[-]

Might sound like a generic tangent, but it's the conclusion people will leave from the talk.

12 days ago

[-]

But is it curious? Is it thoughtful and substantive? Maybe it could have been thoughtful, if it felt like it was in response to what was mentioned in the submission.

karaterobot

12 days ago

[-]

It's odd! The guidelines don't say anything about having to read or watch what the posts linked to, all they say is it's inappropriate to accuse someone you're responding to of not having done so.

There is a community expectation that people will know what they're talking about before posting, and in most cases that means having read the article. At the same time, I suspect that in many cases a lot of people commenting have not actually read the thing they're nominally commenting on, and they get away with it because the people upvoting them haven't either.

However, I think it's a good idea to do so, at least to make a top-level comment on an article. If you're just responding to someone else's comment, I don't think it's as necessary. But to stand up and make a statement about something you know nothing about seems buffoonish and would not, in general, elevate the level of discussion.

12 days ago

[-]

I accept any equivalents of reading comprehension tests to prove thay I watched the video, as I have many of Andrej's in the past. He's generally a good communicator, defo easy to follow.

rudedogg

12 days ago

[-]

> English is a terrible language for deterministic outcomes in complex/complicated systems.

Someone here shared this ancient article by Dijkstra about this exact thing a few weeks ago: https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...

12 days ago

[-]

TIL. Thanks for sharing

oc1

12 days ago

[-]

AI is all about context window. If you figured out the context problem, you will see that all these "AI is bullshit, it doesn't work and can't produce working code" goes away. Same for everything else.

12 days ago

[-]

Working code or not is irelevant. Heck, even human-in-loop (Tony-in-the-Iron-Man) is not actively the point. If we're going into "it's all about" territory then it's all about:

- training data - approximation of the desired outcome

Neither support a good direction for the complexity of some of the system around us, most of which require dedicated language. Imagine doing calculus or quantum physics in English. Novels of words would barely suffice.

So a context window as big as the training data itself?

What if the training data is faulty?

I'm confident you understand that working code or not doesn't matter in this analogy. Neither does LLMs reaching out for the right tool.

LLMs has its merits. Replacing concrete systems that require a formal language and grammar is not.

`1 + 1 = 2` because that's how maths works, not because of deja vú.

gardenhedge

12 days ago

[-]

Tony is iron man, not in him

12 days ago

[-]

Sure, I wasn't sure how to call the robot layer. Is is "Iron Main Suit"?

gardenhedge

11 days ago

[-]

It's just a suit or armour. There are many and are referred to as Mark I, II, III etc

cobertos

12 days ago

[-]

Untrue. I find problems with niche knowledge, heavy math, and/or lack of good online resources to be troublesome for AI. Examples so far I've found of consistent struggle points are shaders, parsers, and streams (in Nodejs at least)

Context window will solve a class of problems, but will not solve all problems with AI.

11 days ago

[-]

I think probably the biggest help I've got from LLMs is things that are "niche" knowledge, for me. Things like "I need a function heavy in math that when I give X and Y, it returns Z" I could have struggled with for days sometimes when I'm writing games for fun, but with LLMs I can have it done and move on in a couple of minutes, the most time consuming part is writing the tests and overall testing, but I no longer spend days just trying to understand enough math to actually write the thing.

strangescript

12 days ago

[-]

Who said I wanted my outcomes to be deterministic. Why is it that the only way we accept programming is for completely deterministic outcomes, when the reality is that is an implementation detail.

I am a real user and I am on a general purpose e-commerce site and my ask is "I want a TV that is not that expensive", then by definition the user request is barely deterministic. User requests are normally like this for any application. High level and vague at best. Then developers spend all their time on edge cases, user QA, in the weeds junk that the User does not care about at all. People dont want to click filters and fill out forms for your app. They want it to be easy.

12 days ago

[-]

Agreed. This e-commerce example is quite a good highlight for LLMs.

Same can't be applied when your supplier needs 300 68 x 34 mm gaskets by the BS10 standard, to give a random, more precise example.

qjack

12 days ago

[-]

While I agree with you broadly, remember that those that employ you don't have those skills either. They accept that they are ceding control of the details and trust us to make those decisions or ask clarifying questions (LLMs are getting better at those things too). Vibe coders are clients seeking an alternative, not developers.

unshavedyak

12 days ago

[-]

Maybe i'm not "vibing" enough, but i've actually been testing this recently. So far i think the thing "vibing" helps most with for me personally is just making decisions which i'm often too tired to do after work.

I've been coming to the realization that working with LLMs offer a different set of considerations than working on your own. Notably i find that i often obsess about design, code location, etc because if i get it wrong then my precious after-work time and energy are wasted on refactoring. The larger the code base, the more crippling this becomes for me.

However refactoring is almost not an issue with LLMs. They do it very quickly and aggressively. So the areas i'm not vibing on is just reviewing, and ensuring it isn't committing any insane sins. .. because it definitely will. But the structure i'm accepting is far from what i'd make myself. We'll see how this pans out long term for me, but it's a strategy that i'm exploring.

On the downside, my biggest difficulty with LLMs is getting them to just.. not. To produce less. Choosing too large of tasks is very easy and the code can snowball before you have a chance to pump the breaks and course correct.

Still, it's been a positive experience so far. I still consider it vibing though because i'm accepting far less quality work than what i'd normally produce. In areas where it matters though, i enforce correctness, and have to review everything as a result.

12 days ago

[-]

> Vibe coders are clients seeking an alternative, not developers.

Agreed. That's genuinely a good framing for clients.

brainless

12 days ago

[-]

I am not sure I got your point about English. I thought Karpathy was talking about English being the language of prompts, not output. Outputs can be English but if the goal is to compute using the output, then we need structured output (JSON, snippets of code, etc.), not English.

12 days ago

[-]

Entertain me in an exercise:

First, instruct a friend/colleague of how to multiply two 2 digit numbers in plain English.

Secondly (ideally with a different friend, to not contaminate tests), explain the same but using only maths formulas.

Where does the prompting process start and where does it end? Is it a one-off? Is the prompt clear enough? Do all the parties involved communicate within same domain objects?

Hopefully my example is not too contrived.

brainless

11 days ago

[-]

Yes the prompts are clear enough but it depends on the capacity of the people involved. People have to internalize the math (or any other) concepts from language into some rules, syntax, etc.

This is what an agent can do with an LLM. LLMs can help take English and generate some sort of an algorithm. The agent stores algorithm not the prompt. I do not know what current commercially available agents do but this was always clear to me.

barumrho

12 days ago

[-]

I agree with your point about English, but LLMs are not limited to English. You can show them formulas, images, code, etc.

12 days ago

[-]

Time is a funny calculator, measuring how an individual is behind. And in the funny circumstance that an individual is human, they look back on this comment in 3 years and wonder why humans only see themselves.

m3kw9

12 days ago

[-]

Like biz logic requirements they need to be fine grained defined

serjester

12 days ago

[-]

I think you’re straw manning his argument.

He explicitly says that both LLMs and traditional software have very important roles to play.

LLMs though are incredibly useful when encoding the behavior of the system deterministically is impossible. Previously this fell under the umbrella of problems solved with ML. This would take a giant time investment and a highly competent team to pull off.

Now anyone can solve many of these same problems with a single API call. It’s easy to wave this off, but this a total paradigm shift.

belter

12 days ago

[-]

You just described Software 4.0...

12 days ago

[-]

Can we have it now and skip 3.0?

12 days ago

[-]

I'd like to hear from Linux kernel developers. There is no significant software that has been written (plagiarized) by "AI". Why not ask the actual experts who deliver instead of talk?

This whole thing is a religion.

mellosouls

12 days ago

[-]

There is no significant software that has been written (plagiarized) by "AI".

How do you know?

As you haven't evidenced your claim, you could start by providing explicit examples of what is significant.

Even if you are correct, the amount of llm-assisted code is increasing all the time, and we are still only a couple of years in - give it time.

Why not ask the actual experts

Many would regard Karpathy in the expert category I think?

12 days ago

[-]

I think you should not turn things around here. Up to 2021 we had a vibrant software environment that obviously had zero "AI" input. It has made companies and some developers filthy rich.

Since "AI" became a religion, it is used as an excuse for layoffs while no serious software is written by "AI". The "AI" people are making the claims. Since they invading a functioning software environment, it is their responsibility to back up their claims.

12 days ago

[-]

Still wonder what your definition of "serious software" is. I kinda concur - I consider most of the webshit to be not serious, but then, this is where software industry makes bulk of its profits, and that space is absolutely being eaten by agentic coding, right now, today.

So if we s/serious/money-making/, you are wrong - or at least about to be proven, as these things enter prod and are talked about.

rwmj

12 days ago

[-]

The AI people are the ones making the extraordinary claims here.

[1] https://github.com/dotnet/runtime/pulls

bytefish

11 days ago

[-]

Microsoft is dogfooding Copilot in their dotnet/runtime [1] and dotnet/aspnetcore [2] repositories. This is the only time I have seen a company using its own AI Tools transparently. Yes, they label it an experiment, but I am pretty sure it’s mandated use within Microsoft.

I am an “AI skeptic”, so clearly I am biased here. What I am seeing in the repositories is, that Copilot hasn’t made any substantial contributions so far. The PRs, that went through? They often contain very, very detailed feedback, up to the point line by line replacements have been suggested.

The same engineers, that went up stage at “Microsoft Build 2025” to tell how amazing Copilot is and how it made them a 100x developer? They are not using Copilot in any of their PRs.

You said it’s a religion. I’d say it’s a cult. Whatever it is, outside the distortion bubble, this whole thing looks pretty bad to me.

[2] https://github.com/dotnet/aspnetcore/pulls

12 days ago

[-]

What counts as "significant software"? Only kernels I guess?

xvilka

12 days ago

[-]

Office software, CAD systems, Web Browsers, the list is long.

12 days ago

[-]

Microsoft (famously developing somewhat popular office-like software) seems to be going in the direction of almost forcing developers to use LLMs to assist with coding, at least going by what people are willing to admit publicly and seeing some GitHub activity.

Google (made a small browser or something) also develops their own models, I don't think it's far fetched to imagine there is at least one developer on the Chrome/Chromium team that is trying to dogfood that stuff.

As for Autodesk, I have no idea what they're up to, but corporate IT seems hellbent on killing themselves, not sure Autodesk would do anything differently so they're probably also trying to jam LLMs down their employees throats.

https://www.theverge.com/2022/10/13/23402195/microsoft-us-ar...

12 days ago

[-]

Microsoft is also selling "AI", so they want headlines like "30% of our code is written by AI". So they force open source developers to babysit the tools and suffer.

It's also an advertisement for potential "AI" military applications that they undoubtedly propose after the HoloLens failure:

The HoloLens failure is a great example of overhyped technology, just like the bunker busters that are now in the headlines for overpromising.

https://news.ycombinator.com/item?id=44050152

12 days ago

[-]

> Microsoft

Very impressive indeed, not a single line of any quality to be found despite them forcing it on people.

11 days ago

[-]

Lets not change the goalpost, parent asked for any examples of software written with LLMs, and regardless if the output is quality or not, that is one example. Besides, Microsoft isn't really known for their high code quality, so I'm not even sure using even dumb LLMs/tools like Copilot would actually have a negative effect.

e3bc54b2

12 days ago

[-]

'forcing' anybody to do anything means they don't like doing it, usually because it causes them more work or headache or discomfort.

You know, the exact opposite of what AI providers are claiming it does.

rwmj

12 days ago

[-]

Can you point to any significant open source software that has any kind of significant AI contributions?

As an actual open source developer I'm not seeing anything. I am getting bogus pull requests full of AI slop that are causing problems though.

12 days ago

[-]

> Can you point to any significant open source software that has any kind of significant AI contributions?

No, but I haven't looked. Can you?

As an actual open source developer too, I do get some value from replacing search engine usage with LLMs that can do the searching and collation for me, as long as they have references I can use for diving deeper, they certainly accelerate my own workflow. But I don't do "vibe-coding" or use any LLM-connected editors, just my own written software that is mostly various CLIs and chat-like UIs.

fHr

11 days ago

[-]

big companies still already lay off

huksley

12 days ago

[-]

Vibe coding is making a LEGO furniture, getting it run on the cloud is assembling the IKEA table for a busy restaurant

12 days ago

[-]

why does vibe coding still involve any code at all? why can't an AI directly control the registers of a computer processor and graphics card, controlling a computer directly? why can't it draw on the screen directly, connected directly to the rows and columns of an LCD screen? what if an AI agent was implemented in hardware, with a processor for AI, a normal computer processor for logic, and a processor that correlates UI elements to touches on the screen? and a network card, some RAM for temporary stuff like UI elements and some persistent storage for vectors that represent UI elements and past converstations

12 days ago

[-]

I'm not sure this makes sense as a question. Registers are 'controlled' by running code for a given state. An AI can write code that changes registers, as all code does in operation. An AI can't directly 'control registers' in any other way, just as you or I can't.

12 days ago

[-]

I would like to make an AI agent that directly interfaces with a processor by setting bits in a processor register, thus eliminating the need for even assembly code or any kind of code. The only software you would ever need would be the AI.

12 days ago

[-]

This makes no sense at all. You can't set registers without assembly code. If you could set registers without assembly code then it would be pointless as the registers wouldn't be 'running' against anything.

12 days ago

[-]

That's called a JIT compiler. And ignoring how bad an idea blending those two... It wouldn't be that difficult a task.

The hardest parts of a jit is the safety aspect. And AI already violates most of that.

12 days ago

[-]

The safety part will probably be either solved or a non-issue or ignored. Similarly to how GPT3 was often seen as dangerous before ChatGPT was released. Some people who have only ever vibe coded are finding jobs today, ignoring safety entirely and lacking a notion of it or what it means. They just copy paste output from ChatGPT or an agentic IDE. To me it's JIT already with extra steps. Or they have pivoted their software engineers to vibe coding most of the time and don't even touch code anymore doing JIT with extra steps again.

12 days ago

[-]

As "jit" to you means running code, and not "building and executing machine code", maybe you could vibe code this. And enjoy the segfaults.

12 days ago

[-]

In a way he's making sense. If the "code" is the prompt, the output of the llm is an intermediate artifact, like the intermediate steps of gcc.

So why should we still need gcc?

The answer is of course, that we need it because llm's output is shit 90% of the time and debugging assembly or binary directly is even harder, so putting asides the difficulties of training the model, the output would be unusable.

[0] https://msrc.microsoft.com/update-guide/vulnerability/CVE-20...

12 days ago

[-]

Probably too much snark from me. But the gulf between interpreter and compiler can be decades of work, often discovering new mathematical principles along the way.

The idea that you're fine to risk everything, in the way agentic things allow [0], and want that messing around with raw memory is... A return to DOS' crashes, but with HAL along for the ride.

12 days ago

[-]

Ah don't worry, llms are a return to crashes as it is :)

The other day it managed to produce code that made python segfault.

11 days ago

[-]

> produce code that made python segfault

To be fair, that's pretty easy for a human to do too.

8 days ago

[-]

On purpose yes. But the entire point of languages with managed memory is that they do not segfault.

12 days ago

[-]

It's not a JIT. A JIT produces assembly. You can't "set registers" or do anything useful without assembly code running on the processor.

10 days ago

[-]

Riiight... Which was my point? If you want an AI able to set registers, you want to hook it to a JIT. Which avoids assembly by setting machine code directly into memory and executing said memory.

singularity2001

12 days ago

[-]

what he means is why are the tokens not directly machine code tokens

12 days ago

[-]

What is meant by a 'machine code token'? Ultimately a processor needs assembly code as input to do anything. Registers are set by assembly. Data is read by assembly. Hardware is managed through assembly (for example by setting bits in memory). Either I have a complete misunderstanding on what this thread is talking about, or others are commenting with some fundamental assumptions that aren't correct.

birn559

12 days ago

[-]

Because any precise description of what the computer is supposed to do is already code as we know it. AI can fill in the gaps between natural language and programming by guessing and because you don't always care about the "how" only about the "what". The more you care about the "how" you have to become more precise in your language to reduce the guess work of the AI to the point that your input to the AI is already code.

The question is: how much do we really care about the "how", even when we think we care about it? Modern programming language don't do guessing work, but they already abstract away quite a lot of the "how".

I believe that's the original argument in favor of coding in assembler and that it will stay relevant.

Following this argument, what AI is really missing is determinism to a far extend. I can't just save my input I have given to an AI and can be sure that it will produce the exact same output in a year from now on.

12 days ago

[-]

With vibe coding, I am under the impression that the only thing that matters for vibe coders is whether the output is good enough in the moment to fullfill a desire. For companies going AI first that's how it seems to be done. I see people in other places and those people have lost interest in the "how"

birn559

10 days ago

[-]

Which is fine in general. It has been a selling point for SQL or C, for example. What I wanted to say is that for AI output becoming a replacement for code, a necessary requirement is that the output becomes deterministic. While LLMs provide that technically, I am not sure the "culture" that has evolved the technology will lead to product that provide determinism.

therein

12 days ago

[-]

All you need is a framebuffer and AI.

abhaynayar

12 days ago

[-]

Nice try, AI.

ast0708

12 days ago

[-]

Should we not treat LLMs more as a UX feature to interact with a domain specific model (highly contextual), rather than expecting LLMs to provide the intelligence needed for software to act as partner to Humans.

12 days ago

[-]

He's selling something.

11 days ago

[-]

What, exactly? Educational courses?

7 days ago

[-]

He's the cofounder of openai… you don't think there is some kind of monetary incentive here?

rvz

11 days ago

[-]

Someone is thinking.

Aeroi

12 days ago

[-]

the fanboying for this dudes opinion is insane.

11 days ago

[-]

Maybe so, but please don't post unsubstantive comments to Hacker News.

(Thoughtful criticism that we can learn from is welcome, of course. This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.)

Aeroi

11 days ago

[-]

I'd argue that expressing contrarian sentiment can be substantive. Otherwise this would be an echo-chamber of agreement.

11 days ago

[-]

For sure, expressing contrarian sentiment can be substantive, but you'd need to do it substantively.

cedws

11 days ago

[-]

I'm about half way through the video and I'm really not seeing what all the praise is about, it just seems to be an AI optimism word salad.

edit: a lot of the comments giving praise on YouTube look like bots...

11 days ago

[-]

> it just seems to be an AI optimism word salad

Maybe it was missed, but in the beginning he said he was told the audience are mostly students about to enter the industry, so I feel like a lot of the talk is just establishing vocabulary, basic information about what LLMs are, analogies to get people to wrap their head around where in the workflow they can fit, and so on.

So while most of it seems obvious or relatively abstract, I think that's because of the target audience of the talk. I had that lens while watching the talk and while I cannot say my worldview has done a large change because of it, I understand it could be valuable to newer members of the ecosystem.

mupuff1234

12 days ago

[-]

Yeah, not sure I ever saw anything similar on HN before, feels very odd.

I mean the talk is fine and all but that's about it?

11 days ago

[-]

> Yeah, not sure I ever saw anything similar on HN before, feels very odd.

What exactly have you been seeing here on HN? I've been reading through most of the comments in this submission, since it was submitted yesterday, and none of it seems to be "fanboying" (maybe I misunderstand the term?) but discussions about where LLMs fit in the software development workflow.

Some people find some parts interesting, others obvious, others think he's selling something, others find the analogies lacking, but I've seen no "fanboy" comments like what parent seemed to exclusively see here.

mrmansano

12 days ago

[-]

It's pastor preaching for the already converted, not new in the area. The only thing new is that they are selling the kool-aid this time.

Aeroi

12 days ago

[-]

It's been a multi-day like conversation where multiple people are trying to obtain the transcripts, publish the text as gospel, and now the video. Like, yes thank you but, holy shit.

greybox

12 days ago

[-]

He's talking about "LLM Utility companies going down and the world becoming dumber" as a sign of humanity's progress.

This if anything should be a huge red flag

bryanh

12 days ago

[-]

Replace with "Water Utility going down and the world becoming less sanitary", etc. Still a red flag?

greybox

12 days ago

[-]

You're making leap of logic.

Before water sanitization technology we had no way of sanitizing water on a large scale.

Before LLMs, we could still write software. Arguably we were collectively better at it.

12 days ago

[-]

LLMs are general-purpose tools used for great many tasks, most of them not related to writing code.

12 days ago

[-]

He lives in a GenAI bubble where everyone is self-congratulating about the usage of LLMs.

The reality is that there's not a single critical component anywhere that is built on LLMs. There's absolutely no reliance on models, and ChatGPT being down has absolutely no impact on anything beside teenagers not being able to cheat on their homeworks and LLM wrappers not being able to wrap.

nlawalker

12 days ago

[-]

Adults everywhere are using it to "cheat" at work, except there it's not cheating, it's celebrated and welcomed as a performance enhancement because results are the only thing that matter, and over time that will result in new expectations for productivity.

It's going to take a while for those new expectations to develop, and they won't develop evenly, just like how even today there's plenty of low-hanging fruit in the form of roles or businesses that aren't using what anyone here would identify as simple opportunities for automation, and the main benefit that accrues to the one guy in the office who knows how to cheat with Excel and VBA is that he gets to slack off most of the time. But there certainly are places where the people in charge expect more, and are quick to perceive when and how much that bar can be raised. They don't care if you're cheating, but you'll need to keep up with the people who are.

bwfan123

12 days ago

[-]

> The reality is that there's not a single critical component anywhere that is built on LLMs.

Remember that there are billion dollar usecases where being correct is not important. For example, shopping recommendations, advertizing, search results, image captioning, etc. All of these usecases have humans consuming the output, and LLMs can play a useful role as productivity boosters.

12 days ago

[-]

And none of those are crucial.

His point is that the world is RELIANT on GenAI. This isn't true.

11 days ago

[-]

The full quote from 7:40 in the video: "I think it's kind of fascinating to me that when the state-of-the-art LLMs go down, it's actually kind of like an intelligence brownout in the world. It's kind of like when the voltage is unreliable in the grid, and the planet just gets dumber. The more reliance we have on these models, which already is really dramatic and I think will continue to grow."

I don't think his point was that LLMs are as crucial as the power grid, or even close. He's just saying that he finds the comparison interesting, for whatever reason. If you find it stupid instead, that's okay.

11 days ago

[-]

I'm just saying that the statement "when the state-of-the-art LLMs go down, it's actually kind of like an intelligence brownout in the world" is entirely false.

11 days ago

[-]

All analogies are false, but some are useful.

ukprogrammer

12 days ago

[-]

Even an LLM could tell you that that's an unknowable thing, perhaps you should rely on them more.

12 days ago

[-]

Has a critical service that you used meaningfully changed to seemingly integrate non-deterministic "intelligence" in the past 3 years in one of its critical paths? I'd bet good money that the answer to literally everyone is no.

My company uses GenAI a lot in a lot of projects. Would it have some impact if all models suddenly stopped working? Sure. But the oncalls wouldn't even get paged.

jeffnappi

12 days ago

[-]

Tesla FSD, Waymo are good examples.

12 days ago

[-]

It's fascinating to see his gears grinding at 22:55 when acknowledging that a human still has to review the thousand lines of LLM-generated code for bugs and security issues if they're "actually trying to get work done". Yet these are the tools that are supposed to make us hyperproductive? This is "Software 3.0"? Give me a break.

rwmj

12 days ago

[-]

Plus coding is the fun bit, reviewing code is the hard and not fun bit, arguing with an overconfident machine sound like it'll be worse even than that. Thankfully I'm going to retire soon.

12 days ago

[-]

Agreed. Hell, even reviewing code can be fun and engaging, especially if done in person. But it helps when the other party can actually think, instead of automatically responding with "You're right!", followed by changes that may or may not make things worse.

It's as if software developers secretly hated their jobs and found most tasks a chore, so they hired someone else to poorly do the mechanical tasks for them, while ignoring the tasks that actually matter. That's not software engineering, programming, nor coding. It's some process of producing shitty software for which we need new terminology to describe.

I envy you for retiring. Good luck!

11 days ago

[-]

> Plus coding is the fun bit, reviewing code is the hard and not fun bit

To you. For others, it looks differently. And for yet others, they don't care about the coding nor the reviewing, they want to solve a particular problem.

I'd probably say I'm a programmer by accident. It's not that I love producing binaries by writing and compiling code, but I need to solve some particular problem that either is best solved by programming, or can only be solved by programming. "Programming by need" maybe is a fitting definition.

Doesn't mean I don't care about code quality, or good abstractions and having a reasonable design/architecture. But I'm focused on the end goal, having a particular problem solved, and coding is just the way there (sometimes).

11 days ago

[-]

I can respect that. But reading and writing code, and discussing code with your colleagues, are pretty essential tasks to software development. If you don't enjoy either, then you probably would not enjoy working in the industry.

Which is fine, don't get me wrong. But that would be like if someone wants to work as an automotive engineer, but they only enjoy driving a car. It doesn't work that way. You should enjoy the entire process of manufacturing a car if you want to drive a good one. Sure, you may enjoy some tasks more than others, and this is fine, but you can't ignore the ones you don't. Otherwise you're only doing a disservice to yourself, your team, and the users of what you build.

> Doesn't mean I don't care about code quality, or good abstractions and having a reasonable design/architecture. But I'm focused on the end goal, having a particular problem solved, and coding is just the way there (sometimes).

But coding is just the mechanical part of building software. It's the last step of the process after everything you mentioned is taken into consideration. Everything else is how you ensure that you reach the end goal successfully. So saying that the end goal is your main focus doesn't make sense if you want to actually reach it.

This is why I think that people who enjoy vibe coding today, are not, and will never become software engineers. They want to fast track to the end goal by jumping over the parts that are actually important. Blindly accepting whatever a code generation tool spits out if it passes a quick manual happy path test is not engineering. It's something else that produces much inferior results. At least until these tools get much, much better at it, which still seems far away, and unlikely with the current tech.

11 days ago

[-]

> I can respect that. But reading and writing code, and discussing code with your colleagues, are pretty essential tasks to software development. If you don't enjoy either, then you probably would not enjoy working in the industry.

I've enjoyed all my time in the software industry, especially compared to other professions I did before, like strawberry-picking, or roof-snow removal, or elder-case. It's easily the most relaxing job I've had, even when everything is on fire and you need to bring up production database again, it's so much better than most jobs out there. That the pay is just over-the-top compared to what most of us do, is just a plus.

> This is why I think that people who enjoy vibe coding today, are not, and will never become software engineers

I think I kind of agree with that, I see some people who have zero interest in understanding code, but they want to produce code somehow, today via LLMs/agents and yesterday via no-code platforms. I don't think they're interested in knowing programming, any parts of it, so they try to find workarounds.

What I was trying to say, is that there is maybe a group of developers, like myself, that sit somewhere in-between. If I can solve a problem by not using code, and the trade-offs are OK considering the context, then that's probably my ideal approach. I try to only use code when there is no way around it, or it's the best way.

But I agree that people who will just accept whatever an LLM gives you, are bound to end up in trouble in the future, regardless of improvements of the tooling/models, because spaghetti always sucks, no matter who writes/consumes it.

12 days ago

[-]

Because we are still using code as a proof that needs to be proven. Software 3.0 will not be about reviewing legible code, with its edge-cases and exploits and trying to impersonate hardware.

William_BB

12 days ago

[-]

[flagged]

12 days ago

[-]

It's an interesting presentation, no doubt. The analogies eventually fail as analogies usually do.

A recurring theme presented, however, is that LLM's are somehow not controlled by the corporations which expose them as a service. The presenter made certain to identify three interested actors (governments, corporations, "regular people") and how LLM offerings are not controlled by governments. This is a bit disingenuous.

Also, the OS analogy doesn't make sense to me. Perhaps this is because I do not subscribe to LLM's having reasoning capabilities nor able to reliably provide services an OS-like system can be shown to provide.

A minor critique regarding the analogy equating LLM's to mainframes:

  Mainframes in the 1960's never "ran in the cloud" as it did
  not exist.  They still do not "run in the cloud" unless one
  includes simulators.

  Terminals in the 1960's - 1980's did not use networks.  They
  used dedicated serial cables or dial-up modems to connect
  either directly or through stat-mux concentrators.

  "Compute" was not "batched over users."  Mainframes either
  had jobs submitted and ran via operators (indirect execution)
  or supported multi-user time slicing (such as found in Unix).

distalx

12 days ago

[-]

Hang in there! Your comment makes some really good points about the limits of analogies and the real control corporations have over LLMs.

Plus, your historical corrections were spot on. Sometimes, good criticisms just get lost in the noise online. Don't let it get to you!

furyofantares

12 days ago

[-]

> The presenter made certain to identify three interested actors (governments, corporations, "regular people") and how LLM offerings are not controlled by governments. This is a bit disingenuous.

I don't think that's what he said, he was identifying the first customers and uses.

12 days ago

[-]

>> A recurring theme presented, however, is that LLM's are somehow not controlled by the corporations which expose them as a service. The presenter made certain to identify three interested actors (governments, corporations, "regular people") and how LLM offerings are not controlled by governments. This is a bit disingenuous.

> I don't think that's what he said, he was identifying the first customers and uses.

The portion of the presentation I am referencing starts at or near 12:50[0]. Here is what was said:

  I wrote about this one particular property that strikes me
  as very different this time around.  It's that LLM's like
  flip they flip the direction of technology diffusion that
  is usually present in technology.

  So for example with electricity, cryptography, computing,
  flight, internet, GPS, lots of new transformative that have
  not been around.

  Typically it is the government and corporations that are
  the first users because it's new expensive etc. and it only
  later diffuses to consumer.  But I feel like LLM's are kind
  of like flipped around.

  So maybe with early computers it was all about ballistics
  and military use, but with LLM's it's all about how do you
  boil an egg or something like that.  This is certainly like
  a lot of my use.  And so it's really fascinating to me that
  we have a new magical computer it's like helping me boil an
  egg.

  It's not helping the government do something really crazy
  like some military ballistics or some special technology.

Note the identification of historic government interest in computing along with a flippant "regular person" scenario in the context of "technology diffusion."

You are right in that the presenter identified "first customers", but this is mentioned in passing when viewed in context. Perhaps I should not have characterized this as "a recurring theme." Instead, a better categorization might be:

  The presenter minimized the control corporations have by
  keeping focus on governmental topics and trivial customer
  use-cases.

0 - https://youtu.be/LCEmiRjPEtQ?t=770

furyofantares

12 days ago

[-]

Yeah that's explicitly about first customers and first uses, not about who controls it.

I don't see how it minimizes the control corporations have to note this. Especially since he's quite clear about how everything is currently centralized / time share model, and obviously hopeful we can enter an era that's more analogous to the PC era, even explicitly telling the audience maybe some of them will work on making that happen.

11 days ago

[-]

I took away from this a different message than what I think you did. I respect your perspective of same and that we respectfully disagree.

jppope

12 days ago

[-]

Well that showed up significantly faster than they said it would.

12 days ago

[-]

The team adapted quickly, which is a good sign. I believe getting the videos out sooner (as in why-not-immediately) is going to be a priority in the future.

seneca

12 days ago

[-]

Classic under promise and over deliver.

I'm glad they got it out quickly.

12 days ago

[-]

Me too. It was my favorite talk of the ones I saw.

11 days ago

[-]

I did like it to, but haven't seen any of the others, and usually don't like sitting through talks rather than reading transcripts.

But you got me curious, what other talks from the day/event would be worth watching in your mind?

10 days ago

[-]

I also liked Chelsea Finn's robot talk.

(I didn't see all the talks, so please don't take absence of recommendation as recommendation of absence!)

https://en.m.wikipedia.org/wiki/Well-known_URI

sneak

12 days ago

[-]

Can we please stop standardizing on putting things in the root?

/.well-known/ exists for this purpose.

example.com/.well-known/llms.txt

12 days ago

[-]

You can't just put things there any time you want - the RFC requires that they go through a registration process.

Having said that, this won't work for llms.txt, since in the next version of the proposal they'll be allowed at any level of the path, not only the root.

https://github.com/AnswerDotAI/llms-txt/issues/2#issuecommen...

politelemon

12 days ago

[-]

> You can't just put things there any time you want - the RFC requires that they go through a registration process.

Actually, I can for two reasons. First is of course the RFC mentions that items can be registered after the fact, if it's found that a particular well-known suffix is being widely used. But the second is a bit more chaotic - website owners are under no obligation to consult a registry, much like port registrations; in many cases they won't even know it exists and may think of it as a place that should reflect their mental model.

It can make things awkward and difficult though, that is true, but that comes with the free text nature of the well-known space. That's made evident in the Github issue linked, a large group of very smart people didn't know that there was a registry for it.

12 days ago

[-]

There was no "large group of very smart people" behind llms.txt. It was just me. And I'm very familiar with the registry, and it doesn't work for this particular case IMO (although other folks are welcome to register it if they feel otherwise, of course).

sneak

12 days ago

[-]

I put stuff in /.well-known/ all the time whenever I want. They’re my servers.

dncornholio

12 days ago

[-]

> You can't just put things there any time you want - the RFC requires that they go through a registration process.

Excuse me???

12 days ago

[-]

From the RFC:

""" A well-known URI is a URI [RFC3986] whose path component begins with the characters "/.well-known/", and whose scheme is "HTTP", "HTTPS", or another scheme that has explicitly been specified to use well- known URIs.

Applications that wish to mint new well-known URIs MUST register them, following the procedures in Section 5.1. """

11 days ago

[-]

Keyword being "mint" there. You can still put whatever you want in there, but in order to "register" it, you need to"mint" it by registering it. But you're in no way obligated to register random stuff you put in /.well-known, that'd be bananas to put in a specification like that.

11 days ago

[-]

Applications, not the websites, web services, or such. I read that as: "If you are making an application and you want it to introduce a new convention, then sign up here." (otherwise do whatever you want)

https://github.com/AnswerDotAI/llms-txt/issues/2

12 days ago

[-]

researchai

12 days ago

[-]

I can't believe I googled most of the dishes on the menu every time I went to the Thai restaurant. I've just realised how painful that was when I saw MenuGen!