Python 3.15's JIT is now back on track
430 points
19 hours ago
| 15 comments
| fidget-spinner.github.io
| HN
mattclarkdotnet
11 hours ago
[-]
Python really needs to take the Typescript approach of "all valid Python4 is valid Python3". And then add value types so we can have int64 etc. And allow object refs to be frozen after instantiation to avoid the indirection tax.

Sensible type-annotated python code could be so much faster if it didn't have to assume everything could change at any time. Most things don't change, and if they do they change on startup (e.g. ORM bindings).

reply
BerislavLopac
1 hour ago
[-]
> python code could be so much faster if it didn't have to assume everything could change at any time

Definitely, but then it wouldn't be Python. One of the core principles of Python's design is to be extremely dynamic, and that anything can change at any time.

There are many other, pretty good, strictly dynamically typed languages which work just as well if not better than Python, for many purposes.

reply
oblio
1 hour ago
[-]
I feel that this excuse is being trotted out too much. Most engineers never get to choose the programming language used for 90% of their professional projects.

And when Python is a mainstream language on top of which large, globally known websites, AI tools, core system utilities, etc are built, we should give up the purity angle and be practical.

Even the new performance push in Python land is a reflection of this. A long time ago some optimizations were refused in order to not complicate the default Python implementation.

reply
drob518
1 hour ago
[-]
You’re always free to create your own Python-like language that caters more toward your goals. No excuses, then.
reply
oblio
52 minutes ago
[-]
If you're a contributor to Python, my apologies.
reply
drob518
21 minutes ago
[-]
I’m not a Python contributor, so no need to apologize to me. But if you have strong ideas about what Python should be, perhaps you should step up and contribute that code rather than saying that others are offering excuses for why they won’t deliver what you want. I have worked on other open source projects where users were very entitled, to the point of demanding that the project team deliver them certain features. It’s not fun. It’s ironic that open source often brings out both the best and the worst in people. Suggesting changes and new features is fine, even critical to a strong roadmap. But we all need to realize that maintainers may have other goals and there’s no obligation on their part to implement anything. The beauty of open source is that you can customize or fork as much as you want to match your goals. But then you’re responsible for doing the work and if your changes are public you may have your own set of users demanding their own favorite changes.
reply
mattclarkdotnet
10 hours ago
[-]
To clarify, it is nuts that in an object method, there is a performance enhancement through caching a member value.

  class SomeClass
    def init(self)
      self.x = 0
    def SomeMethod(self)
      q = self.x
      ## do stuff with q, because otherwise you're dereferencing self.x all the damn time
reply
1718627440
3 hours ago
[-]
This is not just a performance concern, this describes completely different behaviour. You forgot that self.x is just Class.__getattr__(self, 'x') and that you can implement __getattr__ how you like. There is no object identity across the values returned by __getattr__.
reply
dekelpilli
9 hours ago
[-]
Java also has a performance cost to accessing class fields, as exampled by this (now-replaced) code in the JDK itself - https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/...
reply
anematode
8 hours ago
[-]
Any decent JIT compiler (and HotSpot's is world class) will optimize this out. Likely this was done very early on in development, or was just to reduce bytecode size to promote inlining heuristics that use it
reply
vanderZwan
3 hours ago
[-]
String is also a pretty damn fundamental object, and I'm sure trim() calls are extremely common too. I wouldn't be surprised if making sure that seemingly small optimizations like this are applied in the interpreter before the JIT kicks are not premature optimizations in that context.

There might be common scenarios where this had a real, significant performance impacts, E.G. use-cases where it's such a bottle-neck in the interpreter that it measurably affects warm-up time. Also, string manipulation seems like the kind of thing you see in small scripts that end before a JIT even kicks in but that are also called very often (although I don't know how many people would reach for Java in that case.

EDIT: also, if you're a commercial entity trying to get people to use your programming language, it's probably a good idea to make the language perform less bad with the most common terrible code. And accidentally quadratic or worse string manipulation involving excessive calls to trim() seems like a very likely scenario in that context.

reply
LtWorf
7 hours ago
[-]
But what if whatever you call is also accessing and changing the attribute?
reply
anematode
7 hours ago
[-]
If what you call gets inlined, then the compiler can see that it either does or doesn't modify the attribute and optimize it accordingly. Even virtual calls can often be inlined via, e.g., class hierarchy analysis and inline caches.

If these analyses don't apply and the callee could do anything, then of course the compiler can't keep the value hoisted. But a function call has to occur anyway, so the hoisted value will be pushed/popped from the stack and you might as well reload it from the object's field anyway, rather than waste a stack slot.

reply
LtWorf
16 minutes ago
[-]
Another thread can access it and do that, how could the compiler possibly know about it?
reply
Kamii0909
5 hours ago
[-]
That was a niche optimization primarily targeting code at intepretor. Even the most basic optimizing compiler in HotSpot tiered compilation chain at that time (the client compiler or C1) would be able to optimize that into the register. Since String is such an important class, even small stuffs like this is done.
reply
duskdozer
6 hours ago
[-]
You mean even if x is not a property?
reply
mathisfun123
9 hours ago
[-]
> it is nuts that in an object method, there is a performance enhancement through caching a member value

i don't understand what you think is nuts about this. it's an interpreted language and the word `self` is not special in any way (it's just convention - you can call the first param to a method anything you want). so there's no way for the interpreter/compiler/runtime to know you're accessing a field of the class itself (let alone that that field isn't a computed property or something like that).

lots of hottakes that people have (like this one) are rooted in just a fundamental misunderstanding of the language and programming languages in general <shrugs>.

reply
mattclarkdotnet
9 hours ago
[-]
What's nuts is that the language doesn't guarantee that successive references to the same member value within the same function body are stable. You can look it up once, go off and do something else, and look it up again and it's changed. It's dynamism taken to an unnecessary extreme. Nobody in the real world expects this behaviour. Making it just a bit less dynamic wouldn't change the fundamentals of the language but it would make it a lot more tractable.
reply
1718627440
4 hours ago
[-]
> What's nuts is that the language doesn't guarantee that successive references to the same member value within the same function body are stable. You can look it up once, go off and do something else, and look it up again and it's changed.

There is no such thing as 'successive references to the same member value' here. It's not that you look up the same object and it can change, it's that you are not referring to the same object at all.

self.x is actually self.__getattr__('x'), which can in fact return a different thing each time. `self.x` IS a string lookup and that is not an implementation detail, but a major design goal. This is the dynamism, that is one of the selling points of Python, it allows you to change and modify interfaces to reflect state. It's nice for some things and it is what makes Python Python. If you don't want that, use another language.

reply
gpderetta
1 hour ago
[-]
ok, then it is nuts that __getattr__ (itself a specially blessed function) is not required to be pure at least from the caller point of view.
reply
rtpg
8 hours ago
[-]
In Python attribute access aren't stable! `self.x` where `x` is a property is not guaranteed to refer to the same thing.

And getting rid of descriptors would be a _fundamental change to the language_. An immeense one. Loads of features are built off of descriptors or descriptor-like things.

And what you're complaining about is also not true in Javascript world either... I believe you can build descriptor-like things in JS now as well.

_But_ if you want that you can use stuff like mypyc + annotations to get that for you. There are tools that let you get to where you want. Just not out of the box because Python isn't that language.

Remember, this is a scripting language, not a compiled language. Every optimization for things you talk about would be paid on program load (you have pyc stuff but still..)

Gotta show up with proof that what you're saying is verifiable and works well. Up until ~6 or 7 years ago CPython had a concept of being easy to onboard onto. Dataflow analyses make the codebase harder to deal with.

Having said all of that.... would be nice to just inline RPython-y code and have it all work nicely. I don't need it on everything and proving safety is probably non-trivial but I feel like we've got to be closer to doing this than in the past.

I ... think in theory the JIT can solve for that too. In theory

reply
Someone
7 hours ago
[-]
> What's nuts is that the language doesn't guarantee that successive references to the same member value within the same function body are stable.

The language supports multiple threads and doesn’t have private fields (https://docs.python.org/3/tutorial/classes.html#private-vari...), so the runtime cannot rule out that the value gets changed in-between.

And yes, it often is obvious to humans that’s not intended to happen, and almost never what happens, but proving that is often hard or even impossible.

reply
gpderetta
1 hour ago
[-]
wouldn't a concurrent change without synchronization be UB anyway? Also parent wants to cache the address, not the value (but you have to cache the value if you want to optimize manually)
reply
fulafel
8 hours ago
[-]
> Nobody in the real world expects this behaviour.

For example, numbers and strings are immutable objects in Python. If self.x is a number and its numeric value is changed by a method call, self.x will be a different object after that. I'd dare say people expect this to work.

reply
codesnik
8 hours ago
[-]
basically all object oriented languages work like that. You access a member; you call a method which changes that member; you expect that change is visible lower in the code, and there're no statically computable guarantees that particular member is not touched in the called method (which is potentially shadowed in a subclass). It's not dynamism, even c++ works the same, it's an inherent tax on OOP. All you can do is try to minimize cost of that additional dereference. I'm not even touching threads here.

now, functional languages don't have this problem at all.

reply
cherryteastain
5 hours ago
[-]
OOP has nothing to do with it. In your C++ example, foo(bar const&); is basically the same as bar.foo();. At the end of the day, whether passing it in as an argument or accessing this via the method call syntax it's just a pointer to a struct. Not to mention, a C++ compiler can, and often does, choose to put even references to member variables in registers and access them that way within the method call.

This is a Python specific problem caused by everything being boxed by default and the interpreter does not even know what's in the box until it dereferences it, which is a problem that extends to the "self" object. In contrast in C++ the compiler knows everything there's to know about the type of this which avoids the issue.

reply
adrian17
4 hours ago
[-]
That's not true. I mean: it's true that it has little to do with OOP, but most imperative languages (only exception I know is Rust) have the issue, it's not "Python specific". For example (https://godbolt.org/z/aobz9q7Y9):

struct S { const int x; int f() const; }; int S::f() const { int a = x; printf("hello\n"); int b = x; return a-b; }

The compiler can't reuse 'x' unless it's able to prove that it definitely couldn't have changed during the `printf()` call - and it's unable to prove it. The member is loaded twice. C++ compilers can usually only prove it for trivial code with completely inlined functions that doesn't mutate any external state, or mutates in a definitely-not-aliasing way (strict aliasing). (and the `const` don't do any difference here at all)

In Python the difference is that it can basically never prove it at all.

reply
1718627440
4 hours ago
[-]
> This is a Python specific problem caused by everything being boxed by default and the interpreter does not even know what's in the box until it dereferences it

That's not the whole thing, what is going on. Every attribute access is a function call to __getattr__, that can return whatever object it wants.

bar.foo (...) is actually bar.__getattr__ ('foo') (bar, ...)

This dynamism is what makes Python Python and it allows you to wrap domain state in interface structure.

reply
josefx
4 hours ago
[-]
> This is a Python specific problem caused by everything being boxed

I would say it is part python being highly dynamic and part C++ being full of undefined behavior.

A c++ compiler will only optimize member access if it can prove that the member isn't overwritten in the same thread. Compatible pointers, opaque method calls, ... the list of reasons why that optimization can fail is near endless, C even added the restrict keyword because just having write access to two pointers of compatible types can force the compiler to reload values constantly. In python anything is a function call to some unknown code and any function could get access to any variable on the stack (manipulating python stack frames is fun).

Then there is the fun thing the C++ compiler gets up to with varibles that are modified by different threads, while(!done) turning into while(true) because you didn't tell the compiler that done needs to be threadsafe is always fun.

reply
1718627440
3 hours ago
[-]
What is going on here is not, that an attribute might be changed concurrently and the interpreter can't optimize the access. That is also a consideration. But the major issue is that an attribute doesn't really refer to a single thing at all, but instead means whatever object is returned by a function call that implements a string lookup. __getattr__ is not an implementation detail of the language, but something that an object can implement how it wants to, just like __len__ or __gt__. It's part of the object behaviour, not part of the static interface. This is a fundamental design goal of the Python language.
reply
mathisfun123
7 hours ago
[-]
> same member value within the same function body are stable

Did you miss the part where I explained to you there's no way to identify that it's a member variable?

> Nobody in the real world expects this behaviour

As has already been explained to you by a sibling comment you are in fact wrong and there are in fact plenty of people in the real world who do actually expect this behavior.

So I'll repeat myself: lots of hottakes from just pure. Unadulterated, possibly willful, ignorance.

reply
coldtea
5 hours ago
[-]
The above is a very thick response that doesn't address the parent's points, just sweeps them under the rag with "that's just how it was designed/it works".

"Did you miss the part where I explained to you there's no way to identify that it's a member variable?"

No, you you did miss the case where that in itself can be considered nuts - or at least an unfortunate early decision.

"this just how things are dunn around diz here parts" is not an argument.

reply
1718627440
3 hours ago
[-]
> No, you you did miss the case where that in itself can be considered nuts - or at least an unfortunate early decision.

This is not a side implementation detail, that they got wrong, this is a fundamental design goal of Python. You can find that nuts, but then just don't use Python, because that is (one of) that things, that make Python Python.

reply
mathisfun123
3 hours ago
[-]
> considered nuts - or at least an unfortunate early decision

Please explain to us then how exactly you would infer a variable with an arbitrary name is actually a reference to the class instance in an interpreted language.

reply
coldtea
2 hours ago
[-]
>Please explain to us then how exactly you would infer a variable with an arbitrary name is actually a reference to the class instance in an interpreted language.

Did I stutter when I wrote about "an unfortunate early decision"? Who said it has to be "an arbitrary name"?

Even so, you could add a bloody marker announcing an arbitrary name (which 99% would be self anyway) as so, as an instruction to the interpreter. If it fails, it fails, like countless other things that can fail during runtime in Python today.

reply
NetMageSCW
14 minutes ago
[-]
But now you are no longer talking about the way Python works, but the way you want Python to work - and that has nothing to do with Python.
reply
EE84M3i
4 hours ago
[-]
> the word `self` is not special in any way (it's just convention - you can call the first param to a method anything you want).

The name `self` is a convention, yes, but interestingly in python methods the first parameter is special beyond the standard "bound method" stuff. See for example PEP 367 (New Super) for how `super()` resolution works (TL;DR the super function is a special builtin that generates extra code referencing the first parameter and the lexically defining class)

reply
bmitc
9 hours ago
[-]
I don't think it's a hot take to say much of Python's design is nuts. It's a very strange language.
reply
stabbles
6 hours ago
[-]
That was how the Mojo language started. And then soon after the hype they said that being a superset of Python was no longer the goal. Probably because being a superset of Python is not a guarantee for performance either.
reply
Hendrikto
1 hour ago
[-]
Being a superset would mean all valid Python 3 is valid Python 4. A valuable property for sure, but not what OP suggested. In fact, it is the exact opposite.
reply
giancarlostoro
59 minutes ago
[-]
I went sort of this route in an experiment with Claude.. I really want Python for .NET but I said, damn the expense, prioritize .NET compatibility, remove anything that isn't supported feasably. It means 0 python libs, but all of NuGet is supported. The rules are all signatures need types, and if you declare a type, it is that type, no exceptions, just like in C# (if you squint when looking at var in a funny way). I wound up with reasonable results, just a huge trade of the entire Python ecosystem for .NET with an insanely Python esque syntax.

Still churning on it, will probably publish it and do a proper blog post once I've built something interesting with the language itself.

reply
coredog64
12 minutes ago
[-]
IronPython -> TitaniumPython?
reply
fermigier
2 hours ago
[-]
I have made some experiments with P2W, my experimental Python (subset) to WASM compiler. Initial figures are encouraging (5x speedup, on specific programs).

https://github.com/abilian/p2w

NB: some preliminary results:

  p2w is 4.03x SLOWER than gcc (geometric mean)

  p2w is 5.50x FASTER than cpython (geometric mean)

  p2w is 1.24x FASTER than pypy (geometric mean)
reply
wolvesechoes
5 hours ago
[-]
> Python really needs to take the Typescript approach of "all valid Python4 is valid Python3"

It is called type hints, and is already there. TS typing doesn't bring any perf benefits over plain JS.

reply
stabbles
5 hours ago
[-]
You really need dedicated types for `int64` and something like `final`. Consider:

    class Foo:
      __slots__ = ("a", "b")
      a: int
      b: float
there are multiple issues with Python that prevent optimizations:

* a user can define subtype `class my_int(int)`, so you cannot optimize the layout of `class Foo`

* the builtin `int` and `float` are big-int like numbers, so operations on them are branchy and allocating.

and the fact that Foo is mutable and that `id(foo.a)` has to produce something complicates things further.

reply
wolvesechoes
4 hours ago
[-]
Maybe, but I quoted specific part I was replying to. TS has no impact on runtime performance of JS. Type hints in Python have no impact on runtime performance of Python (unless you try things like mypyc etc; actually, mypy provides `from mypy_extensions import i64`)

Therefore Python has no use for TS-like superset, because it already has facilities for static analysis with no bearing on runtime, which is what TS provides.

reply
wiseowise
4 hours ago
[-]
What OP means is that they need to:

1) Add TS like language on top of Python in backwards compatible way

2) Introduce frozen/final runtime types

3) Use 1 and 2 to drive runtime optimizations

reply
wolvesechoes
2 hours ago
[-]
Still makes no sense. OP demands introduction of different runtime semantics, but this doesn't require adding more language constructs (TS-like superset). Current type hints provide all necessary info on the language level, and it is a matter of implementation to use them or not.

From all posts it looks like what OP wants is a different language that looks somewhat like Python syntax-wise, so calling for "backwards-compatible" superset is pointless, because stuff that is being demanded would break compatibility by necessity.

reply
bloppe
10 hours ago
[-]
But that's just not what python is for. Move your performance-critical logic into a native module.
reply
wiseowise
4 hours ago
[-]
I’ll be happy if over night all Python code in the world can reap 10-100x performance benefits without changing much of a codebase, you can continue having soup of multiple languages.
reply
bloppe
8 minutes ago
[-]
Me too, but changing the referential semantics would be a massive breaking change. That doesn't qualify as "without changing much of a codebade". And tacking on a giant new orthogonal type system to avoid breaking existing code would be akin to creating a new language. Why bother when you can just write Python modules in Rust.
reply
drob518
43 minutes ago
[-]
I’d like to be good looking and drive a Ferrari. But that probably isn’t going to happen, either.
reply
BerislavLopac
1 hour ago
[-]
Any program written in Python of any significant size is literally a soup of multiple languages.
reply
LtWorf
7 minutes ago
[-]
There's no project that isn't like that.
reply
mattclarkdotnet
9 hours ago
[-]
Performance is one part of the discussion, but cleanliness is another. A Python4 that actually used typing in the interpreter, had value types, had a comptime phase to allow most metaprogramming to work (like monkey patching for tests) would be great! It would be faster, cleaner, easier to reason about, and still retain the great syntax and flexibility of the language.
reply
BerislavLopac
1 hour ago
[-]
> A Python4 that actually used typing in the interpreter, had value types, had a comptime phase to allow most metaprogramming to work (like monkey patching for tests) would be great! It would be faster, cleaner, easier to reason about, and still retain the great syntax and flexibility of the language.

And what prevents someone from designing such a language?

reply
LtWorf
5 minutes ago
[-]
PSF has full time employees. If someone else does it as a personal project it would remain a personal project and we'd never hear about it.
reply
mechsy
8 hours ago
[-]
I too see potential in this - it started feeling a bit weird in recent years switching between Go, Python and Rust codebases with Python code looking more and more like a traditional statically typed language and not getting the performance benefits. I know I know, there are libraries and frameworks which make heavy use of fun stuff you can do with strings (leading to the breakdown of even the latest and greatest IDE tooling and red squiggly lines all over you code) and don’t get me started on async etc.

Funnily enough I’ve found Python to be excellent for modelling my problem domain with Pydantic (so far basically unparalleled, open for suggestions in Go/Rust), while the language also gets out of my way when I get creative with list expressions and the like. So overall, still it is extremely productive for the work I’m doing, I just need to spin up more containers in prod.

reply
panzi
10 hours ago
[-]
Isn't rpython doing that, allowing changes on startup and then it's basically statically typed? Does it still exist? Was it ever production ready? I only once read a paper about it decades ago.
reply
mattclarkdotnet
10 hours ago
[-]
RPython is great, but it changes semantics in all sorts of ways. No sets for example. WTF? The native Set type is one of the best features of Python. Tuples also get mangled in RPython.
reply
rich_sasha
10 hours ago
[-]
I think sadly a lot of Python in the wild relies heavily, somewhere, on the crazy unoptimisable stuff. For example pytest monkey patches everything everywhere all the time.

You could make this clean break and call it Python 4 but frankly I fear it won't be Python anymore.

reply
NetMageSCW
12 minutes ago
[-]
Perl 6 showed what happens when you do something like that.
reply
fyrn_
7 hours ago
[-]
As a person who has spent a lot of time with pytest, I'm ready for testing framework that doesn't do any of that non-obvious stuff. Generally use unittest as much as I can these days, so much less _wierd_ about how it does things. Like jeeze pytest, do you _really_ need to stress test every obscure language feature? Your job is to call tests.
reply
mattclarkdotnet
9 hours ago
[-]
Allowing metaprogramming at module import (or another defined phase) would cover most monkey patching use cases. From __future__ import python4 would allow developers to declare their code optimisable.
reply
musicale
8 hours ago
[-]
> Python really needs to take the Typescript approach of "all valid Python4 is valid Python3

Great idea, but I'm not convinced that they learned anything from the Python 2 to 3 transition, so I wouldn't hold my breath.

If you want a language system without contempt for backward compatibility, you're probably better off with Java/C++/JavaScript/etc. (though using JS libraries is like building on quicksand.) Bit of a shame since I want to like Python/Rust/Swift/other modern-ish languages, but it turns out that formal language specifications were actually a pretty good idea. API stability is another.

reply
musicale
7 hours ago
[-]
is that you, python core dev team? ;-)
reply
dobremeno
4 hours ago
[-]
SPy [1] is a new attempt at something like this.

TL;DR: SPy is a variant of Python specifically designed to be statically compilable while retaining a lot of the "useful" dynamic parts of Python.

The effort is led by Antonio Cuni, Principal Software Engineer at Anaconda. Still very early days but it seems promising to me.

[1] https://github.com/spylang/spy

reply
BiteCode_dev
4 hours ago
[-]
There will be not Python 4, and 3.X policy requires forward compat, so we are already there.
reply
mattclarkdotnet
10 hours ago
[-]
Oh, and while we're at it, fix the "empty array is instantiated at parse time so all your functions with a default empty array argument share the same object" bullshit.
reply
zahlman
7 hours ago
[-]
We don't call them "arrays".

It has nothing to do with whether the list is empty. It has nothing to do with lists at all. It's the behaviour of default arguments.

It happens at the time that the function object is created, which is during runtime.

You only notice because lists are mutable. You should already prefer not to mutate parameters, and it especially doesn't make sense to mutate a parameter that has a default value because the point of mutating parameters is that the change can be seen by the caller, but a caller that uses a default value can't see the default value.

The behaviour can be used intentionally. (I would argue that it's overused intentionally; people use it to "bind" loop variables to lambdas when they should be using `functools.partial`.)

If you're getting got by this, you're fundamentally expecting Python to work in a way that Pythonistas consider not to make sense.

reply
Revisional_Sin
5 hours ago
[-]
It's best practice to avoid mutable defaults even if you're not planning to mutate the argument.

It's just slightly annoying having to work around this by defaulting to None.

reply
Izkata
10 hours ago
[-]
Execution time, not parse time. It's a side effect of function declarations being statements that are executed, not the list/dict itself. It would happen with any object.
reply
mattclarkdotnet
9 hours ago
[-]
It's still ridiculous. A hypothetical Python4 would treat function declarations as declarations not executable statements, with no impact on real world code except to remove all the boilerplate checks.
reply
zahlman
7 hours ago
[-]
There is no such thing as a "function declaration" in Python. The keyword is "def", which is the first three letters of the word "define" (and not a prefix of "declare"), for a reason.

The entire point of it being an executable statement is to let you change things on the fly. This is key to how the REPL works. If I have `def foo(): ...` twice, the second one overwrites the first. There's no need to do any checks ahead of time, and it works the same way in the REPL as in a source file, without any special logic, for the exact same reason that `foo = 1` works when done twice. It's actually very elegant.

People who don't like these decisions have plenty of other options for languages they can use. Only Python is Python. Python should not become not-Python in order to satisfy people who don't like Python and don't understand what Python is trying to be.

reply
1718627440
3 hours ago
[-]
You are describing a completely different language, that differs in very major ways from Python. You can of course create that, but please don't call it Python 4 !
reply
boxed
8 hours ago
[-]
You think so but then you write a function with a default argument pointing to some variable that is a list and now suddenly the semantics of that are... what?
reply
codesnik
7 hours ago
[-]
you could just treat argument initialization as an executable expression which is called every time you call a function. If you have a=[], then it's a new [] every time. If a=MYLIST then it's a reference to the same MYLIST. Simple. And most sane languages do it this way, I really don't know why python has (and maintain) this quirk.
reply
1718627440
3 hours ago
[-]
What are the semantics of the following:

    b = ComplexObject (...)
    # do things with b

    def foo (self, arg=b):
        # use b

    return foo
Should it create a copy of b every time the function is invoked? If you want that right now, you can just call b.copy (), when you always create that copy, then you can not implement the current choice.

Should the semantic of this be any different? :

    def foo (self, arg=ComplexObject (...)):
Now imagine a:

    ComplexObject = list
reply
codesnik
2 hours ago
[-]
I wonder, why that kind of ambiguity or complexity even comes to your mind at all. Just because python is weird?

def foo(self, arg=expression):

could, and should work as if it was written like this (pseudocode)

def foo(self, arg?): if is_not_given(arg): arg=expression

if "expression" is a literal or a constructor, it'd be called right there and produce new object, if "expression" is a reference to an object in outer scope, it'd be still the same object.

it's a simple code transformation, very, very predictable behavior, and most languages with closures and default values for arguments do it this way. Except python.

reply
1718627440
2 hours ago
[-]
What you want is for an assignment in a function definition to be a lambda.

  def foo (self, arg=lambda : expression):
Assignment of unevaluated expressions is not a thing yet in Python and would be really surprising. If you really want that, that is what you get with a lambda.

> most languages with closures and default values for arguments do it this way.

Do these also evaluate function definitions at runtime?

reply
codesnik
1 hour ago
[-]
yes they do. check ruby for example.
reply
mattclarkdotnet
9 hours ago
[-]
Let's not get started on the cached shared object refs for small integers....
reply
zahlman
7 hours ago
[-]
What realistic use case do you have for caring about whether two integers of the same value are distinct objects? Modern versions of Python warn about doing unpredicatble things with `is` exactly because you are not supposed to do those things. Valid use cases for `is` at all are rare.
reply
thaumasiotes
5 hours ago
[-]
> Valid use cases for `is` at all are rare.

There might not be that many of them, depending on how you count, but they're not rare in the slightest. For example, you have to use `is` in the common case where you want the default value of a function argument to be an empty list.

reply
exyi
1 hour ago
[-]
If you change this you break a common optimization:

https://github.com/python/cpython/blob/3.14/Lib/json/encoder...

Default value is evaluated once, and accessing parameter is much cheaper than global

reply
zeratax
4 hours ago
[-]
there is PEP 671 for that, which introduces extra syntax for the behavior you want. people rely on the current behavior so you can't really change it
reply
adrian17
17 hours ago
[-]
I'm been occasionally glancing at PR/issue tracker to keep up to date with things happening with the JIT, but I've never seen where the high level discussions were happening; the issues and PRs always jumped right to the gritty details. Is there anywhere a high-level introduction/example of how trace projection vs recording work and differ? Googling for the terms often returns CPython issue tracker as the first result, and repo's jit.md is relatively barebones and rarely updated :(

Similarly, I don't entirely understand refcount elimination; I've seen the codegen difference, but since the codegen happens at build time, does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs? With so many opcodes and their specialized variants, how many stencils are there now?

reply
kenjin4096
3 hours ago
[-]
> I've never seen where the high level discussions were happening

Thanks for your interest. This is something we could improve on. We were supposed to document the JIT better in 3.15, but right now we're crunching for the 3.15 release. I'll try to get to updating the docs soon if there's enough interest. PEP 744 does not document the new frontend.

I wrote a somewhat high-level overview here in a previous blog post https://fidget-spinner.github.io/posts/faster-jit-plan.html#...

> does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs?

This is a great question, the answer is not exactly! The key is to expose the refcount ops in the intermediate representation (IR) as one single op. For example, BINARY_OP becomes BINARY_OP, POP_TOP (DECREF), POP_TOP (DECREF). That way, instead of optimizing for n operations, we just need to expose refcounting of n operations and optimize only 1 op (POP_TOP). Thus, we just need to refactor the IR to expose refcounting (which was the work I divided up among the community).

If you have any more questions, I'm happy to answer them either in public or email.

reply
kenjin4096
37 minutes ago
[-]
Update: I put up a PR to document the trace recording interpreter https://github.com/python/cpython/pull/146110
reply
flakes
17 hours ago
[-]
You’ll probably want to look to the PEPs. Havent dug into this topic myself but looks related https://peps.python.org/pep-0744/
reply
adrian17
17 hours ago
[-]
I think CPython already had tier2 and some tracing infrastructure when the copy-and-patch JIT backend was added; it's the "JIT frontend" that's more obscure to me.
reply
rtpg
8 hours ago
[-]
discussions might be happening on the Python forums, which are pretty active.

https://discuss.python.org/t/pep-744-jit-compilation/50756/8... here's one thing

I do think you can also just outright ask questions about it on the forums and you'll get some answers.

At the end of the day there's only so many people working on this though.

reply
saikia81
16 hours ago
[-]
have you read the dev mailing list? There the developers of python discuss lots.
reply
pansa2
16 hours ago
[-]
There isn’t a dev mailing list any more, is there? Do you mean the Discord forum?
reply
sheepscreek
16 hours ago
[-]
UPDATE: I misunderstood the question :-/ You can ignore this.

I love playing with compilers for fun, so maybe I can shed some light. I’ll explain it in a simplified way for everyone’s benefit (going to ignore the stack):

When an object is passed between functions in Python, it doesn’t get copied. Instead, a reference to the object’s memory address is sent. This reference acts as a pointer to the object’s data. Think of it like a sticky note with the object’s memory address written on it. Now, imagine throwing away one sticky note every time a function that used a reference returns.

When an object has zero references, it can be freed from memory and reused. Ensuring the number of references, or the “reference count” is always accurate is therefore a big deal. It is often the source of memory leaks, but I wouldn’t attribute it to a speed up (only if it replaces GC, then yes).

reply
yuliyp
16 hours ago
[-]
what at all does this comment have to do with what it's replying to?
reply
sheepscreek
14 hours ago
[-]
I misread the original comment, thinking it was a question about what is refcount elimination, than how it affects the JIT's performance(?).
reply
owaislone
15 hours ago
[-]
Oh man, Python 2 > 3 was such a massive shift. Took almost half a decade if not more and yet it mainly changing superficial syntax stuff. They should have allowed ABIs to break and get these internal things done. Probably came up with a new, tighter API for integrating with other lower level languages so going forward Python internals can be changed more freely without breaking everything.
reply
scorpioxy
15 hours ago
[-]
The text encoding stuff wasn't a small change considering what it could break, at least. And remember we're sometimes talking about software that would cost a lot of money to migrate or upgrade. I still maintain some 2.x python code-bases that will be very expensive to migrate and the customer is not willing to invest that money.

Although your general sentiment is something I agree with(if it's going to be painful do it and get it over with), I don't believe anybody knew or could've guessed what the reaction of the ecosystem would be.

Your last point about being able to change internals more freely is also great in theory but very difficult(if not impossible) to achieve in practice.

I don't know. Having maintained some small projects that were free and open source, I saw the hostility and entitlement that can come from that position. And those projects were a spec of dust next to something like Python. So I think the core team is doing the best they can. It was always going to be damned if you do, damned if you don't.

reply
eru
12 hours ago
[-]
> I still maintain some 2.x python code-bases that will be very expensive to migrate and the customer is not willing to invest that money.

Slight tangent: if Claude can decimate IBM stock price by migrating off Cobol for cheap, surely we can do Python 2 to 3 now, too?

About the internals: we sort of missed an opportunity there, but back then there also didn't quite know what they were doing (or at least we have better ideas of what's useful today). And making the step from 2 to 3 even bigger might have been a bad idea?

reply
scorpioxy
11 hours ago
[-]
I wasn't aware that migrating projects off Cobol has become cheap and it would only take a Claude subscription.

In my experience, the problem had always been maintaining the business logic and any integrations with third-party software that also may be running legacy code-bases or have been abandoned. It can get quite complicated, from what I've seen. Now of course if you're talking about well maintained code-bases with 100%, or close to 100% test coverage, and that includes the integration part along with having the ability to maintain the user experience and/or user interface then yes it becomes a relatively easy process of "just write the code". But, in my experience, this has never been the case.

For the 2.x code-bases I maintain, the customers simply doesn't want to pay for any of it. They might choose to at a later time, but so far it has been more cost effective for them to pay me to maintain that legacy code than pay to have it migrated. Other customers have different needs and thus budget differently.

I'll refrain from judging if 2 to 3 was a missed opportunity or not. I believe the core team does actually know what they're doing and that any decision would've been criticized.

reply
Tempest1981
11 hours ago
[-]
IBM shares fell 13% in a single day in last month:

"IBM Sinks Most Since 2000 as Anthropic Touts Cobol Tool"

https://finance.yahoo.com/news/ibm-sinks-most-since-2000-210...

It may not be "cheap", but possibly cheaper than IBM's consulting.

reply
scorpioxy
11 hours ago
[-]
I skip news like that. It's an AI business hyping one of their tools in a major AI hype-cycle. Shares can go up and down based on sentiment. My point still stands.

To me, there's a big difference between saying that migration projects can now be assisted with some AI tooling and saying that it is cheap and to just get Claude to do it.

Maybe I am out of touch but the former is realistic and the latter is just magical hand-waving.

reply
PurpleRamen
3 hours ago
[-]
Share-pricing operates on illusions. Just selling a plausible claim can influence the price. Whether they will deliver at the end, doesn't matter at that moment.
reply
eru
3 hours ago
[-]
Feel free to correct the market and make oodles of money.
reply
kelipso
2 hours ago
[-]
Risk my money based on a bunch of wallstreetbets idiots yoloing their money using a random number generator and seeing the word AI on twitter posts, sure lol. I’ll let you play in that cesspool.
reply
eru
1 hour ago
[-]
I am!
reply
Marazan
6 hours ago
[-]
IBM share price is back to where it was pre-Anthropic press release.
reply
thaumasiotes
5 hours ago
[-]
Sure, but imagine how much higher it would have gone in the counterfactual world where Anthropic didn't have an automatic port-from-Cobol tool.
reply
Maxion
5 hours ago
[-]
Remember that those who trade on the stock market are not programmers with decades of experience writing cobol.
reply
eru
10 hours ago
[-]
> I believe the core team does actually know what they're doing and that any decision would've been criticized.

I agree with the latter. About the former: they probably made a good decisions given the information available at the time. I mean that nowadays they know more than they did in the past.

reply
CJefferson
8 hours ago
[-]
Absoultely, I had a 2 -> 3 code base I'd mostly given up on, and Claude was amazing. It even re-wrote some libraries I used without py3 versions, decided to just write the parts of the libraries I needed.

It does much better with good tests. In my case the output was a statically generated website, so I could just say 'make the same website, given these inputs'.

reply
smcl
12 hours ago
[-]
I cannot believe people are still acting like Python 2->3 was a huge fuck-up and an enormous missed opportunity. When in reality Python is by most measures the most popular language and became so AFTER that switch.

Since the switch we have seen enormous companies being built from scratch. There is no reason for anyone to be complaining about it being too hard to upgrade in 2026

reply
rtpg
8 hours ago
[-]
Living through it... Python 3 made a lot of changes for the better but 3.0 in particular included a bunch of unforced errors that made it too hard for people to upgrade in one go.

It wasn't until much later (I would say 3.4 or 3.5?) that we had good tooling to allow for migrating from Python 2 to Python 3 gradually, which is what most tools needed to do.

The final thing that made Python upgrading easy was making a bunch of changes (along with stuff like six) so that you could write code that would run identically in Python 2 and Python 3. That lets you do refactors over time, little cleanups, and not have the huge "move to Python 3" commit.

reply
badsectoracula
10 hours ago
[-]
> Python is by most measures the most popular language and became so AFTER that switch

The switch had nothing to do with Python's rise in popularity though, it was because of NumPy and later PyTorch being adopted by data scientist and later machine learning tasks that themselves became very popular. Python's popularity rose alongside those.

> There is no reason for anyone to be complaining about it being too hard to upgrade in 2026

The "complaints" are about unnecessary and pointless breakage, that was very difficult for many codebases to upgrade for years. That by now most of these codebases have been either abandoned, upgraded or decided to stick with Python2 until the end of time doesn't mean these pains didn't happen nor that the language's developers inflicting them to their users were a good idea because some largely unrelated external factors made the language popular several years later.

reply
Izkata
9 hours ago
[-]
> that was very difficult for many codebases to upgrade for years.

In case people have forgotten: python 3.3 through 3.5 (and 3.6 I think) each had to reintroduce something that was removed to make the upgrade easier. Jumping from 2.7 to 3.3 (or higher depending on what you needed) was the recommended route because of this, it was less work than going to 3.0, 3.1, or 3.2

reply
20k
11 hours ago
[-]
It took a long time for python 3 to add the necessary backwards compatibility features to allow people to switch over. Once they did it was fine, but it was a massive fuck up until then. The migration took far longer than it should have done

Its widely regarded as a disaster for good reason, that forced some corrections in python to fix it. Just because its fine now, does not mean it was always fine

reply
LtWorf
1 minute ago
[-]
Now they just break stuff every release so we never relax.
reply
bmitc
9 hours ago
[-]
Those are unrelated.
reply
nurettin
9 hours ago
[-]
The biggest (and worst planned) change was module names. Your imports didn't work, forcing hacks like

    if sys.version_info.major == 2:
        import old
    else:
        import new
Or worse, people used try/except in their imports.
reply
jmspring
11 hours ago
[-]
still GIL
reply
marcyb5st
3 hours ago
[-]
Opt-in starting from 3.15, or am I mistaken?

Anyway you can already try freethreaded builds that have the GIL disabled, but my experience is that most of your dependencies won't work.

reply
gjvc
15 hours ago
[-]
yes. it was not a massive shift. it was barely worth the effort.
reply
pansa2
15 hours ago
[-]
The Python devs didn’t want to make huge changes because they were worried Python 3 would end up taking forever like Perl 6. Instead they went to the other extreme and broke everyone’s code for trivial reasons and minimal benefit, which meant no-one wanted to upgrade.

Even the main driver for Python 3, the bytes-Unicode split, has unfortunately turned out to be sub-optimal. Python essentially bet on UTF-32 (with space-saving optimisations), while everyone else has chosen UTF-8.

reply
diziet_sma
14 hours ago
[-]
> Python essentially bet on UTF-32 (with space-saving optimisations)

How so? Python3 strings are unicode and all the encoding/decoding functions default to utf-8. In practice this means all the python I write is utf-8 compatible unicode and I don't ever have to think about it.

reply
sheept
13 hours ago
[-]
UTF-32 allows for constant time character accesses, which means that mystr[i] isn't O(n). Most other languages can only provide constant time access for code units.
reply
msl
6 hours ago
[-]
UTF-32 allows for constant time access to code points. Neither UTF-8 nor UTF-16 can do the same (there are 2 to the power of 20 valid code points, though not all are in use).

While most characters might be encodable as a single code point, Python does not normalize strings, so there is no guarantee that even relatively normal characters are actually stored as single code points.

Try this in Python:

  s = "a\u0308"
  print(s)
  print(s[0])
You will see:

  ä
  a
reply
pansa2
13 hours ago
[-]
> all the encoding/decoding functions default to utf-8

Languages that use UTF-8 natively don't need those functions at all. And the ones in Python aren't trivial - see, for example, `surrogateescape`.

As the sibling comment says, the only benefit of all this encoding/decoding is that it allows strings to support constant-time indexing of code points, which isn't something that's commonly needed.

reply
laurencerowe
13 hours ago
[-]
They absolutely do because random byte strings are not valid utf8. Safe Rust requires validating bytes when converting to strings because this.
reply
cloudbonsai
10 hours ago
[-]
Internally Python holds a string as an array of uint32. A utf-8 representation is created on demand from it (and cached). So pansa2 is basically correct [^1].

IMO, while this may not be optimal, it's far better than the more arcane choice made by other systems. For example, due to reasons only Microsoft can understand, Windows is stuck with UTF-16.

[1] Actually it's more intelligent. For example, Python automatically uses uint8 instead of uint32 for ASCII strings.

reply
zahlman
7 hours ago
[-]
There is no caching of a "utf-8 representation". You may check for example:

  >>> x = '日本語'*100000000
  >>> import time
  >>> t = time.time(); y = x.encode(); time.time() - t # takes nontrivial time
  >>> t = time.time(); y = x.encode(); time.time() - t # not cached; not any faster
Generally, the only reason this would happen implicitly is for I/O; actual operations on the string operate directly on the internal representation.

Python uses either 8, 16 or 32 bits per character according to the maximum code point found in the string; uint8 is thus used for all strings representable in Latin-1, not just "ASCII". (It does have other optimizations for ASCII strings.)

The reason for Windows being stuck with UTF-16 is quite easy to understand: backwards compatibility. Those APIs were introduced before there supplementary Unicode planes, such that "UTF-16" could be equated with UCS-2; then the surrogate-pair logic was bolted on top of that. Basically the same thing that happened in Java.

reply
cloudbonsai
4 hours ago
[-]
> There is no caching of a "utf-8 representation".

No there certainly is. This is documented in the official API documentation:

    UTF-8 representation is created on demand and cached in the Unicode object.

    https://docs.python.org/3/c-api/unicode.html#unicode-objects
In particular, Python's Unicode object (PyUnicodeObject) contains a field named utf8. This field is populated when PyUnicode_AsUTF8AndSize() is first called and reused thereafter. You can check the exact code I'm talking about here:

https://github.com/python/cpython/blob/main/Objects/unicodeo...

Is it clear enough?

reply
nslsm
9 hours ago
[-]
reply
zahlman
7 hours ago
[-]
> Python essentially bet on UTF-32 (with space-saving optimisations), while everyone else has chosen UTF-8.

It did nothing of the sort. UTF-8 is the default source file encoding and has been the target for many APIs. It likely would have been the default for all I/O stuff if we lived in a world where Windows had functioning Unicode in the terminal the whole time and didn't base all its internal APIs on UTF-16.

I assume you're referring to the internal representation of strings. Describing it as "UTF-32 with space-saving optimizations" is missing the point, and also a contradiction in terms. Yes, it is a system that uses the same number of bytes per character within a given string (and chooses that width according to the string contents). This makes random access possible. Doing anything else would have broken historical expectations about string slicing. There are good arguments that one shouldn't write code like that anyway, but it's hard to identify anything "sub-optimal" about the result except that strings like "I'm learning 日本語" use more memory than they might be able to get away with. (But there are other strings, like "ℍℯℓ℗", that can use a 2-byte width while the UTF-8 encoding would add 3 bytes per character.)

reply
rjh29
14 hours ago
[-]
Ironically Perl 5 managed to do the bytes-Unicode split with a feature gate, no giant major version change.
reply
gjvc
9 hours ago
[-]
this must be right, i'm getting downvoted
reply
zahlman
7 hours ago
[-]
Please don't do this.
reply
boxed
8 hours ago
[-]
It's wrong. Python3 eliminated mountains of annoying bugs that happened all over the code base because of mixing of unicode strings and byte strings. Python2 was an absolute mess.
reply
rslashuser
15 hours ago
[-]
I'm curious is the JIT developers could mention any Python features that prevent promising JIT features. An earlier Ken Jin blog [1], mentions how __del__ complicates reference counting optimization.

There is a story that Python is harder to optimize than, say, Typescript, with Python flexibility and the C API getting mentioned. Maybe, if the list of troublesome Python features was out there, programmers could know to avoid those features with the promise of activating the JIT when it can prove the feature is not in use. This could provide a way out of the current Python hard-to-JIT trap. It's just a gist of an idea, but certainly an interesting first step would be to hear from the JIT people which Python features they find troublesome.

[1] https://fidget-spinner.github.io/posts/faster-jit-plan.html

reply
rtpg
14 hours ago
[-]
It's interesting you mention __del__ because Javascript not only doesn't have destructors but for security reasons (that are above my pay grade) but the spec _explicitly prohibits_ implementations from allowing visibility into garbage collection state, meaning that code cannot have any visibility into deallocations.

I think __del__ is tricky though. In theory __del__ is not meant to be reliable. In practice CPython reliably calls it cuz it reference counts. So people know about it and use it (though I've only really seen it used for best effort cleanup checks)

In a world where more people were using PyPy we could have pressure from that perspective to avoid leaning into it. And that would also generate more pressure to implement code that is performant in "any" system.

reply
cpgxiii
13 hours ago
[-]
> In practice CPython reliably calls it cuz it reference counts ... In a world where more people were using PyPy we could have pressure from that perspective to avoid leaning into it

A big part of the problem is that much of the power of the Python ecosystem comes specifically from extensions/bindings written in languages with manual (C) or RAII/ref-counted (C++, Rust) memory management, and having predictable Python-level cleanup behavior can be pretty necessary to making cleanup behavior in bound C/C++/Rust objects work. Breaking this behavior or causing too much of a performance hit is basically a non-starter for a lot of Python users, even if doing so would improve the performance of "pure" Python programs.

reply
mattip
9 hours ago
[-]
That cleanup can be explicit when needed by using context managers. Mixing resource handling with object lifetime is a bad design choice
reply
1718627440
1 hour ago
[-]
Tell that to the C++ guys...
reply
nvme0n1p1
14 hours ago
[-]
> code cannot have any visibility into deallocations

Doesn't FinalizationRegistry let you do exactly that?

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

reply
alpinisme
13 hours ago
[-]
That link itself calls out that conformant implementations can’t be relied on to call callbacks.

> A conforming JavaScript implementation, even one that does garbage collection, is not required to call cleanup callbacks. When and whether it does so is entirely down to the implementation of the JavaScript engine. When a registered object is reclaimed, any cleanup callbacks for it may be called then, or some time later, or not at all. It's likely that major implementations will call cleanup callbacks at some point during execution, but those calls may be substantially after the related object was reclaimed. Furthermore, if there is an object registered in two registries, there is no guarantee that the two callbacks are called next to each other — one may be called and the other never called, or the other may be called much later. There are also situations where even implementations that normally call cleanup callbacks are unlikely to call them:

reply
bastawhiz
13 hours ago
[-]
It's supported in all of the major engines. And you also can't rely on the garbage collector to run at a predictable time (or at all!), so the engine never calling finalizers is functionally the same as the garbage collector being unusual.
reply
sfink
8 hours ago
[-]
The only (other) visible effect of GC not running is memory exhaustion. WeakRef/FinalizationGroup not getting triggered can have lots of script-visible effects, so can be much much worse. I wouldn't describe that as "functionally the same".
reply
rtpg
13 hours ago
[-]
Oh! While this one does mention that you don't have visibility, this + weak refs seem to change the game

I remember a couple of years ago (well probably around 2021) reading about GC exposure concerns and seeing some line in some TC39 doc like "users should not have visibility into collection" but if we've shipped weakrefs sounds like we're not thinking about that anymore

reply
sfink
8 hours ago
[-]
We still try to limit any additional exposure as much as possible, and WR/FG are specced to keep the visibility as coarse as possible. (Collections won't be visible until the current script execution finishes, though async adds a lot more places where that can happen.)

A proposal to add new ways of observing garbage collection will still be shot down immediately without a damn good justification.

reply
jonathanlydall
9 hours ago
[-]
> meaning that code cannot have any visibility into deallocations.

This is more pedantry than a serious question. JavaScript has WeakReference, sure it'd be cumbersome and inefficient because you'd need to manually make and poll each thing you wanted to observe, but could it not be said that it does provide a view on deallocations?

reply
sfink
8 hours ago
[-]
Yes, WeakRef and FinalizationGroup both make GC visible (the latter removes the need to poll in your example). So not pedantic at all. They were eventually added after much reluctance from the language designers and implementers, partly because they can lead to code being broken by (valid & correct) engine optimizations, which is a big no-no on the web. But some things simply cannot be implemented without them.

Note that 90% of the uses for them actually shouldn't be using them, usually for subtle reasons. It's always a big cause for debate.

reply
kstrauser
12 hours ago
[-]
Huh, I could imagine that as a set of Ruff rules:

> Using str.frobnicate prevents TurboJit on line 63

reply
adgjlsfhk1
14 hours ago
[-]
The biggest thing is BigInt by default. It makes every integer operation require an overflow check.
reply
ridiculous_fish
12 hours ago
[-]
JS (when using ints, which v8 does) is the same in this respect.
reply
vanderZwan
15 hours ago
[-]
> However, I misunderstood and came up with an even more extreme version: instead of tracing versions of normal instructions, I had only one instruction responsible for tracing, and all instructions in the second table point to that. Yes I know this part is confusing, I’ll hopefully try to explain better one day. This turned out to be a really really good choice. I found that the initial dual table approach was so much slower due to a doubling of the size of the interpreter, causing huge compiled code bloat, and naturally a slowdown.

> By using only a single instruction and two tables, we only increase the interpreter by a size of 1 instruction, and also keep the base interpreter ultra fast. I affectionally call this mechanism dual dispatch.

I really do hope they'll write that better explanation one day because this sounds pretty intriguing all on its own.

reply
oystersareyum
18 hours ago
[-]
> We don’t have proper free-threading support yet, but we’re aiming for that in 3.15/3.16. The JIT is now back on track.

I recently read an interview about implementing free-threading and getting modifications through the ecosystem to really enable it: https://alexalejandre.com/programming/interview-with-ngoldba...

The guy said he hopes the free-threaded build'll be the only one in "3.16 or 3.17", I wonder if that should apply to the JIT too or how the JIT and interpreter interact.

reply
zarzavat
16 hours ago
[-]
I continue to believe that free-threading hurts performance more than it helps and Python should abandon it.

Having to have thread safe code all over the place just for the 1% of users who need to have multi-threading in Python and can't use subinterpreters for some reason is nuts.

reply
cpgxiii
13 hours ago
[-]
> Having to have thread safe code all over the place just for the 1% of users who need to have multi-threading in Python and can't use subinterpreters for some reason is nuts.

Way more than 1% of the community, particularly of the community actively developing Python, wants free-threaded. The problem here is that the Python community consists of several different groups:

1. Basically pure Python code with no threading

2. Basically pure Python with appropriate thread safety

3. Basically pure Python code with already broken threaded code, just getting lucky for now

4. Mixed Python and C/C++/Rust code, with appropriate threading behavior in the C or C++ components

5. Mixed Python and C or C++ code, with C and C++ components depending on GIL behavior

Group 1 gets a slightly reduced performance. Groups 2 and 4 get a major win with free-threaded Python, being able to use threading through their interfaces to C/C++/Rust components. Group 3 is already writing buggy code and will probably see worse consequences from their existing bugs. Group 5 will have to either avoid threading in their Python code or rewrite their C/C++ components.

Right now, a big portion of the Python language developer base consists of Groups 2 and 4. Group 5 is basically perceived as holding Python-the-language and Python-the-implementations back.

reply
zarzavat
6 hours ago
[-]
Where is the major win? Sorry but I just don't see the use case for free-threading.

Native code can already be multi-threaded so if you are using Python to drive parallelized native code, there's no win there. If your Python code is the bottleneck, well then you could have subinterpreters with shared buffers and locks. If you really need to have shared objects, do you actually need to mutate them from multiple interpreters? If not, what about exploring language support for frozen objects or proxies?

The only thing that free threading gives you is concurrent mutations to Python objects, which is like, whatever. In all my years of writing Python I have never once found myself thinking "I wish I could mutate the same object from two different threads".

reply
reinhash
3 hours ago
[-]
I also wonder how many people actually need free-threading. And I wonder how useful it will be, when you can already use the ABI to call multi-threaded code.

I think the GIL provides python with a great guarantee, I would probably prefer single-thread performance improvements over multithreading in python to be honest.

Anyway if I need performance, Python would probably not be my first choice

reply
pansa2
15 hours ago
[-]
Maybe they could have two versions of the interpreter, one that’s thread-safe and one that’s optimised for single-threading?

Microsoft used to do this for their C runtime library.

reply
chuckadams
14 hours ago
[-]
PHP does this as well. Most distributions ship PHP without thread safety, but it's seeing more use now that FrankenPHP uses it. Speaking of which, it would be nice if PHP's JIT got a little love: it's never eked out more than marginal gains in heavily-numeric code.
reply
veber-alex
15 hours ago
[-]
That's exactly what we have now and it looks like the python devs want a single unified build at some point
reply
kzrdude
16 hours ago
[-]
I don't want to go too heavy on the negatives, but what's nuts is Python going for trust-the-programmer style multithreading. The risk is that extension modules could cause a lot of crashes.
reply
gwking
12 hours ago
[-]
My understanding is that many extension modules are already written to take advantage of multithreading by releasing the GIL when calling into C code. This allows true concurrency in the extension, and also invites all the hazards of multithreading. I wonder how many bugs will be uncovered in such extensions by the free threaded builds, but it seems like the “nuts” choice actually happened a long time ago.
reply
zadikian
9 hours ago
[-]
Pure Python code always needed mutexes for thread safety with or without ol' GIL. I thought the difficulty with removing the GIL instead had to do with C extensions that rely on it.
reply
pjmlp
6 hours ago
[-]
Great to see this going, Python also deserves a JIT, and given that only few bother with PyPy or GraalPy, shipping into the CPYthon is the only way to have less "rewrite into XYZ".

Kudos to those involved into making it happen.

reply
a3w
1 hour ago
[-]
Over 100% speedup sound like "the code compiled before you asked the compiler to start working".

`from future import time_travel`

reply
quietbritishjim
43 minutes ago
[-]
If the speed of a car increases by 100% does that mean that it arrives at its destination before it left? No, it just means it took 50% of the time it would have otherwise.

But I do agree that it would be a bit clearer to talk in terms of time taken rather than speedup % i.e. instead of "20% slowdown to over 100% speedup" it's clearer to say "takes between 50% and 125% of the original time". (Especially since people very often say things like "3 times faster", which technically means 4 times as fast, when they should say "3 times as fast"; "takes 1/3 of the time" is unambiguous.)

reply
ekjhgkejhgk
18 hours ago
[-]
Doesn't PyPy already have a jit compiler? Why aren't we using that?
reply
olivia-banks
17 hours ago
[-]
As far as I know, PyPy doesn't support all CPython extensions, so pure Python code will probably (very likely) run fine but for other things most bets are off. I believe PyPy also only supports up to 3.11?
reply
hrmtst93837
16 hours ago
[-]
PyPy isn't CPython.

A lot of Python code still leans on CPython internals, C extensions, debuggers, or odd platform behavior, so PyPy works until some dependency or tool turns that gap into a support problem.

The JIT helps on hot loops, but for mixed workloads the warmup cost and compatibility tax are enough to keep most teams on the interpreter their deps target first.

reply
contravariant
17 hours ago
[-]
Why shouldn't the reference implementation get JIT? Just because some other implementations already have it is no reason not to. That'd be like skipping list comprehensions because they already exist in CPython.
reply
3laspa
17 hours ago
[-]
Because the same people who made a big deal about supporting PyPy and PEP 399 when it was fashionable to do so are now told by their corporations that PyPy does not matter. CPython only moves with what is currently fashionable, employer mandated and profitable.
reply
cpburns2009
17 hours ago
[-]
PyPy is limited to maintenance mode due to a lack of funding/contributors. In the past, I think a few contributors or funding is what helped push "minor" PyPy versions. It's too bad PyPy couldn't take the federal funding the PSF threw away.
reply
philipallstar
3 hours ago
[-]
> It's too bad PyPy couldn't take the federal funding the PSF threw away.

The PSF is primarily a political advocacy organisation, so it wouldn't make sense for them to use the money for Python.

reply
JoshTriplett
17 hours ago
[-]
Because PyPy seems to be defunct. It hasn't updated for quite a while.

See https://github.com/numpy/numpy/issues/30416 for example. It's not being updated for compatibility with new versions of Python.

reply
mkl
17 hours ago
[-]
reply
qy-mj
3 hours ago
[-]
What performance improvements will I see if I upgrade from version 3.10 to 15?
reply
ghm2199
15 hours ago
[-]
Thanks for all the amazing work! I have Noob question. Wouldn't this get the funding back? Or would that not be preferable way to continue(as opposed to just volunteer driven)?

Like this is a big deal to get a project to a state where volunteers are spun up and actively breaking tasks and getting work done, no? It's a python JIT something I know next to nothing about — as do most application developers — which tells one how difficult this must have been.

reply
pansa2
14 hours ago
[-]
> Wouldn't this get the funding back?

The funding was Microsoft employing most of the team. They were laid off (or at least, moved onto different projects), apparently because they weren't working on AI.

reply
kelvinjps
13 hours ago
[-]
With Python being the main language for AI, isn't like more important to be more performant? I kinda don't get Microsoft reasoning, maybe they're just tight in money
reply
brianwawok
12 hours ago
[-]
I don’t think Python is the main language of AI.
reply
eru
11 hours ago
[-]
Python is pretty big as glue in the AI ecosystem as far as I can tell. It also seems to be most agent's 'preferred' language to write code in, when you don't specify anything.

(The latter is probably more to do with the preferences they give it in the re-inforcement learning phase than anything technical, though.)

reply
Ralfp
14 hours ago
[-]
It looks like ARM picked up plenty of those folk and pays them to continue this work.
reply
thunky
15 hours ago
[-]
I always wanted this for Python but now that machines write code instead of humans I feel like languages like Python will not be needed as much anymore. They're made for humans, not machines. If a machine is going to do the dirty work I want it to produce something lean, fast, and strictly verified.
reply
bigstrat2003
7 hours ago
[-]
> now that machines write code instead of humans

That is not remotely the case for anyone who produces quality work.

reply
thunky
2 hours ago
[-]
Look again.

If you care about quality you absolutely can guide a machine to produce that for you without writing a single line of code yourself.

And I expect the amount of guidance needed will continue to drop.

reply
zahlman
7 hours ago
[-]
We got daguerrotypes, and then photographic film, and then digital cameras, along with image editing software, and now AI image generation systems; yet there are still people who go out and apply oil paints to a canvas with natural hair brushes. I'm not willing to lose that.
reply
JodieBenitez
14 hours ago
[-]
Pretty much my thoughts the other day... now that Codex does the writing, maybe I can finally switch to Go for the web backend stuff without being annoyed by some of its archaisms and gain significant execution performance, while still having a relatively easy to read language.
reply
kccqzy
14 hours ago
[-]
You ask a machine to write your code and you still care about being easy to read?

In my experience the people who care the most about code readability tend to be the people most opinionated on having the right abstractions, which are historically not available in Go.

reply
thunky
14 hours ago
[-]
I don't think people mind reading Go as much as they mind writing it.
reply
kccqzy
13 hours ago
[-]
Nah all the `if err != nil` is just so much noise they obscures the real logic. And for the longest time it didn’t have generics to write map/filter/reduce on slices, forcing people to use loops where the intention is less clear.
reply
maleldil
11 hours ago
[-]
Ideally, the errors shouldn't be returned as-is, but wrapped with context instead. If that context doesn't matter for you, you can have your editor wrap the if instead, which helps a lot.
reply
brianwawok
12 hours ago
[-]
I have shifted as much as I can python to go when I don’t code. It’s just faster and the compiler catches more errors, win win,
reply
ddorian43
6 hours ago
[-]
AI, write me that sqlalchemy clone in <lang>
reply
ecshafer
17 hours ago
[-]
What is wrong with the Python code base that makes this so much harder to implement than seemingly all other code bases? Ruby, PHP, JS. They all seemed to add JITs in significantly less time. A Python JIT has been asked for for like 2 decades at this point.
reply
0cf8612b2e1e
17 hours ago
[-]
The Python C api leaks its guts. Too much of the internal representation was made available for extensions and now basically any change would be guaranteed to break backwards compatibility with something.
reply
patmorgan23
16 hours ago
[-]
Ooo this makes sense it's like if the Linux had don't break users space AND a whole bunch of other purely internal APIs you also can't refactor.
reply
echelon
16 hours ago
[-]
It's a shame that Python 2->3 transition was so painful, because Python could use a few more clean breaks with the past.

This would be a potential case for a new major version number.

reply
froobius
16 hours ago
[-]
On the other hand, taking backwards compatibility so seriously is a big part of the massive success of Python
reply
__mharrison__
16 hours ago
[-]
I would argue that the libraries, and specifically NumPy, are the reason Python is still in the picture today.

It will be interesting to see, moving forward, what languages survive. A 15% perf increase seems nice, until you realize that you get a 10x increase porting to Rust (and the AI does it for you).

Maybe library use/popularity is somewhat related to backwards compatibility.

Disclaimer: I teach Python for a living.

reply
kelvinjps
12 hours ago
[-]
Python it's a language that really good libraries for different domains. like web: django/flask AI numpy pytorch and more. All the ecosystem for scripting and being already installed in most linux distros and on macs. For GUI it has really good bindings for the major frameworks QT,GTK.
reply
punnerud
15 hours ago
[-]
And PyTorch, and Pandas, and, and…
reply
__mharrison__
15 hours ago
[-]
Built and or inspired by NumPy...
reply
B1FF_PSUVM
15 hours ago
[-]
> you get a 10x increase porting to Rust (and the AI does it for you)

So, you keep reading/writing Python and push a button to get binary executables through whatever hoops are best today ?

(I haven't seen the "fits your brain" tagline in the recent past ...)

reply
pansa2
16 hours ago
[-]
>> Python 2->3 transition

> taking backwards compatibility so seriously

Python’s backward compatibility story still isn’t great compared to things like the Go 1.x compatibility promise, and languages with formal specs like JS and C.

The Python devs still make breaking changes, they’ve just learned not to update the major version number when they do so.

reply
BarryMilo
15 hours ago
[-]
Indeed, Python's version format is semver but it's just aesthetics, they remove stuff in most (every?) minor version. Just yesterday I wasted hours trying to figure out a bug before realizing my colleague hadn't read the patch notes.
reply
kccqzy
15 hours ago
[-]
Python does not take backwards compatibility seriously. 2 to 3 is a big compatibility break. But things like `map(None, seq1, seq2)` also broke; such deliberate compatibility break is motivated by no more than aesthetic purity.
reply
IshKebab
16 hours ago
[-]
Python does not take backwards compatibility very seriously at all. Take a look at all the deprecated APIs.

I would say it's probably worth it to clean up all the junk that Python has accumulated... But it's definitely not very high up the list of languages in terms of backwards compatibility. In fact I'm struggling to think of other languages that are worse. Typescript probably? Certainly Go, C++ and Rust are significantly better.

reply
hardwaregeek
16 hours ago
[-]
For what it’s worth Ruby’s JIT took several different implementations, definitely struggled with Rails compatibility and literally used some people’s PhD research. It wasn’t a trivial affair
reply
fleetfox
4 hours ago
[-]
I can't really talk about Ruby. But PHP is much more static and surface of things you have to care about at runtime is like magnitude smaller and there already was opache as a starting point. And speaking of something like JIT in V8 is of the most sophisticated and complicated ever built. There hasn't been near enough man hours and funding to cpython to make it fair comparison
reply
stmw
17 hours ago
[-]
Some languages are much harder to compile well to machine code. Some big factors (for any languages) are things like: lack of static types and high "type uncertainty", other dynamic language features, established inefficient extension interfaces that have to be maintained, unusual threading models...
reply
RussianCow
17 hours ago
[-]
That makes sense if you're comparing with Java or C#, but not Ruby, which is way more dynamic than Python.

The more likely reason is that there simply hasn't been that big a push for it. Ruby was dog slow before the JIT and Rails was very popular, so there was a lot of demand and room for improvement. PHP was the primary language used by Facebook for a long time, and they had deep pockets. JS powers the web, so there's a huge incentive for companies like Google to make it faster. Python never really had that same level of investment, at least from a performance standpoint.

To your point, though, the C API has made certain types of optimizations extremely difficult, as the PyPy team has figured out.

reply
vlovich123
16 hours ago
[-]
Google, Dropbox, and Microsoft from what I can recall all tried to make Python fast so I don’t buy the “hasn’t seen a huge amount of investment”. For a long time Guido was opposed to any changes and that ossified the ecosystem.

But the main problem was actually that pypy was never adopted as “the JIT” mechanism. That would have made a huge difference a long time ago and made sure they evolved in lock step.

reply
int_19h
15 hours ago
[-]
Microsoft is the one the TFA refers to cryptically when it says "the Faster CPython team lost its main sponsor in 2025".

AFAIK it was not driven by anything on the tech side. It was simply unlucky timing, the project getting in the middle of Microsoft's heavy handed push to cut everything. So much so that the people who were hired by MS to work on this found out they were laid off in a middle of a conference where they were giving talks on it.

reply
flykespice
16 hours ago
[-]
> Python never really had that same level of investment, at least from a performance standpoint.

Or lack of incentive?

Alot of big python projects that does machine learning and data processing offloads the heavy data processing from pure python code to libraries like numpy and pandas that take advantage of C api binding to do native execution.

reply
simonask
16 hours ago
[-]
The simplest JIT just generates the machine code instructions that the interpreter loop would execute anyway. It’s not an extremely difficult thing, but it also doesn’t give you much benefit.

A worthwhile JIT is a fully optimizing compiler, and that is the hard part. Language semantics are much less important - dynamic languages aren’t particularly harder here, but the performance roof is obviously just much lower.

reply
kelvinjps
12 hours ago
[-]
I think that it's just that python people took the problem different, they made working with c and other languages better, and just made bindings for python and offloaded the performant code to these libraries. Ex: numpy
reply
fridder
16 hours ago
[-]
For better or for worse they have been very consistent throughout the years that they don't want want to degrade existing performance. It is why the GIL existed for so long
reply
bawolff
16 hours ago
[-]
I thought php hasn't shipped jit yet (as in its behind a disabled by default config)
reply
SahAssar
16 hours ago
[-]
PHP 8 shipped with JIT on by default unless I'm mistaken.
reply
bawolff
11 hours ago
[-]
https://www.php.net/manual/en/opcache.configuration.php says its off by default as of php 8.4 (and prior to that it was technically on but effectively off due to other configs)
reply
brokencode
17 hours ago
[-]
Are you forgetting about PyPy, which has existed for almost 2 decades at this point?
reply
RussianCow
17 hours ago
[-]
That's a completely separate codebase that purposefully breaks backwards compatibility in specific areas to achieve their goals. That's not the same as having a first-class JIT in CPython, the actual Python implementation that ~everyone uses.
reply
brokencode
16 hours ago
[-]
Definitely agree that it’s better to have JIT in the mainline Python, but it’s not like there weren’t options if you needed higher performance before.

Including simply implementing the slow parts in C, such as the high performance machine learning ecosystem that exists in Python.

reply
wat10000
17 hours ago
[-]
PHP and JS had huge tech companies pouring resources into making them fast.
reply
g947o
16 hours ago
[-]
Money.
reply
fluidcruft
17 hours ago
[-]
(what are blueberry, ripley, jones and prometheus?)
reply
mkl
17 hours ago
[-]
Yes, the graphs are incomprehensible because those are not defined in the article. They turn out to be different physical machines with different architectures: https://doesjitgobrrr.com/about

  blueberry (aarch64)
  Description: Raspberry Pi 5, 8GB RAM, 256GB SSD
  OS: Debian GNU/Linux 12 (bookworm)
  Owner: Savannah Ostrowski

  ripley (x86_64)
  Description: Intel i5-8400 @ 2.80GHz, 8GB RAM, 500GB SSD
  OS: Ubuntu 24.04
  Owner: Savannah Ostrowski

  jones (aarch64)
  Description: Apple M3 Pro, 18GB RAM, 512GB SSD
  OS: macOS
  Owner: Savannah Ostrowski

  prometheus (x86_64)
  Description: AMD Ryzen 5 3600X @ 3.80GHz, 16GB RAM
  OS: Windows 11 Pro
  Owner: Savannah Ostrowski
reply
max-m
17 hours ago
[-]
The names of the benchmark runners. https://doesjitgobrrr.com/about
reply
fluidcruft
17 hours ago
[-]
So the biggest gains so far are on Windows 11 Pro of (x86_64) ~20%? Is that because Windows was bad as a baseline (promethius)? It doesn't seem like the x86_64/Linux has improved as dramatically ~5% (ripley). I'm just surprised OS has that much of an effect that can be attributed to JIT vs other OS issues.
reply
raddan
17 hours ago
[-]
It's hard to say whether it's Windows related since the two x86_64 machines don't just run different OSes, they also have different processors, from different manufacturers. I don't know whether an AMD Ryzen 5 3600X versus Intel i5-8400 have dramatically different features, but unlike a generic static binary for x86_64, a JIT could in principle exploit features specific to a given manufacturer.
reply
nonameiguess
16 hours ago
[-]
The immediate question has been answered, but what about the names? The latter three are obvious references to the Alien universe, but what relationship does blueberry have to them?
reply
luhn
16 hours ago
[-]
I assume Blueberry is a nod to the machine being a Raspberry Pi.
reply
killingtime74
17 hours ago
[-]
Sorry but the graphs are completely unreadable. There are four code names for each of the lines. Which is jit and which is cpython?
reply
mkl
17 hours ago
[-]
They are all JIT on different architectures, measured relative to CPython. https://doesjitgobrrr.com/about: blueberry is aarch64 Raspberry Pi, ripley is x86_64 Intel, jones is aarch64 M3 Pro, prometheus is x86_64 AMD.
reply
killingtime74
10 hours ago
[-]
Thanks
reply