FilterHN

Pandas 3.0

123 points

by jonbaer

4 days ago

| past

| 5 comments

| pandas.pydata.org

| HN

▲

edschofield

1 hour ago

[-]

The design of Pandas is inferior in every way to Polars: API, memory use, speed, expressiveness. Pandas has been strictly worse since late 2023 and will never close the gap. Polars is multithreaded by default, written in a low-level language, has a powerful query engine, supports lazy, out-of memory execution, and isn’t constrained by any compatibility concerns with a warty, eager-only API and pre-Arrow data types that aren’t nullable.

It’s probably not worth incurring the pain of a compatibility-breaking Pandas upgrade. Switch to Polars instead for new projects and you won’t look back.

▲

rich_sasha

1 hour ago

[-]

I almost fully agree. I would add that Pandas API is poorly thought through and full of footguns.

Where I certainly disagree is the "frame as a dict of time series" setting, and general time series analysis.

The feel is also different. Pandas is an interactive data analysis container, poorly suited for production use. Polars I feel is the other way round.

▲

sirfz

39 minutes ago

[-]

I think that's a sane take. Indeed, I think most data analysts find it much easier to use pandas over polars when playing with data (mainly the bracket syntax is faster and mostly sensible)

▲

v3ss0n

1 hour ago

[-]

Sounds too much like an advertisement. Also we need to watch out when diving into Polars . Polars is VC backed Opensource project with cloud offering , which may become an opencore project - we know how those goes.

▲

gkbrk

1 hour ago

[-]

> we know how those go

They get forked and stay open source? At least this is what happens to all the popular ones. You can't really un-open-source a project if users want to keep it open-source.

▲

stingraycharles

1 hour ago

[-]

Depends on your definition of popular; plenty of examples where the business interests don't align well with open source.

▲

jtrueb

45 seconds ago

[-]

That timestamp resolution discrepancy is going to cause so many problems

▲

postalcoder

2 hours ago

[-]

I've migrated off of pandas to polars for my workflows to reap the benefit of, in my experience a 10-20x speedup on average. I can't imagine anything bringing me back short of a performance miracle. LLMs have made syntax almost a non-barrier.

▲

lvl155

1 hour ago

[-]

Went from pandas to polars to duckdb. As mentioned elsewhere SQL is the most readable for me and LLM does most of the coding on my end (quant). So I need it at the most readable and rudimentary/step-wise level.

OT, but I can’t imagine data science being a job category for too long. It’s got to be one of the first to go in AI age especially since the market is so saturated with mediocre talents.

▲

thibaut_barrere

1 hour ago

[-]

Polars being so fast, and embeddable into other languages, has made it a no brainer for me to adopt it.

I have integrated Explorer https://github.com/elixir-explorer/explorer, which leverages it, into many Elixir apps, so happy to have this.

▲

howling

2 hours ago

[-]

Same. I don't even use LLM normally as I found polars' syntax to be very intuitive. I just searched my ChatGPT history and the only times I used it are when I'm dealing with list and struct columns that were not in pandas.

▲

postalcoder

2 hours ago

[-]

iirc part of pandas’ popularity was that it modeled some of R’s ergonomics. What a time in history, when such things mattered! (To be clear, I’m not making fun of pandas. It was the bridge I crossed that moved me from living in Excel to living in code.)

▲

mritchie712

2 hours ago

[-]

also migrated, but to duckdb.

It's funny to look back at the tricks that were needed to get gpt3 and 3.5 to write SQL (e.g. "you are a data analyst looking at a SQL database with table [tables]"). It's almost effortless now.

▲

gHA5

2 hours ago

[-]

Do you not experience LLM generated code constantly trying to use Pandas' methods/syntax for Polars objects?

▲

edschofield

2 hours ago

[-]

Yes, ChatGPT 5.2 Pro absolutely still does this. Just ask it for a pivot table using Polars and it will probably spit out code with Pandas arguments that doesn’t work.

▲

postalcoder

2 hours ago

[-]

There were some growing pains in gpt-3.5 to gpt-4 era, but not nowadays (shoutout to the now-defunct Phind, which was a game changer back then).

▲

crimsoneer

2 hours ago

[-]

The fact they pivoted away from their very compelling core offering (AI stack overflow) to complete with loveable etc in the "AI generated apps" giant fight continues to baffle me. Though I guess model updates ate their lunch.

▲

postalcoder

1 hour ago

[-]

My guess is that their pivot came after distress, and was not the cause of it. It'd be great to have @rushingcreek write a post-mortem. I think it'd benefit a lot of people because I honestly don't have a monday morning playbook of what could have saved them.

Like you said, perhaps the demise of phind was inevitable, with large models displacing them kind of like how Spotify displaced music piracy.

▲

alex7o

2 hours ago

[-]

Same, also polars works on typescript which I used at some point out move my data from backend to frontend

▲

OutOfHere

2 hours ago

[-]

The speedup you claim is going to be contingent on how you use Pandas, with which data types, and which version of Pandas.

▲

optimalsolver

2 hours ago

[-]

How soon will the leading LLMs ingest the updated documentation? Because I'm certainly not going to.

▲

uncletoxa

2 hours ago

[-]

Use context7 mcp. It'll do the trick

▲

OutOfHere

2 hours ago

[-]

In my experience, it would take a year to ingest it natively, and two years to also ingest enough coding examples.

▲

OutOfHere

2 hours ago

[-]

s/impactfull/impactful