Pandas Exercises for Data Analysis (Interactive)
55 points
4 days ago
| 5 comments
| machinelearningplus.com
| HN
0x696C6961
1 hour ago
[-]
Would be nice to have a polars version of this.
reply
rithdmc
1 hour ago
[-]
Dope. I've just started using Pandas in some personal projects, and am quickly hitting my knowledge ceiling. I think this will be useful. I'll check it out properly after work.
reply
derriz
1 hour ago
[-]
If I were investing effort into acquiring knowledge in this domain, I'd skip straight to Polars. Before I made the switch, I had been using Pandas on and off for more than a decade. I'm not sure how representative this is, but most of the people I know who were Pandas users have also made this switch. I initially did it for the performance improvements but the API (according to my subjective opinion) is much more logical and has far fewer surprises compared to Pandas and it would be my default choice for this reason alone at this stage despite my years of Pandas experience.
reply
benrutter
31 minutes ago
[-]
I'd second this, especially if its just for personal use!

The data world owes a lot to pandas, but it has plenty of sharp edges and using it can sometimes involve pretty close knowledge of how things like indexing/slicing/etc work under the hood.

If I get stuck in polars, its almost always just a "what's the name of the function to use?" type problem rather than needing lots of knowledge about how things are working under the hood.

reply
rithdmc
1 hour ago
[-]
Thanks, I'll look into this in the future. I don't need the most performant script, but this could change.
reply
ertgbnm
57 minutes ago
[-]
It's less about performance and more about ecosystem lockin. It's a bit like imperial vs metric units. Why would you ever chose to learn imperial if you had the option to only ever use metric to begin with?
reply
rithdmc
48 minutes ago
[-]
Because these are silly personal scripts. I'm not going to make sensible architectural decisions on something I run every now and then on my laptop. That's optimising too early.
reply
short_sells_poo
26 minutes ago
[-]
For short scripts and interactive research work, pandas is still much better than polars. Polars works well when you know what you want.

When you are still figuring out things step by step, pandas does a lot of heavy lifting for you so you don't have to think about it.

E.g. I don't have to think about timeseries alignment, pandas handles that for me implicitly because dataframes can be indexed by timestamps. Polars has timeseries support, but I need to write a paragraph of extra code to deal with it.

reply
data-ottawa
1 hour ago
[-]
You should check out the Modern Pandas series by Tom Augspurger, it’s well worth reading to get clean modern style code.

https://tomaugspurger.net/posts/modern-1-intro/

reply
rithdmc
1 hour ago
[-]
Thanks. There's a special place in my heart for any blog that opens with 'Prior Work' :)
reply
pixelispoint
1 hour ago
[-]
I second this blog post. I worked with Tom on a project several years ago and he's brilliant. Started doing python more frequently after that project and I found his blog to be very helpful in finding a good way to conceptualize pandas and python data structures in general.
reply
selva86
4 days ago
[-]
Build this as an interactive tool for our popular 101 Pandas exercises. The code runs entirely in local in your browser. Would love feedback on the ease of use and the editor UX.
reply
alexpotato
1 hour ago
[-]
These are great!

Would have made my life a lot easier when I was learning Pandas.

Would also be cool to have a Polars version of this too.

One suggestion:

A lot of folks come to Pandas from using SQL. It might be handy to have a couple "The equivalent of this SQL statement but in Pandas"

reply
sghaz
1 hour ago
[-]
The pricing page says, "This page doesn’t seem to exist. It looks like the link pointing here was faulty. Maybe try searching?"
reply
fud101
1 hour ago
[-]
what is the permission it asks for? it seems suspicious af.
reply