A 100x speedup with unsafe Python
318 points
13 days ago
| 15 comments
| yosefk.com
| HN
Animats
11 days ago
[-]
Oh, array striding.

This is a classic bikeshedding issue. When Go and Rust were first being designed, I brought up support for multidimensional arrays. For both cases, that became lost in discussions over what features arrays should have. Subarrays, like slices but multidimensional? That's the main use case for striding. Flattening in more than one dimension? And some people want sparse arrays. Stuff like that. So the problem gets pushed off onto collection classes, there's no standard, and everybody using such arrays spends time doing format conversion. This is why FORTRAN and Matlab still have strong positions in number-crunching.

reply
yosefk
11 days ago
[-]
From everything I'm seeing, it follows that Matlab & Fortran have been decimated by Python and C/C++ around 1990s and 2010s, respectively. Of course I could be wrong; any evidence of their still strong position will be greatly appreciated, doubly so evidence that this position is due to stride issues.

(Of course Python provides wrappers to Fortran code, eg FITPACK, but this code specifically was mostly written in the 80s, with small updates in this century, and is probably used more thru the numpy wrappers, stride conversion issues and all, than thru its native Fortran interface)

reply
MobiusHorizons
11 days ago
[-]
I think pretty much all linear algebra libraries are still Fortran and are unlikely to ever be C because Fortran is legitimately faster than C for this stuff. I don't know if it has anything to do with strong opinions about how values are represented, I think it has more to do with lower overhead of function calling, but that is just repeating what someone told me about Fortran vs C in general, not necessarily applicable to BLAS libraries.

Fortran at least used to be very common in heavy scientific computing, but I would bet that relies on GPUs or other accelerators these days.

reply
Galanwe
11 days ago
[-]
> I think pretty much all linear algebra libraries are still Fortran and are unlikely to ever be C

That is more of an urban legend than reality in 2024. Fact is, although the original BLAS implementation was fortran, it has been at least a decade since every Linux distribution ships either OpenBLAS or ATLAS or the MKL, which are both written in a mix of C and assembly. All modern processor support is only available in these implementations.

LAPACK itself is still often built from the Fortran code base, but it's just glue over BLAS, and it gets performance solely from its underlying BLAS. Fortran doesn't bring any value to LAPACK, it's just that rewriting millions of algebraic tricks and heuristics for no performance gain is not enticing.

reply
yosefk
11 days ago
[-]
The question is how much new numeric code is written in Fortran vs C/C++. My guess is way below 10%, certainly if we measure by usage as opposed to LOC and I would guess by LOC as well.

Is Fortran legitimately faster than C with the restrict keyword? Regardless of the function call cost diffs between the two - meaning, even if we assume Fortran is somehow faster here - there's no way numeric production code does enough function calls for this to matter. If Fortran was faster than C at any point in time I can only imagine pointer aliasing issues to have been the culprit and I can't imagine it still being relevant, but I am of course ready to stand corrected.

reply
josefx
11 days ago
[-]
Restrict probably helps. I can't say much about fortran but C still has warts that can significantly impact its math library. For example almost every math function may set errno, that is a sideeffect that isn't easy to eliminate by the compiler and might bloat the code significantly. For example with gcc a single instruction sqrt turns into a sqrt instruction, followed by several checks to see if it succeeded, followed by a function call to the sqrt library function just to set errno correctly. I just started to disable math-errno completely once I realized that C allows several alternative ways to handle math errors, which basically makes any code relying on it non portable.
reply
eyegor
11 days ago
[-]
I don't think fortran is faster by language virtue, but it's certainly easier to scrap together high performance numeric fortran code. And ifort/aocc are amazing at making code that runs well on clusters, which is not a priority for any other domain. Fortran is absolutely still on top for modern numerics research code that involves clusters, talk to anyone who works in simulations at a national lab. If your code is mostly matrix math, modern fortran is very nice to work with.

Emphasis on modern because maintaining 70s-80s numeric mathematician cowboy code is a ninth circle of hell. It has a bad rep for that reason.

reply
semi-extrinsic
11 days ago
[-]
When you speak about code from the 1970s in particular, you need to appreciate the extremely limited language features that were available.

They did not even have while-loops in the language until Fortran 77. While-loop functionality used to be implemented using DO-loops mixed with GOTOs. You can't fault the people of that era for using the only tools that were available to them.

reply
ant6n
11 days ago
[-]
At the core, Fortran uses multidimensional numerical arrays, with the shapes being defined being defined in the code. So the compiler knows much more about the data that’s being operated on, which in theory allows better optimization.

I thought blas/lapack is still written in Fortran, so most numerical code would still be built on top of Fortran.

reply
StableAlkyne
11 days ago
[-]
> any evidence of their still strong position will be greatly appreciated

Fortran still dominates certain areas of scientific computing / HPC, primarily computational chemistry and CFD. - https://fortran-lang.org/packages/scientific - you don't hear about most of them because they're generally run on HPC centers by scientists in niche fields. But you do get their benefit if you buy things that have the chemical sector in their supply chain.

The common thread is generally historical codes with a lot of matrix math. Fortran has some pretty great syntax for arrays and their operations. And the for-loop parallelization syntax in parallel compilers (like openmp) is also easy to use. The language can even enforce function purity for you, which removes some of the foot guns from parallel code that you get in other languages.

The kinds of problems those packages solve tend to bottleneck at matrix math, so it's not surprising a language that is very ergonomic for vector math found use there.

Same for Matlab, it's mostly used by niche fields and engineers who work on physical objects (chemical, mechanical, etc). Their marketing strategy is to give discounts to universities to encourage classes that use them. Like Fortran, it has good syntax for matrix operations. Plus it has a legitimately strong standard library. Great for students who aren't programmers and who don't want to be programmers. They then only know this language and ask their future employer for a license. If you don't interact with a lot of engineers at many companies, you aren't going to see Matlab.

reply
embwbam
11 days ago
[-]
I work for the National Solar Observatory, creating Level 2 data for the DKIST Telescope’s observations of the sun. (For example, an image of the Temperature that lines up with the observation)

Just the other day, the solar physicist I work with said “yeah, that code that runs on the supercomputer needs to be rewritten in Fortran” (from C, I think.

He’s nearing retirement, but it’s not that he’s behind the times. He knows his stuff, and has a lot of software experience in addition to physics

reply
moregrist
11 days ago
[-]
While it's not as ubiquitous as it used to be, Matlab is still very heavily used within shops that do a lot of traditional engineering (ME, EE, Aero, etc.).

This surprises people who just write software, but consider:

- The documentation and support is superb. This alone can justify the cost for many orgs.

- Did I mention the support? MathWorks has teams of application support engineers to help you use the tool. The ability to pay to get someone on the phone can also justify the price.

- The toolkits tend to do what specific fields want, and they tend to have a decent api. In contrast, you might end up hacking together something out of scipy and 3-4 weirdly incompatible data science packages. That's fine for me, but your average ME/EE would rather just have a tool that does what they need.

- It also has SIMULINK, which can help engineers get from spec to simulation very quickly, and seems deeply embedded in certain areas (eg: it's pretty common for a job ad with control systems work to want SIMULINK experience).

Is Python gradually displacing it? Probably.

(Honestly, I wish it would happen faster. I've written one large program in Matlab, and I have absolutely no desire to write another.)

reply
samatman
11 days ago
[-]
Julia has gained a lot of mindshare in array-heavy coding, with a lot of hard work put into making it possible to get comparable speeds to C++ and Fortran. The manual[0] offers a good introduction to what's possible: this includes sparse and strided arrays, as well as the sort of zero-copy reinterpretation you accomplish with pointer wizardry in the Fine Article.

[0]: https://docs.julialang.org/en/v1/manual/interfaces/#man-inte...

reply
chillfox
11 days ago
[-]
If everyone is just using the Fortran libraries instead of reimplementing it in a modern language, then that's evidence that it's still being used for that purpose.
reply
aragilar
11 days ago
[-]
Matlab has the issue of not being open source (similarly IDL), but I still see it popping up (though naturally, if I've got any choice it the matter, it's going to be ported to Python). I've also seen new Fortran codebases as well, mainly where the (conceptual, not computational) overhead of using helper libraries isn't worth it.

I'd suggest Python (via the various memoryview related PEPs, plus numpy) does provide most of the required information (vs. e.g. Go or Rust or Ruby), so I'm not sure that proving much other than if the value of using a library to handle multidimensional arrays is enough such that the combined value of a more general language plus library beats Matlab or Fortran (likely due to other ecosystem or licensing effects), people will switch.

reply
RugnirViking
11 days ago
[-]
we still get new Matlab codebases at my work, also R. In my case it comes from academics in the power systems and renewables fields who move into industry and keep using the tools they used there. We try to gently ease them into python and version control and sensible collaboration but we get new hires all the time
reply
xdavidliu
11 days ago
[-]
> Matlab & Fortran have been decimated by Python and C/C++ around 1990s and 2010s, respectively.

Nit: this failed to be respective; the years are the other way around

reply
kbelder
11 days ago
[-]
Interesting... maybe the respective comparison was Matlab to Python and Fortran to C/C++? This sentence actually had three parallel clauses.

But that was a great nit to find.

reply
xdavidliu
10 days ago
[-]
yes that's what I originally meant. Matlab to Python was 2010's and Fortran to C/C++ was 1990s. (Probably not the exact time frame, but that's close)
reply
lifthrasiir
11 days ago
[-]
But format conversion is inevitable because C and FORTRAN has a different axis ordering anyway, isn't it?
reply
blt
11 days ago
[-]
Not really, you can always change the indexing to account for it. For example, the GEMM matrix multiplication subroutines from BLAS can transpose their arguments [1]. So if you have A (m x n) and B (n x p) stored row-major, but you want to use a column-major BLAS to compute A*B, you can instead tell BLAS that A is n x m, B is p x n, and you want to compute A' * B'.

As the article mentions, NumPy can handle both and do all the bookkeeping. So can Eigen in C++.

[1] https://www.math.utah.edu/software/lapack/lapack-blas/dgemm....

reply
lifthrasiir
11 days ago
[-]
That's conceptually still a format conversion, though the actual conversion might not happen. Users have to track which format is being used for which matrix and I believe that was what the GP was originally complaining for.
reply
yosefk
11 days ago
[-]
Agreed, realistically you're going to either convert the data or give up on using the function rather than reimplementing it to support another data layout efficiently; while converting the code instead of the data, so to speak, is more efficient from the machine resource use POV, it can be very rough on human resources.

(The article is basically about not having to give up in some cases where you can tweak the input parameters and make things work without rewriting the function; but it's not always possible)

reply
semi-extrinsic
11 days ago
[-]
Talking about rough on human resources, I take it you've never written iterative Krylov methods where matrices are in compressed sparse row (CSR) format?

Because that's the kind of mental model capacity scientific programmers needed to have back in the days. People still do similarly brain-twisting things today. In this context, converting your logic between C and Fortran to avoid shuffling data is trivial.

reply
DarkUranium
9 days ago
[-]
I'm working towards my own language, and I'm sort of stuck on this issue, too:

Namely:

1. Should the "default" (syntactically-sugar'd) arrays in the language (e.g. `int[]`) be restricted to simple contiguous storage, or should they also hold (potentially negative) stride, etc?

2. Should I have multidimensional arrays? (to be clear, I'm talking about contiguous blocks of data, not arrays-holding-pointers-to-arrays)

3. Should slicing result in a view, or a copy? (naturally, some of these options conflict)

I'm honestly erring towards:

1. "fat" arrays;

2. multidimensional arrays;

3. slicing being views.

This allows operations like reversing, "skipping slices" (e.g. `foo[::2]` in Python), even broadcasts (`stride=0`) to be represented by the same datastructure. It's ostensibly even possible to transpose matrices and such, all without actual copying of the data.

But there are good arguments for the reverse, too. E.g. "fat" arrays need 1 more `size_t` of storage per dimension (to store the stride), compared to "thin" arrays. And a compiler can optimize thin ones much better due to more guarantees (for example, a copy can use vector instructions instead of having to check stride first). Plus "thin" arrays are definitely the more common scenario and it simplifies the language.

So, yeah, still not quite sure despite liking the concept. One obvious option would be for `int[]` to be a thin array, but offering `DataView<int>` [1]. The problem with that is that people implementing APIs will default to the easy thing (`int[]`) instead of the more-appropriate-but-more-letters `DataView<int>`. Ah, dilemmas.

[1] (not actual syntax, but I figured I'd use familiar syntax to get the point across)

reply
sa-code
11 days ago
[-]
What kind of support would you have hoped for?
reply
infogulch
11 days ago
[-]
A way to reinterpret a slice of size N as a multidimensional array with strides that are a factorization of N, including optional reverse order strides. Basically, do the stride bookkeeping internally so I can write an algorithm only considering the logic and optimize the striding order independently.
reply
Animats
11 days ago
[-]
That's where you end up after heavy bikeshedding. Lots of features, terrible performance, as the OP points out.
reply
infogulch
11 days ago
[-]
I agree with you on sparse arrays and multidimensional slices, but this is basically the same as what you'd do manually. Saying that "track strides for me" is "lots of features" is a bit uncharitable.
reply
londons_explore
11 days ago
[-]
> What can we do about this? We can't change the layout of pygame Surface data. And we seriously don't want to copy the C++ code of cv2.resize, with its various platform-specific optimizations,

Or... you could have sent a ~25 line pull request to opencv to fix this performance problem not just for you, but for thousands of other developers and millions of users.

I think your fix would go here:

https://github.com/opencv/opencv/blob/ba65d2eb0d83e6c9567a25...

And you could have tracked down that slow code easily by running your slow code in gdb, hitting Ctrl+C while it's doing the slow thing, and then "bt" to get a stack trace of what it's doing and you'd see it constructing this new image copy because the format isn't correct.

reply
yosefk
11 days ago
[-]
25 line pull request doing what? Supporting this format efficiently is probably more LOC and unlikely to get merged as few need this and it complicates the code. Doing the thing I did in Python inside OpenCV C++ code instead (reinterpreting the data) is not really possible since you have less knowledge about the input data at this point (like, you're going to assume it's an RGBA image and access past the area defined as the allocated array data range for the last A value? The user gives a 3D array and you blithely work on the 4th dimension?)

And what about the other examples I measured, including pure numpy code? More patches to submit?

reply
londons_explore
11 days ago
[-]
inside the python opencv bindings?

Or just special case resize, since there the channel order doesn't matter. If you interpret R as A you'll still get the right answer.

reply
yosefk
11 days ago
[-]
The pygame API gives you a 3D RGB array and a separate 2D alpha array, which happen to reference the same RGBA/BGRA memory. Are you suggesting that OpenCV Python bindings (in most of their functions, and likewise other libraries including numpy operators themselves) should have code assuming that a WxHx3 RGB array really references the memory of a WxHx4 RGBA array?..

If you really want a solution which is a patch rather than a function in your own code, I guess the way to go would be to persuade the maintainers of pygame and separately, the maintainers of its fork, pygame-ce, to add a pixels4d_maybe_rgba_and_maybe_bgra() function returning a WxHx4 array. Whether they would want to add something like that I don't know. But in any case, I think that since closed source projects, open source projects, and code by sister teams exist that will not accept patches, or will not accept them in a timely manner, it's interesting what you can do without changing someone else's code.

reply
C4stor
11 days ago
[-]
All of this seems unnecessary, and easily replaced in the provided benchmark by :

i2 = np.ascontiguousarray(pg.surfarray.pixels3d(isurf))

Which does the 100x speedup too and is a "safe" way to adjust memory access to numpy strides.

Whether the output is correct or not is left as an exercise to the author, since the provided benchmark only use np.zeros() it's kind of hard to verify

reply
yosefk
11 days ago
[-]
I just measured this with the np.ascontiguousarray call (inside the loop of course, if you do it outside the loop when i2 is assigned to, then ofc you don't measure the overhead of np.ascontiguousarray) and the runtime is about the same as without this call, which I would expect given everything TFA explains. So you still have the 100x slowdown.

TFA links to a GitHub repo with code resizing non-zero images, which you could use both to easily benchmark your suggested change, and to check whether the output is still correct.

reply
ggm
11 days ago
[-]
Why does numpy do column order data?

Is it because in much of the maths domain it turns out you manipulate columns more often than rows?

(not a mathematician but I could believe if the columns represent "dimensions" or "qualities" of some dataset, and you want to apply a function to a given dimension, having data in column-natural order makes that faster.)

Obviously you want to believe naievely there is no difference to the X and Y in the X,Y plane, but machines are not naieve and sometimes there IS a difference to doing things to the set of verticals, and the set of horizontals.

reply
Pinus
11 days ago
[-]
Hang on, doesn’t numpy use C (row major) array ordering by default? checks docs Seems like it does. However, numpy array indexing also follows the maths convention where 2D arrays are indexed by row, column (as does C, by the way), so to access pixel x, y you need to say im[y, x]. And the image libraries where I have toyed with with numpy integration (only Pillow, to be honest) seem to work just fine like this — a row of pixels is stored contiguously in memory. So I don’t quite see why the author claims that numpy stores a column of pixels contiguously, but I have only glanced at the article, so quite probably I have missed something.
reply
KeplerBoy
11 days ago
[-]
You're right. Numpy stores arrays in row-major order by default. One can always just have a look at the flags (ndarray.flags returns some information about the order and underlying buffer).
reply
sdeer
11 days ago
[-]
Probably because Fortran stores matrices and other multidimensional arrays in column order. Traditionally most numerical computation software was written in Fortran and numpy calls those under the hood. Storing in row order would have meant copying the data to column major order and back for any call to Fortran.
reply
Thrymr
11 days ago
[-]
NumPy can _store_ arrays in either row-major (C-style) or column-major (Fortran-style) order. Row-major is the default. Index order is always row, column.
reply
bee_rider
11 days ago
[-]
It is possible that my brain had been damaged by convention, but it looks much tidier to have your vector multiplicand on the right of the matrix when doing a matrix vector multiplication y=Ax, where A is the matrix and x is the vector, y is the result.

So, x and y must be columns. So, it is nice if our language has column-order as the default, that way a vector is also just a good old 1d array.

If we did y=xA, x and y would be rows, the actual math would be the same but… I dunno, isn’t it just hideous? Ax is like a sentence, it should start with an upper case!

It also fits well with how we tend to learn math. First we learn about functions, f(x). Then we learn that a function can be an operator. And a matrix can represent a linear operator. So, we tend to have a long history of stuff sitting to the left of x getting ready to transform it.

reply
SailorJerry
11 days ago
[-]
I think the reason I prefer columns is I do the mental expansion into large bracketed expressions. If x is a row and kept inline, then the expansion gets really wide. To keep it compact and have the symbols oriented the same as their expansion, you'd have to put the x above A and that's just silly.
reply
planede
11 days ago
[-]
Your reasoning starts out nicely with y=Ax, but you get the wrong conclusion. The layout of x and y are not affected at all by row or column order, they are just consecutive numbers in either case. So you have to look at the layout of A.

For the row-major layout the multiplication Ax ends up being more cache-efficient than for the column major layout, as for calculating each component of y you need to scan x and a row of A.

reply
bee_rider
11 days ago
[-]
I’m pretty sure BLAS will tile out the matrix multiplication internally anyway, so it doesn’t matter. At least for matmat would, is matvec special?
reply
planede
11 days ago
[-]
However matmat is done, row-major vs column-major for both matrices shouldn't make a difference (for square matrices at least).

I don't know if tiling is done for matvec. I don't think it makes sense there, but I didn't think about it too hard.

reply
Joker_vD
11 days ago
[-]
Well, there used to be a tradition in universal algebra (and category theory) of putting functions/operators at the right side of the arguments but it seems to have ultimately died out. And "y = xA" is the standard notation in the linear coding theory, even to this day: message vectors are bit strings, not bit columns.
reply
bee_rider
11 days ago
[-]
I think there’s also some other pretty big field that tends to put the vector on the left… statistics? Economics? For some reason I couldn’t find any links, so maybe I just got this from, like, one random statistician. If only they’d told me about sample sizes.
reply
montebicyclelo
11 days ago
[-]
y = x @ W + b

Is how you'd write it in NumPy and most (/all?) deep learning libraries, with x.shape=[batch, ndim]

I'd personally prefer that to be the convention, for consistency.

reply
thayne
11 days ago
[-]
Numpy was originally designed for mathematicians and scientists, and in those domains convention often lines up better with column major order. For example, vectors are a single column, and the column index comes before the row index in many notations. So using column major order meant familar formulas and algorithms (including translating from fortran code which is column major, or matlab for that matter) are easy to translate to numpy code.

Also, numpy is built on some fortran libraries, like BLAS and LAPACK that assume a column major order. If it took row major input, it would need to transpose matrices in some cases change the order, or rewrite those libraries to use row major form.

reply
kolbusa
11 days ago
[-]
Copying is usually not necessary. Often times you can swap data and/or shape arguments and get a transposed result out. While it is true that Fortran BLAS only supports col-major, CBLAS supports both row- and col-major. Internally, all the libraries I worked on use col-major, but that is just a convention.
reply
rdtsc
11 days ago
[-]
I think in math columns as vectors is a more common representation. Especially if we talk about matrix multiplication and linear systems.
reply
antirez
11 days ago
[-]
MicroPython allows to do even more than that, and is fantastic and I wish it was part of standard Python as well. Using a decorator you can write functions in Viper, a subset of Python that has pointers to bytearrays() and only fast integer types that have a behavior similar to C (fixed size, they wrap around and so forth). It's a very nice way to speedup things 10 or 100x without resorting to C extensions.
reply
yosefk
11 days ago
[-]
I dunno if it's "more than that" since they're not directly comparable? What you describe sounds more like numba or Cython than what TFA describes which is a different use case?..
reply
antirez
11 days ago
[-]
Indeed, I don't mean they are directly comparable, but it's a great incarnation of the "Unsafe Python" idea, because it does not go too far with the subset of Python that you can use, like introducing a complete different syntax or alike. There are strict rules, but if you follow them you still kinda write Python that goes super fast.
reply
3abiton
7 days ago
[-]
Iam curious about the quantification of "super fast", do you have some pointers?
reply
antirez
7 days ago
[-]
Viper is normally half the speed of using the MicroPython decorator to directly write assembly.
reply
comex
11 days ago
[-]
Do you actually need ctypes here, or can you just use np.reshape? That would cut out the unsafety.
reply
Retr0id
11 days ago
[-]
You can exploit memory unsafety in cpython using only built-in methods, no imports needed at all: https://github.com/DavidBuchanan314/unsafe-python/
reply
yosefk
11 days ago
[-]
You... probably shouldn't, but, very interesting stuff in that link! I didn't know CPython had no bounds checks in the load_const bytecode op
reply
yosefk
11 days ago
[-]
I think you'll have a problem with BGR data absent ctypes, since you have a numpy array with the base pointing at the first red channel value, and you want an array with the base pointing 2 bytes before the first red channel value. This is almost definitely "unsafe"?.. or does numpy have functions knowing that due to the negative z stride, the red value before the blue value is within the bounds of the original array? I somehow doubt it though it would be very impressive. And I think the last A value is hopelessly out of reach absent ctypes since a safe API has no information that this last value is within the array bounds; more strictly speaking all the alpha values are out of bounds according to the shape and strides of the BGRA array.
reply
im3w1l
11 days ago
[-]
I think what you want is a combination of swapaxes and flip (reshape does not turn rows into columns, it turns m input rows into n output rows), but yeah.

Actually let me include a little figure

  original  reshape  swapaxes

  abc       ab       ad
  def       cd       be
            ef       cf
reply
jofer
11 days ago
[-]
I'm confused on the need for cytpes/etc here. You can directly modify somearr.strides and somearr.shape. And if you need to set them both together, then there's numpy.lib.stride_tricks.as_strided. Unless I'm missing something, you can do the same assignment that ctypes is used for here directly in numpy. But I might also be missing a lot - I'm still pre-coffee.

On a different note, I'm surprised no one has mentioned Cython in this context. This is distinct from, but broadly related to things like @cython.boundscheck(False). Cython is _really_ handy, and it's a shame that it's kinda fallen out of favor in some ways lately.

reply
yosefk
11 days ago
[-]
You need to modify base pointer in this example, specifically to point 2 bytes before the current base pointer (moving back from the first red pixel value to the first blue value.) I don't think you can do it without ctypes, maybe I'm wrong.

What does Cython do better than numbs except static compilation? Honest q, I know little about both

reply
jofer
11 days ago
[-]
You actually can do the "offset by 2 bytes back" with a reshape + indexing + reshape back. But I suspect I'm still missing something. I need to read things in more depth and try it out locally.

By "numbs" here, I'm assuming you mean numba. If that's not correct, I apologize in advance!

There are several things where Cython is a better fit than numba (and to be fair, several cases where the opposite is true). There are two that stick out in my mind:

First off, the optimizations for cython are explicit and a matter of how you implement the actual code. It's a separate language (technically a superset of python). There's no "magic". That's often a significant advantage in and of itself. Personally, I often find larger speedups with Cython, but then again, I'm more familiar with it, and understand what settings to turn on/off in different situations. Numba is much more of a black box given than it's a JIT compiler. With that said, it can also do things that Cython can't _because_ it's a JIT compiler.

If you want to run python code as-is, then yeah, numba is the better choice. You won't see a speedup at all with Cython unless you change the way you've written things.

The second key thing is one that's likely the most overlooked. Cython is arguably the best way to write a python wrapper around C code where you want to expose things as numpy arrays. It takes a _huge_ amount of the boilerplate out. That alone makes it worth learning, i.m.o.

reply
jofer
11 days ago
[-]
Ah, right, you mean _outside_ the memory block of the array! Sorry, my mind was just foggy this morning.

That's not strictly possible, but "circular" references are with "stride tricks". Those can accomplish similar things in some circumstances. But with that said, I don't think that would work in this case.

reply
ruined
11 days ago
[-]
oh THIS is why image byte order and dimensions are so confusing every time i fuck with opencv and pygame

well, half of why. for some reason i keep doing everything directly on /dev/fb0

reply
TeMPOraL
11 days ago
[-]
Image byte order, axis directions, coordinate system handedness when in 3D... after enough trying, you eventually figure out the order of looping at any given stage of your program, and then it Just Works, and then you never touch it again.
reply
sylware
11 days ago
[-]
I wonder how it would perform with the latest openstreetmap tile renderer which was released not long ago.
reply
gugagore
11 days ago
[-]
Technically, image resizing does in general care about the color channel ordering, because color spaces are in general not linear. https://www.alanzucconi.com/2016/01/06/colour-interpolation/
reply
quietbritishjim
11 days ago
[-]
That article fails to touch on the fundamental issue with its title "The Secrets of Colour Interpolation": RGB values are a nonlinear function of the light emitted (because we are better at distinguishing dark colours, so it's better to allow representing more of those), so to interpolate properly you need to invert that function, interpolate, then reapply. The difference that makes to colour gradients is really striking.
reply
topherclay
11 days ago
[-]
Do you mean converting the RGB value to LAB values in the CIELAB color space and doing the interpellation there?

Is there a better way to do it?

reply
CarVac
11 days ago
[-]
No, you just need to linearize the brightness.
reply
drjasonharrison
11 days ago
[-]
Typically this is done by using a look-up table to convert the 8-bit gamma encoded intensities to 10-bit (or more) linear intensities. You can use the same look-up table for R, G, and B. Alpha should be linear.
reply
yosefk
11 days ago
[-]
While this is a valid point, cv2.resize doesn't actually implement color conversion in a way addressing this issue; at least in my testing I get identical results whether I interpret the data as RGB or BGR. So if you want to use cv2.resize, AFAIK you can count on it not caring which channel is which. And if you need fast resizing, you're quite likely to settle for the straightforward interpolation implemented by cv2.resize.
reply
planede
11 days ago
[-]
Even if the RGB components correspond to sRGB, to linearize you apply the same non-linear function to each component value, independently. So even if you do the interpolation in a linear colorspace, the order of the sRGB components does not matter.
reply
Joker_vD
11 days ago
[-]
Yep, welcome to the world of RGBA and ARGB storage and memory representation formats, with little- and big-endianness thrown in for the mix, it's all very bloody annoying for very little gain.
reply
qaq
11 days ago
[-]
Well with mojo it looks like you will have both safety and performance
reply
grandma_tea
11 days ago
[-]
Closed source and not relevant.
reply
ianbutler
11 days ago
[-]
Mojo was open sourced months ago iirc

https://github.com/modularml/mojo

reply
viraptor
11 days ago
[-]
Those are examples and docs. Mojo is still closed.
reply
ianbutler
11 days ago
[-]
Their standard lib is in there, seems more open to me

Theres the announcement where they open sourced the modules in their standard lib: https://www.modular.com/blog/the-next-big-step-in-mojo-open-...

There's the standard lib: https://github.com/modularml/mojo/blob/nightly/stdlib/README...

reply
viraptor
11 days ago
[-]
Yeah, that's some minimal progress, but really not that interesting. (As in, it's cool that it exists, but given we've got Python stdlib and numpy already open, it's not really new/exciting) And doesn't allow you to port to platforms they don't care enough about.
reply
qaq
11 days ago
[-]
It might be not interesting to you. Having a lot of Rust features with much smarter compiler and Python syntax is pretty interesting to me.
reply
viraptor
11 days ago
[-]
No, mojo itself is interesting. The stdlib itself that's posted isn't. I mean, is not very interesting code beyond "what are the implementation details". Any decent programmer could redo that based on python's stdlib.
reply
Too
11 days ago
[-]
Clickbait title. The speedup only applies to one particular interaction between SDL and numpy.
reply
intelVISA
11 days ago
[-]
Isn't all Python, by design, unsafe?
reply
emmanueloga_
11 days ago
[-]
As others commented "safety" is a heavily overloaded term, for instance there's "type safety" [1] and "memory safety" [2], and then there's "static vs dynamic typing" [3], and "weak vs strong typing" [4], so talking about types and safety offered by a language can be very nuanced.

I suspect when you say "unsafe by design" you may be referring to the dynamic type checking aspect of Python, although Python has supported for a while type annotations, that can be statically checked by linter-like tools like PyRight.

--

1: https://en.wikipedia.org/wiki/Type_safety

2: https://en.wikipedia.org/wiki/Memory_safety

3: https://en.wikipedia.org/wiki/Type_system#Type_checking

4: https://en.wikipedia.org/wiki/Strong_and_weak_typing

reply
d0mine
11 days ago
[-]
Python both type and memory safe e.g., ""+1 in Python is TypeError and [][0] is IndexError

But you can perform unsafe operations e.g.,

    import ctypes
    ctypes.string_at(0)  # access 0 address in memory -> segfault
reply
eru
10 days ago
[-]
Yes, though `ctypes' is kind of marked as `unsafe' in Python by convention.
reply
westurner
11 days ago
[-]
ctypes, c extensions, and CFFI are unsafe.

Python doesn't have an unsafe keyword like Rustlang.

In Python, you can dereference a null pointer and cause a Segmentation fault given a ctypes import.

lancedb/lance and/or pandas' dtype_backend="pyarrow" might work with pygame

reply
Retr0id
11 days ago
[-]
You don't even need a ctypes import for it, you'll get a segfault from this:

   eval((lambda:0).__code__.replace(co_consts=()))
cpython is not memory safe.
reply
westurner
11 days ago
[-]
Static analysis of Python code should include review of "unsafe" things like exec(), eval(), ctypes, c strings, memcpy (*),.

Containers are considered nearly sufficient to sandbox Python, which cannot be effectively sandboxed using Python itself. Isn't that actually true for all languages though?

There's a RustPython, but it does support CFFI and __builtins__.eval, so

reply
maple3142
11 days ago
[-]
The example given by parent does not need eval to trigger though. Just create a function and replace its code object then call it, it will easily segfault.
reply
jwilk
11 days ago
[-]
Complete example without eval:

  def f(): pass
  f.__code__ = f.__code__.replace(co_consts=())
  f()
reply
Retr0id
11 days ago
[-]
yup, eval was just there for golfing purposes
reply
eru
11 days ago
[-]
Is this supposed to be a joke?

Have a look at the linked article to see what meaning of 'unsafe' the author has in mind.

reply
intelVISA
11 days ago
[-]
Safety is catching (more) errors ahead of time ...for which Python is grossly unsuitable imo.

Fun lang though, just not one that ever comes to mind when I hear 'safe'.

reply
recursive
11 days ago
[-]
Memory corruption vs type safety.

"Safety" is an overloaded term, but that happens a lot in software. You'll probably get best results if you try to understand what people are talking about, rather than just assuming everyone else is an idiot.

reply
eru
11 days ago
[-]
That's one definition of safety. But it's not the one the author uses in this case.

The generic snide about Python that has nothing to do with the article ain't all that informative.

You could make the same remark you just made about all Python being unsafe about Rust: 'Isn't all Rust, by design, unsafe?'. And compared to eg Agda or Idris that would be true, but it wouldn't be a very useful nor interesting comment when talking about specifically 'unsafe' Rust vs normal Rust.

reply
intelVISA
11 days ago
[-]
I would agree, Rust is not safe. We need to encourage more formal rigor in our craft and avoid misconceptions like 'safe Rust' or 'safe Python'. Thus my original comment :P
reply
dahart
11 days ago
[-]
It’s a term of art in this case. Or someone proposed the term improper noun https://news.ycombinator.com/item?id=32673100

Fine and good to advocate rigor, whether or not it’s specifically relevant to this post, but maybe be careful with the interpretation and commentary lest people decide the misconception is on your part?

reply
eru
11 days ago
[-]
Eh, even Agda is only safe in the sense of 'does what you've proven it to do'. That doesn't mean that if you eg write a machine learning library in Agda, your AI won't come and turn us all into paperclips.

So it doesn't really make sense to pretend there's some single meaning of 'safe' vs 'unsafe' that's appropriate in all contexts.

reply
idkdotcom
11 days ago
[-]
Safety is one of Python's greatest advantages over C or C++. Why would anyone use unsafe Python when P0ython doesn't have other features such as type safety and all the debugging tools that have been built for C and C++ over time is beyond me.

Python is a great language, but just as the generation of kids who got out of computer science programs in the 2000s were clueless about anything that wasn't Java, it seems this generation is clueless about anything that isn't Python.

There was life before Python and there will be life after Python!

reply
jandrese
11 days ago
[-]
Seems to me if you could do most of the work in Python and then just make the critical loop unsafe and 100x faster then that would certainly have some appeal.
reply
gibolt
11 days ago
[-]
Plenty of people would gladly not have to learn another language (especially C).

You could also benefit from testing blocks of code with safety enabled to have more confidence when safety is removed.

reply
TylerE
11 days ago
[-]
Or if you’re going to learn another language might as well learn nim. Keep most of the python syntax, ditch the performance and the packaging insanity.
reply
KeplerBoy
11 days ago
[-]
It was explained pretty well in the blog: Installing the OpenCV python package is easy, fast and painless as long as you're happy with the default binary. Building OpenCV from source to get it into a C/C++ program can quickly turn into a multi-hour adventure, if you're not familiar with building sizeable C++ projects.
reply
bongodongobob
11 days ago
[-]
I don't use Python because of type safety, I use it because it's easy to write. I couldn't give a single fuck about some missed pointers that weren't cleaned up in the 2.5 seconds my program ran. I'm not deploying public code.

I tend to prototype in Python and then just rewrite the whole thing in C with classes if I need the 10000x speedup.

reply
SunlitCat
11 days ago
[-]
That easy to write part is something, I am not so sure about. It took me ages to understand why my first try at writing a blender plugin didn't work.

It was because of white spaces not lining up. I was like, really? Using white spaces as a way to denote the body of a function? Really?

reply
nottorp
11 days ago
[-]
I'm curious... what rock have you been living under in the past 33 years since python was launched?
reply
SunlitCat
11 days ago
[-]
Let's see, I was living under the:

- Amiga E - Assembler (on Amiga) - perl - tcl/tk - javascript - vbscript - java - vba - c - c++ - python

rock. Granted at some of those rocks, I just took a quick peek (more like a glance), but a rock with whitespaces being important was, at that moment, new to me!

reply
nottorp
11 days ago
[-]
Oh cmon, I can still write some z80 assembly from memory and remember the zx spectrum memory layout somewhat, but I check new programming languages now and then :)
reply
SunlitCat
11 days ago
[-]
:)

I just meant that for someone coming from a language(s) where different kinds of delimiters are used to denote function bodies, a language that puts such a huge emphasis on whitespace was kinda throwing me off to get an error message slapped at me even though I was replicating the example letter by letter (although, obviously, I missed the importance of whitespace).

reply
nottorp
11 days ago
[-]
It's all fuzzy, but I'm sure I knew about the importance of whitespace in python long before I actually tried python.

Even Raymond's article which was one of the first whined about it [1]. I definitely remember reading that one and thinking I have to check out that language later.

[1] https://www.linuxjournal.com/article/3882

reply
SunlitCat
11 days ago
[-]
Thank you for the article. I have to put it on my "to read" pile! :)
reply
nottorp
10 days ago
[-]
It's from 2000, python has grown up in the mean time and it has added complexity. Put a (small) grain of salt on Raymond's praise.
reply
SunlitCat
11 days ago
[-]
Thing about academia (like computer science and the like) is, you need a programming language you can get across easily, quickly and can get people to produce satisfying results in no time.

Back then it might have been java, then python, till the next language comes around.

The java thing was so apparent, that when you looked at c++ code, you were able to spot the former java user pretty easily. :)

About python, sometime ago, I was looking into arudino programming in c and found someone offering a library (I think) in c (or c++, can't remember). What i remember is, that this person stopped working on it, because they got many requests and questions regarding to be able to use their library in python.

reply
pjmlp
11 days ago
[-]
> The java thing was so apparent, that when you looked at c++ code, you were able to spot the former java user pretty easily. :)

Well, Java was designed based on C++ GUI development patterns typical in the 10 years of C++ GUI frameworks that predated Java.

Yet, people keep using these kind of remarks.

reply
gibolt
11 days ago
[-]
"Why would anyone use" -> "when" usage generally means lots of use cases are being ignored / swept under the rug.

>1 billion people exist. Each has a unique opinion / viewpoint.

reply
omoikane
11 days ago
[-]
The "unsafe" in the title appears to be used in the sense of "exposing memory layout details", but not in the sense of "direct unbounded access to memory". It's probably not the memory safety issue you are thinking of.
reply
yosefk
11 days ago
[-]
You can absolutely get direct unbounded access to memory with ctypes, with all the bugs that come from this. I just think/hope the code I show in TFA happens to have no such bugs.
reply
eru
11 days ago
[-]
> Why would anyone use unsafe Python when Python doesn't have other features such as type safety and all the debugging tools that have been built for C and C++ over time is beyond me.

C and C++ 'type safety' is barely there (compared to more sane languages like OCaml or Rust or even Java etc). As to why would anyone do that? The question is 'why not?' It's fun.

reply
kragen
11 days ago
[-]
while python is eminently criticable, yosef kreinin has designed his own cpu instruction set and gotten it fabbed in a shipping hardware product, and is the author of the c++fqa, which is by far the best-informed criticism of c++; criticizing him with 'this generation is clueless about anything that isn't Python' seems off the mark
reply