u = ARGV[0].to_i
r = rand(10_000)
a = Array.new(10_000, 0)
(0...10_000).each do |i|
(0...100_000).each do |j|
a[i] += j % u
end
a[i] += r
end
puts a[r]
Weird benchmark. Hand-optimized, I guess this benchmark will spend over 99% of its time in the first two lines.If you do liveliness analysis on array elements you’ll discover that it is possible to remove the entire outer loop, turning the program into:
u = ARGV[0].to_i
r = rand(10_000)
(0...100_000).each do |j|
a += j % u
end
a += r
puts a
Are there compilers that do this kind of analysis?Even though u isn’t known at compile time, that inner loop can be replaced by a few instructions, too, but that’s a more standard optimization that, I suspect, the likes of clang may be close to making.
I used to work on an AI compiler where liveness analysis of individual tensor elements actually would have been useful. We still didn't do it because the compilation time/memory requirements would be insane.
result = ((u * (u - 1)) / 2 * (100000/u)) + (100000%u * (100000%u - 1) / 2) + r)
Also, I'm wondering what effect Python's minimal JIT [2] has coming for this type of loop. Python 3.13 needs to be built with the JIT enabled, so it would be interesting if someone who has built it runs the benchmarks.
[1] https://www.ruby-lang.org/en/downloads/releases/
[2] https://drew.silcock.dev/blog/everything-you-need-to-know-ab...
But perf improvements can and do drop in point releases too, afair.
I find myself using `#succ` most often for readability reasons, not just for performance. Here's an example where I use it twice in my UUID library's `#bytes` method to keep my brain in “bit slicing mode” when reading the code. I need to loop 16 times (`0xF.succ`) and then within that loop divide things by 256 (`0xFF.succ`): https://github.com/okeeblow/DistorteD/blob/ba48d10/Globe%20G...
irb> 0xFFFFFFFF_FFFFFFFF_FFFFFFFF_FFFFFFFF.bit_length => 128
I am not sure if the benchmark that you had provided showing the speed of truffleruby were made after the changes that you have made.
I would really appreciate it if I could verify the benchmark
and maybe try to add it to the main https://github.com/bddicken/languages as a commit as well , because the truffleruby implementation actually is faster than the node js and goes close to bun or even golang for that matter which is nuts.
This was a fun post to skim through , definitely bookmarking it.
Making it easily fork-able should Oracle choose to do something users dislike.
Which is disappointing since it has the highest likelihood of making the biggest impact to Ruby perf.
TruffleRuby is not 100% compatible with MRI 3.2 yet
Rails: Rails 8 will require Ruby 3.2.0 or newer
https://github.com/oracle/truffleruby0: https://shopify.engineering/porting-yjit-ruby-compiler-to-ru...
The chart axis labels and bar labels overlap each other, and there are no vertical grid lines.
Oh for a simple HTML table!
Interesting that there seems to be a correlation between a language being slow and it being popular.
I say this as a pretty deep rust fanatic. All languages (and runtimes, interpreters, and compilers) are tools. Different problems and approaches to solving them benefit from having a good set at your disposal.
If you're building something that may only run a handful of times (which a lot of python, R, et al programs include) slow execution doesn't matter.
by and large, Ruby is slow, but damn is it nice to code with, which is more appealing for newcomers
Work has been done to make faster Ruby language implementations.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Program performance is associated with the specific programming language implementation and the specific program implementation.
Different language implementation, so different performance.
> too seriously
Take those measurements just seriously enough.
> Ran each three times and used the lowest timing for each. Timings taken on an M3 Macbook pro with 16 gb RAM using the /usr/bin/time command. Input value of 40 given to each.
Not even using JMH. I highly doubt accuracy of the “benchmark”.
You only have to check the additional bytecode that gets generated, to work around the features not natively supported.
https://www.oracle.com/java/technologies/javameoverview.html
Do you mean CPython or PyPy or MicroPython or ?
Yes, compared to Python.
> Do you mean CPython or PyPy
Python standard virtual machine is called CPython, just look at the official web page.
This actually makes me feel sad because it reminded me of Chris Seaton. The idea isn't new and Chris has been promoting it during his time working on TruffleRuby. I think the idea goes back even further to Rubinius.
It is also nice to see TruffleRuby being very fast and YJIT still has lots of headroom to grow. I remember one obstacle with it running rails was memory usage. I wonder if that is still the case.
This makes way for jitting c code to make it way faster than the author has written it.
It’s possible to write gems which will use underlying C on MRI or Java when running on jRuby.
It would be interesting to know if a “pure” would also help jRuby too.
First, this indicates some sort of deep confusion about the purpose of benchmarks in the first place. Benchmarks are performance tests, not popularity tests. And I don't think I'm just jumping on a bit of bad wording, because I see this idea in its various forms a lot poking out in a lot of conversations. Python is popular because there are many aspects to it, among which is the fact that yes, it really is a rather slow language, but the positives outweigh it for many purposes. They don't cancel it. Python's other positive aspects do not speed it up; indeed, they're actually critically tied to why it is slow in the first place. If they were not, Python would not be slow. It has had a lot of work done on it over the years, after all.
Secondly, I think people sort of chant "microbenchmarks are useless", but they aren't useless. I find that microbenchmark actually represents some fairly realistic representation of the relative performance of those various languages. What they are not is totally determinative. You can't divide one language's microbenchmark on this test by another to get a "Python is 160x slower than C". This is, in fact, not an accurate assessment; if you want a single unified number, 40-50 is much closer. But "useless" is way too strong. No language is so wonderful on all other dimensions that it can have something as basic as a function call be dozens of times slower than some other language and yet keep up with that other language in general. (Assuming both languages have had production-quality optimizations applied to them and one of them isn't some very very young language.) It is a real fact about these languages, it is not a huge outlier, and it is a problem I've encountered in real codebases before when I needed to literally optimize out function calls in a dynamic scripting language to speed up certain code to acceptable levels, because function calls in dynamic scripting languages really are expensive in a way that really can matter. It shouldn't be overestimated and used to derive silly "x times faster/slower" values, but at the same time, if you're dismissing these sorts of things, you're throwing away real data. There are no languages that are just as fast as C, except gee golly they just happen to have this one thing where function calls are 1000 times slower for no reason even though everything else is C-speed. These performance differences are reasonably correlated.
I don't think it indicates a deep confusion. I think it leaves a simple point unsaid because it's so strongly implied (related to what you say):
Python may be very low in benchmarks, but clearly it has acceptable performance for a very large subset of applications. As a result, a whole lot of us can ignore the benchmarks.
Even in domains where one would have shuddered at this before. My students are launching a satellite into low earth orbit that has its primary flight computer running python. Yes, sometimes this does waste a few hundred milliseconds and it wastes several milliwatts on average. But even in the constrained environment of a tiny microcontroller in low earth orbit, language performance doesn't really matter to us.
We wouldn't pay any kind of cost (financial or giving up any features) to make it 10x better.
Fuzzy one-dimensional thinking that classifies languages on a "good" and "bad" axis is quite endemic in this industry. And for those people, you can counter "X is slow" with "X has good library support", and disprove "X lacks good tooling" with "But X has a good type system", because all they hear is that you said something is "good" but they have a reason why it's "bad", or vice versa.
Keep an eye out for it.
I've built an autonomous drone using Matlab. It worked but it was a research project, so when it came down to making the thing real and putting our reputation on the line, we couldn't keep going down that route -- we couldn't afford the interpreter overhead, the GC pauses, and all the other nonsense. That aircraft was designed to be as efficient as possible, so we could literally measure the inefficiency from the choice of language in terms of how much it cost in extra battery weight and therefore decreased range.
If you can afford that, great, you have the freedom to run your satellite in whatever language. If not, then yeah you're going to choose a different language if it means extra performance, more runtime, greater range, etc.
Years of effort from a large team is worth something, as is the tens of thousands of dollars we're spending. We expect a return on that investment of data and mission success. We're spending a lot of money to improve odds of success.
But even in this power constrained application, a few milliwatts is nothing. (Nearly half the time, it's literally nothing, because we'd have to use power to run heaters anyways. Most of the rest of the time, we're in the sun, so there's a lot of power around, too). The marginal benefit to saving a milliwatt is zero, so unless the marginal cost is also zero we're not doing it.
> That aircraft was designed to be as efficient as possible, so we could literally measure the inefficiency from the choice of language in terms of how much it cost in extra battery weight and therefore decreased range
If this is a rotorcraft of some sort, that seems silly. It's hard to waste enough power to be more than rounding error compared to what large brushless motors take.
Let me ask you, why do you think most real-time mission critical projects are not typically done in Python?
> If this is a rotorcraft of some sort, that seems silly. It's hard to waste enough power to be more than rounding error compared to what large brushless motors take.
It was a glider trying to fly as long as possible, so no motors, no solar power either. It got to the point that we could not even execute the motion planner fast enough in Matlab given the performance demands of the craft, we had to resort to Mex, and at that point we might as well have been writing in C. Which we did.
otoh When the title is "Speeding up Ruby" we are kind-of presuming it matters.
Never mind performance, would it not be good to at least machine check some static properties? A dynamic language is not a good choice for anything mission critical IMHO.
They might be !
(They aren't necessarily useless. It depends. It depends what one is looking for. It depends etc etc)
> You can't divide one language's microbenchmark on this test by another to get a "Python is 160x slower than C".
Sure you can !
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
— and —
Table 4, page 139
https://dl.acm.org/doi/pdf/10.1145/3687997.3695638
— and then one has — "[A] Python is 160x slower than C" not "[THE] Python is 160x slower than C".
Something multiple and tentative not something singular and definitive.
But presumably they're meant to test something that matters. And the popularity suggests that what's being tested in this case doesn't.
> But "useless" is way too strong. No language is so wonderful on all other dimensions that it can have something as basic as a function call be dozens of times slower than some other language and yet keep up with that other language in general.
And yet Python does keep up with C in general. You might object that when a Python-based system outperforms a C-based system it's not running the same algorithm, or it's not really Python, and that would be technically true, but seemingly not in a way that matters.
> if you're dismissing these sorts of things, you're throwing away real data
Everything is data. The most important part of programming is often ignoring the things that aren't important.
Also, for a lot of the areas where languages like python or ruby aren't great choices because of performance, they would also not be great choices because of the cost of maintaining untyped code, or in python's case the cost of maintaining code in a language that keeps making breaking changes in minor versions.
Script with scripting languages, build other things in other languages
Amber and Lucky are 2 mature frameworks to give Rails a run for their money, and Kemal is your Sinatra.
Mentioning Crystal would be odd since it has nothing to do with the article.
[0] https://crystal-lang.org/2020/02/02/alpine-based-docker-imag...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/... seems like the latest iteration of what used to be a pretty popular one, now with fewer languages and more self-deprecation.
Maybe you've only noticed the dozen in-your-face on the home page?
The charts have shown ~27 for a decade or so.
There's another half-dozen more in the site map.
The effect is similar to dragging a string past a cat: complete distraction — unable to avoid focusing on the movement — unable to extract any information from the movement.
To understand the measurements, cover the "fun visualization" and read the numbers in the single column data table.
(Unfortunately we aren't able to scan down the column of numbers, because the language implementation name is shown first.)
Previously: <blink>
https://developer.mozilla.org/en-US/docs/Glossary/blink_elem...
You would have no problem doing that with a [typo histogram should say bar chart].
:-)
We could try to count how many times the java circle crosses left-to-right and right-to-left, in the time it takes for the PHP circle to cross left-to-right once.
That's error prone but should be approximately correct after a couple of attempts.
That's work we're forced to do because the "fun visualization" is uninformative.
Java was so fast it glowed orange!
I wonder if the distraction of the animation actually makes people slower at reading the information that is in the text column.
The animation serves its purpose -- it grabs attention.
Also, would have loved to see LuaJIT (interpreted lang) & Crystal (static Ruby like language) included just for comparison sake.
https://en.m.wikipedia.org/wiki/Lars_Bak_(computer_programme...
It was stuck with a bad rep for being the language that was never going to replace JavaScript in the browser, and then was merely a transpiler no one was going to use, before it found a new life as the language for Flutter, which has driven a lot of its syntax and semantics improvements since, with built-in VM support for extremely efficient object templating (used by the reactive UI framework).
Nearly 25 years ago, nested loops and fibs.
https://web.archive.org/web/20010424150558/http://www.bagley...
https://web.archive.org/web/20010124092800/http://www.bagley...
It's been a long time since the benchmarks game showed those.
On x86_64 I expect the numbers would have been much closer and within measurement error. The top half is within 0.5-0.59s - there really isn't much you can do inside such a loop, almost nothing happens there.
As Isaac pointed out in a sibling comment - it's best to pick specific microbenchmarks, a selection of languages and implementations that interest you and dissect those - it will tell you much more.
Perhaps the tiny tiny programs none-the-less took enough time that startup was amortized.
It's fast and flexible.
No, hot paths are seldom fairly evenly distributed, even on non-numeric applications. In most cases they will be in a small number of locations.
And this kind of benchmark is the one that tells you why this is different across different languages.