What is RCU, Fundamentally? - https://lwn.net/Articles/262464/
Amusingly, the first several hits for "Golang RCU" point to GitHub repos which contain a bunch of Mutexes (locks), import "unsafe", and haven't been updated in 7+ years. Perhaps this is not the most well-understood area of computer science.
Let's face it, when was the last time you needed RCU for a web app?
Linux kernel's RCU waits for all CPU cores to context switch and disables preemption when reading RCU data structure. This is not usable in normal application programming.
Is that actually true?
Mathematically: O(f(n)) is asymptotic bound when n approaches infinity, which could be used to describe worst, best, average and other cases.
Have programmers redefined O to specifically mean worst case?
Certainly as I understand it your definition of Big O is incorrect - it provides exclusively an upper bound. Big Omega is used for a lower bound and Big Theta for an upper and lower bound.
Function “quickSortExecutionTime(permutation)” depends on exact input (each permutation/shuffle has different exec time), parameter “permutation” can not go to infinity, so it can’t be used with O(n) directly. (Parameter need to be single real number)
QuickSortWorstCaseTime(n) is single valued function thus, it can be used with O(n) notation. Same is true for quickSortAvgCaseTime(n) and quickSortBestCaseTime(n).
But you answered my question. (Very indirect “Yes” :) )
Perhaps it is accurate to say that when Big O is used to describe the time complexity of a function in computer science the variable n in O(f(n)) usually describes the size of the input, which may not be common knowledge. But if the question is:
> Have programmers redefined O to specifically mean worst case?
Then in general I would say no. I suppose to answer more concretely I would have to ask: "Which programmers?"
Big O is an upper bound for a function growth - being O(f(n)) means growing asymptotically slower than f(n) (up to a multiplication by a constant).
Small o is a lower bound for a function growth - being o(f(n)) means growing asymptotically faster than f(n) (up to a multiplication by a constant).
Big theta means the function grows exactly that fast - being Θ(f(n)) means the function is both O(f(n)) and o(f(n)), so being exactly f(n) up to a multiplication by a constant. In practice, due to typographical limitations, people use O(n) where Θ(n) would be more precise.
Average case, best case, worst case, amortized etc complexities are a wholly different thing. For example, quick sort is Θ(n^2) worst case and Θ(n log n) average case.
In C it is much more likely that these allocations are spread around memory and in different cache lines.
> Reviewers were, as ever, skeptical in the face of assertions about performance without corresponding measurements. Peng Zhang asked "how much performance improvement is expected if it is applied to dcache?" Wilcox didn't reply to Zhang's question.
Am I reading this right that the author supplied no actual (benchmarking) evidence to support their proposed performance optimization?
My days as a LWN subscriber are now far behind but gosh reading the comments there was a blast. I had forgotten how good proper technical discussions used to be.
[1] https://lore.kernel.org/lkml/20240222203726.1101861-1-willy@...
Is it common to add an unused data structure into the kernel without understanding its performance characteristics? If they are available it is not obvious to someone who has read the article, read the commit message, read lkml, went looking for benchmarks, and has still come up with no concept of how this might perform in realistic or synthetic scenarios.
> Seriously, a simple google search or taking a minute to follow the first link in the article would have shown you that the patch set obviously came with a benchmark [1].
If you're referring to the table with headers "number xarray maple rhash rosebush", it's an estimate of memory usage, not a benchmark. And the author (Matthew Wilcox) says "I've assumed an average chain length of 10 for rhashtable in the above memory calculations", which folks questioned. So there's reason to doubt its correctness.
> It’s someone posting a patch to the kernel mailing list, not some random kid on GitHub, there is no point assuming they are an idiot to try to shine.
Let's just let the work speak rather than flagging the author as an idiot or genius...at present rosebush is unproven and based on questionable assumptions. That might change in future rounds.
Once you're in a bucket, it does full hash match, then I assume (didn't look at the patch) actual key comparison.
What happens when more than 20 objects end up with the exact same hash?
If I'm wrong I don't think this code does a good enough job of explaining what's going on, and if I'm correct it's defective and should not ship.
Even if this situation "shouldn't" happen, it clearly can happen, you just have to get unlucky, and it's difficult to compute how unlikely that is for real systems, so better to ensure that "We got unlucky" isn't a disaster for the kernel.
It would just be a disaster if they follow Miller's networking code, which uses slower hashes to prevent seed attacks. But every other popular language/library does the very same wrong thing.
So a "twenty object" bucket would actually store 19 objects, and if the last one wasn't null it would contain a pointer to the next bucket.
I'm willing to presume these folks know what they're doing, so that's probably what's happening here.
So we grow the top level hash table and we put all of the twenty items in the same new bucket, because they have the same hash, and then we... oh, we don't have enough room. Try again? (The code presented does). Too bad, that can't help.
In your understanding, why does this not happen? That's the question. How do this code ensure this can't happen? If it doesn't, if it's just hoping to get lucky the code must not be used.
Changing which bits we use changes nothing for this problem case.
I guess I'd argue if you have 20 items whose full hash collides then something is pretty seriously wrong with the hashing algorithm - raise an error/exception/panic/(whatever is appropriate for the situation) and surface it.
In fact it makes that problem acute. A linked list structure will degrade gradually as we shove more entries with the same hash into it, every look-up incurs a pointer chase down the entire list, slower is eventually unacceptable but we do get signs first -- but with arrays we're just screwed, in C it's pretty likely we end up with an illegal dereference or we lose data once there's no room.
If it's designed to be used for other purposes as well, not so much.
(I wasn't entirely sure which from reading the article)