The handling of the "tail" of the key (the last < 32 bytes) is slightly odd (the "if (l & 1) { mix 1 byte of key }" happens before 8-byte chunks and 2-byte chunks), but if it passes SMHasher it should be fine for general use.
Author of the post writes:
> I kept making changes until the tests passed.
If SMHasher has become a target, shouldn't it be considered a bad measure now? [1]
OTOH, I would expect that tests for hashing and random number generation in particular would be naturally resistant to overfitting, due to the nature of the problem. But I'm not an expert at this topic, so I'd love to hear someone else's take on that
When a hash function has weak mixing they tend to fail catastrophically on one of those key sets, so you can think of SMHasher as more like "attempts to test the hash function to the point of failure" instead of a "target" as described in Goodhart's Law.
Also here is a hash function I wrote a while ago now: https://github.com/Keith-Cancel/k-hashv/tree/main
> The way it loads the 8 bytes is also important. The correct way is to load via shift+or > This is free of any UB, works on any alignment and on any machine regardless of it's endianness. It's also fast, gcc and clang recognize this pattern and optimize it into a single mov instruction on x86 targets.
Is a single MOV instruction still fast when the 8 bytes begin on an odd address?
On x86, yes. There is no performance penalty for misaligned loads, except when the misaligned load also happens to straddle a cache line boundary, in which case it is slower, but only marginally so.
I don't think so. At the start k will be equal to keyIn, which can point to any arbitrary memory location.
The case where you allocate a buffer, populate it with random contents and hash the whole of it is an artificial setup for benchmarking, but far from real use cases.
CRCs allow you to precisely tune the type of error detection guarantees you have to the application. You just pick the combination of hamming distance and polynomial size that you like best from the CRC Zoo:
https://users.ece.cmu.edu/~koopman/crc/index.html
A hash function is essentially just a statistical version with the avalanche property that works acceptably well the majority of the time instead of a hard guarantee.
> It’s the fastest hash function we know of, and we have benchmarked all the ones we could find. On modern Intel x64 CPUs, it hashes 16 bytes per cycle single-threaded. This means in cache it can hash at a rate of 64 gigabytes per second on a 4.2gHz machine. Out of cache, it hashes at whatever speed your main memory bus can provide to a single core, since that is usually the limiting factor on modern x64 CPUs.
> It has also now been tuned to be the fastest hash on small inputs, too. Despite the fact that it is a full 128-bit hash, it still outperforms “fast” 64-bit hashes across all input sizes.
— https://archive.is/CQOVm (originally https://mollyrocket.com/meowhash)
Discussion on HN: https://news.ycombinator.com/item?id=29038813 & https://news.ycombinator.com/item?id=18262627
https://rurban.github.io/smhasher/doc/table.html
In particular, rapidhash is three times as fast for small inputs and is portable (unlike meow_hash). Plus meow fails one of the SMHasher quality tests.
It shines for large inputs (it’s far up the SMHasher list you linked in this case).
It offers a good compromise; it’s not bad (good, even) for both large and small inputs.
Granted, rapidhash is quite good on both fronts, too.
> To our surprise, we found a lack of published, well-optimized, large-data hash functions.
Murmur was released in 2008. Murmur3 in 2011. Release dates on higher quality functions are not easy to find, but I am sure that there were more around.
This type of thing is why I take Casey's claims with a huge dose of salt.
My impression in general is that Casey is a fairly decent engineer who has somehow been elevated into godhood by a subset of users.
Meow hash may very well be fast on large data, but it completely fails the "small code" criteria, which is one of the clearly stated design goal of Chibi.
Generally speaking, the "code as small as possible" is IMO a design constraint that is very often under-estimated or even ignored for these type of algorithms, including the crypto hard ones.
I personally find the "code fits on half a page" metric very important: beyond satisfying some very personal sense of aesthetics, it buys you quite a lot of nice properties:
- fits in head
- inlines like a dream
- easy to integrate
- easy to do header only
- easy to security audit
- pretty hard to introduce bugs
- nothing up my sleeve numbers pretty much obvious
For these reasons, I used to love the TEA (tiny encryption algorithm) [1] before cryptanalysis unfortunately showed it to be weak. Its successors (XXTEA, etc...) are already almost too big for my taste.The speck cipher [2] is quite nice in that regard, in spite of its rather untrustworthy provenance.