For example, I have a RAID0 with 4 SSDs (Samsung 990 PRO, so consumer, but quite good for reads). And this is what fio says:
# random reads, 8K, direct IO, depth=1
fio --filename=device name --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --readonly
-> read: IOPS=19.1k, BW=149MiB/s (156MB/s)(4473MiB/30001msec)
# sequential reads, 8K, direct IO, depth=1
fio --filename=/dev/md127 --direct=1 --rw=read --bs=8k --ioengine=io_uring --iodepth=1 --runtime=30 --numjobs=1 --time_based --group_reporting --name=random-1 --eta-newline=1 --readonly
-> read: IOPS=85.5k, BW=668MiB/s (700MB/s)(19.6GiB/30001msec)
With buffered I/O, random read stay at ~19k IOPS, while sequential reads get to ~1M IOPS (thanks to read-ahead, either at the OS level, or in the SSD).
So part of this is sequential reads benefiting from implicit "prefetching", which reduces the observed cost of a page. But for random I/O there's no such thing, and so it seems more expensive.
It's more complex (e.g. sequential reads allow issuing larger reads), of course.
Manufacturers use many hacks like caching writes on disk etc. In my experience, it is rare to have an ssd that actualy behaves like it is expected to.
A solid way of measuring this is using fio with different configurations.
It does. Just differently.
E.g. a lot of SSDs nowadays cheap out and save money by using slower and poorer quality NAND + faster and high quality NAND cache. So random often misses the cache a lot more.
SSDs can be connected to a machine through raid cards or some enclosures etc. etc. And all of this makes massive differences compared to a proper on-board PCIe connection.
Also obviously SSDs have very vastly different performance characteristics.
For example an ssd might look very good at writes until you keep writing for more than some amount without any breaks, and then it becomes super slow.
I've observed exactly this behavior on a wide range of hardware / environments, it's not very specific to particular SSDs models (at least not for reads, which is what the blog post was measuring). That's why I showed results from three very different systems.
Some information for the two physical machines:
1) ryzen: Ryzen 9 9900X, RAID0 with 4x Samsung 990 PRO 1TB (in Asus Hyper M.2 Gen5 card)
2) xeon: E5-2699v4, WD Ultrastar DC SN640 960GB (U.3)
I don't know what exactly is backing the SSD storage on the Azure instance.
In general, there are a dizzying number of parameters for both MySQL and Postgres (I assume Oracle and SQL Server as well, but I don’t have experience with them), and many of them can have surprising results. One such example for MySQL is innodb_io_capacity[_max]. The docs [0] say that you should set it to the number of IOPS your system is capable of, and that InnoDB will then use that to guide its background operations. As of version 8.4, the default value has been raised from 200 to 10000. Granted, I haven’t used 8.4 (or 9.x for that matter) in prod, but with 5.7 and 8.0, the advice from Percona [1], and what I’ve found with my own workloads, is to leave it alone - going higher can reduce performance by adding additional write loads (and, as the post points out, prematurely wear out SSDs if you’re running your own).
0: https://dev.mysql.com/doc/refman/8.0/en/innodb-configuring-i...
1: https://www.percona.com/blog/give-love-to-your-ssds-reduce-i...
Embarking right now on a long-term embedded storage project and wondering what people actually monitor (apart from SMART and latency/throughput at app or db-level).
I'm curious about ways to live automatically tune this. You can use SET LOCAL to transaction scope a value, yet this would mean managing it completely in your application.
That’s actually an interesting idea, now that I think about it. You could have it running the queries as EXPLAIN in the background, then occasionally testing a change out with EXPLAIN ANALYZE before adjusting the settings to use for rewrites.