is there a reason the direct definition would be slow, if we cache the prior harmonic number to calculate the next?
Not least because for modestly-sized target sums the number of terms you need to sum is more than is actually feasible. For instance, if you're interested in approximating a sum of 100 then you need something on the order of exp(100) or about 10^43 terms. You can't just say "well, it's not slow to add up 10^43 numbers, because it's quick if you've already done the first 10^43-1 of them".
You could compute in the reverse direction, starting from 1/n instead of starting from 1, this would produce a stable floating point sum but this method is slow.
Edit: Of course, for very large n, 1/n becomes unrepresentable in floating point.
For n > 1, there isn’t a lower bound. None of the numbers are integers again (https://en.wikipedia.org/wiki/Harmonic_series_(mathematics)#...), and because the difference between successive partial sums goes to zero and the series grows to arbitrary values, you’re bound to find a difference smaller than 1/(2n) somewhere beyond n.