and one has a complete hardware solution?
Congrats, now you have capture synchronized to pulse start "providing the sub-microsecond timing needed for ultrasound acquisition".
If your microcontroller has a parallel port interface, you would use the clock setup you described. This works, I've done it before, but there was very little CPU left to do anything useful with the data.
It's neat that they used the PIO, its demonstrating how that peripheral fills a niche where you things that might have been impossible without an FPGA, suddenly become doable on a microcontroller.