SPI peripheral library. At this speed SPI requires ~60% CPU time at -Og
optimisation level. This could be further improved by trimming down the
SPI interrupt. But...
Speed is now limited by Downstream's single-packet-per-URB restriction,
to about 460kB/s. USB Middleware does not implement TX FIFO empty
interrupt, so a bit of work is required here.
Although I am not entirely convinced this is necessary, as the SPI data
stall issue only appeared with optimisation off (-O0). Perhaps re-visit
this if Upstream needs more free CPU time later...