correctly now, either because SPI is in 16-bit mode, or because I found
all the other bugs!
Doubled SPI baudrate to 10.5Mbps. Transfer speed now limited (again) by
Downstream's lack of FIFO buffering in the USB host controller.
Also disabled DMA transaction half-complete interrupt in
stm32f4xx_hal_dma.c, as it wasn't doing anything useful.
Although I am not entirely convinced this is necessary, as the SPI data
stall issue only appeared with optimisation off (-O0). Perhaps re-visit
this if Upstream needs more free CPU time later...