Linux itself had some issues with _its_ DMA API recently: people have
been writing drivers using the Linux DMA API making broken assumptions
that happen to work on x86, and work some of the time on other
architectures. These things don't show up during driver tests, and
are very difficult to track down later. I suspect "overwrites the
remaining buffer with randomness/zeros but only under bounce-buffer
conditions" will be similarly unlikely to trigger, and very difficult
to track down if it causes a problem anywhere.
The recent solution in Linux is to add some debugging options which
check it's used correctly, even on x86.
Here's a final thought, to do with performance:
e1000, configured for jumbo frames, receives logs of small packets,
and the DMA subsystem has to bounce-copy the data for some reason (Ian
suggested maybe always doing that with Xen for DMA to guest RAM?)
Without a length passed to unmap, won't it copy 65536 bytes per packet
(most from cold cache) because that's the amount set up for DMA to
receive from /dev/tap, instead of 256 or 1514 bytes per packet which
is their actual size?