Really? Can you provide an documentation to support this claim?
My impression is that there is no difference between translated and
untranslated devices, and the translation is explicitly disabled by software.
ATS allows an I/O device to request a translation from the IOMMU.
The device can then cache that translation and use the translated address
in a PCIe memory transaction. PCIe uses a couple of previously reserved
bits in the transaction layer packet header to describe the address
type for memory transactions. The default (00) maps to legacy PCIe and
describes the memory address as untranslated. This is the normal mode,
and could then incur a translation if an IOMMU is present and programmed
w/ page tables, etc. as is passes through the host bridge.
Another type is simply a transaction requesting a translation. This is
new, and allows a device to request (and cache) a translation from the
IOMMU for subsequent use.
The third type is a memory transaction tagged as already translated.
This is the type of transaction an ATS capable I/O device will generate
when it was able to translate the memory address from its own cache.
Of course, there's also an invalidation request that the IOMMU can send
to ATS capable I/O devices to invalidate the cached translation.