[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH 2/5] system/memory: support unaligned access
From: |
Tomoyuki HIROSE |
Subject: |
Re: [RFC PATCH 2/5] system/memory: support unaligned access |
Date: |
Wed, 11 Dec 2024 18:35:57 +0900 |
User-agent: |
Mozilla Thunderbird |
Sorry for late reply.
On 2024/12/07 1:42, Peter Xu wrote:
On Fri, Dec 06, 2024 at 05:31:33PM +0900, Tomoyuki HIROSE wrote:
In this email, I explain what this patch set will resolve and an
overview of this patch set. I will respond to your specific code
review comments in a separate email.
Yes, that's OK.
On 2024/12/03 6:23, Peter Xu wrote:
On Fri, Nov 08, 2024 at 12:29:46PM +0900, Tomoyuki HIROSE wrote:
The previous code ignored 'impl.unaligned' and handled unaligned
accesses as is. But this implementation could not emulate specific
registers of some devices that allow unaligned access such as xHCI
Host Controller Capability Registers.
I have some comment that can be naive, please bare with me..
Firstly, could you provide an example in the commit message, of what would
start working after this patch?
Sorry, I'll describe what will start working in the next version of
this patch set. I'll also provide an example here. After applying
this patch set, a read(addr=0x2, size=2) in the xHCI Host Controller
Capability Registers region will work correctly. For example, the read
result will return 0x0110 (version 1.1.0). Previously, a
read(addr=0x2, size=2) in the Capability Register region would return
0, which is incorrect. According to the xHCI specification, the
Capability Register region does not prohibit accesses of any size or
unaligned accesses.
Thanks for the context, Tomoyuki.
I assume it's about xhci_cap_ops then. If you agree we can also mention
xhci_cap_ops when dscribing it, so readers can easily reference the MR
attributes from the code alongside with understanding the use case.
Does it mean that it could also work if xhci_cap_ops.impl.min_access_size
can be changed to 2 (together with additional xhci_cap_read/write support)?
Note that I'm not saying it must do so even if it would work for xHCI, but
if the memory API change is only for one device, then it can still be
discussed about which option would be better on changing the device or the
core.
Meanwhile, if there's more use cases on the impl.unaligned, it'll be nice
to share together when describing the issue. That will be very persuasive
input that a generic solution is needed.
OK, I understand. I will try to describe 'xhci_cap_ops' and related topics.
Currently, the actual 'xhci_cap_ops' code is as follows:
```
static const MemoryRegionOps xhci_cap_ops = {
.read = xhci_cap_read,
.write = xhci_cap_write,
.valid.min_access_size = 1,
.valid.max_access_size = 4,
.impl.min_access_size = 4,
.impl.max_access_size = 4,
.endianness = DEVICE_LITTLE_ENDIAN,
};
```
According to the above code, the guest can access this MemoryRegion
with 1-4 bytes. 'valid.unaligned' is also not explicitly defined, so
it is treated as 'false'. This means the guest can access this MR with
1-4 bytes, as long as the access is aligned. However, the xHCI
specification does not prohibit unaligned accesses.
Simply adding '.valid.unaligned = true' will not resolve this problem
because 'impl.unaligned' is also 'false'. In this situation, where
'valid.unaligned' is 'true' but 'impl.unaligned' is 'false', we need
to emulate unaligned accesses by splitting them into multiple aligned
accesses.
An alternative solution would be to fix 'xhci_cap_{read,write}',
update '.impl.min_access_size = 1', and set '.impl.unaligned = true'
to allow the guest to perform unaligned accesses with 1-4 bytes. With
this solution, we wouldn't need to modify core memory code.
However, applying this approach throughout the QEMU codebase would
increase the complexity of device implementations. If a device allows
unaligned guest access to its register region, the device implementer
would needs to handle unaligned accesses explicitly. Additionally,
the distinction between 'valid' and 'impl' would become almost
meaningless, making it unclear why they are separated.
"Ideally", we could consider one of the following changes:
1. Introduce an emulation mechanism for unaligned accesses using
multiple aligned accesses.
2. Remove either 'valid' or 'impl' and unify these functionality.
Solution 2 would require extensive changes to the codebase and memory
API, making it impractical. Solution 1 seems to align with QEMU's
original intentions. Actually, there is a comment in 'memory.c' that
states:
`/* FIXME: support unaligned access? */`
This patch set implements solution 1. If there is a better way to
resolve these issues, I would greatly appreciate your suggestions.
Thanks,
Tomoyuki HIROSE
IIUC things like read(addr=0x2, size=8) should already working before but
it'll be cut into 4 times read() over 2 bytes for unaligned=false, am I
right?
Yes, I also think so. I think the operation read(addr=0x2, size=8) in
a MemoryRegion with impl.unaligned==false should be split into
multiple aligned read() operations. The access size should depends on
the region's 'impl.max_access_size' and 'impl.min_access_size'
. Actually, the comments in 'include/exec/memory.h' seem to confirm
this behavior:
```
/* If true, unaligned accesses are supported. Otherwise all accesses
* are converted to (possibly multiple) naturally aligned accesses.
*/
bool unaligned;
```
MemoryRegionOps struct in the MemoryRegion has two members, 'valid'
and 'impl' . I think 'valid' determines the behavior of the
MemoryRegion exposed to the guest, and 'impl' determines the behavior
of the MemoryRegion exposed to the QEMU memory region manager.
Consider the situation where we have a MemoryRegion with the following
parameters:
```
MemoryRegion mr = (MemoryRegion){
//...
.ops = (MemoryRegionOps){
//...
.read = ops_read_function;
.write = ops_write_function;
.valid.min_access_size = 4;
.valid.max_access_size = 4;
.valid.unaligned = true;
.impl.min_access_size = 2;
.impl.max_access_size = 2;
.impl.unaligned = false;
};
};
```
With this MemoryRegion 'mr', the guest can read(addr=0x1, size=4)
because 'valid.unaligned' is true. But 'impl.unaligned' is false, so
'mr.ops->read()' function does not support addr=0x1, which is
unaligned. In this situation, we need to convert the unaligned access
to multiple aligned accesses, such as:
- mr.ops->read(addr=0x0, size=2)
- mr.ops->read(addr=0x2, size=2)
- mr.ops->read(addr=0x4, size=2)
After that, we should return a result of read(addr=0x1, size=4) from
above mr.ops->read() results, I think.
Yes. I agree with your analysis and understanding.
Thanks,