Am 27.04.2023 um 15:22 hat zhoushl geschrieben:Hi Kevin: I’m sorry for missing commit message, next time I will be careful. When the application in guest vm execute fsync, qemu will execute fsync too. But when aio + dio is enabled, pagecache is bypassed
As far as I can tell, you don't need AIO for that, only DIO.and we could assure the data is on disk
No.(at least on the disk cache),
In some cases, for a local file system on a physical disk, yes. Butthis is not enough. The promise when a guest application calls fsync()is not that the data is in a potentially volatile disk cache, but ondisk.If the image is on a network file system, there are other options wherethe data could still be cached, like the page cache of the server.
Just as you mentioned, when the image is on network file system, the fsync operation still can’t assure the data is really flushed to disk.
so there is no needto sync anymore. For example, we could execute the following python script in vm: #!/usr/bin/python import os
fo = os.open(“test.txt”, os.O_RDWR|os.O_CREAT) while True: os.write(fo, “123\n”) os.fsync(fo)
os.closed(fo)
In this case, each write will take an fsync operation, which will search the dirty page in pagecache, force flushing the metadata and data into disk, which is often useless and waste IO resource and maybe will cause write amplification in filesystem.
Yes, if you request an fsync(), you get an fsync(). This is necessary to fulfill the guarantes that fsync() makes. If a guest application doesn't want fsync() semantics, it shouldn't call it.
In this extreme scenario(the fsync python script), could we do something to avoid the write amplification in filesystem? Sometimes the vm user don’t have a clear understanding about the backend storage and we don’t know what’s kind of application will be run in vm, but in qemu we could filter or ignore some improper operation.
QEMU has an option cache.no-flush=on for block backends (cache=unsafe contains this), which will skip flushes. This is unsafe and if your host crashes, you may get a corrupted file system in the guest. But at the risk of losing your filesystem, it does save the overhead of these operations that you want to avoid.
When AIO is enabled, cache mode should be set to none or direct sync. Even call fsync() after each IO, the data in disk cache still will be missing when host crash.
|