[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-ddrescue] Alignment of I/O buffer for direct disk access could fail
From: |
Heikki Tauriainen |
Subject: |
[Bug-ddrescue] Alignment of I/O buffer for direct disk access could fail on some memory addresses? |
Date: |
Sat, 02 Jan 2016 02:27:32 +0200 |
Hi,
First, thanks for this very useful tool! I've been using it on my
Linux system with great success.
However, for some reason I've had trouble in trying to use the direct
disk access mode (-d), just to check whether it would slow down or
speed up data recovery in my case: when running ddrescue in two phases
as suggested in Example 1 of the "Direct disc access" chapter of the
manual,
- the first invocation of ddrescue (with the -n switch) runs as
expected (taking some time to work through the various phases in
trying to rescue data, and finally leaving some sectors for the
scraping phase), but
- the second invocation of ddrescue (with the -d switch, and, of
course, the same input location, output location, and mapfile), will
complete almost immediately (in less than 1 second), seemingly
without trying to access the input drive at all, even though the
default sector size (512) should be correct for my case (I'm using
a hard disk as the input source). Increasing the retry count for
this phase shows that the input position will move back and forth
through the retried sectors (as expected), but it seems as if all
reads just fail instantly; the input drive makes no apparent
signals of being accessed while ddrescue is running.
To try and understand what was happening, I ran ddrescue (with the -d
switch) under strace, and got output with a lot of lines of the
following kind repeated (with just the seek position changing):
...
_llseek(3, 4679865344, [4679865344], SEEK_SET) = 0
read(3, 0xb8186ef0, 512) = -1 EINVAL (Invalid argument)
access("/dev/sdc1", F_OK) = 0
...
>From this output it appears that all calls to read(2) failed in direct
disk access mode. According to the read(2) manual page, the error code
means that
EINVAL fd is attached to an object which is unsuitable for
reading; or the file was opened with the O_DIRECT flag,
and either the address specified in buf, the value
specified in count, or the current file offset is not
suitably aligned.
In this case, both the current file offset (4679865344) and the maximum
number of bytes to read (512) are multiples of the sector size (512),
but the buffer address (0xb8186ef0) is not. This could be a possible
cause for the failures.
Here's my analysis about the possible cause of the buffer misalignment
which I believe could be due to a possible bug in computing with memory
addresses in the buffer alignment code:
According to the ddrescue manual, "Ddrescue aligns its I/O buffer to
the sector size so that it can be used for direct disc access or to
read from raw devices". From the v1.20 source code, it looks like this
is supposed to happen in the Mapbook class constructor (in mapbook.cc),
where a pointer to the buffer is stored in the "iobuf_" member, which
appears to eventually get passed to the "readblock" routine in io.cc
(the only place that I could find where failing calls to read(2) could
originate).
I believe that aligning the "iobuf_" pointer to the sector size could
indeed fail in some occasions due to casting it to a signed long in the
assignment
const int disp = alignment - ( reinterpret_cast<long> (iobuf_) %
alignment );
in the Mapbook constructor if the cast operation to a signed type
happens to produce a negative number out of the address of the buffer.
(In this case, if "alignment" is positive, "disp" will exceed the
original value of "alignment", so the code that follows will skip
making any adjustment to "iobuf_".)
While I can't prove that this is the only reason which could have
prevented direct disk access from working in my case (*), I can just
say that at least on the 32-bit platform that I used, where
sizeof(long)=4, using page size 0x1000 (4096), sector size 0x200 (512),
and buffer address 0xb8186ef0 (== -1206358288 as a 32-bit signed
decimal long integer) *will* trigger the above scenario where the
Mapbook constructor will not make any adjustment to "iobuf_" after
allocating the buffer.
(*) In my case, I ran into the problem consistently with direct disk access
when running the version of ddrescue (1.19) installed via my Linux
distribution's package manager, until, after looking at the strace output, I
decided to compile the same version of the tool from the official sources,
adding some debug prints to display the address of the I/O buffer after
allocation - it just happened that the locally compiled binary allocated the
buffer into an address that didn't trigger any problems with the signed casts,
so direct disk access finally worked. While it could be that the pre-built
binary installed via the package manager could have bugs of its own which
contributed to the original occurrence of the problem, I nevertheless decided
to report this observation about the potential dangerous use of a signed cast
in ddrescue source code just in case it hasn't been noticed before (the buffer
alignment code hasn't changed in v1.20).
----
To protect the code from a buffer alignment failure whose occurrence
depends on the buffer's address (which is not guaranteed to remain
consistent between invocations of the program), I think it would be
safer to compute "disp" using unsigned arithmetic on an integral type
large enough to hold the address of the buffer. Alternatively, the
program could (as a sanity check) abort with an error in case the
buffer cannot be adjusted to a multiple of the sector size for direct
disk access, or give more debug information about the failing read(2)
requests, instead of failing silently in this case.
Best regards,
Heikki Tauriainen
- [Bug-ddrescue] Alignment of I/O buffer for direct disk access could fail on some memory addresses?,
Heikki Tauriainen <=