qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 1/2] qcow2: Allow checking and repairing cor


From: Max Reitz
Subject: Re: [Qemu-devel] [RFC PATCH 1/2] qcow2: Allow checking and repairing corrupted internal snapshots
Date: Mon, 26 Feb 2018 14:40:08 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

On 2018-02-15 17:30, Alberto Garcia wrote:
> The L1 table parameters of internal snapshots are generally not
> checked by QEMU. This patch allows 'qemu-img check' to detect broken
> snapshots and to skip them when doing the refcount consistency check.
> 
> Since without an L1 table we don't have a reliable way to recover the
> data from the snapshot, when 'qemu-img check' runs in repair mode this
> patch simply removes the corrupted snapshots.
> 
> Signed-off-by: Alberto Garcia <address@hidden>
> ---
>  block/qcow2-snapshot.c | 53 
> ++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/qcow2.c          |  7 ++++++-
>  block/qcow2.h          |  2 ++
>  3 files changed, 61 insertions(+), 1 deletion(-)

I think shouldn't delete things in qemu-img check.  I think we do need a
new mode (-r lossy? -r destructive?), although I'd personally even
prefer indeed asking the user before every destructive change.  The only
reason I'm not strongly in favor of this is because we don't have an
infrastructure for that (yet).

> diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
> index cee25f582b..7a36073e3e 100644
> --- a/block/qcow2-snapshot.c
> +++ b/block/qcow2-snapshot.c
> @@ -736,3 +736,56 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
>  
>      return 0;
>  }
> +
> +/* Check the snapshot table and optionally delete the corrupted entries */
> +int qcow2_snapshot_table_check(BlockDriverState *bs, BdrvCheckResult *result,
> +                               BdrvCheckMode fix)
> +{
> +    BDRVQcow2State *s = bs->opaque;
> +    bool keep_checking;
> +    int ret, i;
> +
> +    do {
> +        keep_checking = false;
> +
> +        for (i = 0; i < s->nb_snapshots; i++) {> +            QCowSnapshot 
> *sn = s->snapshots + i;
> +            bool found_corruption = false;
> +
> +            if (offset_into_cluster(s, sn->l1_table_offset)) {
> +                fprintf(stderr, "%s snapshot %s (%s) l1_offset=%#" PRIx64 ": 
> "
> +                        "L1 table is not cluster aligned; snapshot table 
> entry "
> +                        "corrupted\n",
> +                        (fix & BDRV_FIX_ERRORS) ? "Deleting" : "ERROR",
> +                        sn->id_str, sn->name, sn->l1_table_offset);
> +                found_corruption = true;
> +            } else if (sn->l1_size > QCOW_MAX_L1_SIZE / sizeof(uint64_t)) {
> +                fprintf(stderr, "%s snapshot %s (%s) l1_size=%#" PRIx32 ": "
> +                        "L1 table is too large; snapshot table entry 
> corrupted\n",
> +                        (fix & BDRV_FIX_ERRORS) ? "Deleting" : "ERROR",
> +                        sn->id_str, sn->name, sn->l1_size);
> +                found_corruption = true;
> +            }

This code assumes the snapshot table itself has been valid.  Why should
it be when it contains garbage entries?

> +
> +            if (found_corruption) {
> +                result->corruptions++;
> +                sn->l1_size = 0; /* Prevent this L1 table from being used */
> +                if (fix & BDRV_FIX_ERRORS) {
> +                    ret = qcow2_snapshot_delete(bs, sn->id_str, sn->name, 
> NULL);

So calling this is actually very dangerous.  It modifies the snapshot
table which I wouldn't trust is actually just a snapshot table.  It
could intersect any other structure in the qcow2 image.  Yes, we do an
overlap check, but that only protects metadata, and I don't really want
to see an overlap check corruption when repairing the image; especially
since this means you cannot fix the corruption.

I don't quite know myself what to do instead, but I guess my main point
would be:  Before any (potentially) destructive changes are made, the
user should have the chance of still opening the image read-only and
copying all the data off somewhere else.  Which of course again means we
shouldn't prevent the user from opening an image because a snapshot is
broken.

(This would at least allow the user to convert the image to raw, then
invoke -r destructive, and then compare the result to see whether
anything has visibly changed.)

Max

> +                    if (ret < 0) {
> +                        return ret;
> +                    }
> +                    result->corruptions_fixed++;
> +                    /* If we modified the snapshot table we can't keep
> +                     * iterating. We have to start again from the
> +                     * beginning instead. */
> +                    keep_checking = true;
> +                    break;
> +                }
> +            }
> +        }
> +
> +    } while (keep_checking);
> +
> +    return 0;
> +}
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 2c6c33b67c..20e16ea602 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -546,7 +546,12 @@ int qcow2_mark_consistent(BlockDriverState *bs)
>  static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
>                         BdrvCheckMode fix)
>  {
> -    int ret = qcow2_check_refcounts(bs, result, fix);
> +    int ret = qcow2_snapshot_table_check(bs, result, fix);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    ret = qcow2_check_refcounts(bs, result, fix);
>      if (ret < 0) {
>          return ret;
>      }
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 1a84cc77b0..19f14bfc1e 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -637,6 +637,8 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
>                              const char *snapshot_id,
>                              const char *name,
>                              Error **errp);
> +int qcow2_snapshot_table_check(BlockDriverState *bs, BdrvCheckResult *result,
> +                               BdrvCheckMode fix);
>  
>  void qcow2_free_snapshots(BlockDriverState *bs);
>  int qcow2_read_snapshots(BlockDriverState *bs);
> 


Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]