Re: [Qemu-devel] Image probing: how it can be insecure, and what we coul

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Image probing: how it can be insecure, and what we coul

From:	Markus Armbruster
Subject:	Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
Date:	Fri, 07 Nov 2014 16:21:38 +0100
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Kevin Wolf <address@hidden> writes:

> Am 06.11.2014 um 14:57 hat Markus Armbruster geschrieben:
>> Kevin Wolf <address@hidden> writes:
>> 
>> > Am 04.11.2014 um 19:45 hat Markus Armbruster geschrieben:
>> >> I'll try to explain all solutions fairly.  Isn't easy when you're as
>> >> biased towards one of them as I am.  Please bear with me.
>> >> 
>> >> 
>> >> = The trust boundary between image contents and meta-data =
>> >> 
>> >> A disk image consists of image contents and meta-data.
>> >> 
>> >> Example: all of a raw image's contents is image contents.  Leaves just
>> >> file name and attributes for meta-data.
>> >
>> > Better: Leaves only protocol-specific metadata (e.g. file name and
>> > attributes for raw-posix).
>> 
>> Can you give examples for other protocols?
>
> Max already gave the example of NBD always implying raw.
>
> I can also imagine that protocols that take URLs would use the file name
> from the URL - which is not necessarily the end of the string, because
> query options could follow.
>
> Even though I don't think we have it today, it's also not entirely
> unthinkable that some network protocol specifically made for VM images
> (perhaps something like Sheepdog) could be storing the image format as
> metadata on the server. Actually, I think that would make a whole lot of
> sense for them.

Thanks.

>> >> = Insecure usage is easy, secure usage is hard =
>> >> 
>> >> The oldest stratum of user interfaces doesn't let you specify the image
>> >> format.  Use of raw images with these is insecure by design.  These
>> >> interfaces are still recommended for human users.
>> >> 
>> >> Example of insecure usage: -hda foo.img, where foo.img is raw.
>> >> 
>> >> With the next generation of interfaces, specifying the image format is
>> >> optional.  Use of raw images with these is insecure by default.
>> >> 
>> >> Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
>> >> where foo.img is raw.  The -hda above is actually sugar for this.
>> >> 
>> >> Equivalent secure usage: add format=raw.
>> >> 
>> >> Note that specifying just the top image's format is not enough, you also
>> >> have to specify any backing images' formats.  QCOW2 can optionally store
>> >> the backing image format in the image.  The other COW formats can't.
>> >> 
>> >> Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
>> >> with a raw backing file.
>> >
>> > Usually this is mitigated by the fact that backing files are read-only.
>> > Trouble is starting when you use things like commit.
>> 
>> Yes.
>> 
>> >> Equivalent secure usage: Beats me.  Maybe there's a funky -drive
>> >> backing.whatever to specify the backing image's format.
>> >
>> > Yes, you can override the backing file driver (backing.driver=raw should
>> > do the trick). Not really user-friendly, especially with long backing
>> > file chains, but it happens to be there.
>> >
>> > And of course, libvirt should be using it for non-qcow2 or qcow2 without
>> > the backing format header extension (but doesn't yet).
>> 
>> I'm glad it's there.  Too bad libvirt doesn't use it, yet.  Supports my
>> point that secure usage is too hard now.
>
> I don't know whether it's related to being too hard or just too new. I
> won't disagree when you say that it isn't obvious, but the libvirt
> authors are experts and probably know better than the average command
> line user what they should be doing ideally.

I probably know better than average, too.  Yet I wouldn't bet on me
being able to avoid insecure format probing 100%, because I have to ask
for security for every image and backing image separately, and some of
the fancy stuff QEMU can do taxes my poor old mind enough for me not to
trust it not to slip.

Likewise, I'm reluctant to trust even competently written software to
get it right 100%.  It's just too complex.

A global "insecure probing on/off" switch would help.

>> >> I proposed something less radical, namely to keep guessing the image
>> >> format, but base the guess on trusted meta-data only: file name and
>> >> attributes.  Block and character special files are raw.  For other
>> >> files, find the file name extension, and look up the format claiming it.
>> >> 
>> >> PRO: Plugs the hole.
>> >> 
>> >> CON: Breaks existing usage when the new guess differs from the old
>> >>     guess.  Common usage should be fine:
>> >> 
>> >>     * -hda test.qcow2
>> >> 
>> >>       Fine as long as test.qcow2 is really QCOW2 (as it should!), and
>> >>       either specifies a backing format (as it arguably should), or the
>> >>       backing file name is sane.
>> >> 
>> >>     * -hda disk.img
>> >> 
>> >>       Fine as long as disk.img is really a disk image (as it should).
>> >
>> > .img is not as clear, I've seen people using it for other formats. It's
>> > still a disk image, but not a raw one.
>> 
>> Is this usage common?
>
> More common that writing a qcow2 header to your boot sector. ;-)
>
> But seriously, one of the problems in this discussion is that we don't
> have any actual data for more exotic use cases. I can only say that I've
> seen it before, even though that doesn't mean much.
>
> If you want me to guess: Not really common, but probably one of the most
> common corner cases from those that we've been discussing here.

Plausible, given the anecdotical evidence we have.

>> >>     * -hda /dev/mapper/vg0-virtdisk
>> >> 
>> >>       Fine as long as the logical volume is raw.
>> >> 
>> >>     Less common usage can break:
>> >> 
>> >>     * -hda nbd://localhost
>> >> 
>> >>       Socket provides no clue, so no guess.
>> >> 
>> >>     Weird usage can conceivably break hard:
>> >> 
>> >>     * -hdd disk.img
>> >> 
>> >>       Breaks hard when disk.img is actually QCOW2, the guest boots
>> >>       anyway from another drive, then proceeds to overwrite this one.
>> >> 
>> >> Mitigation: lengthy transition period where we warn "this usage is
>> >> insecure, and we'll eventually break it; here's a hint on secure usage".
>> >> 
>> >> CON: We delay plugging the hole one more time.  But at least we no
>> >> longer expose our users to it silently.
>> >
>> > CON: Relies on metadata that is protocol-specific. Each protocol that
>> >      should support probing needs extra code. Essentially means that
>> >      probing will be disabled on anything except raw-posix (and if we're
>> >      lucky enough that someone pays attention during review, raw-win32)
>> 
>> Terminology: I use "probing" and "guessing from trusted meta-data".  The
>> former is for probing raw image contents.  The latter may only examine
>> trusted meta-data.
>> 
>> I suspect you mean "each protocol that should support guessing from
>> trusted meta-data needs extra code".  Do you?
>
> Yes.
>
> I'll try to remember using "probing" and "guessing" with your meaning.
>
>> >> == Prevent "bad" guest writes ==
>> >> 
>> >> Again, several variations, but this time, only the last one is serious,
>> >> the others are just for illustration.
>> >> 
>> >> Fail guest writes to those parts of the image that probing may examine
>> >> Can fail only writes to the first few sectors (at worst) of raw images.
>> >> 
>> >> PRO: Plugs the hole.
>
> PRO: Fix is small, local to raw block driver, and obviously complete (in
> the sense that it catches every usage of the raw format).

Point taken.

> PRO: Only affects images actually opened as raw; can't possibly break
> non-raw use cases

Related, but point taken.  Both apply to the variations, too.

> CON: Can affect raw images even with an explicit format=raw

For me, this one is covered by my CON:

>> >> CON: The virtual hardware is defective.  Breaks common guest software
>> >> that writes to the first few sectors, such as boot loaders and
>> >> partitioning tools.  Breaks guest software using the whole device, which
>> >> isn't common, but certainly not unheard of.
>> >> 
>> >> Variation: fail only writes of patterns that actually can make probing
>> >> guess something other than raw.
>> >> 
>> >> PRO: Still plugs the hole.
>> >> 
>> >> CON: Except when you upgrade to a version that recognizes more patterns.
>> >> 
>> >> CON: The virtual hardware is still defective, but the defects are
>> >> minimized.  We can hope that partition tables, boot sectors and such
>> >> won't match the patterns, so common guest software hopefully works.
>> >> Guest software using the whole device still breaks, only now it breaks
>> >> later rather than sooner.
>> >> 
>> >> Variation: fail writes only on *probed* raw images.
>
> PRO: Plugs the hole in the most common case (user relies consistently on
> probing)

This is my first CON with a different baseline.  A partial fix is better
than nothing, so if "nothing" is the baseline, file under PRO.  It's
worse than a full fix, so if that's the baseline (and it consistently is
in my memo), file it under CON.

In short, I don't disagree with you, I just wrote it up differently, and
possibly suboptimally.

> PRO: Users explicitly specifying format=raw can't possibly be affected
> any more, just like non-raw formats in the basic variant.

This is my second CON with a different baseline: "defective in some
configurations" is better than "defective in all configurations", but
worse than "not defective".

>> >> CON: Doesn't fully plug the hole: mixing probed usage (user doesn't
>> >> specify format) with non-probed usage (user specifies format) remains
>> >> insecure.  The guest's write succeeds in non-probed usage, and the guest
>> >> escapes isolation in the next probed usage.
>> >> 
>> >> CON: The virtual hardware is still defective, but it now comes with a
>> >> "defective on/off" switch, factory default "defective on".  We could add
>> >> a warning to guide users to switch defective off but then that warning
>> >> would annoy people who don't care to switch it off (sometimes with
>> >> reason), and we can't have that.  So we leave users who would care if
>> >> they knew in the dark.
>> 
>> Replace by
>> 
>> CON: The virtual hardware is still defective, but it now comes with a
>> "defective on/off" switch, factory default "defective on".  We could add
>> a warning to guide users to switch defective off.
>> 
>> >> The two variations can be combined.  This is Kevin's proposal.
>> >> 
>> >> CON: Doesn't fully plug the hole: union of both variations' flaws.
>> >> 
>> >> CON: The virtual hardware is still defective: interesection of both
>> >> variations' defects.
>
> PRO: Union of both variations' advantages wrt false positives, which are
> minimised as much as possible: Explicit format=... or usage of non-raw
> formats doesn't trigger the check. This limits it to cases that are
> already broken today (even though the failure mode changes - can
> possibly even be called a bonus bug fix).

This is worth spelling out, thanks.

> PRO: Still plugs the hole in the most common case (user relies
> consistently on probing)

This is my first CON with a different baseline.  Again, I'm not
disagreeing, just explaining the thinking behind my writing.

> PRO: The fix is still small, local to raw block driver, and obviously
> complete (in the sense that it catches every usage of the raw format).
>
>> > I like how you took care to avoid finding any PROs. :-)
>> 
>> Come on, this line 281 of 327, cut me some slack :)
>> 
>> > I'll leave commenting on this section to others for now. I feel I have
>> > already said enough about it in the other threads, and defending it here
>> > at this point wouldn't help the discussion.
>> 
>> The purpose of this document is to summarize our thoughts.  Need yours
>> to achieve it.  Can you give me your concise PROs?
>
> Added them above.

Thanks!

Are you ready to write up your conclusion, similar to how I did?

Before you do, let me refine / vary the hybrid approach I mentioned
under " Don't guess format from untrusted image contents" some.  I think
I can trace some inspiration to Max here.

Say we use trusted meta-data to compute a set of admissible formats, and
if the set has multiple members, use probing to pick one of them.

Example: foo.qcow2 -> { qcow2 }, no probing

Example: foo.qcow -> { qcow, qcow2 }, probe to pick one

Likewise for foo.vhdx and foo.vhd.

To ensure this actually knocks out condition (b), all members of the set
must have the image contents used by their members' probes within their
trust boundary.

Example: { qcow, qcow2 } is fine, because both formats have a header,
and each header covers the bytes either probe examines.

Example: { raw } is fine, because there is no probing.

Counterexample: { raw, qcow2 } is not possible, because qcow2 probes
outside raw's trusted metadata.

Note my careful wording "contents used by [other] probes".  Right now we
simply assume that the first 2048 bytes can be trusted.  This is not
obviously the case!  If I remember correctly, you proposed to cut it to
512 bytes, which feels a lot safer, since any sane format probably
aligns (untrusted) image contents to at least a 512 byte boundary, but
is still theoretically unsound.

PRO and CON like my proposal to guess from trusted meta-data only.  The
difference is in what existing usage exactly it breaks.

Can be combined with "refuse to use a format without an explicit format=
when any other non-raw format probe accepts", just like everything else
proposed so far.

>                   Of course, you could also add the CONs of the other
> proposal as PROs here, like "works with any filename and even protocols
> that don't have anything filename-like", but I think they are already
> covered well enough in the other section.

Yes, one alternative's PRO is often another alternative's CON.  Covering
each issue in all places explicitly could perhaps be clearer.  Instead,
I chose to pick a baseline, and PRO/CON off that, for (relative)
brevity.

>> > And yes, it also doesn't help when you accidentally type format=qcow2
>> > instead of format=raw. When you have images of both types, things like
>> > this happen with manual typing.
>> 
>> I consider mixing probed and non-probed usage a more plausible bad habit
>> than accidental use of format=qcow2, because the former works just fine,
>> but the latter fails (unless your guest has "helpfully" written a QCOW2
>> header).
>
> I'll admit that mixing probed and non-probed usage might be more common
> (though I think that I for one am pretty consistent in not using
> format=... with my trusted images - why would I?), but I consider both
> cases plausible. I've mistyped enough -f options for qemu-img. And that
> it fails doesn't prevent the typo.

My point wasn't that accidental misuse doesn't matter, only that
mistakes that have no immediately visible consequences can easily become
bad habits.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, (continued)
- Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Gerd Hoffmann, 2014/11/05
- Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Eric Blake, 2014/11/05
  - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Markus Armbruster, 2014/11/06
- Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Kevin Wolf, 2014/11/05
  - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Markus Armbruster, 2014/11/06
    - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Eric Blake, 2014/11/06
    - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Jeff Cody, 2014/11/06
    - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Jeff Cody, 2014/11/06
    - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Kevin Wolf, 2014/11/06
    - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Markus Armbruster <=
    - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Jeff Cody, 2014/11/07
    - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Markus Armbruster, 2014/11/10
    - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Kevin Wolf, 2014/11/10
    - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Markus Armbruster, 2014/11/10
    - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Jeff Cody, 2014/11/10
    - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Markus Armbruster, 2014/11/11
    - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Markus Armbruster, 2014/11/10
- Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Dr. David Alan Gilbert, 2014/11/05
  - Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it, Markus Armbruster, 2014/11/06

Prev by Date: Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
Next by Date: Re: [Qemu-devel] What "opaque" stand for?
Previous by thread: Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
Next by thread: Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it
Index(es):
- Date
- Thread