qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] Migration+TLS: Fix crash due to double cleanup


From: Daniel P . Berrangé
Subject: Re: [Qemu-devel] [PATCH] Migration+TLS: Fix crash due to double cleanup
Date: Tue, 1 May 2018 12:21:18 +0100
User-agent: Mutt/1.9.2 (2017-12-15)

On Tue, May 01, 2018 at 11:57:25AM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (address@hidden) wrote:
> > On Tue, May 01, 2018 at 11:00:35AM +0100, Daniel P. Berrangé wrote:
> > > On Mon, Apr 30, 2018 at 07:59:43PM +0100, Dr. David Alan Gilbert (git) 
> > > wrote:
> > > > From: "Dr. David Alan Gilbert" <address@hidden>
> > > > 
> > > > During a TLS connect we see:
> > > >   migration_channel_connect calls
> > > >   migration_tls_channel_connect
> > > >   (calls after TLS setup)
> > > >   migration_channel_connect
> > > > 
> > > > My previous error handling fix made migration_channel_connect
> > > > call migrate_fd_connect in all cases; unfortunately the above
> > > > means it gets called twice and crashes doing double cleanup.
> > > > 
> > > > Fixes: 688a3dcba98
> > > 
> > > This fixes the crash, but we're still seeing error messages duplicated
> > > 
> > > (qemu) migrate_set_parameter tls-creds tls0
> > > (qemu) migrate tcp:localhost:9000
> > > qemu-system-x86_64: Certificate does not match the hostname localhost
> > > qemu-system-x86_64: Certificate does not match the hostname localhost
> > > 
> > > git bisect points to 688a3dcba98 as the cause of these double
> > > errors still.
> 
> Can you give me the full sequence to trigger that; I did try yesterday
> and have failed to get that error out of the TLS code.

Essentially I follow this:

  https://qemu.weilnetz.de/doc/qemu-doc.html#network_005ftls

when generating the server-cert.pem file I use this input template:

$ cat server.info
organization = My Org
cn = 127.0.0.1
# Commented out to force failure with 'localhost' name
#dns_name = "localhost"
dns_name = "localhost.localdomain"
ip_address = 127.0.0.1
ip_address = ::1
tls_www_server
encryption_key
signing_key

$ certtool --generate-certificate \
   --load-ca-certificate ca-cert.pem \
   --load-ca-privkey ca-key.pem \
   --load-privkey server-key.pem \
   --template server.info \
   --outfile server-cert.pem

On the server (mig target) side I run:

$ ./x86_64-softmmu/qemu-system-x86_64 \
    -object 
tls-creds-x509,id=tls0,endpoint=server,dir=/home/berrange/security/qemutls,verify-peer=no
 \
    -incoming defer -monitor stdio -display none
QEMU 2.11.50 monitor - type 'help' for more information
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate_incoming tcp:localhost:9000

Notice I've set  verify-peer=no on the server, since I've not bothered to
issue any client-cert.pem on my dev setup here.

On the client (mig source) side I run:

$  ./x86_64-softmmu/qemu-system-x86_64 \
    -object 
tls-creds-x509,id=tls0,endpoint=client,dir=/home/berrange/security/qemutls\
    -monitor stdio  -display none 
QEMU 2.11.50 monitor - type 'help' for more information
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate tcp:localhost:9000
qemu-system-x86_64: Certificate does not match the hostname localhost
qemu-system-x86_64: Certificate does not match the hostname localhost
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off 
compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: 
off return-path: off pause-before-switchover: off x-multifd: off 
Migration status: failed (Certificate does not match the hostname localhost)
total time: 0 milliseconds


Notice that when generating the server certificate I did *not* include
"localhost" as a DNS name, nor in the Common Name.

So when I connect with a URI of "tcp:localhost:9000" we correctly see
that the "localhost" name doesn't match any DNS name or common name
in the server cert.

If I instead set the tls-hostname parameter it will succeed:

(qemu) migrate_set_parameter tls-hostname 127.0.0.1
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate tcp:localhost:9000
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off 
compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: 
off return-path: off pause-before-switchover: off x-multifd: off 
Migration status: completed
total time: 131 milliseconds
downtime: 5 milliseconds
setup: 1 milliseconds
transferred ram: 1369 kbytes
throughput: 105.76 mbps
remaining ram: 0 kbytes
total ram: 148296 kbytes
duplicate: 36817 pages
skipped: 0 pages
normal: 261 pages
normal bytes: 1044 kbytes
dirty sync count: 2
page size: 4 kbytes


Or likewise if I used "tcp:127.0.0.1:9000"  it would succeed too,
because "127.0.0.1" is listed as a IP address in the server cert.

>
> > The second stack trace is the error reporting context that I added 
> > originally
> > in
> > 
> >   commit d59ce6f34434bf47a9b26138c908650bf9a24be1
> >   Author: Daniel P. Berrange <address@hidden>
> >   Date:   Wed Apr 27 11:05:00 2016 +0100
> > 
> >     migration: add reporting of errors for outgoing migration
> > 
> > 
> > So the first stack trace is the new duplicate.
> >
> > Which error reporting context is "better" though, I don't know ?
> 
> 
> OK, I see why I didn't see this - I almost always use migrate -d  and
> that timer only happens without the -d.

Oh, yes, I forget to mention I left migration in the foreground. If you
background it, then my original patch would only show the error message
when you did "info migrate"

> > My patch was based on the view that, although alot of code uses 
> > error_report,
> > long term all migration would eventually need to be able to filter an
> > 'Error *errp' back up the stack, so that we can pass it back to QMP / HMP 
> > via
> > 'info migrate' / query-migrate. So I decided to leave the error_report_err
> > call to the hmp.c code, as long term that's the only place that would need
> > to print to the console.
> 
> I wanted it to land in the /var/log/libvirt/qemu/whatever so we could
> see what happened.

In general QMP commands are expected to send errors back to the QMP client,
not write them to stderr. Of course we have substantial legacy code so some
commands do end up only using error_report, but the general expectation with
QMP is that we'd eliminate error_report in codepaths triggered from monitor
commands. The only reason we print to stderr with QMP is that it would be
impossible to diagnose failures for code that is not converted to "Error *"
yet. Places where we have converted to "Error *" don't need to use stderr.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]