[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of
|
From: |
Markus Armbruster |
|
Subject: |
Re: [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno |
|
Date: |
Wed, 04 Oct 2023 13:12:00 +0200 |
|
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) |
Fabiano Rosas <farosas@suse.de> writes:
> Markus Armbruster <armbru@redhat.com> writes:
>
>> We use errno after calling Libibverbs functions that are not
>> documented to set errno (manual page does not mention errno), or where
>> the documentation is unclear ("returns [...] the value of errno on
>> failure"). While this could be read as "sets errno and returns it",
>> a glance at the source code[*] kills that hope:
>>
>> static inline int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr
>> *wr,
>> struct ibv_send_wr **bad_wr)
>> {
>> return qp->context->ops.post_send(qp, wr, bad_wr);
>> }
>>
>> The callback can be
>>
>> static int mana_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
>> struct ibv_send_wr **bad)
>> {
>> /* This version of driver supports RAW QP only.
>> * Posting WR is done directly in the application.
>> */
>> return EOPNOTSUPP;
>> }
>>
>> Neither of them touches errno.
>>
>> One of these errno uses is easy to fix, so do that now. Several more
>> will go away later in the series; add temporary FIXME commments.
>> Three will remain; add TODO comments. TODO, not FIXME, because the
>> bug might be in Libibverbs documentation.
>>
>> [*] https://github.com/linux-rdma/rdma-core.git
>> commit 55fa316b4b18f258d8ac1ceb4aa5a7a35b094dcf
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>> migration/rdma.c | 45 +++++++++++++++++++++++++++++++++++++++------
>> 1 file changed, 39 insertions(+), 6 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 28097ce604..bba8c99fa9 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -853,6 +853,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct
>> ibv_context *verbs, Error **errp)
>>
>> for (x = 0; x < num_devices; x++) {
>> verbs = ibv_open_device(dev_list[x]);
>> + /*
>> + * ibv_open_device() is not documented to set errno. If
>> + * it does, it's somebody else's doc bug. If it doesn't,
>> + * the use of errno below is wrong.
>> + * TODO Find out whether ibv_open_device() sets errno.
>> + */
>> if (!verbs) {
>> if (errno == EPERM) {
>> continue;
>
> This function can call into glibc, so it's not unreasonable to expect
> errno to be set at some point. We're not relying on errno to be set,
> just taking an action if it happens to be.
errno may well be set on some failures. But it needs to be set on *all*
failures to be reliable. If it's not, then its value on such failures
comes from some unrelated, prior errno-setting failure.
> I don't think someone would just decide to handle EPERM at this point
> for no reason. Specially since the manual makes no mention to
> errno. This was probably introduced after someone got bit by it.
>
> ... indeed the commit 5b61d57521 ("rdma: Fix qemu crash when IPv6
> address is used for migration") introduced this to fix a crash.
I don't doubt the error recovery added there works in the case described
by the commit message. I suspect it can break on unrelated failures.
- Re: [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno,
Markus Armbruster <=