[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions
From: |
Eduardo Habkost |
Subject: |
Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions |
Date: |
Fri, 18 Jan 2013 11:26:46 -0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Fri, Jan 18, 2013 at 11:01:14AM +0100, Markus Armbruster wrote:
> Eduardo Habkost <address@hidden> writes:
>
> > There are lots of duplicate parsing code using strto*() in QEMU, and
> > most of that code is broken in one way or another. Even the visitors
> > code have duplicate integer parsing code[1]. This introduces functions
> > to help parsing unsigned int values: parse_uint() and parse_uint_full().
> >
> > Parsing functions for signed ints and floats will be submitted later.
> >
> > parse_uint_full() has all the checks made by opts_type_uint64() at
> > opts-visitor.c:
> >
> > - Check for NULL (returns -EINVAL)
> > - Check for negative numbers (returns -ERANGE)
> > - Check for empty string (returns -EINVAL)
> > - Check for overflow or other errno values set by strtoll() (returns
> > -errno)
> > - Check for end of string (reject invalid characters after number)
> > (returns -EINVAL)
> >
> > parse_uint() does everything above except checking for the end of the
> > string, so callers can continue parsing the remainder of string after
> > the number.
> >
> > Unit tests included.
> >
> > [1] string-input-visitor.c:parse_int() could use the same parsing code
> > used by opts-visitor.c:opts_type_int(), instead of duplicating that
> > logic.
> [...]
> > diff --git a/util/cutils.c b/util/cutils.c
> > index 80bb1dc..7f09251 100644
> > --- a/util/cutils.c
> > +++ b/util/cutils.c
> > @@ -270,6 +270,85 @@ int64_t strtosz(const char *nptr, char **end)
> > return strtosz_suffix(nptr, end, STRTOSZ_DEFSUFFIX_MB);
> > }
> >
> > +/* Try to parse an unsigned integer
> > + *
> > + * Error checks done by the function:
> > + * - NULL pointer will return -EINVAL.
> > + * - Empty strings will return -EINVAL.
>
> Same for strings containing only whitespace.
Oh, you're right.
>
> > + * - Overflow errors or other errno values set by strtoull() will
>
> Extra space before 'set'.
Oops.
>
> > + * return -errno (-ERANGE in case of overflow).
> > + * - Differently from strtoull(), values starting with a minus sign are
> > + * rejected (returning -ERANGE).
> > + *
> > + * Sets endptr to point to the first invalid character.
>
> Mention that unlike strtol() & friends, endptr must not be null?
> Probably not necessary.
I believe it is implicit if I document it as "Sets *endptr". But worth
documenting as it is different from strtoull() behavior.
>
> The two ERANGE cases treat endptr differently: it's either set to point
> just behind the out-of-range number, or to point to the beginning of the
> out-of-range number. Ugh.
This should be fixed in the newer version I sent.
>
> > + Callers may rely
> > + * on *value and *endptr to be always set by the function, even in case of
> > + * errors.
>
> You neglect to specify what gets assigned to *value in error cases.
>
Undefined. :-)
Anyway, I agree it not very useful to document that "*value and *endptr
will be always set" if it is undefined. I will try to reword it.
> Your list of error checks isn't quite complete. Here's my try:
>
> /**
> * Parse unsigned integer
> *
> * @s: String to parse
> * @value: Destination for parsed integer value
> * @endptr: Destination for pointer to first character not consumed
> * @base: integer base, between 2 and 36 inclusive, or 0
> *
> * Parsed syntax is just like strol()'s: arbitrary whitespace, a single
> * optional '+' or '-', an optional "0x" if @base is 0 or 16, one or
> * more digits.
The newer version has '-' as _not_ part of the syntax.
> *
> * If @s is null, or @base is invalid, or @s doesn't start with an
> * integer in the syntax above, set address@hidden to 0, address@hidden to
> @s, and
> * return -EINVAL.
This matches the behavior of the newer version. But I would like to keep
*value documented as undefined in case of errors.
> *
> * Set @endptr to point right beyond the parsed integer.
> *
> * If the integer is negative, set address@hidden to 0, and return -ERANGE.
This isn't the case in the newer version.
> * If the integer overflows unsigned long long, set address@hidden to
> * ULLONG_MAX, and return -ERANGE.
> * Else, set address@hidden to the parsed integer, and return 0.
> */
Thanks for taking the time to write this. I hate having to look up
what's the right syntax to use in docstrings, so I often just write
plain comments before functions. :-)
I have a minor problem with the documentation above: as a developer, the
most important question I have is: what's the difference between using
parse_uint() and using strtoull() directly? But maybe it is a good
thing: instead of referencing the strtoull() specification (and an
implementation detail), now we have a well-defined and well-documented
behavior.
>
> > + *
> > + * The 'base' parameter has the same meaning of 'base' on strtoull().
> > + *
> > + * Returns 0 on success, negative errno value on error.
> > + */
> > +int parse_uint(const char *s, unsigned long long *value, char **endptr,
> > + int base)
> > +{
> > + int r = 0;
> > + char *endp = (char *)s;
> > + unsigned long long val = 0;
> > +
> > + if (!s) {
> > + r = -EINVAL;
> > + goto out;
> > + }
> > +
> > + /* make sure we reject negative numbers: */
> > + while (isspace((unsigned char)*s)) {
> > + ++s; endp++;
>
> Mixing pre- and post-increment like that is ugly. Recommend to stick to
> post-increment.
Agreed. I blame the copy & paste I made from opts-visitor. Later I added
the postfix increment without noticing how ugly it looked like
Anyway, this was changed in the newer patch version.
>
> I'd set endp after the loop. Right before goto.
Fixed in the newer version.
>
> Or simply change the specification to set *endptr to point beyond the
> integer instead of to the '-'. I took the liberty to do exactly that in
> my proposed function comment.
The newer version was changed to return -EINVAL and set endptr to the
beginning of the string.
>
> > + }
> > + if (*s == '-') {
> > + r = -ERANGE;
> > + goto out;
> > + }
>
> This creates the special case that made me go "ugh" above. Suggest to
> drop the if here, and check right after strtoull() instead, as shown
> below. That way, we get the same endptr behavior for all out-of-range
> numbers.
Is returning -EINVAL acceptable for you, as well? In your proposal we
consider "-1234" part of the language but out-of-range. Returning
-EINVAL, we consider "-1234" not part of the language, and invalid
input. The newer version returns -EINVAL.
>
> > +
> > + errno = 0;
> > + val = strtoull(s, &endp, base);
>
> if (*s = '-') {
> r = -ERANGE;
> val = 0;
> goto out;
> }
This has another bug: makes endptr point to the middle of the string in
case we are parsing " xxx" or " -1234".
>
> > + if (errno) {
> > + r = -errno;
> > + goto out;
> > + }
> > +
> > + if (endp == s) {
> > + r = -EINVAL;
> > + goto out;
> > + }
> > +
> > +out:
> > + *value = val;
> > + *endptr = endp;
> > + return r;
> > +}
> > +
> > +/* Try to parse an unsigned integer, making sure the whole string is parsed
> > + *
> > + * Have the same behavior of parse_uint(), but with an additional check
> > + * for additional data after the parsed number (in that case, the function
> > + * will return -EINVAL).
>
> What's assigned to *value then?
Undefined. :-)
>
> > + */
>
> Alternatively, you could make into parse_uint() do that check when
> endptr is null, and drop this function. Matter of taste.
I considered doing that, but I believe it would be too subtle.
>
> > +int parse_uint_full(const char *s, unsigned long long *value, int base)
> > +{
> > + char *endp;
> > + int r;
> > +
> > + r = parse_uint(s, value, &endp, base);
> > + if (r < 0) {
> > + return r;
> > + }
> > + if (*endp) {
>
> Answering my own question: the parsed integer is assigned to *value then.
I prefer to document it as undefined. If you want to use the parsed
integer even if there's additional data after it, you should use
parse_int() instead.
>
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > int qemu_parse_fd(const char *param)
> > {
> > int fd;
--
Eduardo
- [Qemu-devel] [PATCH 8/8] vl.c: validate -numa "cpus" parameter properly, (continued)
- [Qemu-devel] [PATCH 8/8] vl.c: validate -numa "cpus" parameter properly, Eduardo Habkost, 2013/01/16
- [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions, Eduardo Habkost, 2013/01/16
- Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions, Laszlo Ersek, 2013/01/17
- Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions, Eduardo Habkost, 2013/01/17
- [Qemu-devel] [PATCH 1/8 v4] cutils: unsigned int parsing functions, Eduardo Habkost, 2013/01/17
- Re: [Qemu-devel] [PATCH 1/8 v4] cutils: unsigned int parsing functions, Laszlo Ersek, 2013/01/17
- Re: [Qemu-devel] [PATCH 1/8 v4] cutils: unsigned int parsing functions, Eric Blake, 2013/01/17
- Re: [Qemu-devel] [PATCH 1/8 v4] cutils: unsigned int parsing functions, Blue Swirl, 2013/01/17
- Re: [Qemu-devel] [PATCH 1/8 v4] cutils: unsigned int parsing functions, Eduardo Habkost, 2013/01/17
- Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions, Markus Armbruster, 2013/01/18
- Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions,
Eduardo Habkost <=
- Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions, Andreas Färber, 2013/01/18
- [Qemu-devel] [PATCH 1/8 v5] cutils: unsigned int parsing functions, Eduardo Habkost, 2013/01/18
- Re: [Qemu-devel] [PATCH 1/8 v5] cutils: unsigned int parsing functions, Eric Blake, 2013/01/18
- [Qemu-devel] [PATCH 1/8 v6] cutils: unsigned int parsing functions, Eduardo Habkost, 2013/01/18
- Re: [Qemu-devel] [PATCH 1/8 v6] cutils: unsigned int parsing functions, Eric Blake, 2013/01/18
- Re: [Qemu-devel] [PATCH 1/8 v6] cutils: unsigned int parsing functions, Laszlo Ersek, 2013/01/23
- Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions, Markus Armbruster, 2013/01/18
Re: [Qemu-devel] [PATCH 0/8] -numa option parsing fixes (v3), Eric Blake, 2013/01/16
Re: [Qemu-devel] [PATCH for-1.4 0/8] -numa option parsing fixes (v3), Eduardo Habkost, 2013/01/28