qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v12 21/36] qapi: Tighten the regex on valid name


From: Eric Blake
Subject: Re: [Qemu-devel] [PATCH v12 21/36] qapi: Tighten the regex on valid names
Date: Wed, 18 Nov 2015 09:57:16 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 11/18/2015 04:51 AM, Markus Armbruster wrote:
> Eric Blake <address@hidden> writes:
> 
>> We already documented that qapi names should match specific
>> patterns (such as starting with a letter unless it was an enum
>> value or a downstream extension).  Tighten that from a suggestion
>> into a hard requirement, which frees up names beginning with a
>> single underscore for qapi internal usage.
> 
> If we care enough about naming conventions to document them, enforcing
> them makes only sense.
> 
>>                                             Also restrict things
>> to avoid 'a__b' or 'a--b' (that is, dash and underscore must not
>> occur consecutively).
> 
> I feel this is entering "foolish names" territory, where things like
> "IcAnFiNdNaMeSeVeNlEsSrEaDaBlEtHaNStudlyCaps" live.  Catching fools

IhAdToOmUcHfUnReAdInGtHaT

[It's amazing that scholars can ever manage to correctly interpret old
written languages, thanks to omitted word breaks (scriptio continua) or
omitted vowels (such as in Hebrew)]

> automatically is generally futile, they're too creative :)
> 
> Let's drop this part.

Sure, I can do that. I'll post a fixup patch, as it will affect docs too.

> 
>> Add a new test, reserved-member-underscore, to demonstrate the
>> tighter checking.
>>
>> Signed-off-by: Eric Blake <address@hidden>
>>
>> ---
>> v12: new patch
>> ---

>> +incompatibly in a future release.  All names must begin with a letter,
>> +and contain only ASCII letters, digits, dash, and underscore, where
>> +dash and underscore cannot occur consecutively.  There are two

Namely, right here.

>> +# Names must be letters, numbers, -, and _.  They must start with letter,
>> +# except for downstream extensions which must start with __RFQDN_.
>> +# Dots are only valid in the downstream extension prefix.
>> +valid_name = re.compile('^(__[a-zA-Z][a-zA-Z0-9.]*_)?'
>> +                        '[a-zA-Z][a-zA-Z0-9]*([_-][a-zA-Z0-9]+)*$')
> 
> This regexp consists of two parts: optional __RFQDN_ prefix and the name
> proper.
> 
> The latter stays simpler if we don't attempt to catch adjacent [-_].
> 
> The former isn't quite right.  According to RFC 822 Appendix 1 - Domain
> Name Syntax Specification:
> 
> * We must accept '-' in addition to digits.
> 
> * We require RFQDN to start with a letter.  This is still a loose match.
>   The tight match is "labels separated by dots", where labels consist of
>   letters, digits and '-', starting with a letter.  I wouldn't bother
>   checking first characters are letters at all.
> 
> Recommend
> 
>    valid_name = re.compile('^(__[a-zA-Z0-9.-]+_)?'
>                            '[a-zA-Z][a-zA-Z0-9_-]*$')

Sure, that works for me. It's tighter than what we had before, but
looser than what I proposed so that it allows more RFQDN.  It
potentially lets users confuse us by abusing 'foo__max' or similar, but
I'm less worried about that abuse - it's okay to stop the blatantly
obvious mistakes without worrying about the corner cases, if it makes
the maintenance easier for the cases we do catch.

> 
>>
>>
>>  def check_name(expr_info, source, name, allow_optional=False,
>> @@ -374,8 +376,8 @@ def check_name(expr_info, source, name, 
>> allow_optional=False,
>>                                  % (source, name))
>>      # Enum members can start with a digit, because the generated C
>>      # code always prefixes it with the enum name
>> -    if enum_member:
>> -        membername = '_' + membername
>> +    if enum_member and membername[0].isdigit():
> 
> What's wrong with the old condition?
> 
>> +        membername = 'D' + membername

The old code prepended a lone '_', which doesn't work any more with the
tighter valid_name regex, so we have to use some other prefix (I picked
'D').

>>      # Reserve the entire 'q_' namespace for c_name()
>>      if not valid_name.match(membername) or \
>>         c_name(membername, False).startswith('q_'):

The other thing is that I realized that an enum value of 'q-int' was
getting munged to '_q-int' which no longer gets flagged by the c_name()
filter.  But neither would 'Dq-int' get flagged.  So limiting the
munging to just enum values that start with a digit (where we know it
doesn't start with q) means we don't weaken the second condition.

I guess I need to call that out in the commit message; all the more
reason for me to send a fixup.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]