octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #35683] regexp crash (named subexp/nested pare


From: Rik
Subject: [Octave-bug-tracker] [bug #35683] regexp crash (named subexp/nested paren; maybe related to #29438)
Date: Sun, 11 Mar 2012 18:03:04 +0000
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2

Update of bug #35683 (project octave):

                Severity:              3 - Normal => 4 - Important          

    _______________________________________________________

Follow-up Comment #3:

Interestingly, my system does not behave the same as yours and I don't see
segfaults for your reduced testcases.  This is probably due to subtle
differences in the way things are laid out in memory.  I'm sure the memory
corruption is occurring, but it doesn't always lead to an actual segfault.

The following string and pattern combination always segfaults for me (tested
on versions 3.2.4, 3.4.0, 3.4.3, 3.6.1, and the dev).


str = 'char short xyz';
ptn = '(?<label>(char|short)s+)';
regexp (str, ptn, 'names')


The problem seems to be with the mixing of named and unnamed match buffers. 
The outer parentheses in the pattern above create a match buffer, which
happens to be named label.  The inner parenthesis, which I know you are
intending to use only for grouping, also creates an unnamed match buffer. 
When you ask for the named match tokens Octave indexes into an array of names.
 It finds the first name, <label>, but when it goes to find the second name it
oversteps the array bounds.

A solution in this case is to use alternation without creating a back
reference.  The syntax for this is '(?:A|B)'.  While the example above
segfaults, this one works.


str = 'char short xyz';
ptn = '(?<label>(?:char|short)s+)';
regexp (str, ptn, 'names')


Another possibility is to explicitly name every capture backreference.  I
tried that with your original pattern and it no longer segfaults.  (I don't
think it's giving you what you want but that is a different question about
what the regexp should be picking out from the string.)

The pattern that doesn't segfault is


pattern =
'(?<typestr>(?<unname1>(?<unname2>char|short|int|long|signed|unsigned|float|double|int8|uint8|int16|uint16|int32|uint32|int64|uint64|int8_t|uint8_t|int16_t|uint16_t|int32_t|uint32_t|int64_t|uint64_t|BYTE|UBYTE|WORD|DWORD|QWORD)s+)+)(?<name>[a-zA-Z_][a-zA-Z0-9_]*)s*(?<length>[[0-9]+])?s*;'


Where I have used 'unname1' and 'unname2' for the previously implicit capture
buffers.

Obviously, there is still a deeper problem that bad input, no matter how
malformed, shouldn't cause the program to segfault.  It should produce a
warning or error message and return to the prompt.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?35683>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]