[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Grammatica-users] Having problems with my grammar
From: |
Per Cederberg |
Subject: |
Re: [Grammatica-users] Having problems with my grammar |
Date: |
Wed, 16 Mar 2005 19:30:54 +0100 |
Ok, the problem has already been solved. But I noted a few other
things you'd might consider fixing while you're at it:
aLetter = <<[A-Z]>>
...
h = "H"
As the <h> token is below <aLetter> and they can match the same
character sequence, the <aLetter> token will always be chosen. I
suggest you move the <aLetter> (and <aDigit>) token down in the
list below <h>.
WHITESPACE = <<[ \t]+>> %ignore%
Your second issue was probably caused by your <WHITESPACE> token
not including linefeeds. Add \n and \r to the token, or trim()
the input text before calling the parser. Note that if you add
linefeeds to the <WHITESPACE> token, your parser will also accept
linefeeds inside document numbers, which might not be what you
want.
Cheers,
/Per
On wed, 2005-03-16 at 11:07 -0700, Anant Mistry wrote:
>
> Please ignore this posting. I figured it out ..... duh!!! I was being
> really stupid ..... cut'n'paste is not always a good thing!!!
>
> Thanks
>
> Anant
>
> On Wed, 2005-03-16 at 07:48 -0700, Anant Mistry wrote:
> >
> > I'm trying to create a grammar to parse a single line (a doc number
> > actually). Here is my grammar
> >
> > %header%
> >
> > GRAMMARTYPE = "LL"
> >
> > %tokens%
> >
> > dash = '-'
> > period = '.'
> > slash = '/'
> > rev1 = "REVISION"
> > rev2 = "REV"
> > part1 = "PART"
> > part2 ="PT"
> > chapter1 = "CHAPTER"
> > chapter2 = "CHAP"
> > volume1 = "VOLUME"
> > volume2 = "VOL"
> > validNotice1 = "VALID NOTICE"
> > validNotice2 = "VAL NOTICE"
> > interimChg1 = "INTERIM CHANGE"
> > interimChg2 = "INT CHG"
> > supplement1 = "SUPPLEMENT"
> > supplement2 = "SUPP"
> > aLetter = <<[A-Z]>>
> > aDigit = <<[0-9]>>
> > mil = "MIL"
> > dod = "DOD"
> > jan = "JAN"
> > prf = "PRF"
> > dtl = "DTL"
> > oo = "00"
> > oh = "OH"
> > h = "H"
> > WHITESPACE = <<[ \t]+>> %ignore%
> >
> > %productions%
> >
> > reference = milspec [suffix] ;
> >
> > suffix = ( rev1 | rev2 | part1 | part2 | chapter1 | chapter2
> > | volume1 | volume2 | validNotice1 | validNotice2
> > | interimChg1 | interimChg2 | supplement1 |
> > supplement2 ) [singleNumber] ;
> >
> > milspec = milPrefix baseNum [slashNum] ;
> >
> > milPrefix = ( mil | dod | jan ) dash [middleAlpha] [delimiter]
> > [indicator] baseNum [slash slashNum];
> > middleAlpha = prf | dtl | alpha ;
> >
> > baseNum = singleNumber [impliedRev] ;
> > slashNum = singleNumber [impliedRev] ;
> >
> > alpha = aLetter+ ;
> > singleNumber = aDigit+ ;
> >
> > delimiter = dash | period | slash ;
> >
> > indicator = oo | oh | h ;
> >
> > impliedRev = (dash singleNumber [alpha]) | alpha ;
> >
> > The problem I'm having is when I try to parse the line
> >
> > MIL-DTL-0053133/47B SUPPLEMENT 1
> >
> > I get
> >
> > bandikoot$ java -jar lib/grammatica-1.4.jar struct_key.grammar.g --parse
> > ./inpfile
> > Parse tree from ./inpfile:
> > reference(2001)
> > milspec(2003)
> > milPrefix(2004)
> > mil(1020): "MIL", line: 1, col: 1
> > dash(1001): "-", line: 1, col: 4
> > middleAlpha(2005)
> > dtl(1024): "DTL", line: 1, col: 5
> > delimiter(2010)
> > dash(1001): "-", line: 1, col: 8
> > indicator(2011)
> > oo(1025): "00", line: 1, col: 9
> > baseNum(2006)
> > singleNumber(2009)
> > aDigit(1019): "5", line: 1, col: 11
> > aDigit(1019): "3", line: 1, col: 12
> > aDigit(1019): "1", line: 1, col: 13
> > aDigit(1019): "3", line: 1, col: 14
> > aDigit(1019): "3", line: 1, col: 15
> > slash(1003): "/", line: 1, col: 16
> > slashNum(2007)
> > singleNumber(2009)
> > aDigit(1019): "4", line: 1, col: 17
> > aDigit(1019): "7", line: 1, col: 18
> > impliedRev(2012)
> > alpha(2008)
> > aLetter(1018): "B", line: 1, col: 19
> > Error: in ./inpfile: line 1:
> > unexpected token "SUPPLEMENT", expected <aDigit>
> >
> > MIL-DTL-0053133/47B SUPPLEMENT 1
> > ^
> > I'm not sure why it's expecting an <aDigit> token. If I move the
> > [suffix] to the end of the slashNumber, it works O.K .... i.e.
> >
> > slashNum = singleNumber [impliedRev] [suffix] ;
> >
> > and removing [suffix] from the reference line, gives me an output of
> >
> > bandikoot$ java -jar lib/grammatica-1.4.jar struct_key.grammar.g --parse
> > ./inpfile
> > Parse tree from ./inpfile:
> > reference(2001)
> > milspec(2003)
> > milPrefix(2004)
> > mil(1020): "MIL", line: 1, col: 1
> > dash(1001): "-", line: 1, col: 4
> > middleAlpha(2005)
> > dtl(1024): "DTL", line: 1, col: 5
> > delimiter(2010)
> > dash(1001): "-", line: 1, col: 8
> > indicator(2011)
> > oo(1025): "00", line: 1, col: 9
> > baseNum(2006)
> > singleNumber(2009)
> > aDigit(1019): "5", line: 1, col: 11
> > aDigit(1019): "3", line: 1, col: 12
> > aDigit(1019): "1", line: 1, col: 13
> > aDigit(1019): "3", line: 1, col: 14
> > aDigit(1019): "3", line: 1, col: 15
> > slash(1003): "/", line: 1, col: 16
> > slashNum(2007)
> > singleNumber(2009)
> > aDigit(1019): "4", line: 1, col: 17
> > aDigit(1019): "7", line: 1, col: 18
> > impliedRev(2012)
> > alpha(2008)
> > aLetter(1018): "B", line: 1, col: 19
> > suffix(2002)
> > supplement1(1016): "SUPPLEMENT", line: 1, col: 21
> > singleNumber(2009)
> > aDigit(1019): "1", line: 1, col: 32
> > Error: in ./inpfile: line 1:
> > unexpected character '
> > '
> >
> > MIL-DTL-0053133/47B SUPPLEMENT 1
> > ^
> >
> > Not quite perfect but at least it gets the [suffix] part correctly.
> >
> > Any thoughts why the first one doesn't work?
> >
> > Thanks in advance
> >
> > Anant
> >
> >
> >
> >
> > _______________________________________________
> > Grammatica-users mailing list
> > address@hidden
> > http://lists.nongnu.org/mailman/listinfo/grammatica-users
>
>
> _______________________________________________
> Grammatica-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/grammatica-users