[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Grammatica-users] How to treat nested comments?
From: |
Oliver Gramberg |
Subject: |
[Grammatica-users] How to treat nested comments? |
Date: |
Wed, 24 Jun 2009 17:22:44 +0200 |
Hello Grammatica users,
I want to write a parser for a language
that allows nested comments: /* ... /* ... */ ... */ is valid but
/* ... /* ... */ is not. Obviously, I cannot cover that with just a regular
_expression_. I started by defining tokens similar to the following: (don't
bother with correctness here ;-) )
COMMENT_START = "/*"
COMMENT_END = "*/"
NESTED_COMMENT_CONTENTS = << ... (ugly regexp matching
anything except COMMENT_START or COMMENT_END) >>
One big problem with this is that NESTED_COMMENT_CONTENTS,
as intended, matches anything except COMMENT_START or COMMENT_END, which
can be as much as all from the current position until the end of the input
file! That changes the running time from close to O(n) to something like
O(n^2) - 102 sec. on a 34k input file.
Before NFAs were introduced to tokenize
(Grammatica up to 1.5 alpha 2, if I'm right), my solution was to
- add an "enabled" flag to
the token patterns, and
- hack the tokenizer to not match a
token pattern that is not enabled,
- to keep track of the number of COMMENT_START
and COMMENT_END
encountered, and
- to enable NESTED_COMMENT_CONTENTS
only when "inside" a comment.
Since Grammatice 1.5 release, NESTED_COMMENT_CONTENTS
is being recognized by the new NFA implementation where I cannot find an
easy way to disable a token pattern.
Any suggestions?
Regards
Oliver
Oliver Gramberg
ABB AG
Forschungszentrum Deutschland
DECRC/I2
Wallstadter Str. 59
D-68526 Ladenburg
Phone: +49 6203/71-6461
Fax: +49 6203/71-6253
E-mail: address@hidden
Sitz/Head Office: Mannheim
Registergericht/Registry Court: Mannheim
Handelsregisternummer/Commercial Register No.: HRB 4664
Vorstand/Managing Board: Peter Smits
(Vorsitzender/Chairman), Heinz-Peter Paffenholz, Dr. Joachim Schneider,
Hendrik Weiler
Vorsitzender des Aufsichtsrats/Chairman of Supervisory Board: Bernhard
Jucker
Diese E-Mail enthaelt vertrauliche
und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige
Adressat sind oder diese E-Mail irrtuemlich erhalten haben, informieren
Sie bitte sofort den Absender und vernichten Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist
nicht gestattet.
This e-mail may contain confidential
and/or privileged information. If you are not the intended recipient (or
have received this e-mail in error) please notify the sender immediately
and destroy this e-mail.
Any unauthorized copying, disclosure or distribution of the material in
this e-mail is strictly forbidden.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Grammatica-users] How to treat nested comments?,
Oliver Gramberg <=