[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bison for nlp
From: |
r0ller |
Subject: |
Re: bison for nlp |
Date: |
Mon, 19 Nov 2018 16:09:14 +0100 (CET) |
Hi Akim,
I managed to take the first step and get it running but it wasn't as easy as I
thought. First, I wanted to take the approach that the 'Simple C++ Example'
demonstrates in the bison manual. However, I could not figure out what my
yylex() should return when defining api.value.type variant. In the C parser I
used to return an int so to avoid turning everything upside down in one big
step, I sticked to that concept and tried what happens if I define my tokens as
%token <int> mytoken number where 'mytoken' is just a token name and 'number'
is the number assigned to it. As a real example:
%token <int> t_Con 1
The question is, in case of having many tokens like I do, how do I decide which
shall be returned, as bison now generates a symbol_type make_TOKEN() for each
token which I shall be able to return in yylex(). Though, I'd rather not put a
huge switch() in yylex(). Is there any other solution like defining a "dummy"
token like
%token <int> INT;
whose constructor make_INT(const& int) would simply return the int passed to
it? Or shall I simply try to cast the integers of my tokens to symbol_type?
The other problem I ran into was related to the non-terminals: wherever I
wanted to read the value of a symbol in an action via e.g. $1, I got an error
about type conversion as it could not be converted any more to an integer as in
the C parser. For this I have only one guess namely, that each non-terminal
needs a %type declaration like
%type <int> ENG_Con;
and even the = operator needs to be defined for it, right? So here I got stuck
at least with regards to api.value.type variant.
Then I decided to take a step back and not to use complete symbols but split
symbols for a first try. This I managed to figure out and make it work but with
a small hack as I declared yylex as:
int yylex(int* yylval);
If I did it like:
int yylex(semantic_type* yylval);
the compiler kept complaining about not knowing semantic_type (nor
parser::semantic_type, nor yy::parser::semantic_type). So I took clang's hint
when it said semantic_type* is aka int* and it worked. In the end, to make my
hack a bit more nicer, I added %define api.value.type {int}. Still, the
question in this case would be, how shall yylex() be correctly declared?
I also attached the source I ended up with which was originally generated for a
C parser but I manually modified and played around with it till I got it
working.
Thanks for any hints!
Best regards,
r0ller
-------- Eredeti levél --------
Feladó: r0ller < address@hidden (Link -> mailto:address@hidden) >
Dátum: 2018 november 12 11:43:47
Tárgy: Re: bison for nlp
Címzett: Akim Demaille < address@hidden (Link -> mailto:address@hidden) >
Hi Akim,
Sorry for the delay, I had to go through my own code to be able to answer your
question about the tokens:) But to begin with your first observation, you're
right: I should wrap that conditional ternary op for logging.
After going through the code, I concluded that currently it's only used to make
sure that a constant (or unknown word) is mapped to the same symbol in each
language having the token value 1. But it could be solved in a different way
e.g. that each language can define its own symbol for constants in a newly
introduced (customizing) db table. As you can guess, currently if the program
bumps into a token with value 1, then it assumes that it's an unknown
word/morpheme/constant. It seemed ok 8 years ago, but now it has it 's price to
turn back the wheels. However, I think I'll do it even though it seems that
individual numbering does not cause any problem as there are only two numbers
to avoid conflicts (0 and 256). The other place where it'd be used is the
symbol prediction (where I need to remap a token to a symbol) in case of an
error but that method is currently not called at all as it does not yet work
well and now I just return the bison error message about the expected symbols.
Mine would have the additional functionality on top that it'd not only tell
what's syntactically expected but would return a subset of those symbols which
are semantically expected.
Concerning the c++ bison wrapper, what I mentioned is simply that I read
somewhere an article in 2010 when I started the project which made that
statement and I didn't even validate it. But now I'm pretty curious about its
c++ features so I'll definitely go through the documentation you sent and try
to turn mine into a c++ parser:)
Best regards,
r0ller
-------- Eredeti levél --------
Feladó: Akim Demaille < address@hidden (Link -> mailto:address@hidden) >
Dátum: 2018 november 9 06:13:29
Tárgy: Re: bison for nlp
Címzett: r0ller < address@hidden (Link -> mailto:address@hidden) >
Hi!
> Le 7 nov. 2018 à 10:09, r0ller <address@hidden> a écrit :
>
> Hi Akim,
>
> The file hi_nongen.y is just left there as the last version that I wrote
> manually:) If you check out any other hi.y files in the platform specific
> directories (e.g. the one for the online demo is
> https://github.com/r0ller/alice/blob/master/hi_js/hi.y but you can have a
> look in hi_android or hi_desktop as well) you’ll see how they look like
> nowadays.
You have tons of
logger::singleton()==NULL?(void)0:logger::singleton()->log(2,"vm is NULL!");
you could introduce logger::log, or whatever free function,
that does that for you instead of having to deal with that
in every call site.
> Numbering tokens was introduced in the very beginning and has been questioned
> by myself quite a many times if it's still needed. I didn’t give a hard try
> to get rid of it mainly due to one reason: I want to have an error handling
> that tells in case of an error which symbols could be accepted instead of the
> erroneous one just as bison itself does it but in a structured way (as bison
> returns that info in an error message string).
Where are these numbers used?
> Though, I could not come up with any better idea when it comes to remapping a
> token to a symbol. As far as I know bison uses internally the tokens and not
> the symbols for the terminals and it's not possible to get back a symbol
> belonging to a certain token. That's it roughly but I'd be glad to get rid of
> it. However, if it's not possible and poses no problems then I can live with
> it. By the way, are there any number ranges or specific numbers that are
> reserved?
Some numbers are reserved, yes: 0 for eof and 256 for error (per POSIX). For
error, Bison can accommodate if you use 256. EOF must be 0.
> Not using the C++ features of bison has historical reasons: I started writing
> the project in C and even back then I used yacc which I later replaced with
> bison. When I started to shift the project to C++ I was glad that it still
> worked with the generated C parser and since then I never had time to make
> such an excursion but it'd be great. I also must admit that I wasn't really
> aware of it. The only thing I read somewhere was that bison has a C++ wrapper
> but have never taken any steps into that direction.
I don’t know what you mean here: this is bison itself, there’s
no need for a wrapper, and the deterministic parser itself is
genuine C++, not C++ wrapping C. The GLR parser in C++ though _is_
a wrapper for the C GLR parser.
> Now I think I'll find some time for it -at least to check it out:) Could you
> give me any links pointing to any tutorial or something like that? It’d be
> very kind if you could help me in taking the first steps, thanks!
I would very like to have your opinion on the open section of the
documentation about C++. It’s recent, and it probably needs polishing.
https://www.gnu.org/software/bison/manual/bison.html#A-Simple-C_002b_002b-Example
hi.y
Description: Binary data
- Re: bison for nlp, (continued)
- Re: bison for nlp, Akim Demaille, 2018/11/08
- Re: bison for nlp, Hans Åberg, 2018/11/09
- Re: bison for nlp, Akim Demaille, 2018/11/09
- Re: bison for nlp, Hans Åberg, 2018/11/09
- improving error message (was: bison for nlp), Akim Demaille, 2018/11/10
- Re: improving error message (was: bison for nlp), Hans Åberg, 2018/11/10
- Re: improving error message (was: bison for nlp), Akim Demaille, 2018/11/10
- Re: improving error message, Hans Åberg, 2018/11/10
- Re: bison for nlp, Akim Demaille, 2018/11/09
- Re: bison for nlp, r0ller, 2018/11/12
- Re: bison for nlp,
r0ller <=
- Re: bison for nlp, Akim Demaille, 2018/11/20
- Re: bison for nlp, r0ller, 2018/11/21
- Re: bison for nlp, Akim Demaille, 2018/11/23
- Re: bison for nlp, r0ller, 2018/11/27
- Re: bison for nlp, Akim Demaille, 2018/11/27
- Re: bison for nlp, r0ller, 2018/11/27