[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bison for nlp
From: |
r0ller |
Subject: |
Re: bison for nlp |
Date: |
Wed, 7 Nov 2018 10:09:37 +0100 (CET) |
Hi Akim,
The file hi_nongen.y is just left there as the last version that I wrote
manually:) If you check out any other hi.y files in the platform specific
directories (e.g. the one for the online demo is
https://github.com/r0ller/alice/blob/master/hi_js/hi.y but you can have a look
in hi_android or hi_desktop as well) you'll see how they look like nowadays.
Numbering tokens was introduced in the very beginning and has been questioned
by myself quite a many times if it's still needed. I didn't give a hard try to
get rid of it mainly due to one reason: I want to have an error handling that
tells in case of an error which symbols could be accepted instead of the
erroneous one just as bison itself does it but in a structured way (as bison
returns that info in an error message string). Though, I could not come up with
any better idea when it comes to remapping a token to a symbol. As far as I
know bison uses internally the tokens and not the symbols for the terminals and
it's not possible to get back a symbol belonging to a certain token. That's it
roughly but I'd be glad to get rid of it. However, if it's not possible and
poses no problems then I can live with it. By the way, are there any number
ranges or specific numbers that are reserved?
Concerning, the missing features, I can currently only mention the error
message about the expected tokens in case of error. It'd be great to be able to
get an array of expected tokens instead of/besides the error message.
Not using the C++ features of bison has historical reasons: I started writing
the project in C and even back then I used yacc which I later replaced with
bison. When I started to shift the project to C++ I was glad that it still
worked with the generated C parser and since then I never had time to make such
an excursion but it'd be great. I also must admit that I wasn't really aware of
it. The only thing I read somewhere was that bison has a C++ wrapper but have
never taken any steps into that direction. Now I think I'll find some time for
it -at least to check it out:) Could you give me any links pointing to any
tutorial or something like that? It'd be very kind if you could help me in
taking the first steps, thanks!
Using std::endl is something I never pondered about. But is really
std::cout<<mystring+"\n" faster than std::cout<<mystring<<std::endl? I'll check
out the links you sent, thanks!
Concerning foma, I cannot really do anything besides letting Mans (author of
foma) know that he'd better check that regex.y out:)
Best regards,
r0ller
-------- Eredeti levél --------
Feladó: Akim Demaille < address@hidden (Link -> mailto:address@hidden) >
Dátum: 2018 november 7 07:30:29
Tárgy: Re: bison for nlp
Címzett: r0ller < address@hidden (Link -> mailto:address@hidden) >
Hi!
> Le 6 nov. 2018 à 17:32, r0ller <address@hidden> a écrit :
>
> To whoever it may concern,
>
> I’m using bison to create an open source nlp tool and it's kind of useable
> already (see online demo at https://r0ller.github.io/alice).
Wow! That’s impressive.
> The bison source itself is generated from the grammar rules (which can at
> most be binary branching) that are stored in an sqlite db file along with all
> other specs of the language being modeled.
So a file like hi_nongen.y is generated, or not? Why do you number your tokens?
%token t_Con 1
%token t_ENG_A 2
%token t_ENG_Adv 3
%token t_ENG_Det 4
%token t_ENG_N_stem 5
%token t_ENG_N_lfea_Pl 6
?
Are there features missing in Bison could help?
I see that this file is C++, but you’re not using the C++ features of Bison, is
that on purpose? Would you want some assistance to move to C++?
I also see you use std::endl. Really, you shouldn’t: stick to ‘\n’. It’s nicer
to read, and faster (you are unlike to need to flush each line). There’s an
urban legend in the C++ world that std::endl would be portable on Windows
(i.e., generates \r\n), but actually it’s already the case for \n.
https://gitlab.lrde.epita.fr/vcsn/vcsn/commit/733163ad406125460e8deffb6508b16f570b9bff
https://www.youtube.com/watch?v=lHGR_kH0PNA
> The rules of course cover the grammar while the actions carry out semantic
> checks. The phonological and morphological analyses are covered by foma
> (https://fomafst.github.io) which also uses bison.
https://github.com/mhulden/foma/blob/4e98d61ce67babe1555e1e1b04d5a65bc3b5a03a/foma/regex.y#L205
%expect 686
OMG! And it’s not GLR, this is plain old LALR(1). How can one trust such a
number of conflicts? If this is really means, then that’s probably a place
where a feature such as
https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00105.html would
shine.
> As I mentioned the whole stuff is open source and free so please, don't take
> this as an ad. I just wanted to say thank you to all who develops/maintains
> bison by showing what you made possible.
That’s very kind, thanks a lot!
- bison for nlp, r0ller, 2018/11/06
- Re: bison for nlp, Akim Demaille, 2018/11/07
- Re: bison for nlp,
r0ller <=
- Re: bison for nlp, Hans Åberg, 2018/11/08
- Re: bison for nlp, r0ller, 2018/11/08
- Re: bison for nlp, Hans Åberg, 2018/11/08
- Re: bison for nlp, r0ller, 2018/11/08
- Re: bison for nlp, Akim Demaille, 2018/11/08
- Re: bison for nlp, Hans Åberg, 2018/11/09
- Re: bison for nlp, Akim Demaille, 2018/11/09
- Re: bison for nlp, Hans Åberg, 2018/11/09
- improving error message (was: bison for nlp), Akim Demaille, 2018/11/10
- Re: improving error message (was: bison for nlp), Hans Åberg, 2018/11/10