Re: bison for nlp

help-bison

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bison for nlp

From:	r0ller
Subject:	Re: bison for nlp
Date:	Wed, 7 Nov 2018 10:09:37 +0100 (CET)

Hi Akim,

The file hi_nongen.y is just left there as the last version that I wrote 
manually:) If you check out any other hi.y files in the platform specific 
directories (e.g. the one for the online demo is 
https://github.com/r0ller/alice/blob/master/hi_js/hi.y but you can have a look 
in hi_android or hi_desktop as well) you'll see how they look like nowadays.

Numbering tokens was introduced in the very beginning and has been questioned 
by myself quite a many times if it's still needed. I didn't give a hard try to 
get rid of it mainly due to one reason: I want to have an error handling that 
tells in case of an error which symbols could be accepted instead of the 
erroneous one just as bison itself does it but in a structured way (as bison 
returns that info in an error message string). Though, I could not come up with 
any better idea when it comes to remapping a token to a symbol. As far as I 
know bison uses internally the tokens and not the symbols for the terminals and 
it's not possible to get back a symbol belonging to a certain token. That's it 
roughly but I'd be glad to get rid of it. However, if it's not possible and 
poses no problems then I can live with it. By the way, are there any number 
ranges or specific numbers that are reserved?

Concerning, the missing features, I can currently only mention the error 
message about the expected tokens in case of error. It'd be great to be able to 
get an array of expected tokens instead of/besides the error message.

Not using the C++ features of bison has historical reasons: I started writing 
the project in C and even back then I used yacc which I later replaced with 
bison. When I started to shift the project to C++ I was glad that it still 
worked with the generated C parser and since then I never had time to make such 
an excursion but it'd be great. I also must admit that I wasn't really aware of 
it. The only thing I read somewhere was that bison has a C++ wrapper but have 
never taken any steps into that direction. Now I think I'll find some time for 
it -at least to check it out:) Could you give me any links pointing to any 
tutorial or something like that? It'd be very kind if you could help me in 
taking the first steps, thanks!

Using std::endl is something I never pondered about. But is really 
std::cout<<mystring+"\n" faster than std::cout<<mystring<<std::endl? I'll check 
out the links you sent, thanks!

Concerning foma, I cannot really do anything besides letting Mans (author of 
foma) know that he'd better check that regex.y out:)

Best regards,
r0ller

-------- Eredeti levél --------
Feladó: Akim Demaille < address@hidden (Link -> mailto:address@hidden) >
Dátum: 2018 november 7 07:30:29
Tárgy: Re: bison for nlp
Címzett: r0ller < address@hidden (Link -> mailto:address@hidden) >
 
Hi!
> Le 6 nov. 2018 à 17:32, r0ller <address@hidden> a écrit :
>
> To whoever it may concern,
>
> I’m using bison to create an open source nlp tool and it's kind of useable 
> already (see online demo at https://r0ller.github.io/alice).

Wow! That’s impressive.

> The bison source itself is generated from the grammar rules (which can at 
> most be binary branching) that are stored in an sqlite db file along with all 
> other specs of the language being modeled.

So a file like hi_nongen.y is generated, or not? Why do you number your tokens?
%token t_Con 1
%token t_ENG_A 2
%token t_ENG_Adv 3
%token t_ENG_Det 4
%token t_ENG_N_stem 5
%token t_ENG_N_lfea_Pl 6

?

Are there features missing in Bison could help?

I see that this file is C++, but you’re not using the C++ features of Bison, is 
that on purpose? Would you want some assistance to move to C++?

I also see you use std::endl. Really, you shouldn’t: stick to ‘\n’. It’s nicer 
to read, and faster (you are unlike to need to flush each line). There’s an 
urban legend in the C++ world that std::endl would be portable on Windows 
(i.e., generates \r\n), but actually it’s already the case for \n.

https://gitlab.lrde.epita.fr/vcsn/vcsn/commit/733163ad406125460e8deffb6508b16f570b9bff
https://www.youtube.com/watch?v=lHGR_kH0PNA

> The rules of course cover the grammar while the actions carry out semantic 
> checks. The phonological and morphological analyses are covered by foma 
> (https://fomafst.github.io) which also uses bison.

https://github.com/mhulden/foma/blob/4e98d61ce67babe1555e1e1b04d5a65bc3b5a03a/foma/regex.y#L205
%expect 686

OMG! And it’s not GLR, this is plain old LALR(1). How can one trust such a 
number of conflicts? If this is really means, then that’s probably a place 
where a feature such as 
https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00105.html would 
shine.

> As I mentioned the whole stuff is open source and free so please, don't take 
> this as an ad. I just wanted to say thank you to all who develops/maintains 
> bison by showing what you made possible.

That’s very kind, thanks a lot!

[Prev in Thread]

Current Thread

[Next in Thread]

bison for nlp, r0ller, 2018/11/06
- Re: bison for nlp, Akim Demaille, 2018/11/07
  - Re: bison for nlp, r0ller <=
    - Re: bison for nlp, Hans Åberg, 2018/11/08
    - Re: bison for nlp, r0ller, 2018/11/08
    - Re: bison for nlp, Hans Åberg, 2018/11/08
    - Re: bison for nlp, r0ller, 2018/11/08
    - Re: bison for nlp, Akim Demaille, 2018/11/08
    - Re: bison for nlp, Hans Åberg, 2018/11/09
    - Re: bison for nlp, Akim Demaille, 2018/11/09
    - Re: bison for nlp, Hans Åberg, 2018/11/09
    - improving error message (was: bison for nlp), Akim Demaille, 2018/11/10
    - Re: improving error message (was: bison for nlp), Hans Åberg, 2018/11/10

Prev by Date: Re: bison for nlp
Next by Date: [sr #109580] .PHOMY in examples/local.mk
Previous by thread: Re: bison for nlp
Next by thread: Re: bison for nlp
Index(es):
- Date
- Thread