[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Grammatica-users] HTML grammar??
From: |
Per Cederberg |
Subject: |
Re: [Grammatica-users] HTML grammar?? |
Date: |
Sun, 18 Dec 2005 13:47:41 +0100 |
Well, I guess it would be possible to write an HTML
grammar for Grammatica. But the question is more if
it would really be a good fit. The thing with HTML
is that *lots* of the real-world web pages are
invalid (syntactically).
So I think to write a good HTML-parser, one really
needs to do it by hand. Adding special code
everywhere to recover from common problems and
issues.
Also, HTML is a very unstrict syntax, allowing new
unknown tags to be used, end tags to be omitted, etc,
etc. So it is very hard to create a correct BNF
grammar that covers all that still provides something
more than a pure tokenizer.
Cheers,
/Per
On thu, 2005-12-15 at 11:33 -0800, John Kleven wrote:
> Hi all,
>
> Curious if anybody has used Grammatica to create an
> HTML parser?
>
> Not sure if thats a good fit for grammatica or not but
> it seemed like it might be. The existing C# HTML
> parsers out there all seem to leave something (or
> quite a bit) to be desired.
>
> Any info appreciated!
> Thanks
> John
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
>
> _______________________________________________
> Grammatica-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
>