freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] HTML support (was: A look at JCAT?)


From: Thierry Sourbier
Subject: [Freecats-Dev] HTML support (was: A look at JCAT?)
Date: Fri, 7 Feb 2003 23:51:47 +0100

Let me jump on a comment I saw while quickly browsing through Henri's long
email :).

> - another closely related issue is that JCAT only understands
> well-formed HTML and XML. Due to this, it won't be able to work on at
> least 80% of existing HTML files. This is why we prefer a "dumb" approach.

1. It is easier to have a "dumb" parser read well formed HTML than a "smart"
parser able to read "dumb" HTML. Indeed for malformed HTML it is not only of
tags being misplaced or missing, but also knowing what is a tag and what is
not e.g. : "<b> This character "<" can mess up everything </b>").

2. Most malformed HTML files can be made compliant to the standard by
running them through Tidy. See http://tidy.sourceforge.net/. In a web l10n
product I worked on before Tidy was part of the workflow.

> Thierry, what would you think about establishing a contact for us with
Yves?

I've already pointed him out to the project home page.

Cheers,
Thierry.








_______________________________________________
Freecats-dev mailing list
address@hidden
http://mail.nongnu.org/mailman/listinfo/freecats-dev





reply via email to

[Prev in Thread] Current Thread [Next in Thread]