bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unescaped content in HTML attributes in EPUB output of texi2any


From: Benjamin Kalish
Subject: Re: Unescaped content in HTML attributes in EPUB output of texi2any
Date: Wed, 25 Sep 2024 19:43:10 -0400

Thanks. I can see it both ways, but still lean towards it being a bug. Headings in HTML can contain HTML and I don't think the user has any reason to expect that the content of a heading would end up anywhere else. If it were up to me the deciding factor would be whether the literal content of the headings need to show up in HTML attributes (and I don't see why they would). If it is necessary, then yes, it is up to the user to avoid the use of @inlineraw headings (and a warning would be most welcome!). On the other hand, if it is not necessary and that content doesn't need to show up in attributes, or if an escaped version could be used instead, then the user should be allowed to use @inlineraw here and a change to the code is necessary to prevent noncompliant HTML output.

Benjamin Kalish

On Sun, Sep 22, 2024 at 6:12 PM Patrice Dumas <pertusus@free.fr> wrote:
On Sun, Sep 22, 2024 at 05:27:34PM -0400, Benjamin Kalish wrote:
> It looks like the problem occurs only with the use of raw HTML, directly
> (as in the minimal example here), or indirectly through a macro (as I first
> encountered it):

I can reproduce with your example (in inline_in_chap.texi file), with
USE_NODES set to 0 (as is done for epub):

$srcdir/texi2any.pl --html -c 'USE_NODES 0' inline_in_chap.texi

I am not sure that this is a bug, though, looks like a feature to me, as
@inlineraw turns off escaping of HTML characters.  The situation is not
ideal, because there is no way to specify something different for
attributes (called 'string' context in texi2any) such as <meta>
name="description" content attribute, and in the main output, for
instance in the chapter heading <h2>.  When Texinfo code is used, the
formatting to HTML can be different in 'string' context and in normal
context, but I can't see how this could be specified for @inlineraw raw
HTML.

Maybe we could say something in the documentation, for instance that
HTML elements should not be used in @inlineraw in @node or sectioning
commands?

> \input texinfo
>
> @node Top
> @top
>
> @node Cap 1
> @chapter @inlineraw{html,<span class="test">}One@inlineraw{html,</span>}
>
> @bye
>
> Benjamin Kalish
>
>
> On Sun, Sep 22, 2024 at 4:47 PM Gavin Smith <gavinsmith0123@gmail.com>
> wrote:
>
> > On Sun, Sep 22, 2024 at 02:20:36PM -0400, Benjamin Kalish wrote:
> > > EPUB output contains unescaped content in a number of HTML attributes.
> > I'm
> > > seeing this with:
> > >
> > > - The content attribute for <meta> with name="description"
> > > - The content attribute for <meta> name="keywords"
> > > - The title attribute of the <link> elements with rel="next" and
> > rel="prev"
> > >
> > > HTML output also has these same tags and attributes, but the content
> > seems
> > > fine in my case. This may not actually be due to better escaping, as it
> > > looks like entirely different content is being used for the attribute
> > > values when generating HTML, and the content is, in this case at least,
> > > safe without escaping.
> > >
> > > Changing the values to be the same as those used when generating HTML
> > would
> > > solve the problem in my case, but it is probably best to make sure that
> > > attribute values are always escaped.
> > >
> > > What should be escaped? Quotation marks must be. Ambiguous ampersands
> > must
> > > be. But it is probably prudent to escape all ampersands and all
> > > occurrences of < or >.
> > >
> > > I'm sorry I can't suggest a fix in the code—I'm not familiar with the
> > > Texinfo codebase and it's been decades since I've coded in Perl or C.
> > >
> > > I'm using texi2any 7.1.1
> >
> > I tried testing this on the master development branch and it looked
> > ok:
> >
> > $ cat test.texi
> > \input texinfo
> >
> > @node Top
> > @top
> >
> > @node Cap 1
> > @chapter One "<>
> >
> > @bye
> >
> > After running "texi2any --epub3 test.texi" and extracting the
> > resulting "test.epub" file, the output file in the ZIP archive had, in
> > "test/EPUB/xhtml/Cap-1.xhtml", the " < and > escaped (see output below).
> > Can you please explain how to reproduce the problem?
> >
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <!DOCTYPE html>
> > <html xmlns="http://www.w3.org/1999/xhtml">
> > <!-- Created by GNU Texinfo 7.1.1, https://www.gnu.org/software/texinfo/
> > -->
> > <head>
> > <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
> > <title>1 One &quot;&lt;&gt; (Untitled Document)</title>
> >
> > <meta name="description" content="1 One &quot;&lt;&gt; (Untitled
> > Document)"/>
> > <meta name="keywords" content="1 One &quot;&lt;&gt; (Untitled Document)"/>
> > <meta name="resource-type" content="document"/>
> > <meta name="distribution" content="global"/>
> > <meta name="Generator" content="texi2any"/>
> > <meta name="viewport" content="width=device-width,initial-scale=1"/>
> >
> > <link href="" rel="start" title=""/>
> > <link href="" rel="index" title="1 One &quot;&lt;&gt;"/>
> > <link href="" rel="up" title=""/>
> > <link href="" rel="prev" title=""/>
> >
> >
> > </head>
> >
> > <body lang="en">
> > <div class="chapter-level-extent" id="Cap-1">
> >
> > <h2 class="chapter" id="One-_0022_003c_003e">1 One &quot;&lt;&gt;</h2>
> >
> >
> >
> > </div>
> >
> >
> >
> > </body>
> > </html>
> >

reply via email to

[Prev in Thread] Current Thread [Next in Thread]