|
From: | Benjamin Kalish |
Subject: | Re: Unescaped content in HTML attributes in EPUB output of texi2any |
Date: | Wed, 25 Sep 2024 19:43:10 -0400 |
On Sun, Sep 22, 2024 at 05:27:34PM -0400, Benjamin Kalish wrote:
> It looks like the problem occurs only with the use of raw HTML, directly
> (as in the minimal example here), or indirectly through a macro (as I first
> encountered it):
I can reproduce with your example (in inline_in_chap.texi file), with
USE_NODES set to 0 (as is done for epub):
$srcdir/texi2any.pl --html -c 'USE_NODES 0' inline_in_chap.texi
I am not sure that this is a bug, though, looks like a feature to me, as
@inlineraw turns off escaping of HTML characters. The situation is not
ideal, because there is no way to specify something different for
attributes (called 'string' context in texi2any) such as <meta>
name="description" content attribute, and in the main output, for
instance in the chapter heading <h2>. When Texinfo code is used, the
formatting to HTML can be different in 'string' context and in normal
context, but I can't see how this could be specified for @inlineraw raw
HTML.
Maybe we could say something in the documentation, for instance that
HTML elements should not be used in @inlineraw in @node or sectioning
commands?
> \input texinfo
>
> @node Top
> @top
>
> @node Cap 1
> @chapter @inlineraw{html,<span class="test">}One@inlineraw{html,</span>}
>
> @bye
>
> Benjamin Kalish
>
>
> On Sun, Sep 22, 2024 at 4:47 PM Gavin Smith <gavinsmith0123@gmail.com>
> wrote:
>
> > On Sun, Sep 22, 2024 at 02:20:36PM -0400, Benjamin Kalish wrote:
> > > EPUB output contains unescaped content in a number of HTML attributes.
> > I'm
> > > seeing this with:
> > >
> > > - The content attribute for <meta> with name="description"
> > > - The content attribute for <meta> name="keywords"
> > > - The title attribute of the <link> elements with rel="next" and
> > rel="prev"
> > >
> > > HTML output also has these same tags and attributes, but the content
> > seems
> > > fine in my case. This may not actually be due to better escaping, as it
> > > looks like entirely different content is being used for the attribute
> > > values when generating HTML, and the content is, in this case at least,
> > > safe without escaping.
> > >
> > > Changing the values to be the same as those used when generating HTML
> > would
> > > solve the problem in my case, but it is probably best to make sure that
> > > attribute values are always escaped.
> > >
> > > What should be escaped? Quotation marks must be. Ambiguous ampersands
> > must
> > > be. But it is probably prudent to escape all ampersands and all
> > > occurrences of < or >.
> > >
> > > I'm sorry I can't suggest a fix in the code—I'm not familiar with the
> > > Texinfo codebase and it's been decades since I've coded in Perl or C.
> > >
> > > I'm using texi2any 7.1.1
> >
> > I tried testing this on the master development branch and it looked
> > ok:
> >
> > $ cat test.texi
> > \input texinfo
> >
> > @node Top
> > @top
> >
> > @node Cap 1
> > @chapter One "<>
> >
> > @bye
> >
> > After running "texi2any --epub3 test.texi" and extracting the
> > resulting "test.epub" file, the output file in the ZIP archive had, in
> > "test/EPUB/xhtml/Cap-1.xhtml", the " < and > escaped (see output below).
> > Can you please explain how to reproduce the problem?
> >
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <!DOCTYPE html>
> > <html xmlns="http://www.w3.org/1999/xhtml">
> > <!-- Created by GNU Texinfo 7.1.1, https://www.gnu.org/software/texinfo/
> > -->
> > <head>
> > <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
> > <title>1 One "<> (Untitled Document)</title>
> >
> > <meta name="description" content="1 One "<> (Untitled
> > Document)"/>
> > <meta name="keywords" content="1 One "<> (Untitled Document)"/>
> > <meta name="resource-type" content="document"/>
> > <meta name="distribution" content="global"/>
> > <meta name="Generator" content="texi2any"/>
> > <meta name="viewport" content="width=device-width,initial-scale=1"/>
> >
> > <link href="" rel="start" title=""/>
> > <link href="" rel="index" title="1 One "<>"/>
> > <link href="" rel="up" title=""/>
> > <link href="" rel="prev" title=""/>
> >
> >
> > </head>
> >
> > <body lang="en">
> > <div class="chapter-level-extent" id="Cap-1">
> >
> > <h2 class="chapter" id="One-_0022_003c_003e">1 One "<></h2>
> >
> >
> >
> > </div>
> >
> >
> >
> > </body>
> > </html>
> >
[Prev in Thread] | Current Thread | [Next in Thread] |