[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Loss of search facility in info in newer releases of Texinfo
From: |
Alan Mackenzie |
Subject: |
Re: Loss of search facility in info in newer releases of Texinfo |
Date: |
Mon, 11 Oct 2021 20:47:40 +0000 |
Hello, Gavin.
On Mon, Oct 11, 2021 at 16:43:21 +0100, Gavin Smith wrote:
> On Mon, Oct 11, 2021 at 11:35:06AM +0000, Alan Mackenzie wrote:
> > If there are any other formatting characters above 0x7f inserted by
> > Texinfo, I would also like their "ASCII" equivalents to be used instead.
> I've checked with a test file and the output without @documentencoding
> is close to what you ask for.
OK, thanks. But this appears not to be documented in the Texinfo
manual. In fact, the effect of omitting @documentencoding is entirely
undocumented on the page @documentencoding. This is not good.
I think it is also undocumented that texi2any puts Unicode punctuation
characters around things like @code{foo}. It sometimes uses ASCII
punctuation characters instead. Which it uses and when, I think is also
undocumented. If so, that is also not good.
@documentencoding appears to be doing three jobs, which I think really
ought to be done by separate directives: (i) It specifies the encoding
used in the .texi file; (ii) It specifies the encoding to be used in the
..info file; (iii) It specifies whether to use Unicode or ASCII markers
around @code{foo}, etc. I'm not sure it does any of these jobs well.
I think my request is really to separate out (iii) from (i) and (ii).
After spending a lot of the weekend and today on this topic I am now
thoroughly confused about character encodings in Texinfo.
> \input texinfo
> @c @documentencoding UTF-8
> @dfn{foo}
> @code{code}
> `bar'
> `hello'
> ``oompa''
> a---b
> c--d
> Herr M@"uller will Sie sprechen.
> @bye
> $ ./texi2any.pl test.texi -c OPEN_QUOTE_SYMBOL=\` -c CLOSE_QUOTE_SYMBOL=\'
> test.texi: warning: document without nodes
> $ cat test.info
> This is test.info, produced by texi2any version 6.8dev+dev from
> test.texi.
> "foo"
> `code'
> 'bar'
> 'hello'
> "oompa"
> a--b
> c-d
> Herr Müller will Sie sprechen.
> Tag Table:
> End Tag Table
> Local Variables:
> coding: utf-8
> End:
> $
Where did this "coding: utf-8" in the Local Variables: come from? Is
UTF-8 now some sort of default in Texinfo? This coding: setting doesn't
appear at the end of my copy of texinfo.info, even though texinfo.texi
also lacks a @documentencoding command.
What would have happened if the ü had appeared in its utf-8 encoding
0xc3, 0xbc rather than @"u, given that there's no @documentencoding
directive in the source? This seems also to be undocumented in the
manual. I should really try this out myself, but I'm too tired at the
moment.
> Notice the OPEN_QUOTE_SYMBOL wasn't used in some of the cases.
> With the @documentencoding line not commented out it is:
> This is test.info, produced by texi2any version 6.8dev+dev from
> test.texi.
> “foo”
> `code'
> ‘bar’
> ‘hello’
> “oompa”
> a—b
> c–d
> Herr Müller will Sie sprechen.
> Tag Table:
> End Tag Table
> Local Variables:
> coding: utf-8
> End:
> again with the OPEN_QUOTE_SYMBOL and CLOSE_QUOTE_SYMBOL not affecting the
> the output for ` and ' - arguably a bug.
> > > If you remove "@documentencoding UTF-8" from a file, the file is still
> > > assumed to be in UTF-8, but less Unicode is used in the output where it
> > > is not necessary. Does that help?
It helps my understanding a bit. It doesn't help me in running texi2any
/ makeinfo, where the files.texi are going to have @documentencoding
UTF-8 in them. What I really need is a command line switch to tell
texi2any which sort of textual markers to use when the output encoding
is UTF-8.
> > Not really. I've got too many info files on my system (Gentoo
> > GNU/Linux) to remove that directive from them all each time there's a
> > new version of the file.texi.
> > So, I'm asking you to implement such an option in the next version of
> > Texinfo, or perhaps accept a patch from me which would do this.
> Yes I think it is a valid desire to have such an option, especially as such
> an output is already available by changing the use of @documentencoding.
> (That's why I made @documentencoding have this effect in the first place,
> to give the chance to avoid having unnecessary UTF-8 sequences in Info files.)
> Look at where the 'no_extra_unicode' flag is set in
> Texinfo/Convert/Plaintext.pm - any option should use the same code as this.
OK, I understand that bit of the code now, thanks!
--
Alan Mackenzie (Nuremberg, Germany).
- Re: Loss of search facility in info in newer releases of Texinfo, (continued)
- Re: Loss of search facility in info in newer releases of Texinfo, Jacob Bachmeyer, 2021/10/09
- Re: Loss of search facility in info in newer releases of Texinfo, Eli Zaretskii, 2021/10/10
- Re: Loss of search facility in info in newer releases of Texinfo, Alan Mackenzie, 2021/10/10
- Re: Loss of search facility in info in newer releases of Texinfo, Patrice Dumas, 2021/10/10
- Re: Loss of search facility in info in newer releases of Texinfo, Alan Mackenzie, 2021/10/10
- Re: Loss of search facility in info in newer releases of Texinfo, Gavin Smith, 2021/10/11
- Re: Loss of search facility in info in newer releases of Texinfo, Alan Mackenzie, 2021/10/11
- Re: Loss of search facility in info in newer releases of Texinfo, Patrice Dumas, 2021/10/11
- Re: Loss of search facility in info in newer releases of Texinfo, Alan Mackenzie, 2021/10/11
- Re: Loss of search facility in info in newer releases of Texinfo, Gavin Smith, 2021/10/11
- Re: Loss of search facility in info in newer releases of Texinfo,
Alan Mackenzie <=
- Re: Loss of search facility in info in newer releases of Texinfo, Gavin Smith, 2021/10/18
- Re: Loss of search facility in info in newer releases of Texinfo, Alan Mackenzie, 2021/10/19