bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#21055: Info reader fails to follow xrefs to anchors


From: Eli Zaretskii
Subject: bug#21055: Info reader fails to follow xrefs to anchors
Date: Tue, 14 Jul 2015 17:57:56 +0300

Redirecting the Emacs part to bug-gnu-emacs; see
http://lists.gnu.org/archive/html/bug-texinfo/2015-07/msg00051.html
for the related Texinfo discussion.

CC to Juri, who made the offending change.

> From: ludo@gnu.org (Ludovic Courtès)
> Cc: bug-texinfo@gnu.org
> Date: Mon, 13 Jul 2015 22:16:02 +0200
> 
> The standalone Info reader in Texinfo 6.0 fails to follow
> cross-references to anchors: Following such a link leads to an unrelated
> place in the document.  This is a regression compared to Texinfo 5.2
> (guix.texi is one example that illustrates the bug.)
> 
> Unfortunately the Emacs Info reader has had the same problem for a long
> time, but I suppose this one should go to bug-emacs?
> 
> That’s with 24.5.1, and I remember experience that with earlier
> versions too.

There are two issues here.  One is that Emacs 24.4 introduced a
change, as part of fixing bug #14125 (see
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=14125), which caused the
Emacs Info reader to go to the wrong place when it follows
cross-references to anchors (as opposed to references to nodes).  The
other problem is the generous use of UTF-8 encoded characters in
guix.info, including in the preamble, which makes Emacs's job even
harder, because references in Info files are given in bytes, not
characters.

The second problem needs an infrastructure, part of which was
introduced only recently: how to convert a file byte offset to an
Emacs buffer position (which counts characters), accounting correctly
for the file's encoding and EOL format.  It sounds like we would need
the reverse conversion for fixing this present problem, see below.

As for the first part: I've read the discussions in bug #14125, and
tried playing with the test file provided there, and I must say that I
understand neither the problem nor its solution.  The analysis of the
problem (see http://debbugs.gnu.org/cgi/bugreport.cgi?bug=14125#11)
was this:

> Makeinfo 4.13 produced the character positions of indirect subfiles
> relative to the beginning of the first node, but Makeinfo 5.0 produces the
> positions relative to the beginning of the subfile.  The Emacs Info reader
> fails when the distance between the beginning of the subfile and
> the beginning of its first node is longer than a thousand characters.
> [...]
> The expression (+ (- nodepos lastfilepos) (point)) in `Info-read-subfile'
> assumes that `lastfilepos' in `Info-read-subfile' is the beginning of the
> first node, so for Info files produced by Makeinfo 4.13 it returns the
> length of the summary segment, but for Makeinfo 5.0 it returns
> two lengths of the summary segment.

Perhaps I don't understand what this says, but the conclusion sounds
incorrect to me.

The actual difference between makeinfo 4.13 and makeinfo 5.0 and later
is that with makeinfo 5 the starting position of the 2nd, 3rd,
etc. subfile includes the length of the preamble text that precedes
the first node in the subfile.  In makeinfo 4, only the beginning of
the first subfile included the preamble, and all the rest excluded it.

But that doesn't matter, IMO, because with both versions of makeinfo,
if a subfile's beginning is recorded in the tag table as byte position
N, the first node in that subfile is also recorded to start at byte
position N.  Therefore, to find the byte offset of a node/anchor from
the beginning of a subfile, one needs to do this:

   (+ (- nodepos lastfilepos) preamble-length)

in both the old and the new versions.  To find the length of the
preamble, one needs to search from the beginning of the subfile for
the start of the first node, and then compute the file's byte number
of that position.  Therefore, the original code in Info-read-subfile,
viz.:

  (+ (- nodepos lastfilepos) (point))

was an approximation that did TRT for ASCII Info files.  It is easy to
extend this to UTF-8 encoded files:

  (+ (- nodepos lastfilepos) (position-bytes (point)))

Other encodings, as well as DOS end-of-line format, will need a
dedicated function similar to filepos-to-bufferpos, but in the
reverse direction.  (We also need to subtract 1 from the above
expression, since we need a zero-based offset.)

Juri, do you see any flaws in the above description?  I couldn't
reproduce the problem reported in bug #14125, so I'm not sure why the
fix you installed was even needed, or where my reasoning is wrong.  I
tried both Emacs 24.3 (for which the bug was filed) and later
versions, and they all work correctly with the Info file produced from
the Texinfo source attached to that bug report, no matter if I produce
the Info file with makeinfo 4.13 or makeinfo 5.1 or 6.0.  So I'm
unsure what problems you saw with the original code in
Info-read-subfile.  Could you describe those problems in more detail
than you did in the bug discussions?

Why are these problems invisible when following references to nodes,
you ask?  Because in that case we search for the node's header line
after going to the recorded position.  So going to a position that
undershoots (which is what that change caused) doesn't do any visible
harm.  But for references to anchors, we don't have any text to
search, so the position where we place the reader should be reasonably
exact.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]