[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Lilypond's error column printer confuses bytes and characters
From: |
Patrick McCarty |
Subject: |
Re: Lilypond's error column printer confuses bytes and characters |
Date: |
Mon, 26 Oct 2009 09:25:43 -0700 |
User-agent: |
Mutt/1.5.20 (2009-06-14) |
On 2009-10-22, David Kastrup wrote:
> Patrick McCarty <address@hidden> writes:
>
> > On 2009-10-18, David Kastrup wrote:
> >>
> >> GNU LilyPond 2.13.4
> >> Processing `bad.ly'
> >> Parsing...
> >> bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER
> >> MÃÃÃ A\342\231
> >> \257 Bâ \break
> >> error: failed files: "bad.ly"
> >>
> >> Apparently, the error column is being tracked by counting characters,
> >> but is displayed by counting bytes. The indicator appears too early
> >> because of that (which caused me to look for the wrong bug in an input
> >> file of mine).
> >
> > This patch seems to correct the issue, but I don't know if it's the
> > correct fix (or if there are any side effects I'm unaware of).
>
> The code before states:
>
> while (left > 0)
> {
> /*
> FIXME, this is apparently locale dependent.
> */
> #if HAVE_MBRTOWC
> wchar_t multibyte[2];
> size_t thislen = mbrtowc (multibyte, line_chars, left, &state);
> #else
> size_t thislen = 1;
> #endif /* !HAVE_MBRTOWC */
>
> The question is what we do about locales. I think that in this case
> behavior is arguably correct since we are talking about column numbers
> on the terminal/locale, and even when Lilypond is using utf-8, those
> will correspond with the interpretation of the locale.
Sorry about the delay. The output looks okay to me when invoking
xterm with various locales.
Also, the point-and-click functionality still seems to work correctly,
so this *might* fix the problem Harmath reported a few weeks ago:
http://lists.gnu.org/archive/html/bug-lilypond/2009-10/msg00001.html
> By the way: when I switch into POSIX locale, the error message will
> occur before the first Umlaut which is then no longer considered text
> apparently. So we already have some built-in locale dependencies
> elsewhere.
Yes, I'm pretty sure this is coming from glibc.
After stepping through Source_file::get_counts() when LC_ALL=POSIX, I
noticed that mbrtowc() returned -1 (type size_t) when it processed the
ä. As a result, this condition prevents the consideration of more
characters:
/* Stop converting at invalid character;
this can mean we have read just the first part
of a valid character. */
if (thislen == (size_t) -1)
break;
It seems that non-ASCII characters are not valid characters when the
locale is POSIX. The glibc docs aren't very clear on this point, and
only mention the fact that mbrtowc() is locale-dependent.
BTW, as the comment states, it would be nice to use a function that is
not locale-dependent, since the only information we need is the size
(in bytes) of the current UTF-8 character.
> My vote is on getting it merged, but it probably would do no harm if
> somebody checked this on Windows where the old version purportedly
> worked.
I'll apply it and make a note to check the next devel release on
Windows.
Thanks,
Patrick
Re: Lilypond's error column printer confuses bytes and characters, Patrick McCarty, 2009/10/22
Re: Lilypond's error column printer confuses bytes and characters, David Kastrup, 2009/10/22