bug-lilypond
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug report: U+3000 IDEOGRAPHIC SPACE isn't treated as whitespace


From: Werner LEMBERG
Subject: Re: Bug report: U+3000 IDEOGRAPHIC SPACE isn't treated as whitespace
Date: Thu, 15 Feb 2018 23:20:26 +0100 (CET)

>> LilyPond intentionally uses exclusively the ASCII character range
>> for syntactic purposes.
> 
> ...except it doesn't, as stated.  Lilypond source files aren't
> encoded in ASCII, and anything in a source file is (potentially)
> syntactic, at least as I understand that word.

No, no.  The possibility of using arbitrary UTF8 characters for
identifier names doesn't make them syntactic.  You still need an ASCII
backslash and an ASCII space character (in most situations) to start
and end a call to an idendifier, for example.

> Lilypond source files can contain (AFAIK) any UTF-8 character.  And
> that fact, I believe, means that Lilypond has to have at least a
> modicum of awareness of the properties of those characters as
> guaranteed by the Unicode standard.

Lilypond is essentially a programming language with precisely defined
tokens, not intended for writing plain text prose.  Allowing whole
character classes is counterproductive IMHO.  It's far simpler to say
that ASCII whitespace – which already is a bunch of characters, as you
correctly state – delimits tokens.

> 2. U+3000 IDEOGRAPHIC SPACE has essentially the same semantics as
> U+0020 SPACE (the differences are presentational, and the two
> characters are separate in Unicode largely due to historical
> accident).

I think you are mixing up the behaviour of whitespace characters
within strings and general whitespace between lilypond and/or Scheme
tokens.  For the former it might be indeed useful to consider
character properties while doing text formatting (for example, to
allow line breaks after CJK characers if a string block gets formatted
as a justified paragraph, say).  For the latter, it confuses
everything.  Just imagine to use U+200A (HAIR SPACE) instead of a
standard space – the source might become completely unreadable if
viewed with a proportional font...

> 3. Given 1. and 2., I think that it's silly to treat U+3000
> semantically differently from U+0020 just because it happens not to
> match a certain 7-bit legacy encoding. :)

I strongly disagree, and I see zero benefit to allow it generally.


    Werner

reply via email to

[Prev in Thread] Current Thread [Next in Thread]