emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Emacs-diffs] master 9ce1d38: Use curved quotes in core elisp diagno


From: David Kastrup
Subject: Re: [Emacs-diffs] master 9ce1d38: Use curved quotes in core elisp diagnostics
Date: Sun, 30 Aug 2015 17:49:09 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux)

Eli Zaretskii <address@hidden> writes:

>> From: Stefan Monnier <address@hidden>
>> Date: Sat, 29 Aug 2015 22:00:35 -0400
>> Cc: address@hidden, address@hidden
>> 
>> > Then how are we supposed to handle similar issues, if no one else
>> > knows this, and never will?
>> 
>> By designing a better solution, I guess.
>
> I'm not sure I understand: are you saying that it is fundamentally
> wrong or unclean to have a syntax category for word-constituent
> characters that cannot appear at word beginning or end?  If so, please
> explain why you think so, because this situation happens with many
> characters in human languages, and is not really different from other
> similar syntax categories.
>
> If the idea is OK, and only its current implementation is not clean,
> then I see no reason to refrain from documenting the Lisp-level
> feature, because it will remain unchanged even when the implementation
> will be cleaned up.

For the record: LilyPond's definition of the lexical category "word" is
any sequence of ASCII letters and arbitrary non-ASCII (multibyte)
characters interrupted by isolated hyphens and underlines.

c--d is a note c with a dash-separated accent - followed by a note d.
c-d is a word of its own.

There is a bit of history to this where LilyPond had too many different
definitions of "word" depending on its current lexical mode.

The respective definitions in the (Flex-defined) lexer are:

A               [a-zA-Z\200-\377]
WORD            {A}([-_]{A}|{A})*
COMMAND         \\{WORD}

The lexer is working on UTF-8 encoded bytes as input.  Whenever a
pattern accepts anything outside of the ASCII range, a checking routine
makes sure that only proper UTF-8 is passed on.

At any rate, it would be cool if words could be matched solely by syntax
table.  The "any non-ASCII character" bit might be impractical to
implement, but at least the word syntax inside of the ASCII range would
be nice.

-- 
David Kastrup



reply via email to

[Prev in Thread] Current Thread [Next in Thread]