emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF


From: Eli Zaretskii
Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Date: Sun, 27 Sep 2015 13:13:01 +0300

> From: Rustom Mody <address@hidden>
> Date: Sun, 27 Sep 2015 14:50:48 +0530
> 
> Ive been trying to understand this stuff and was looking at eg.
> lisp/language/indian.el
> 
> In there I find that:
> (defconst bengali-composable-pattern
>   (let ((table
>      '(("a" . "\u0981")        ; SIGN CANDRABINDU
>        ("A" . "[\u0982-\u0983]")    ; SIGN ANUSVARA .. VISARGA
>        ("V" . "[\u0985-\u0994\u09E0-\u09E1]") ; independent vowel
>        ("C" . "[\u0995-\u09B9\u09DC-\u09DF\u09F1]") ; consonant
>        ("B" . "[\u09AC\u09AF-\u09B0\u09F0]")        ; BA, YA, RA
>        ("R" . "[\u09B0\u09F0]")        ; RA
>        ("n" . "\u09BC")        ; NUKTA
>        ("v" . "[\u09BE-\u09CC\u09D7\u09E2-\u09E3]") ; vowel sign
>        ("H" . "\u09CD")        ; HALANT
>        ("T" . "\u09CE")        ; KHANDA TA
>        ("N" . "\u200C")        ; ZWNJ
>        ("J" . "\u200D")        ; ZWJ
>        ("X" . "[\u0980-\u09FF]"))))    ; all coverage
> etc etc

This is unrelated: it specifies which character sequences should be
composed and displayed as a single grapheme cluster.

> So then I checked why the file was showing as UTF-8 encoded.
> 
> Found this one non-ASCII line:
> 
> (set-language-info-alist
>  "Kannada" '((charset unicode)
>          (coding-system mule-utf-8)
>          (coding-priority mule-utf-8)
>          (input-method . "kannada-itrans")
>          (sample-text . "Kannada (ಕನ್ನಡ)    ನಮಸ್ಕಾರ")
>          (documentation . "\
> Kannada language and script is supported in this language
> environment."))
>  '("Indian"))
> 
> It strikes me that this sample text should be there for the other
> languages also but it does not seem to be there

You cannot base encoding decisions on the language or script alone,
unless that language exists in a single locale.  Many languages and
scripts serve several different locales with several different default
encodings.

> Just for context if I can understand whats going on, I would like to
> help improve this/these docs:
> 
> 
> (info "(elisp)input methods")
> 
>   | How to define input methods is not yet documented in this manual,
> but here we
>   | describe how to use them.

Again unrelated.  Input methods are about typing characters not
directly supported by the user's keyboard.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]