bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20499: C-x 8 shorthands for curved quotes, Euro, etc.


From: Ivan Shmakov
Subject: bug#20499: C-x 8 shorthands for curved quotes, Euro, etc.
Date: Thu, 07 May 2015 10:00:38 +0000
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

>>>>> Paul Eggert <eggert@cs.ucla.edu> writes:

[…]

 >> … Also, did you consider generating this list automatically, based
 >> on the codepoint properties already known to Emacs?  Something along
 >> the lines of the function MIMEd, which readily produces a list of
 >> entries for the following 133 characters.  (Three spaces added for
 >> symmetry purposes.)

 >> À Á Â Ã Ä È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ù Ú Û Ü Ý
 >> à á â ã ä è é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý
 >> ÿ   Ā ā Ć ć Ĉ ĉ Č č Ď ď Ē ē Ě ě Ĝ ĝ Ĥ ĥ Ĩ ĩ Ī ī Ĵ ĵ Ĺ ĺ
 >> Ľ ľ Ń ń Ň ň Ō ō Ŕ ŕ Ř ř Ś ś Ŝ ŝ Š š Ť ť Ũ ũ Ū ū Ŵ ŵ Ŷ ŷ
 >> Ÿ   Ź ź Ž ž Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǧ ǧ Ǩ ǩ   ǰ Ǵ ǵ Ǹ ǹ Ș ș Ț ț
 >> Ȟ ȟ Ȳ ȳ

 > Sorry, I don't really follow the code that you attached.

        Which part, specifically?

        It just iterates over the range given (or U+00A8 through U+02AF
        by default) and maps “LATIN + COMBINING” decompositions to
        'iso-transl entries.  For example, it maps the (?g #x327)
        decomposition (U+0327 being COMBINING CEDILLA) for U+0123 into
        an (",g" . ģ) entry.

        Or, rather, it /should/, for my code has an obvious typo:

                    (`(,c #x30c) (string ?v c))
                    (`(,c #x326) (string 59 c))
-                   (`(,c #x326) (string ?, c)))))
+                   (`(,c #x327) (string ?, c)))))

        Other possible additions (assuming we’ll agree on C-x 8 u,
        C-x 8 .) are:

                   (`(,c #x304) (string ?= c))
+                  (`(,c #x306) (string ?u c))
+                  (`(,c #x307) (string ?. c))
                   (`(,c #x308) (string 34 c))
+                  (`(,c #x30b) (string ?2 c))
                   (`(,c #x30c) (string ?v c))

 > Although I suppose it comes from a decomposition table, I don't know
 > what the table was designed for, and it's not clear to me how it's
 > relevant.

        I hope someone more knowledgeable could comment on this.  Still,
        this (ab)use of the data seem to work well in practice.

 > Anyway, most of those letters are either in iso-transl.el now,

        The point is to /remove/ them from 'iso-transl, as these entries
        duplicate, in a way, a part of the decomposition table already
        present in Emacs.

[…]

 >> Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǹ ǹ

 > These are for toned Pinyin but this list is incomplete.  If we wanted
 > to cover toned Pinyin, we'd also need Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ.  Coming up
 > with two-character abbreviations for all these might be tricky.

        But are we actually limited to two-character abbreviations only?
        Why not allow for, say, C-x 8 " ' u?

[…]

 >> ǰ

 > What language uses this?  I couldn't find one.

        To quote NamesList.txt:

01F0    LATIN SMALL LETTER J WITH CARON
        * IPA and many languages

 >> Ǵ ǵ

 > Good catch.  These are used for transliteration from Serbian and
 > Macedonian.  We should also include Ḱ ḱ as they are also needed.
 > Included in the attached patch.

        The code I’ve suggested could be used to scan the U+1Exx range
        just as well, thus resulting in the following set.

    Ḑ ḑ Ḡ ḡ Ḧ ḧ Ḩ ḩ Ḱ ḱ Ḿ ḿ Ṕ ṕ Ṽ ṽ Ẁ ẁ Ẃ ẃ Ẅ ẅ Ẍ ẍ Ẑ ẑ ẗ Ẽ ẽ Ỳ ỳ Ỹ ỹ

[…]

 > Anyway, part of what's going on here is that the proposed list
 > doesn't cover every Latin character in the ISO 10646 repertoire
 > (that'd be a large set), but instead is limited to what appear to be
 > reasonably commonly letters.  Admittedly this is not universal but
 > one must cut things off somewhere, and it would be odd to add only
 > partial coverage for toned Pinyin, Livonian, etc.

        When it comes to the LATIN … LETTER WITH … letters, my proposal
        for such a cut off would be to satisfy /both/ of the following
        criteria:

        • only cover specific Unicode ranges; such as, for instance,
          U+00A8 through U+02AF, U+1E00 … U+1EFF, perhaps 2C60 … 2C7F;

        • only cover the letters which can be represented with a
          sufficiently general C-x 8 ⟨diacritic⟩+ ⟨ASCII-latin⟩ pattern.

        Other characters deemed common may be added to the list.

 >>> --------------090904020002020306060104
 >>> Content-Type: text/x-patch;
 >>>  name="0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch"

 >> This MIME part sure wants ‘; charset=UTF-8’.  Otherwise, Gnus does
 >> no decoding, and Emacs shows the contents with the likes of
 >> \304\260.

 > Hmm, it works for me.  I use Thunderbird to read the top level
 > message, and it spins off an Emacs to display the attachment with no
 > problem.

        I can “spin off” cat(1) to read the offending MIME part, too:
        Emacs will feed it raw-text, and interpret the result as UTF-8
        (the default.)

        It still does /not/ comply with the MIME specification.
        Consider section 4.1.2 of RFC 2046:

 RFC> […] The default character set, which must be assumed in the
 RFC> absence of a charset parameter, is US-ASCII.

        RFC 6657 updates this as follows:

 RFC> Each subtype of the "text" media type that uses the "charset"
 RFC> parameter can define its own default value for the "charset"
 RFC> parameter, including the absence of any default.

        However, given that ‘text/x-patch’ is not a /registered/ MIME
        type, I believe the above does not apply.

 > The web-site archive at <http://bugs.gnu.org/20499#60> also works for
 > me with Firefox.

 > It's common for people to send the output of "git send-email" as
 > attachments;

        If Thunderbird /knows/ the encoding (“character set”) of the
        contents of the MIME part, it /should/ specify it in the MIME
        part header.  If the said contents is strictly 7-bit, it /could/
        omit that (given that it’s more than likely to be US-ASCII.)
        Otherwise, I guess Thunderbird should either ask the user for
        the encoding /or/ send the part as application/octet-stream.

[…]

-- 
FSF associate member #7257  np. Satellite one — Purple Motion  B6A0 230E 334A





reply via email to

[Prev in Thread] Current Thread [Next in Thread]