[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ".*utf\\(-?8\\)\\>" versus ".*[._]utf" versus "address@hidden>"
From: |
Paul Eggert |
Subject: |
Re: ".*utf\\(-?8\\)\\>" versus ".*[._]utf" versus "address@hidden>" |
Date: |
Tue, 1 Jan 2002 10:30:31 -0800 (PST) |
> From: Dave Love <address@hidden>
> Date: 01 Jan 2002 17:07:37 +0000
>
> > No, the preceding entry "address@hidden>" has a delimiter, and the other
> > entries (e.g. ".*8859[-_]?1\\>") are special cases because ISO 8859
> > locale names in practice could have almost anything before the
> > '8859'.
>
> I don't understand why utf-8 should be any different.
Because utf-8 should be the normal case. In the normal case, the
encoding name should be delimited, to prevent incorrect matches when
one encoding name is a suffix of another.
> > I've never seen a locale by that name, and I doubt whether we'll
> > run into one. Locale names like 'iso_8859_1' are still around for
> > backward compatibility reasons, but modern locale names give more
> > than just the character encoding.
>
> Why are only modern names necessarily relevant (and only modern
> Unix-like systems)? Emacs has long been documented to accept just
> that in the environment variables and at least some modern systems
> seem to be happy with it.
I'm not sure I follow your point, but I'll try to answer. The code in
question is using a heuristic to guess the coding system from the
locale name. All other things being equal, it's better to keep the
heuristic simple and easy to explain. The heuristic I was trying to
use is:
Emacs looks at the codeset part of the locale name (e.g. the "UTF-8"
in "address@hidden"), except that there is a special case for
old-fashioned 8859-style locale names like "iso_8859_1".
> I've seen/used considerable variations, so I aimed to be permissive
> like the existing cases. Could this actually lose?
I don't know of any wins or losses in practice, but I think the more
aggressive match would make the documentation a bit more complicated.
This is a fairly minor issue; I wouldn't object much to the more
aggressive match.