Re: per-buffer language environments

emacs-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: per-buffer language environments

From:	Stephen J. Turnbull
Subject:	Re: per-buffer language environments
Date:	Fri, 17 Dec 2010 06:10:44 +0900
Eli Zaretskii writes:

 > The emphasis on *reading* takes what I originally wrote out of its
 > context.  I didn't comment on reading alone, I commented on the entire
 > issue of coding-systems being tied up to the language:

I know you were talking about something else, but I can't figure out
what or why.  You said, "don't associate coding priorities with
language," I gave a good reason why there should be coding priorities
associated with language.  The rest of what you write is irrelevant,
since none of it points out a real problem with that association.

 > If the ambiguity you are talking about is that there are more settings
 > than just for reading,

Of course that's not the ambiguity I'm talking about.  The ambiguity
I'm talking about is in the reading, and that is sufficient reason to
associate priorities with language.

 > I agree that it would be useful to have a language as per-buffer
 > setting.  This discussion is about what should that include.

It should include priorities for encoding detection.

 > > Of course a significant fraction is possible.  That's precisely what
 > > the priority lists have been achieving since the early 1990s.
 > 
 > Evidently, your examples try to show that the fraction is not
 > significant enough.

No, my examples show what you will lose by removing the association of
encoding priority with language environment.

 > > If your complaint is that we should do better, "patches welcome" is
 > > the only thing I can think of to say.
 > 
 > No, I'm saying we shouldn't try to do better _automatically_.  Users
 > have enough facilities to affect the defaults according to their
 > specific use-cases.

Handa-san was not talking about trying to do better.  He was talking
about how we achieve the success rates we currently get.  Removing the
association of language with encoding priority would drastically
decrease that for anybody who needs to deal with multiple languages
and multiple associated encodings in their environment.

 > Exactly my point: the user can override the automated selections if
 > she needs.  So the current automation doesn't need to do better.

Well, your point is just plain wrong, then, because nobody is
proposing a change w.r.t. the current automation.  All that has been
suggested is that we keep doing the same things we've been doing to
achieve a reasonable degree of automatic recognition for people in
environments with multiple encodings.

 > > A completely different purpose (handling exceptions)
 > > from the language environment itself (handling the unmarked case).
 > 
 > Except that set-language-environment calls prefer-coding-system under
 > the hood to do most of its job...

Yes, this works for Europeans, Arabs, and Israelis, because basically
what you need to do is disambiguate ISO-8859-X, and just putting the
right ISO coding system (or perhaps a Windows-125x coding system) at
the head of the list (ie, just using prefer-coding-system) does what
you need.  It's not good enough for Han users because they need to
disambiguate EUC from each other and from 8-bit ISO, and among
Microsoft bogus encodings (Shift JIS and Big5).  That means
manipulating the priority lists at positions other than head of list.
I'm not sure about Cyrillic users.

 > > That's an honest question; the way you are going, I have to wonder.
 > 
 > Knowing me for as long as you do, I wonder how can such a question be
 > honest.  But I digress.

Usually you don't miss a point like "nobody is proposing anything new
here for how language environments work".  (All that is being proposed
is making them buffer-local.)  Since you did miss it, I have to wonder
if you know anything about how encoding detection works internally.

 > I wasn't talking about any bugs at all.  Werner suggested to add a new
 > _feature_; I was talking about what that feature should and shouldn't
 > include.

Well, you're wrong about manipulating the coding priorities.  It is
not new, and it is needed.

 > > And of course in this case, locale is a heuristic.  *Emacs is a
 > > multilingual* (well, technically, multiscript) *application*, and any
 > > setting of the language environment that doesn't take into account the
 > > current text we're working with is surely heuristic.
 > 
 > If so, it's a heuristic that is external to Emacs.  Emacs just abides
 > by it, because users expect that.  Anyway, this aspect is entirely
 > unrelated to the issue at hand.

Of course it's not unrelated.  Referring to the locale is an external
heuristic and therefore unreliable.  If the user sets a language
environment, that is surely better information than what you get from
the locale.  However, it's probably a good idea to merge information
from the new language environment with that from the old one, giving
precedence to the new.

 > But this is not the main issue I wanted to discuss.  The main issue is
 > what constitutes a "language environment" as far as Emacs is
 > concerned, after we factor out the effects of the locale?

What are you talking about, "factor out"?  If the user sets a language
environment, that will override the locale on all points where it
specifies behavior.

 > Perhaps a useful starting point would be to ask: what exactly is a
 > "language name" string? should it specify only a language, or should
 > it also try to specify the preferred encodings?

It should specify only the language, IMO.  Determining the preferred
encodings is complex but fairly mature at this point.  If the user
doesn't want the default priorities associated with a language, I
don't see why they shouldn't use prefer-coding-system or
set-coding-priority-list rather than piggyback on the language
environment itself.

 > I'm not sure it's as black and white as you make it sound.  For
 > example, users of the same language on GNU/Linux and on MS-Windows
 > might very well disagree wrt to the preferred encodings.  So some
 > aspects of the locale still affect language-specific choices.

Huh?  That's not "locale", that's system convention.  Locale is
something else entirely.  It's true that you can override that
heuristic via locale, but (at least in XEmacs) we take the system type
into account when computing the startup priorities, even if the locale
specifies an encoding.  I would imagine Emacs does the same.

 > But again, I think talking about the locale just muddies the waters
 > in this discussion.

Then why do you keep talking about it?

Can we agree that it's a good heuristic for (1) the initial language
environment for *scratch* and (2) when an encoding is specified in the
locale, it should be prefer-coding-system'd, and (3) after doing (1)
and (2) we don't care about the locale any more?
[Prev in Thread]
Current Thread
[Next in Thread]
Re: per-buffer language environments, (continued)
Prev by Date: Re: Why there is no char type?
Next by Date: Re: Why there is no char type?
Previous by thread: Re: per-buffer language environments
Next by thread: Re: per-buffer language environments
Index(es):
- Date
- Thread