emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode support for the MS Windows clipboard


From: Benjamin Riefenstahl
Subject: Re: Unicode support for the MS Windows clipboard
Date: Fri, 28 May 2004 15:26:10 +0200
User-agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3.50 (gnu/linux)

Hi all,


I have been doing some testing, research and thinking.  First here is
a quick response on a few points.  Next I'll do a new patch based on
my current ideas.


> "Eli Zaretskii" <address@hidden> writes:
>> Couldn't this be done without introducing Windows-specific options?

Jason Rumney <address@hidden> writes:
> Also, we should set (and read) CF_LOCALE when we are using CF_TEXT,
> to indicate the coding we have used.

Thanks for the pointer, researching locales actually led me to a
solution for deriving codepage properties (OEM vs "ANSI") via locales.

I think I have an algorithm that works.  It makes a few assumptions
about the coding system names and based on that it derives the
requested clipboard type automatically.

How does this sound:

- If selection-coding-system has the form /(.*-)?utf-16.*/, I assume
  CF_UNICODETEXT is wanted.

- If selection-coding-system has the form /cp[0-9]+.*/ or
  /windows-[0-9]+.*/, I derive the codepage from that.

    - Check if the codepage is identical to GetACP() or GetOEMCP().
      If it is, use CF_TEXT or CF_OEMTEXT accordingly. 

    - Else get a corresponding LCID (reverse mapping via
      EnumLocales()) which has the codepage as OEM or "ANSI".  In this
      case we also need to set LC_LOCALE accordingly.

The last step takes a small performance hit, but the results can
easily be cached.

I am also thinking of custom coding systems, like e.g. for doing
automatic remapping of private characters or locale specific
pre-/postprocessing.  This is why I am not completely comfortable with
hardcoding coding systems or using heuristics based on the coding
system symbol names.  If such concerns are completely misplaced,
please just tell ;-).

Anyway, I have no problem with dictating the above naming conventions
for selection-coding-system for now.


Jason Rumney <address@hidden> writes:
> Andrew Innes always had the intention to make the clipboard work
> on-demand, the same way it does on X. So the memory would only be
> used if the clipboard text was actually pasted (and then only for
> the format the client wanted).

We could do that using WM_RENDERFORMAT.  But than we absolutely need a
valid HWND to get a target for that message.  I don't know anything
about the Emacs message loop and the windows that are available.  It
would probably be best to allocate a custom hidden window for this.
I'll postpone that idea for now and just assume that we don't use
Unicode on 9x/Me.

Jason Rumney <address@hidden> writes:
> Another thing worth considering, if we are making major changes to the
> clipboard code, is that Kenichi Handa pointed out some time ago that
> the encoding part of the X clipboard support is now done in Lisp
> (xselect.el). Windows could do this too.

At the moment this is done via {de,en}code_coding() and a couple of
friends.  Is that the same thing?


Benjamin Riefenstahl <address@hidden> writes:
>> Anyway, what happens to the MULE problem in this unified scenario?
>> Do all problems go away with unify-8859-on-{de,en}coding?

Jason Rumney <address@hidden> writes:
> What MULE problem?

Disjunct charsets leading to the introduction of unwanted characters
(similar to that SHIFT-JIS <-> Chinese confusion that you just
mentioned).  At one of the last times when the discussion came up
somebody mentioned that this could still be a serious problem.


Jason Rumney <address@hidden> writes:
> The encoding of CF_UNICODETEXT does not vary, so utf-16-le (or maybe
> -be) is the only coding-system that is appropriate.

Actually at the moment that would be utf-16le-dos, not utf-16-le-dos.
The latter includes a BOM, which we really don't want here.  The
non-intuitive naming difference makes me wonder though, if this is
just some unintended confusion?  There are also currently
utf-16-le-with-signature-* and mule-utf-16-*.


> "Eli Zaretskii" <address@hidden> writes:
>> Also, AFAIK CF_UNICODETEXT _can_ be used on Windows 9x, as any
>> program like clipbrd.exe or ClipConvert will show you.

I tested Win95 and Win98SE.  On both systems, the clipboard viewer and
Notepad couldn't make use of CF_UNICODETEXT.  Cut-and-paste between
two Emacs instances via CF_UNICODETEXT works, so i assume other
applications that support CF_UNICODETEXT would work, too.  No
automatic conversion by Windows, though.


Benjamin Riefenstahl <address@hidden> writes:
>>> - Drop optimizations for ASCII-only text.

> "Eli Zaretskii" <address@hidden> writes:
>> Is that optimization indeed an optimization?

Getting data from the clipboard is indeed quite a bit faster with this
optimization.  Putting something on the clipboard doesn't benefit, but
that's probably because the detection of this case is inefficient, it
uses find_charset_in_text(), although the result is not really
used. So probably that can be made better, too.  I'll try to get this
integrated in the next version of the patch.


benny






reply via email to

[Prev in Thread] Current Thread [Next in Thread]