Re: Unicode support for the MS Windows clipboard

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode support for the MS Windows clipboard

From:	Benjamin Riefenstahl
Subject:	Re: Unicode support for the MS Windows clipboard
Date:	Fri, 28 May 2004 15:26:10 +0200
User-agent:	Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3.50 (gnu/linux)

Hi all,

I have been doing some testing, research and thinking.  First here is
a quick response on a few points.  Next I'll do a new patch based on
my current ideas.

> "Eli Zaretskii" <address@hidden> writes:
>> Couldn't this be done without introducing Windows-specific options?

Jason Rumney <address@hidden> writes:
> Also, we should set (and read) CF_LOCALE when we are using CF_TEXT,
> to indicate the coding we have used.

Thanks for the pointer, researching locales actually led me to a
solution for deriving codepage properties (OEM vs "ANSI") via locales.

I think I have an algorithm that works.  It makes a few assumptions
about the coding system names and based on that it derives the
requested clipboard type automatically.

How does this sound:

- If selection-coding-system has the form /(.*-)?utf-16.*/, I assume
  CF_UNICODETEXT is wanted.

- If selection-coding-system has the form /cp[0-9]+.*/ or
  /windows-[0-9]+.*/, I derive the codepage from that.

    - Check if the codepage is identical to GetACP() or GetOEMCP().
      If it is, use CF_TEXT or CF_OEMTEXT accordingly. 

    - Else get a corresponding LCID (reverse mapping via
      EnumLocales()) which has the codepage as OEM or "ANSI".  In this
      case we also need to set LC_LOCALE accordingly.

The last step takes a small performance hit, but the results can
easily be cached.

I am also thinking of custom coding systems, like e.g. for doing
automatic remapping of private characters or locale specific
pre-/postprocessing.  This is why I am not completely comfortable with
hardcoding coding systems or using heuristics based on the coding
system symbol names.  If such concerns are completely misplaced,
please just tell ;-).

Anyway, I have no problem with dictating the above naming conventions
for selection-coding-system for now.

Jason Rumney <address@hidden> writes:
> Andrew Innes always had the intention to make the clipboard work
> on-demand, the same way it does on X. So the memory would only be
> used if the clipboard text was actually pasted (and then only for
> the format the client wanted).

We could do that using WM_RENDERFORMAT.  But than we absolutely need a
valid HWND to get a target for that message.  I don't know anything
about the Emacs message loop and the windows that are available.  It
would probably be best to allocate a custom hidden window for this.
I'll postpone that idea for now and just assume that we don't use
Unicode on 9x/Me.

Jason Rumney <address@hidden> writes:
> Another thing worth considering, if we are making major changes to the
> clipboard code, is that Kenichi Handa pointed out some time ago that
> the encoding part of the X clipboard support is now done in Lisp
> (xselect.el). Windows could do this too.

At the moment this is done via {de,en}code_coding() and a couple of
friends.  Is that the same thing?

Benjamin Riefenstahl <address@hidden> writes:
>> Anyway, what happens to the MULE problem in this unified scenario?
>> Do all problems go away with unify-8859-on-{de,en}coding?

Jason Rumney <address@hidden> writes:
> What MULE problem?

Disjunct charsets leading to the introduction of unwanted characters
(similar to that SHIFT-JIS <-> Chinese confusion that you just
mentioned).  At one of the last times when the discussion came up
somebody mentioned that this could still be a serious problem.

Jason Rumney <address@hidden> writes:
> The encoding of CF_UNICODETEXT does not vary, so utf-16-le (or maybe
> -be) is the only coding-system that is appropriate.

Actually at the moment that would be utf-16le-dos, not utf-16-le-dos.
The latter includes a BOM, which we really don't want here.  The
non-intuitive naming difference makes me wonder though, if this is
just some unintended confusion?  There are also currently
utf-16-le-with-signature-* and mule-utf-16-*.

> "Eli Zaretskii" <address@hidden> writes:
>> Also, AFAIK CF_UNICODETEXT _can_ be used on Windows 9x, as any
>> program like clipbrd.exe or ClipConvert will show you.

I tested Win95 and Win98SE.  On both systems, the clipboard viewer and
Notepad couldn't make use of CF_UNICODETEXT.  Cut-and-paste between
two Emacs instances via CF_UNICODETEXT works, so i assume other
applications that support CF_UNICODETEXT would work, too.  No
automatic conversion by Windows, though.

Benjamin Riefenstahl <address@hidden> writes:
>>> - Drop optimizations for ASCII-only text.

> "Eli Zaretskii" <address@hidden> writes:
>> Is that optimization indeed an optimization?

Getting data from the clipboard is indeed quite a bit faster with this
optimization.  Putting something on the clipboard doesn't benefit, but
that's probably because the detection of this case is inefficient, it
uses find_charset_in_text(), although the result is not really
used. So probably that can be made better, too.  I'll try to get this
integrated in the next version of the patch.

benny

[Prev in Thread]

Current Thread

[Next in Thread]

[Patch] Unicode support for the MS Windows clipboard, Benjamin Riefenstahl, 2004/05/26
- Re: [Patch] Unicode support for the MS Windows clipboard, Eli Zaretskii, 2004/05/27
  - Re: [Patch] Unicode support for the MS Windows clipboard, Stefan Monnier, 2004/05/27
  - Re: Unicode support for the MS Windows clipboard, Benjamin Riefenstahl, 2004/05/27
    - Re: Unicode support for the MS Windows clipboard, Jason Rumney, 2004/05/28
    - Re: Unicode support for the MS Windows clipboard, Stefan Monnier, 2004/05/28
    - Re: Unicode support for the MS Windows clipboard, Eli Zaretskii, 2004/05/29
    - Re: Unicode support for the MS Windows clipboard, Benjamin Riefenstahl <=
    - Re: Unicode support for the MS Windows clipboard, Jason Rumney, 2004/05/28
    - Re: Unicode support for the MS Windows clipboard, Kenichi Handa, 2004/05/29
    - Re: Unicode support for the MS Windows clipboard, Eli Zaretskii, 2004/05/29
    - Re: Unicode support for the MS Windows clipboard, Eli Zaretskii, 2004/05/29
    - Re: Unicode support for the MS Windows clipboard, Benjamin Riefenstahl, 2004/05/29
    - Re: Unicode support for the MS Windows clipboard, Eli Zaretskii, 2004/05/29
    - Re: Unicode support for the MS Windows clipboard, Eli Zaretskii, 2004/05/28
    - Re: Unicode support for the MS Windows clipboard, Benjamin Riefenstahl, 2004/05/29
- Re: [Patch] Unicode support for the MS Windows clipboard, Jason Rumney, 2004/05/27
  - Re: Unicode support for the MS Windows clipboard, Benjamin Riefenstahl, 2004/05/27

Prev by Date: Re: next-error-last-buffer
Next by Date: Re: next-error-last-buffer
Previous by thread: Re: Unicode support for the MS Windows clipboard
Next by thread: Re: Unicode support for the MS Windows clipboard
Index(es):
- Date
- Thread