[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode, ports and encoding
From: |
Ludovic Courtès |
Subject: |
Re: Unicode, ports and encoding |
Date: |
Tue, 17 Feb 2009 22:54:36 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.0.90 (gnu/linux) |
Hello!
Mike Gran <address@hidden> writes:
> 1. To move to a Unicode-enabled guile, text information needs to be
> converted to an internal representation when read and converted
> back to the locale when written. Most reading and writing for
> ports passes through scm_getc (input) and scm_lfwrite (output).
> Conversion between locale strings and internal strings should
> happen there.
One strategy could be to have a new C port API, e.g., roughly based on
R6RS', with transcoders and all, and somehow arrange to have the current
port "API" mapped to that new shiny API. It might be a bit ambitious,
though.
> This implies that a source code file should have syntax to
> indicate its own encoding, if it is not ASCII. Something akin to
> the <?xml encoding="utf-8"?> line in HTML files.
One could imagine special treatment of, say, the first 10 lines of a
file, with the ability to recognize Emacs file variables like
"-*- coding: utf-8 -*-" and to change the current port transcoder
accordingly, something like that.
By default, which encoding is used by `read' would be determined by the
input port's encoder.
> 3. The text encoding of a port needs to be associated with the port.
> R6RS has the idea of transcoders for ports that require
> conversion. It is daunting, but, having played some ideas for a
> few weeks, it seems that at least a subset of the transcoder
> functionality needs to be implemented for this to make any sense.
Yes.
> I sent in my copyright assignment last week, so you should have it
> now.
Cool!
IIRC, the first step you suggested was the implementation of wide
string/char types. Did you also work on this?
Thanks,
Ludo'.