[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-libunistring] Changing the appearance of escapes
From: |
Bruno Haible |
Subject: |
Re: [bug-libunistring] Changing the appearance of escapes |
Date: |
Thu, 16 Sep 2010 22:39:32 +0200 |
User-agent: |
KMail/1.9.9 |
Hi Ludo,
> Now to actually design and implement something along these lines...
The way I recommend to do it is:
- For ports with an input direction, store in the port an iconv_t descriptor
from the given encoding to UTF-8. Similarly, for ports with an output
direction, store in it an iconv_t descriptor from UTF-8 to the encoding.
(Why UTF-8 and not UTF-32 = UCS-4? Because on all platforms you can convert
from UTF-8 to anything and vice versa, but not from UTF-32 from/to anything.
Solaris for example.)
- In the input direction you'll also need a small buffer (up to 6 bytes or so)
for bytes that have already been read from the stream but not yet converted
to characters. Near this, you'll also have a character or bit that is used
to implement the CRLF -> LF conversion.
- The most tricky thing is to handle all possible errors and return values
from iconv() correctly.
- In the output direction, an iconv_t can produce a couple of bytes at the
end, that you need to output before closing the stream. This is needed for
stateful encodings such as CP1258, UTF-7, or UTF-16 (with BOM). But only
if you want to support stateful encodings at all. All encodings used by
locales are stateless.
Bruno