[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [pdf-devel] Proposal of API for the Encoded Text module
From: |
Leonard Rosenthol |
Subject: |
RE: [pdf-devel] Proposal of API for the Encoded Text module |
Date: |
Mon, 28 Jan 2008 05:28:57 -0800 |
Just one other thing to remember is that PDF Names are either a subset
of PDDocEncoding _OR_ they are valid UTF8 strings. (See PDFRef 1.7,
3.2.4).
Leonard
-----Original Message-----
From: address@hidden
[mailto:address@hidden On Behalf Of
Aleksander Morgado
Sent: Monday, January 28, 2008 7:30 AM
To: address@hidden
Subject: [pdf-devel] Proposal of API for the Encoded Text module
Hi all,
Find attached my changes to the proposed API for the Encoded Text
module. It's a diff to the gnupdf.texi file.
Some comments on the changes:
- Host encoding management will probably need a second round. We will
need to clearly determine which OS don't have support for iconv
(excluding Windows OSs, which have their specific way of handling host
encodings).
- Regarding the issue of the `best encoding' to encode a given character
string, I really think that UTF-8 could be the default best encoding for
all of those OS supporting iconv (GNU/Linux, Unix, Mac OS X...) and even
for Windows OSs (AFAIK, UTF-8 is available in all modern Windows
versions... should we give support for older versions?). UTF-8, in fact,
is one of the encoding conversions which will be built-in in the
library.
- Maybe we should really decide which will be the full list of supported
OS (and version of OS, if needed), and not think it during the
development phase. This will help not only to determine the specific
OS-dependent functions for host encoding support (determine which OS
don't handle UTF-8, for example), but also to determine platforms with a
lack of some required feature (e.g. 64bit integers, discussed in another
thread). I could start a new page in the Wiki with this issue.
- All the functions involving encoding conversion (even those which
create and initialize a new text object) return the status of the
conversion, which should always be checked.
- I renamed functions involving PDF Strings
(pdf_text_new_from_pdf_string) and PDF Doc Encoding
(pdf_text_get_pdfdocenc, pdf_text_set_pdfdocenc), so that it is clear
which one is being considered (PDF Strings can be in PDF Doc Encoding or
in UTF-16BE with BOM).
- pdf_text_concat won't work with text objects with different
country/language code informations, so the returned status of this
function should always be checked.
- In addition to the `best encoding', another function is given to get
the default host encoding configured in the user's locale
(pdf_text_get_host_encoding).
- And the last one, a new function is given to `initialize' the text
module, which must be called at program startup, and is not thread-safe:
pdf_text_init. This function will be used to load the user's locale
information.
Additional comments are welcome,
--
Aleksander