[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [pdf-devel] Proposal of API for the Encoded Text module
From: |
Leonard Rosenthol |
Subject: |
RE: [pdf-devel] Proposal of API for the Encoded Text module |
Date: |
Mon, 28 Jan 2008 06:48:19 -0800 |
PDF Names have been able to be encoded in host encodings, but by doing
so you can NOT expect them to work correctly. For example, some
producers tried to encode Japanese font names in SJIS, which could be
done (using the #-escape mechanism) but wouldn't be processed by a
conforming reader.
ISO 32000 (the ISO version of PDF 1.7) does not provide compliance for
anything other than PDDoc and UTF8.
Leonard
-----Original Message-----
From: Aleksander Morgado [mailto:address@hidden
Sent: Monday, January 28, 2008 9:43 AM
To: Leonard Rosenthol
Cc: address@hidden
Subject: Re: [pdf-devel] Proposal of API for the Encoded Text module
> Just one other thing to remember is that PDF Names are either a subset
> of PDDocEncoding _OR_ they are valid UTF8 strings. (See PDFRef 1.7,
> 3.2.4).
>
> Leonard
In fact, I think that this is one of the reasons to have the UTF-8
built-in support in the library. I suppose that PDF Name and PDF String
types in the `object library' will be based on the pdf_text_t from the
`base library', which directly supports UTF-8.
Anyway, are you sure that these two encodings are the only ones allowed
for PDF Names? In older Acrobat versions PDF Names could be encoded in
specific `host encodings', like Shift-JIS or Big Five for Asian
languages (PDFRef 1.7, H.3).
If this is the case, how can we detect the encoding being used in the
PDF Name? For example, a PDF with a japanese encoding for PDF names
which is read in a US-localized system... What the present text module
API provides so far is a function to detect the best encoding for a
given Unicode string, and not a function to detect the encoding being
used in a given multibyte string. Something like this could also be
needed, but I am not sure if this is possible to implement.
--
Aleksander