pdf-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[pdf-devel] Re: Comments to the Encoded Text API


From: Aleksander Morgado
Subject: [pdf-devel] Re: Comments to the Encoded Text API
Date: Sat, 19 Jan 2008 22:01:52 +0100

Hi again,

>    5. pdf_text_get_best_encoding function will need specific system
>    functions to get the range of unicode covered by each host encoding,
>    and if no such function is available in a given operating system, a
>    default unicode encoding will be returned.
>
> Remember that this function should return an encoding _actually
> supported_ by the host. If the host support Unicode encoding then it
> always be the best encoding available. If it is not the case the
> function cannot return an unicode encoding.
>
> I think would be good to investigate the availability of the functions
> you need to determine the range covered by a given host encoding in
> Unix, GNU, MacosX and Windows (we need to determine the allowed values
> for pdf_text_host_encoding_t anyway. An email about this follows).

I don't really understand why someone would like to create a
pdf_text_t in a host encoding different than the one used in that
moment by the user/system. Is this really needed? I am talking about
the functions `pdf_text_new_from_host', `pdf_text_get_best_encoding'
and so, where a specific host encoding is passed/returned to/by the
function. Shouldn't the function detect which the host encoding being
used in the system is and just use it? AFAIK, host encoding is just to
receive strings from the user and send strings to the user (not to
store anything in the PDF file, at least in the user's encoding), and
the user expects strings in a single host encoding, which can even be
detected once. Am I right?

The easiest way to handle these host encoding conversions in GNU/Linux
is the wchar_t type and multi-byte functions. The problem is that
there is no way to get conversions to/from encodings different to the
one specified in the user's locale. To get those other conversions
either the locale should be changed in runtime (not a good idea) or
other utilities like GNU libiconv should be used explicitly. Why not
just detect the host encoding once, when the program starts, and use
that single encoding in all the operations involving host encodings
(get/set)? That would be perfect to be able to use wchar_t and
multi-byte functions, with no need to call iconv.

The approach given in the Text Encoding API is quite similar to the
way things are done in Windows OS, where you first have to ask for the
specific ANSI Code Page being used (GetACP) and then use that
identifier in MultiByteToWideChar or WideCharToMultiByte functions.
The equivalent two-step approach in GNU/Linux would be done with
nl_langinfo (to get the name of the encoding set in the locale) and
GNU libiconv for the conversions, so it's possible, but not sure if
it's really needed.

What do you think?

Regards,
Aleksander




reply via email to

[Prev in Thread] Current Thread [Next in Thread]