pdf-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [pdf-devel] Text module unit tests failing


From: jemarch
Subject: Re: [pdf-devel] Text module unit tests failing
Date: Mon, 15 Sep 2008 20:39:04 +0200
User-agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (Shijō) APEL/10.6 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)

Hi Aleks.

   >     The problem here is that we need to know the user's lang/country info 
   >     plus the user's encoding, 
   > 
   > Ok, but why?

   Lang/Country information is used right now in some Unicode
   uppercase/lowercase conversions with special rules, mainly for Turkish
   and Azeri. It can also be used to create 'PDF strings' encoded in
   UTF-16BE with embedded language and country information.

   User's encoding is used in some text module functions, like
   `pdf_text_get_host()' and `pdf_text_set_host()', to manipulate
   pdf_text_t variables based on the specific user's encoding.

You can get the country code for the C/POSIX locale using:

  $ LC_ALL=C locale territory
  ISO

and the encoding:

  $ LC_ALL=C locale charmap
  ANSI_X3.4-1968

that is: ISO 646 or ASCII.

The language is the empty string:

  $ LC_ALL=C locale language

So to fix the problem it would suffice to set both the country and the
language to the empty string ("") if the text module works with the
C/POSIX locale (the existing code assumes that the country is a
2-letter word, so we cannot store "ISO" into it).

You can find a patch doing just that at the end of this email. Please
review it. If you agree I will install it in the trunk.

   > Also, the locale very often does not specify encoding info.  It's far
   > more common to use "de_DE" than to use "de_DE.UTF-8" or whatever.  What
   > do you do in that case, when the information is partial?

   Well, right now if that whole information is not available, the text
   initialization function completely fails :-|

Fortunately that is not true :)

You are calling gnulib's 'locale_charset' in order to get the charset
associated with the current locale, and it does The Right
Thing(TM), calling nl_langinfo.

For example, the default encoding for the "en_BG" locale is
"ISO-8859-1" (despite not having it specified in the
name). 'locale_charset" will return that encoding.

For the default C/POSIX locale locale_charset returns ASCII, that
should be just fine.

   > The important thing is what the software does for users (either
   > end-users or library users).  Whether tests fail or not is basically
   > inconsequential.  Of course if you're testing some lang/enc-specific
   > thing, it can't work in C/POSIX.  So if the test reports failure, that
   > is expected, and therefore the test was successful :).  Just fix the
   > test framework to take care of that.

   Well, that's another point. Right now we're getting the locale equal to
   'C' in the testing framework, even if the user's locale is fully
   configured with lang, country and encoding. That should be changed.
   Jose, did you talk to gnulib maintainers about this?

It is not a problem. In fact we should explicitly set the desired
locale (using setlocale) in each test-case: the text module behaves
differently depending on the encoding, so our unit tests for the text
module are currently non-deterministic, depending on the locale
configured in the user's machine (that may be a "klingorian-alien"
locale made with localedef :'D).

I suggest to explicitly set the locale to C/POSIX in the text module
unit tests (and not to rely on maint.mk to do it) except when testing
specific features needing another locale (such as the turkish,
upper-lower case and comparison scenario).


=== modified file 'src/base/pdf-text-context.c'
--- src/base/pdf-text-context.c 2008-09-09 01:41:01 +0000
+++ src/base/pdf-text-context.c 2008-09-15 18:18:54 +0000
@@ -1,4 +1,4 @@
-/* -*- mode: C -*- Time-stamp: "08/09/09 03:26:05 jemarch"
+/* -*- mode: C -*- Time-stamp: "08/09/15 20:18:54 jemarch"
  *
  *       File:         pdf-text-context.c
  *       Date:         Fri Feb 25 23:58:56 2008
@@ -131,23 +131,27 @@
   
   /* Get system default locale name and check it */
   locale_name = gl_locale_name(LC_CTYPE, "LC_CTYPE");
-  if((locale_name == NULL) || \
-     (strlen(locale_name) < 2))
+  if (locale_name == NULL)
     {
       PDF_DEBUG_BASE("Invalid locale info detected! '%s'",
                      ((locale_name!=NULL) ? locale_name : "null"));
       return PDF_ETEXTENC;
     }
 
-  /* Store language ID */
-  strncpy((char *)&(text_context.host_language_id[0]), locale_name,
-          PDF_TEXT_HLL-1);
-  /* If available, store country ID */
-  if((strlen(locale_name) >= 5) && \
-     (locale_name[2] == '_'))
+  if ((strcmp (locale_name, "C") != 0) &&
+      (strcmp (locale_name, "POSIX") != 0))
     {
-      strncpy((char *)&(text_context.host_country_id[0]), &locale_name[3],
+      /* Store language ID */
+      strncpy((char *)&(text_context.host_language_id[0]), locale_name,
               PDF_TEXT_HLL-1);
+
+      /* If available, store country ID */
+      if((strlen(locale_name) >= 5) &&          \
+         (locale_name[2] == '_'))
+        {
+          strncpy((char *)&(text_context.host_country_id[0]), &locale_name[3],
+                  PDF_TEXT_HLL-1);
+        }
     }
 
   PDF_DEBUG_BASE("TextContext: Locale name is '%s'", locale_name);




reply via email to

[Prev in Thread] Current Thread [Next in Thread]