[no subject]

texinfo-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[no subject]

From:	Patrice Dumas
Date:	Fri, 21 Jul 2023 15:50:35 -0400 (EDT)
branch: master
commit 089f1e71ab0d646766536df72dbb75e18f1051c7
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Fri Jul 21 16:53:11 2023 +0200

    Use any input encoding known by iconv in the XS parser
    
    * doc/texinfo.texi (@code{@@documentencoding}),
    tp/Texinfo/XS/parsetexi/api.c (reset_parser_except_conf),
    tp/Texinfo/XS/parsetexi/end_line.c (end_line_misc_line),
    tp/Texinfo/XS/parsetexi/input.c (set_input_encoding)
    (convert_to_utf8, encode_file_name, reset_encoding_list):
    use a list of encodings and iconv handlers based on the
    @documentencoding found in the manual.  Always set utf-8 at
    the first position in the list.  Use any encoding known by iconv
    instead of a fixed list of known encodings.  Remove enum
    character_encoding.
    
    * tp/Texinfo/ParserNonXS.pm (_end_line_misc_line),
    tp/Texinfo/XS/parsetexi/end_line.c (end_line_misc_line)
    <documentencoding>: trim non-ascii characters and keep only
    alphanumeric characters, - and _ in encoding names. iconv also seems
    to trim non alphanumeric non - _ characters.  Consider an encoding not
    handled by iconv or not found in perl parser during parsing to be
    unhandled.
    
    * tp/Texinfo/Common.pm (get_perl_encoding)
    (element_extra_encoding_for_perl), tp/Texinfo/ParserNonXS.pm
    (%parser_state_initialization, parse_texi_text, _input_push_file)
    (get_parser_info, parse_texi_file, _encode_file_name),
    tp/Texinfo/XS/parsetexi/Parsetexi.pm (get_parser_info): add
    get_perl_encoding function in Texinfo::Common to set
    {'info'}->{'input_perl_encoding'} based on XS parser code.  Use this
    function in both parsers.  In the perl parser, add get_parser_info
    function to call get_perl_encoding instead of doing it while parsing,
    and call get_parser_info in parse_* functions as required, similar to
    code in the XS parser.  Consider an encoding not found by perl to
    be unrecognized.
    
    * tp/Makefile.am (test_files), tp/Makefile.tres,
    tp/t/08misc_commands.t (documentencoding_zero), tp/t/info_tests.t
    (chinese_mixed_with_en_EUC_CN),
    tp/t/input_files/chinese_mixed_with_en_EUC_CN.texi: two new tests
    related to encodings, one using EUC-CN an ascii compatible encodding
    that was not available before in the XS parser.
---
 ChangeLog                                          |  43 +++++
 NEWS                                               |   4 +
 doc/texinfo.texi                                   |   6 +-
 tp/Makefile.am                                     |   1 +
 tp/Makefile.tres                                   |   3 +
 tp/TODO                                            |   9 --
 tp/Texinfo/Common.pm                               |  28 +++-
 tp/Texinfo/ParserNonXS.pm                          | 100 ++++++++----
 tp/Texinfo/XS/parsetexi/Parsetexi.pm               |  32 ++--
 tp/Texinfo/XS/parsetexi/api.c                      |   4 +
 tp/Texinfo/XS/parsetexi/end_line.c                 | 173 +++++++++++++--------
 tp/Texinfo/XS/parsetexi/input.c                    | 167 ++++++++++----------
 tp/Texinfo/XS/parsetexi/input.h                    |   3 +-
 tp/t/08misc_commands.t                             |   3 +
 tp/t/info_tests.t                                  |   7 +-
 tp/t/input_files/chinese_mixed_with_en.texi        |   2 +-
 ...h_en.texi => chinese_mixed_with_en_EUC_CN.texi} |  20 +--
 tp/t/input_files/sample_EUC_CN.texi                |   4 +
 .../macro_and_commands_in_early_commands.pl        |  12 +-
 .../chinese_mixed_with_en_EUC_CN.pl}               |  66 ++++----
 .../res_info/chinese_mixed_with_en_EUC_CN.info     |  57 +++++++
 tp/t/results/info_tests/unknown_encoding.pl        |   4 +-
 .../macro/macro_in_invalid_documentencoding.pl     |   4 +-
 .../results/misc_commands/documentencoding_zero.pl |  82 ++++++++++
 .../misc_commands/invalid_documentencoding.pl      |  38 ++---
 tp/t/results/misc_commands/many_lines.pl           |   4 +-
 .../plaintext_tests/chinese_mixed_with_en.pl       |   8 +-
 .../res_plaintext/chinese_mixed_with_en.txt        |   4 +-
 .../value/value_in_invalid_documentencoding.pl     |   4 +-
 29 files changed, 579 insertions(+), 313 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 4eacfc722a..92dd33fca1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,46 @@
+2023-07-21  Patrice Dumas  <pertusus@free.fr>
+
+       Use any input encoding known by iconv in the XS parser 
+
+       * doc/texinfo.texi (@code{@@documentencoding}),
+       tp/Texinfo/XS/parsetexi/api.c (reset_parser_except_conf),
+       tp/Texinfo/XS/parsetexi/end_line.c (end_line_misc_line),
+       tp/Texinfo/XS/parsetexi/input.c (set_input_encoding)
+       (convert_to_utf8, encode_file_name, reset_encoding_list):
+       use a list of encodings and iconv handlers based on the
+       @documentencoding found in the manual.  Always set utf-8 at
+       the first position in the list.  Use any encoding known by iconv
+       instead of a fixed list of known encodings.  Remove enum
+       character_encoding.
+
+       * tp/Texinfo/ParserNonXS.pm (_end_line_misc_line),
+       tp/Texinfo/XS/parsetexi/end_line.c (end_line_misc_line)
+       <documentencoding>: trim non-ascii characters and keep only
+       alphanumeric characters, - and _ in encoding names. iconv also seems
+       to trim non alphanumeric non - _ characters.  Consider an encoding not
+       handled by iconv or not found in perl parser during parsing to be
+       unhandled.
+
+       * tp/Texinfo/Common.pm (get_perl_encoding)
+       (element_extra_encoding_for_perl), tp/Texinfo/ParserNonXS.pm
+       (%parser_state_initialization, parse_texi_text, _input_push_file)
+       (get_parser_info, parse_texi_file, _encode_file_name),
+       tp/Texinfo/XS/parsetexi/Parsetexi.pm (get_parser_info): add
+       get_perl_encoding function in Texinfo::Common to set
+       {'info'}->{'input_perl_encoding'} based on XS parser code.  Use this
+       function in both parsers.  In the perl parser, add get_parser_info
+       function to call get_perl_encoding instead of doing it while parsing,
+       and call get_parser_info in parse_* functions as required, similar to
+       code in the XS parser.  Consider an encoding not found by perl to
+       be unrecognized.
+
+       * tp/Makefile.am (test_files), tp/Makefile.tres,
+       tp/t/08misc_commands.t (documentencoding_zero), tp/t/info_tests.t
+       (chinese_mixed_with_en_EUC_CN),
+       tp/t/input_files/chinese_mixed_with_en_EUC_CN.texi: two new tests
+       related to encodings, one using EUC-CN an ascii compatible encodding
+       that was not available before in the XS parser.
+
 2023-07-20  Patrice Dumas  <pertusus@free.fr>
 
        * tp/t/65linemacro.t (end_conditional_in_linemacro)
diff --git a/NEWS b/NEWS
index 1bc0aa9092..4c40f41942 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,10 @@ See the manual for detailed information.
 
 ------------------------------------------------------------------------------
 
+* texi2any
+ . use any input encoding known by iconv and Perl in the XS (default) parser.
+
+
 * Language
  . new generic definition commands, @defblock, @defline and @deftypeline,
    that do not automatically create index entries
diff --git a/doc/texinfo.texi b/doc/texinfo.texi
index e3c9a14aa0..51de13cb2d 100644
--- a/doc/texinfo.texi
+++ b/doc/texinfo.texi
@@ -12535,7 +12535,11 @@ if your document encoding is not the default encoding.
 
 UTF-8 should always be the best choice for the encoding.
 Texinfo still supports additional encodings, mainly for compatibility with
-older manuals:
+older manuals@footnote{@command{texi2any} supports more encodings for Texinfo
+manuals, potentially all the encodings supported by both Perl and iconv
+(@pxref{Generic Charset Conversion,,, libc, The GNU C Library}).
+The support in output formats may be lacking, however, especially for @LaTeX{}
+output.}:
 
 @table @code
 @item US-ASCII
diff --git a/tp/Makefile.am b/tp/Makefile.am
index 04aaaefffb..bb4f1cd928 100644
--- a/tp/Makefile.am
+++ b/tp/Makefile.am
@@ -191,6 +191,7 @@ test_files = \
  t/input_files/char_latin2_latin2_in_refs.texi \
  t/input_files/character_and_spaces_in_refs_text.texi \
  t/input_files/chinese_mixed_with_en.texi \
+ t/input_files/chinese_mixed_with_en_EUC_CN.texi \
  t/input_files/complex_sectioning_case.texi \
  t/input_files/cond.texi \
  t/input_files/contents_at_document_begin.texi \
diff --git a/tp/Makefile.tres b/tp/Makefile.tres
index a8acb75e34..82ab6f2373 100644
--- a/tp/Makefile.tres
+++ b/tp/Makefile.tres
@@ -884,6 +884,8 @@ test_files_generated_list = 
$(test_tap_files_generated_list) \
   t/results/info_tests/anchor_in_command.pl \
   t/results/info_tests/before_node_and_section.pl \
   t/results/info_tests/center_flush.pl \
+  t/results/info_tests/chinese_mixed_with_en_EUC_CN.pl \
+  t/results/info_tests/chinese_mixed_with_en_EUC_CN/res_info \
   t/results/info_tests/colon_in_index_entry.pl \
   t/results/info_tests/colons_in_index_entries_and_node.pl \
   t/results/info_tests/colons_in_index_entries_and_node/res_info \
@@ -1422,6 +1424,7 @@ test_files_generated_list = 
$(test_tap_files_generated_list) \
   t/results/misc_commands/definfoenclose.pl \
   t/results/misc_commands/definfoenclose_nestings.pl \
   t/results/misc_commands/definfoenclose_with_empty_arg.pl \
+  t/results/misc_commands/documentencoding_zero.pl \
   t/results/misc_commands/double_exdent.pl \
   t/results/misc_commands/empty_center.pl \
   t/results/misc_commands/empty_center_with_arg.pl \
diff --git a/tp/TODO b/tp/TODO
index d5f9e0bdc1..bba708b5c5 100644
--- a/tp/TODO
+++ b/tp/TODO
@@ -46,15 +46,6 @@ For converter writers,
 Delayed bugs
 ============
 
-Change the implementation, in the XS parser, of encoding support.  Instead of a
-fixed list of supported encodings, which all have their conversion initialized,
-the input encoding actually used could simply be used to initialize iconv (for
-instance when encountering @documentencoding), using a list of initialized
-conversion (which would only have two elements, one for utf-8 and one for
-another documentencoding in general as there is no more than one
-documentencoding in most manuals).  See input.c set_input_encoding and
-convert_to_utf8.
-
 See message/thread from Reißner Ernst: Feature request: api docs
 
 hyphenation: should only appear in toplevel.
diff --git a/tp/Texinfo/Common.pm b/tp/Texinfo/Common.pm
index 90a4ca44c8..c416a12867 100644
--- a/tp/Texinfo/Common.pm
+++ b/tp/Texinfo/Common.pm
@@ -923,6 +923,32 @@ sub _add_preamble_before_content($)
   unshift (@{$before_node_section->{'contents'}}, @first_types);
 }
 
+sub get_perl_encoding($$$)
+{
+  my $commands_info = shift;
+  my $registrar = shift;
+  my $configuration_information = shift;
+
+  my $result;
+  if (defined($commands_info->{'documentencoding'})) {
+    foreach my $element (@{$commands_info->{'documentencoding'}}) {
+      my $perl_encoding = element_extra_encoding_for_perl($element);
+      if (!defined($perl_encoding)) {
+        my $encoding = $element->{'extra'}->{'input_encoding_name'}
+          if ($element->{'extra'});
+        if (defined($encoding)) {
+          $registrar->line_warn($configuration_information,
+                     sprintf(__("unrecognized encoding name `%s'"), $encoding),
+                                          $element->{'source_info'});
+        }
+      } else {
+        $result = $perl_encoding;
+      }
+    }
+  }
+  return $result;
+}
+
 # for Parser and main program
 sub warn_unknown_language($) {
   my $lang = shift;
@@ -1232,7 +1258,7 @@ sub element_extra_encoding_for_perl($)
   my $encoding = $element->{'extra'}->{'input_encoding_name'}
     if ($element->{'extra'});
 
-  if ($encoding) {
+  if (defined($encoding) and $encoding ne '') {
     my $Encode_encoding_object = Encode::find_encoding($encoding);
     if (defined($Encode_encoding_object)) {
       $perl_encoding = $Encode_encoding_object->name();
diff --git a/tp/Texinfo/ParserNonXS.pm b/tp/Texinfo/ParserNonXS.pm
index ce3685074c..57f6511fa8 100644
--- a/tp/Texinfo/ParserNonXS.pm
+++ b/tp/Texinfo/ParserNonXS.pm
@@ -147,6 +147,8 @@ my %parser_state_initialization = (
   'merged_indices' => {},     # the key is merged in the value
   'sections_level' => 0,      # modified by raise/lowersections
   'targets' => [],            # array of elements used to build 'labels'
+  'input_file_encoding' => 'utf-8', # perl encoding name used for the input
+                                    # file
   # initialization of information returned by global_information()
   'info' => {
     'input_encoding_name' => 'utf-8',
@@ -757,6 +759,8 @@ sub parse_texi_piece($$;$)
      = _setup_document_root_and_before_node_section();
   my $tree = $self->_parse_texi($document_root, $before_node_section);
 
+  get_parser_info($self);
+
   return $tree;
 }
 
@@ -789,7 +793,11 @@ sub parse_texi_text($$;$)
 
   _input_push_text($self, $text, $line_nr);
 
-  return $self->_parse_texi_document();
+  my $tree = $self->_parse_texi_document();
+
+  get_parser_info($self);
+
+  return $tree;
 }
 
 # $INPUT_FILE_PATH the name of the opened file should be a binary string.
@@ -803,8 +811,8 @@ sub _input_push_file
     return 0, undef, undef, $!;
   }
 
-  if (defined($self->{'info'}->{'input_perl_encoding'})) {
-    if ($self->{'info'}->{'input_perl_encoding'} eq 'utf-8') {
+  if (defined($self->{'input_file_encoding'})) {
+    if ($self->{'input_file_encoding'} eq 'utf-8') {
       binmode($filehandle, ":utf8");
       # Use :utf8 instead of :encoding(utf-8), as the latter does
       # error checking and has (unreliably) led to fatal errors
@@ -813,7 +821,7 @@ sub _input_push_file
       # Evidently Perl is checking ahead in the file.
     } else {
       binmode($filehandle,
-              ":encoding($self->{'info'}->{'input_perl_encoding'})");
+              ":encoding($self->{'input_file_encoding'})");
     }
   }
   my ($file_name, $directories, $suffix) = fileparse($input_file_path);
@@ -837,6 +845,17 @@ sub _input_push_file
   return 1, $file_name, $directories, undef;
 }
 
+sub get_parser_info($)
+{
+  my $self = shift;
+
+  my $perl_encoding
+    = Texinfo::Common::get_perl_encoding($self->{'commands_info'},
+                                         $self->{'registrar'}, $self);
+  $self->{'info'}->{'input_perl_encoding'} = $perl_encoding
+     if (defined($perl_encoding));
+}
+
 # parse a texi file
 # $INPUT_FILE_PATH is the name of the parsed file and should be a binary 
string.
 sub parse_texi_file($$)
@@ -862,7 +881,10 @@ sub parse_texi_file($$)
   $self->{'info'}->{'input_file_name'} = $file_name;
   $self->{'info'}->{'input_directory'} = $directories;
 
-  return $self->_parse_texi_document();
+  my $tree = $self->_parse_texi_document();
+  get_parser_info($self);
+
+  return $tree;
 }
 
 sub _parse_texi_document($)
@@ -2285,7 +2307,7 @@ sub _encode_file_name($$)
   if ($input_file_name_encoding) {
     $encoding = $input_file_name_encoding;
   } elsif ($self->get_conf('DOC_ENCODING_FOR_INPUT_FILE_NAME')) {
-    $encoding = $self->{'info'}->{'input_perl_encoding'};
+    $encoding = $self->{'input_file_encoding'};
   } else {
     $encoding = $self->get_conf('LOCALE_ENCODING');
   }
@@ -3517,39 +3539,49 @@ sub _end_line_misc_line($$$)
                         = $self->{'info'}->{'input_encoding_name'}
           if defined $self->{'info'}->{'input_encoding_name'};
       } elsif ($command eq 'documentencoding') {
+        # lower case, trim non-ascii characters and keep only alphanumeric
+        # characters, - and _.  iconv also seems to trim non alphanumeric
+        # non - _ characters
+        my $normalized_text = lc($text);
+        $normalized_text =~ s/[^[:alnum:]_\-]//;
 
-        # Warn if the encoding is not one of the encodings supported as an
-        # argument to @documentencoding, documented in Texinfo manual
-        unless ($canonical_texinfo_encodings{lc($text)}) {
+        if ($normalized_text !~ /[[:alnum:]]/) {
           $self->_command_warn($current, $source_info,
-                   __("encoding `%s' is not a canonical texinfo encoding"),
-                               $text)
-        }
+                               __("bad encoding name `%s'"), $text);
+        } else {
+          # Warn if the encoding is not one of the encodings supported as an
+          # argument to @documentencoding, documented in Texinfo manual
+          unless ($canonical_texinfo_encodings{lc($text)}) {
+            $self->_command_warn($current, $source_info,
+                     __("encoding `%s' is not a canonical texinfo encoding"),
+                                 $text)
+          }
 
-        # Set $perl_encoding  -- an encoding name suitable for perl;
-        #     $input_encoding -- for output within an HTML file, used
-        #                        in most output formats
-        my ($perl_encoding, $input_encoding);
-        my $Encode_encoding_object = find_encoding($text);
-        if (defined($Encode_encoding_object)) {
-          $perl_encoding = $Encode_encoding_object->name();
-          # mime_name() is upper-case, our keys are lower case, set to lower 
case
-          $input_encoding = lc($Encode_encoding_object->mime_name());
-        }
+          # Set $perl_encoding  -- an encoding name suitable for perl;
+          #     $input_encoding -- for output within an HTML file, used
+          #                        in most output formats
+          my ($perl_encoding, $input_encoding);
+          my $Encode_encoding_object = find_encoding($normalized_text);
+          if (defined($Encode_encoding_object)) {
+            $perl_encoding = $Encode_encoding_object->name();
+            # mime_name() is upper-case, our keys are lower case, set to lower 
case
+            $input_encoding = lc($Encode_encoding_object->mime_name());
+          }
 
-        if ($input_encoding) {
-          $current->{'extra'}->{'input_encoding_name'} = $input_encoding;
-          $self->{'info'}->{'input_encoding_name'} = $input_encoding;
-        }
+          if (!$perl_encoding) {
+            $self->_command_warn($current, $source_info,
+                 __("unhandled encoding name `%s'"), $text);
+          } else {
+            if ($input_encoding) {
+              $current->{'extra'}->{'input_encoding_name'} = $input_encoding;
+              $self->{'info'}->{'input_encoding_name'} = $input_encoding;
+            }
 
-        if (!$perl_encoding) {
-          $self->_command_warn($current, $source_info,
-               __("unrecognized encoding name `%s'"), $text);
-        } else {
-          $self->{'info'}->{'input_perl_encoding'} = $perl_encoding;
-          foreach my $input (@{$self->{'input'}}) {
-            binmode($input->{'fh'}, ":encoding($perl_encoding)")
-              if ($input->{'fh'});
+            $self->{'input_file_encoding'} = $perl_encoding;
+            foreach my $input (@{$self->{'input'}}) {
+              binmode($input->{'fh'}, ":encoding($perl_encoding)")
+                if ($input->{'fh'});
+            }
           }
         }
       } elsif ($command eq 'documentlanguage') {
diff --git a/tp/Texinfo/XS/parsetexi/Parsetexi.pm 
b/tp/Texinfo/XS/parsetexi/Parsetexi.pm
index 60ed6e2087..63f9dad186 100644
--- a/tp/Texinfo/XS/parsetexi/Parsetexi.pm
+++ b/tp/Texinfo/XS/parsetexi/Parsetexi.pm
@@ -251,31 +251,17 @@ sub get_parser_info {
   $self->{'info'} = $GLOBAL_INFO;
   $self->{'commands_info'} = $GLOBAL_INFO2;
 
-  $self->{'info'}->{'input_perl_encoding'} = 'utf-8';
+  _set_errors_node_lists_labels_indices($self);
 
-  if (defined($self->{'commands_info'}->{'documentencoding'})) {
-    foreach my $element (@{$self->{'commands_info'}->{'documentencoding'}}) {
-      my $perl_encoding
-        = Texinfo::Common::element_extra_encoding_for_perl($element);
-      # Note that the following condition cannot happen as long as
-      # the encodings handled in the XS parser are all known by perl.
-      if (!$perl_encoding) {
-        my $encoding = $element->{'extra'}->{'input_encoding_name'}
-          if ($element->{'extra'});
-        if ($encoding) {
-          my ($registrar, $configuration_information)
-            = _get_error_registrar($self);
-          $registrar->line_warn($configuration_information,
-                     sprintf(__("unrecognized encoding name `%s'"), $encoding),
-                                          $element->{'source_info'});
-        }
-      } else {
-        $self->{'info'}->{'input_perl_encoding'} = $perl_encoding;
-      }
-    }
-  }
+  my ($registrar, $configuration_information)
+     = _get_error_registrar($self);
 
-  _set_errors_node_lists_labels_indices($self);
+  $self->{'info'}->{'input_perl_encoding'} = 'utf-8';
+  my $perl_encoding
+    = Texinfo::Common::get_perl_encoding($self->{'commands_info'},
+                              $registrar, $configuration_information);
+  $self->{'info'}->{'input_perl_encoding'} = $perl_encoding
+     if (defined($perl_encoding));
 }
 
 sub parse_texi_file ($$)
diff --git a/tp/Texinfo/XS/parsetexi/api.c b/tp/Texinfo/XS/parsetexi/api.c
index d5b11e5e82..5bd600da9c 100644
--- a/tp/Texinfo/XS/parsetexi/api.c
+++ b/tp/Texinfo/XS/parsetexi/api.c
@@ -140,6 +140,10 @@ reset_parser_except_conf (void)
   reset_floats ();
   wipe_global_info ();
   set_input_encoding ("utf-8");
+  /* it is not totally obvious that is it better to reset the
+     list to avoid memory leaks rather than reuse the iconv
+     opened handlers */
+  reset_encoding_list ();
   reset_internal_xrefs ();
   reset_labels ();
   input_reset_input_stack ();
diff --git a/tp/Texinfo/XS/parsetexi/end_line.c 
b/tp/Texinfo/XS/parsetexi/end_line.c
index 143b579e4d..5fff51b73e 100644
--- a/tp/Texinfo/XS/parsetexi/end_line.c
+++ b/tp/Texinfo/XS/parsetexi/end_line.c
@@ -18,6 +18,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <ctype.h>
+#include <stdio.h>
 
 #include "parser.h"
 #include "debug.h"
@@ -1311,94 +1312,128 @@ end_line_misc_line (ELEMENT *current)
             }
           else if (current->cmd == CM_documentencoding)
             {
-              int i; char *p, *text2;
+              int i; char *p, *normalized_text, *q;
+              int encoding_set;
               char *input_encoding = 0;
+              int possible_encoding = 0;
+
+              normalized_text = strdup (text);
+              q = normalized_text;
+              /* lower case, trim non-ascii characters and keep only 
alphanumeric
+                 characters, - and _.  iconv also seems to trim non 
alphanumeric
+                 non - _ characters */
+              for (p = text; *p; p++)
+                {
+                  /* check if ascii */
+                  if ((*p & ~0x7f) == 0)
+                    {
+                      if (isalnum (*p))
+                        {
+                          possible_encoding = 1;
+                          *q = tolower (*p);
+                          q++;
+                        }
+                      else if (*p == '_' || *p == '-')
+                        {
+                          *q = *p;
+                          q++;
+                        }
+                    }
+                }
+              *q = '\0';
 
-              text2 = strdup (text);
-              for (p = text2; *p; p++)
-                *p = tolower (*p);
+              if (! possible_encoding)
+                command_warn (current, "bad encoding name `%s'",
+                              text);
+              else
+                {
 
             /* Warn if the encoding is not one of the encodings supported as an
                argument to @documentencoding, documented in Texinfo manual */
-              {
-                char *texinfo_encoding = 0;
-                static char *canonical_encodings[] = {
-                  "us-ascii", "utf-8", "iso-8859-1",
-                  "iso-8859-15","iso-8859-2","koi8-r", "koi8-u",
-                  0
-                };
-
-                for (i = 0; (canonical_encodings[i]); i++)
                   {
-                    if (!strcmp (text2, canonical_encodings[i]))
+                    char *texinfo_encoding = 0;
+                    static char *canonical_encodings[] = {
+                      "us-ascii", "utf-8", "iso-8859-1",
+                      "iso-8859-15","iso-8859-2","koi8-r", "koi8-u",
+                      0
+                    };
+                    char *text_lc;
+
+                    text_lc = strdup (text);
+                    for (p = text_lc; *p; p++)
+                      *p = tolower (*p);
+
+                    for (i = 0; (canonical_encodings[i]); i++)
                       {
-                        texinfo_encoding = canonical_encodings[i];
-                        break;
+                        if (!strcmp (text_lc, canonical_encodings[i]))
+                          {
+                            texinfo_encoding = canonical_encodings[i];
+                            break;
+                          }
+                      }
+                    free (text_lc);
+                    if (!texinfo_encoding)
+                      {
+                        command_warn (current, "encoding `%s' is not a "
+                                    "canonical texinfo encoding", text);
                       }
                   }
-                if (!texinfo_encoding)
-                  {
-                    command_warn (current, "encoding `%s' is not a "
-                                "canonical texinfo encoding", text);
-                  }
-              }
 
               /* Set input_encoding -- for output within an HTML file, used
                                        in most output formats */
-              {
-                struct encoding_map {
-                    char *from; char *to;
-                };
-
-              /* In the perl parser,
-                 lc(Encode::find_encoding()->mime_name()) is used */
-                static struct encoding_map map[] = {
-                      "utf-8", "utf-8",
-                      "ascii",  "us-ascii",
-                      "shiftjis", "shift_jis",
-                      "latin1", "iso-8859-1",
-                      "latin-1", "iso-8859-1",
-                      "iso-8859-1",  "iso-8859-1",
-                      "iso-8859-2",  "iso-8859-2",
-                      "iso-8859-15", "iso-8859-15",
-                      "koi8-r",      "koi8-r",
-                      "koi8-u",      "koi8-u",
-                };
-                for (i = 0; i < sizeof map / sizeof *map; i++)
                   {
-                   /* Elements in first column map to elements in
-                      second column.  Elements in second column map
-                      to themselves. */
-                    if (!strcasecmp (text2, map[i].from)
-                         || !strcasecmp (text2, map[i].to))
+                    struct encoding_map {
+                        char *from; char *to;
+                    };
+
+                  /* In the perl parser,
+                     lc(Encode::find_encoding()->mime_name()) is used */
+                  /* the Perl Parser calls Encode::find_encoding, so knows
+                     about more encodings than what we know about here.
+                   */
+                    static struct encoding_map map[] = {
+                          "utf-8", "utf-8",
+                          "ascii",  "us-ascii",
+                          "shiftjis", "shift_jis",
+                          "latin1", "iso-8859-1",
+                          "latin-1", "iso-8859-1",
+                          "iso-8859-1",  "iso-8859-1",
+                          "iso-8859-2",  "iso-8859-2",
+                          "iso-8859-15", "iso-8859-15",
+                          "koi8-r",      "koi8-r",
+                          "koi8-u",      "koi8-u",
+                    };
+                    for (i = 0; i < sizeof map / sizeof *map; i++)
                       {
-                        input_encoding = map[i].to;
-                        break;
+                       /* Elements in first column map to elements in
+                          second column.  Elements in second column map
+                          to themselves. */
+                        if (!strcasecmp (normalized_text, map[i].from)
+                             || !strcasecmp (normalized_text, map[i].to))
+                          {
+                            input_encoding = map[i].to;
+                            break;
+                          }
                       }
                   }
-              }
-              free (text2);
-
-              if (input_encoding)
-                {
-                  add_extra_string_dup (current, "input_encoding_name",
-                                        input_encoding);
+                  if (!input_encoding)
+                    {
+                      input_encoding = normalized_text;
+                    }
 
-                  global_info.input_encoding_name = strdup (input_encoding);
-                  set_input_encoding (input_encoding);
-                }
-              else
-                {
-                  command_warn (current, "unrecognized encoding name `%s'",
-                                text);
+                  encoding_set = set_input_encoding (input_encoding);
+                  if (encoding_set)
+                    {
+                      add_extra_string_dup (current, "input_encoding_name",
+                                            input_encoding);
 
-                  /* the Perl Parser calls Encode::find_encoding, so knows
-                     about more encodings than what we know about here.
-                     TODO: accept encoding not in encoding_map as long as
-                     an iconv conversion to UTF-8 is possible?
-                     Maybe we should check if an iconv conversion is
-                     possible from this encoding to UTF-8. */
+                      global_info.input_encoding_name = strdup 
(input_encoding);
+                    }
+                  else
+                    command_warn (current, "unhandled encoding name `%s'",
+                                  text);
                 }
+              free (normalized_text);
             }
           else if (current->cmd == CM_documentlanguage)
             {
diff --git a/tp/Texinfo/XS/parsetexi/input.c b/tp/Texinfo/XS/parsetexi/input.c
index 1669a5109f..5546c6fd14 100644
--- a/tp/Texinfo/XS/parsetexi/input.c
+++ b/tp/Texinfo/XS/parsetexi/input.c
@@ -31,16 +31,6 @@
 
 enum input_type { IN_file, IN_text };
 
-enum character_encoding {
-    ce_latin1,
-    ce_latin2,
-    ce_latin15,
-    ce_utf8,
-    ce_shiftjis,
-    ce_koi8r,
-    ce_koi8u
-};
-
 typedef struct {
     enum input_type type;
 
@@ -60,14 +50,33 @@ typedef struct {
 
 static char *input_pushback_string;
 
-enum character_encoding input_encoding;
-
 static char *input_encoding_name;
 static iconv_t reverse_iconv; /* used in encode_file_name */
 
-void
+typedef struct {
+  char *encoding_name;
+  iconv_t iconv;
+} ENCODING;
+
+static ENCODING *encodings_list = 0;
+int encoding_number = 0;
+int encoding_space = 0;
+
+static ENCODING *current_encoding = 0;
+
+/* ENCODING should always be lower cased */
+/* WARNING: it is very important for the first call to
+   set_input_encoding to be for "utf-8" as the codes assume
+   a conversion to UTF-8 in encodings_list[0]. */
+int
 set_input_encoding (char *encoding)
 {
+  int encoding_index = -1;
+  int encoding_set = 0;
+
+  if (!strcmp (encoding, "us-ascii"))
+    encoding = "iso-8859-1";
+
   free (input_encoding_name); input_encoding_name = strdup (encoding);
   if (reverse_iconv)
     {
@@ -75,23 +84,48 @@ set_input_encoding (char *encoding)
       reverse_iconv = (iconv_t) 0;
     }
 
-  if (!strcasecmp (encoding, "utf-8"))
-    input_encoding = ce_utf8;
-  else if (!strcmp (encoding, "iso-8859-1")
-          || !strcmp (encoding, "us-ascii"))
-    input_encoding = ce_latin1;
-  else if (!strcmp (encoding, "iso-8859-2"))
-    input_encoding = ce_latin2;
-  else if (!strcmp (encoding, "iso-8859-15"))
-    input_encoding = ce_latin15;
-  else if (!strcmp (encoding, "shift_jis"))
-    input_encoding = ce_shiftjis;
-  else if (!strcmp (encoding, "koi8-r"))
-    input_encoding = ce_koi8r;
-  else if (!strcmp (encoding, "koi8-u"))
-    input_encoding = ce_koi8u;
+  if (!strcmp (encoding, "utf-8"))
+    {
+      if (encoding_number > 0)
+        encoding_index = 0;
+    }
+  else if (encoding_number > 1)
+    {
+      int i;
+      for (i = 1; i < encoding_number; i++)
+        {
+          if (!strcmp (encoding, encodings_list[i].encoding_name))
+            {
+              encoding_index = i;
+              break;
+            }
+        }
+    }
+
+  if (encoding_index == -1)
+    {
+      if (encoding_number >= encoding_space)
+        {
+          encodings_list = realloc (encodings_list,
+                                    (encoding_space += 3) * sizeof (ENCODING));
+        }
+      encodings_list[encoding_number].encoding_name = strdup (encoding);
+      /* Initialize conversions for the first time.  iconv_open returns
+         (iconv_t) -1 on failure so these should only be called once. */
+      encodings_list[encoding_number].iconv = iconv_open ("UTF-8", encoding);
+      encoding_index = encoding_number;
+      encoding_number++;
+    }
+
+  if (encodings_list[encoding_index].iconv == (iconv_t) -1)
+    current_encoding = 0;
   else
-    fprintf (stderr, "warning: unhandled encoding %s\n", encoding);
+    {
+      current_encoding = &encodings_list[encoding_index];
+      encoding_set = 1;
+    }
+
+  return encoding_set;
 }
 
 
@@ -139,14 +173,6 @@ new_line (ELEMENT *current)
 }
 
 
-static iconv_t iconv_from_latin1;
-static iconv_t iconv_from_latin2;
-static iconv_t iconv_from_latin15;
-static iconv_t iconv_from_shiftjis;
-static iconv_t iconv_from_koi8u;
-static iconv_t iconv_from_koi8r;
-static iconv_t iconv_validate_utf8;
-
 /* Run iconv using text buffer as output buffer. */
 size_t
 text_buffer_iconv (TEXT *buf, iconv_t iconv_state,
@@ -235,49 +261,7 @@ convert_to_utf8 (char *s)
      file, then we'd have to keep track of which strings needed the UTF-8 flag
      and which didn't. */
 
-  /* Initialize conversions for the first time.  iconv_open returns
-     (iconv_t) -1 on failure so these should only be called once. */
-  if (iconv_validate_utf8 == (iconv_t) 0)
-    iconv_validate_utf8 = iconv_open ("UTF-8", "UTF-8");
-  if (iconv_from_latin1 == (iconv_t) 0)
-    iconv_from_latin1 = iconv_open ("UTF-8", "ISO-8859-1");
-  if (iconv_from_latin2 == (iconv_t) 0)
-    iconv_from_latin2 = iconv_open ("UTF-8", "ISO-8859-2");
-  if (iconv_from_latin15 == (iconv_t) 0)
-    iconv_from_latin15 = iconv_open ("UTF-8", "ISO-8859-15");
-  if (iconv_from_shiftjis == (iconv_t) 0)
-    iconv_from_shiftjis = iconv_open ("UTF-8", "SHIFT-JIS");
-  if (iconv_from_koi8r == (iconv_t) 0)
-    iconv_from_koi8r = iconv_open ("UTF-8", "KOI8-R");
-  if (iconv_from_koi8u == (iconv_t) 0)
-    iconv_from_koi8u = iconv_open ("UTF-8", "KOI8-U");
-
-  switch (input_encoding)
-    {
-    case ce_utf8:
-      our_iconv = iconv_validate_utf8;
-      break;
-    case ce_latin1:
-      our_iconv = iconv_from_latin1;
-      break;
-    case ce_latin2:
-      our_iconv = iconv_from_latin2;
-      break;
-    case ce_latin15:
-      our_iconv = iconv_from_latin15;
-      break;
-    case ce_shiftjis:
-      our_iconv = iconv_from_shiftjis;
-      break;
-    case ce_koi8r:
-      our_iconv = iconv_from_koi8r;
-      break;
-    case ce_koi8u:
-      our_iconv = iconv_from_koi8u;
-      break;
-    }
-
-  if (our_iconv == (iconv_t) -1)
+  if (current_encoding == 0)
     {
       /* In case the converter couldn't be initialised.
          Danger: this will cause problems if the input is not in UTF-8 as
@@ -285,7 +269,7 @@ convert_to_utf8 (char *s)
       return s;
     }
 
-  ret = encode_with_iconv (our_iconv, s);
+  ret = encode_with_iconv (current_encoding->iconv, s);
   free (s);
   return ret;
 }
@@ -323,7 +307,7 @@ encode_file_name (char *filename)
         }
       else if (doc_encoding_for_input_file_name)
         {
-          if (input_encoding != ce_utf8 && input_encoding_name)
+          if (input_encoding_name && strcmp (input_encoding_name, "utf-8"))
             {
               reverse_iconv = iconv_open (input_encoding_name, "UTF-8");
             }
@@ -683,6 +667,25 @@ input_reset_input_stack (void)
   value_expansion_nr = 0;
 }
 
+void
+reset_encoding_list (void)
+{
+  int i;
+  /* never reset the utf-8 encoding in position 0 */
+  for (i = 1; i < encoding_number; i++)
+    {
+      free (encodings_list[i].encoding_name);
+      if (encodings_list[i].iconv != (iconv_t) -1)
+        iconv_close (encodings_list[i].iconv);
+    }
+  /* in theory, it could also be 0, but the function is called right
+     after set_input_encoding ("utf-8"); */
+  encoding_number = 1;
+  current_encoding = 0;
+  free (input_encoding_name);
+  input_encoding_name = 0;
+}
+
 int
 top_file_index (void)
 {
diff --git a/tp/Texinfo/XS/parsetexi/input.h b/tp/Texinfo/XS/parsetexi/input.h
index 1749660e3a..b2168ba4e8 100644
--- a/tp/Texinfo/XS/parsetexi/input.h
+++ b/tp/Texinfo/XS/parsetexi/input.h
@@ -15,13 +15,14 @@ int input_push_file (char *filename);
 void input_pushback (char *line);
 void set_input_source_mark (SOURCE_MARK *source_mark);
 void input_reset_input_stack (void);
+void reset_encoding_list (void);
 int expanding_macro (char *macro);
 int top_file_index (void);
 
 char *locate_include_file (char *filename);
 char *encode_file_name (char *filename);
 char *convert_to_utf8 (char *s);
-void set_input_encoding (char *encoding);
+int set_input_encoding (char *encoding);
 void add_include_directory (char *filename);
 void clear_include_directories (void);
 
diff --git a/tp/t/08misc_commands.t b/tp/t/08misc_commands.t
index 19ea6e193d..d1eb5ba221 100644
--- a/tp/t/08misc_commands.t
+++ b/tp/t/08misc_commands.t
@@ -54,6 +54,9 @@ Test text after finalout
 @finalout a word after finalout
 Line after finalout
 '],
+['documentencoding_zero',
+'@documentencoding 0
+'],
 ['also_not_line',
 '
 
diff --git a/tp/t/info_tests.t b/tp/t/info_tests.t
index 68db18421d..3ab958c408 100644
--- a/tp/t/info_tests.t
+++ b/tp/t/info_tests.t
@@ -278,7 +278,7 @@ ref to anchor1@footnote{another footnote}, which is before 
@@node Top: @ref{anch
 
 @image{text_only_image,,,alt}
 '],
-['image_quotes', 
+['image_quotes',
 '@node Top
 
 @image{f--ile,,,alt""\\}
@@ -1082,6 +1082,9 @@ text @* f     nl Something? @* After punct
 * what @* is: ankh p.
 @end menu
 '],
+['chinese_mixed_with_en_EUC_CN',
+undef, {'test_file' => 'chinese_mixed_with_en_EUC_CN.texi'}
+],
 );
 
 my $colons_in_index_entries_and_node = 
@@ -1114,7 +1117,7 @@ node one
 
 ';
 
-push @file_tests, 
+push @file_tests,
 ['colons_in_index_entries_and_node',
 $colons_in_index_entries_and_node,
 undef, {'INFO_SPECIAL_CHARS_QUOTE' => 1,
diff --git a/tp/t/input_files/chinese_mixed_with_en.texi 
b/tp/t/input_files/chinese_mixed_with_en.texi
index a4796841b7..057d7f7170 100644
--- a/tp/t/input_files/chinese_mixed_with_en.texi
+++ b/tp/t/input_files/chinese_mixed_with_en.texi
@@ -3,7 +3,7 @@
 @settitle chinese mixed with english
 
 @node Top
-@top Mixed in UTF-8
+@top Mixed chinese and english
 
 Example of english and chinese, chinese aligned or not.
 
diff --git a/tp/t/input_files/chinese_mixed_with_en.texi 
b/tp/t/input_files/chinese_mixed_with_en_EUC_CN.texi
similarity index 50%
copy from tp/t/input_files/chinese_mixed_with_en.texi
copy to tp/t/input_files/chinese_mixed_with_en_EUC_CN.texi
index a4796841b7..76bddbbf78 100644
--- a/tp/t/input_files/chinese_mixed_with_en.texi
+++ b/tp/t/input_files/chinese_mixed_with_en_EUC_CN.texi
@@ -1,9 +1,9 @@
 \input texinfo
-@documentencoding utf-8
+@documentencoding EUC-CN
 @settitle chinese mixed with english
 
 @node Top
-@top Mixed in UTF-8
+@top Mixed chinese and english
 
 Example of english and chinese, chinese aligned or not.
 
@@ -20,20 +20,20 @@ standard Emacs features when programming in Ada.
 
 2. chinese already aligned in source(this result)
 
-这常用于修饰多个线程会访问或修改的全局变量，让编译器保证每次都从内存读取
-变量的值，而不是作某些优化。（这些优化有可能导致程序不能获得正确的值）
+�ⳣ�������ζ���̻߳���ʻ��޸ĵ�ȫ�ֱ������ñ�������֤ÿ�ζ����ڴ��ȡ
+������ֵ����������ĳЩ�Ż�������Щ�Ż��п��ܵ��³����ܻ����ȷ��ֵ��
 
 3. chinese not aligned in source
 
-这常用于修饰多个线程会访问或修改的全局变量，让编译器保证每次都从内存
-读取
-变量的值，而不是作某些优化。
-（这些优化有可能导致程序不能获得正确的值）
+�ⳣ�������ζ���̻߳���ʻ��޸ĵ�ȫ�ֱ������ñ�������֤ÿ�ζ����ڴ�
+��ȡ
+������ֵ����������ĳЩ�Ż���
+����Щ�Ż��п��ܵ��³����ܻ����ȷ��ֵ��
 
 4. a mix of chinese and english
 
-restrict 表示在当前 scope 内不允许其它变量指向它。用处，比如防止 memory
-overlap。
+restrict ��ʾ�ڵ�ǰ scope �ڲ�������������ָ�������ô��������ֹ memory
+overlap��
 
 
 @bye
diff --git a/tp/t/input_files/sample_EUC_CN.texi 
b/tp/t/input_files/sample_EUC_CN.texi
new file mode 100644
index 0000000000..9684c08459
--- /dev/null
+++ b/tp/t/input_files/sample_EUC_CN.texi
@@ -0,0 +1,4 @@
+\input texinfo   @c -*-texinfo-*-
+@c %**start of header
+@setfilename sample_utf8.info
+@settitle Sample ʾ�� 
\ No newline at end of file
diff --git a/tp/t/results/include/macro_and_commands_in_early_commands.pl 
b/tp/t/results/include/macro_and_commands_in_early_commands.pl
index f5d628c74f..c061f4d435 100644
--- a/tp/t/results/include/macro_and_commands_in_early_commands.pl
+++ b/tp/t/results/include/macro_and_commands_in_early_commands.pl
@@ -231,6 +231,7 @@ $result_trees{'macro_and_commands_in_early_commands'} = {
           ],
           'cmdname' => 'documentencoding',
           'extra' => {
+            'input_encoding_name' => 'iso-8859-1',
             'text_arg' => 'ISO-8859-1@'
           },
           'info' => {
@@ -688,7 +689,7 @@ $result_trees{'macro_and_commands_in_early_commands'} = {
           ],
           'cmdname' => 'verbatiminclude',
           'extra' => {
-            'input_encoding_name' => 'utf-8',
+            'input_encoding_name' => 'iso-8859-1',
             'text_arg' => 'inc_@f--ile.texi'
           },
           'info' => {
@@ -835,15 +836,6 @@ $result_errors{'macro_and_commands_in_early_commands'} = [
     'macro' => '',
     'text' => 'encoding `ISO-8859-1@\' is not a canonical texinfo encoding',
     'type' => 'warning'
-  },
-  {
-    'error_line' => 'warning: unrecognized encoding name `ISO-8859-1@\'
-',
-    'file_name' => '',
-    'line_nr' => 11,
-    'macro' => '',
-    'text' => 'unrecognized encoding name `ISO-8859-1@\'',
-    'type' => 'warning'
   }
 ];
 
diff --git a/tp/t/results/plaintext_tests/chinese_mixed_with_en.pl 
b/tp/t/results/info_tests/chinese_mixed_with_en_EUC_CN.pl
similarity index 83%
copy from tp/t/results/plaintext_tests/chinese_mixed_with_en.pl
copy to tp/t/results/info_tests/chinese_mixed_with_en_EUC_CN.pl
index 73787826c0..7d996935a7 100644
--- a/tp/t/results/plaintext_tests/chinese_mixed_with_en.pl
+++ b/tp/t/results/info_tests/chinese_mixed_with_en_EUC_CN.pl
@@ -5,7 +5,7 @@ use vars qw(%result_texis %result_texts %result_trees 
%result_errors
 
 use utf8;
 
-$result_trees{'chinese_mixed_with_en'} = {
+$result_trees{'chinese_mixed_with_en_EUC_CN'} = {
   'contents' => [
     {
       'contents' => [
@@ -26,7 +26,7 @@ $result_trees{'chinese_mixed_with_en'} = {
                 {
                   'contents' => [
                     {
-                      'text' => 'utf-8'
+                      'text' => 'EUC-CN'
                     }
                   ],
                   'info' => {
@@ -40,8 +40,8 @@ $result_trees{'chinese_mixed_with_en'} = {
               ],
               'cmdname' => 'documentencoding',
               'extra' => {
-                'input_encoding_name' => 'utf-8',
-                'text_arg' => 'utf-8'
+                'input_encoding_name' => 'euc-cn',
+                'text_arg' => 'EUC-CN'
               },
               'info' => {
                 'spaces_before_argument' => {
@@ -49,7 +49,7 @@ $result_trees{'chinese_mixed_with_en'} = {
                 }
               },
               'source_info' => {
-                'file_name' => 'chinese_mixed_with_en.texi',
+                'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
                 'line_nr' => 2,
                 'macro' => ''
               }
@@ -78,7 +78,7 @@ $result_trees{'chinese_mixed_with_en'} = {
                 }
               },
               'source_info' => {
-                'file_name' => 'chinese_mixed_with_en.texi',
+                'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
                 'line_nr' => 3,
                 'macro' => ''
               }
@@ -121,7 +121,7 @@ $result_trees{'chinese_mixed_with_en'} = {
         }
       },
       'source_info' => {
-        'file_name' => 'chinese_mixed_with_en.texi',
+        'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
         'line_nr' => 5,
         'macro' => ''
       }
@@ -131,7 +131,7 @@ $result_trees{'chinese_mixed_with_en'} = {
         {
           'contents' => [
             {
-              'text' => 'Mixed in UTF-8'
+              'text' => 'Mixed chinese and english'
             }
           ],
           'info' => {
@@ -172,7 +172,7 @@ $result_trees{'chinese_mixed_with_en'} = {
         }
       },
       'source_info' => {
-        'file_name' => 'chinese_mixed_with_en.texi',
+        'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
         'line_nr' => 6,
         'macro' => ''
       }
@@ -204,7 +204,7 @@ $result_trees{'chinese_mixed_with_en'} = {
         }
       },
       'source_info' => {
-        'file_name' => 'chinese_mixed_with_en.texi',
+        'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
         'line_nr' => 10,
         'macro' => ''
       }
@@ -394,7 +394,7 @@ $result_trees{'chinese_mixed_with_en'} = {
         }
       },
       'source_info' => {
-        'file_name' => 'chinese_mixed_with_en.texi',
+        'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
         'line_nr' => 11,
         'macro' => ''
       }
@@ -413,12 +413,12 @@ $result_trees{'chinese_mixed_with_en'} = {
   'type' => 'document_root'
 };
 
-$result_texis{'chinese_mixed_with_en'} = '\\input texinfo
-@documentencoding utf-8
+$result_texis{'chinese_mixed_with_en_EUC_CN'} = '\\input texinfo
+@documentencoding EUC-CN
 @settitle chinese mixed with english
 
 @node Top
-@top Mixed in UTF-8
+@top Mixed chinese and english
 
 Example of english and chinese, chinese aligned or not.
 
@@ -455,9 +455,9 @@ overlap。
 ';
 
 
-$result_texts{'chinese_mixed_with_en'} = '
-Mixed in UTF-8
-**************
+$result_texts{'chinese_mixed_with_en_EUC_CN'} = '
+Mixed chinese and english
+*************************
 
 Example of english and chinese, chinese aligned or not.
 
@@ -492,7 +492,7 @@ overlap。
 
 ';
 
-$result_sectioning{'chinese_mixed_with_en'} = {
+$result_sectioning{'chinese_mixed_with_en_EUC_CN'} = {
   'structure' => {
     'section_childs' => [
       {
@@ -536,12 +536,12 @@ $result_sectioning{'chinese_mixed_with_en'} = {
     'section_level' => -1
   }
 };
-$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0]{'structure'}{'section_childs'}[0]{'structure'}{'section_up'}
 = 
$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0];
-$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0]{'structure'}{'section_childs'}[0]{'structure'}{'toplevel_prev'}
 = 
$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0];
-$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0]{'structure'}{'section_childs'}[0]{'structure'}{'toplevel_up'}
 = 
$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0];
-$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0]{'structure'}{'section_up'}
 = $result_sectioning{'chinese_mixed_with_en'};
+$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0]{'structure'}{'section_childs'}[0]{'structure'}{'section_up'}
 = 
$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0];
+$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0]{'structure'}{'section_childs'}[0]{'structure'}{'toplevel_prev'}
 = 
$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0];
+$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0]{'structure'}{'section_childs'}[0]{'structure'}{'toplevel_up'}
 = 
$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0];
+$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0]{'structure'}{'section_up'}
 = $result_sectioning{'chinese_mixed_with_en_EUC_CN'};
 
-$result_nodes{'chinese_mixed_with_en'} = {
+$result_nodes{'chinese_mixed_with_en_EUC_CN'} = {
   'cmdname' => 'node',
   'extra' => {
     'associated_section' => {
@@ -571,10 +571,10 @@ $result_nodes{'chinese_mixed_with_en'} = {
     }
   }
 };
-$result_nodes{'chinese_mixed_with_en'}{'structure'}{'node_next'}{'structure'}{'node_prev'}
 = $result_nodes{'chinese_mixed_with_en'};
-$result_nodes{'chinese_mixed_with_en'}{'structure'}{'node_next'}{'structure'}{'node_up'}
 = $result_nodes{'chinese_mixed_with_en'};
+$result_nodes{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'node_next'}{'structure'}{'node_prev'}
 = $result_nodes{'chinese_mixed_with_en_EUC_CN'};
+$result_nodes{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'node_next'}{'structure'}{'node_up'}
 = $result_nodes{'chinese_mixed_with_en_EUC_CN'};
 
-$result_menus{'chinese_mixed_with_en'} = {
+$result_menus{'chinese_mixed_with_en_EUC_CN'} = {
   'cmdname' => 'node',
   'extra' => {
     'normalized' => 'Top'
@@ -582,10 +582,20 @@ $result_menus{'chinese_mixed_with_en'} = {
   'structure' => {}
 };
 
-$result_errors{'chinese_mixed_with_en'} = [];
+$result_errors{'chinese_mixed_with_en_EUC_CN'} = [
+  {
+    'error_line' => 'warning: encoding `EUC-CN\' is not a canonical texinfo 
encoding
+',
+    'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
+    'line_nr' => 2,
+    'macro' => '',
+    'text' => 'encoding `EUC-CN\' is not a canonical texinfo encoding',
+    'type' => 'warning'
+  }
+];
 
 
-$result_floats{'chinese_mixed_with_en'} = {};
+$result_floats{'chinese_mixed_with_en_EUC_CN'} = {};
 
 
 1;
diff --git 
a/tp/t/results/info_tests/chinese_mixed_with_en_EUC_CN/res_info/chinese_mixed_with_en_EUC_CN.info
 
b/tp/t/results/info_tests/chinese_mixed_with_en_EUC_CN/res_info/chinese_mixed_with_en_EUC_CN.info
new file mode 100644
index 0000000000..17380710d6
--- /dev/null
+++ 
b/tp/t/results/info_tests/chinese_mixed_with_en_EUC_CN/res_info/chinese_mixed_with_en_EUC_CN.info
@@ -0,0 +1,57 @@
+This is chinese_mixed_with_en_EUC_CN.info, produced from
+chinese_mixed_with_en_EUC_CN.texi.
+
+
+File: chinese_mixed_with_en_EUC_CN.info,  Node: Top,  Next: Mixed english and 
chinese,  Up: (dir)
+
+Mixed chinese and english
+*************************
+
+Example of english and chinese, chinese aligned or not.
+
+* Menu:
+
+* Mixed english and chinese::
+
+
+File: chinese_mixed_with_en_EUC_CN.info,  Node: Mixed english and chinese,  
Prev: Top,  Up: Top
+
+1 Mixed english and chinese
+***************************
+
+1.  english only
+
+   The Emacs mode for programming in Ada 95 with GNAT helps the user in
+understanding existing code and facilitates writing new code.  It
+furthermore provides some utility functions for easier integration of
+standard Emacs features when programming in Ada.
+
+   2.  chinese already aligned in source(this result)
+
+   �ⳣ�������ζ���̻߳���ʻ��޸ĵ�ȫ�ֱ������ñ�������֤ÿ�ζ����ڴ�
+��ȡ ������ֵ����������ĳЩ�Ż�������Щ�Ż��п��ܵ��³����ܻ����ȷ��
+ֵ��
+
+   3.  chinese not aligned in source
+
+   �ⳣ�������ζ���̻߳���ʻ��޸ĵ�ȫ�ֱ������ñ�������֤ÿ�ζ����ڴ�
+��ȡ ������ֵ����������ĳЩ�Ż��� ����Щ�Ż��п��ܵ��³����ܻ����ȷ��
+ֵ��
+
+   4.  a mix of chinese and english
+
+   restrict ��ʾ�ڵ�ǰ scope �ڲ�������������ָ�������ô��������ֹ
+memory overlap��
+
+
+
+Tag Table:
+Node: Top93
+Node: Mixed english and chinese344
+
+End Tag Table
+
+
+Local Variables:
+coding: euc-cn
+End:
diff --git a/tp/t/results/info_tests/unknown_encoding.pl 
b/tp/t/results/info_tests/unknown_encoding.pl
index a38184bb56..130d916bf0 100644
--- a/tp/t/results/info_tests/unknown_encoding.pl
+++ b/tp/t/results/info_tests/unknown_encoding.pl
@@ -135,12 +135,12 @@ $result_errors{'unknown_encoding'} = [
     'type' => 'warning'
   },
   {
-    'error_line' => 'warning: unrecognized encoding name `ggg\'
+    'error_line' => 'warning: unhandled encoding name `ggg\'
 ',
     'file_name' => '',
     'line_nr' => 2,
     'macro' => '',
-    'text' => 'unrecognized encoding name `ggg\'',
+    'text' => 'unhandled encoding name `ggg\'',
     'type' => 'warning'
   }
 ];
diff --git a/tp/t/results/macro/macro_in_invalid_documentencoding.pl 
b/tp/t/results/macro/macro_in_invalid_documentencoding.pl
index 6e83402f7e..9e2fa765fc 100644
--- a/tp/t/results/macro/macro_in_invalid_documentencoding.pl
+++ b/tp/t/results/macro/macro_in_invalid_documentencoding.pl
@@ -152,12 +152,12 @@ $result_errors{'macro_in_invalid_documentencoding'} = [
     'type' => 'warning'
   },
   {
-    'error_line' => 'warning: unrecognized encoding name `badm\'
+    'error_line' => 'warning: unhandled encoding name `badm\'
 ',
     'file_name' => '',
     'line_nr' => 4,
     'macro' => '',
-    'text' => 'unrecognized encoding name `badm\'',
+    'text' => 'unhandled encoding name `badm\'',
     'type' => 'warning'
   }
 ];
diff --git a/tp/t/results/misc_commands/documentencoding_zero.pl 
b/tp/t/results/misc_commands/documentencoding_zero.pl
new file mode 100644
index 0000000000..5e8959f205
--- /dev/null
+++ b/tp/t/results/misc_commands/documentencoding_zero.pl
@@ -0,0 +1,82 @@
+use vars qw(%result_texis %result_texts %result_trees %result_errors 
+   %result_indices %result_sectioning %result_nodes %result_menus
+   %result_floats %result_converted %result_converted_errors 
+   %result_elements %result_directions_text %result_indices_sort_strings);
+
+use utf8;
+
+$result_trees{'documentencoding_zero'} = {
+  'contents' => [
+    {
+      'contents' => [
+        {
+          'args' => [
+            {
+              'contents' => [
+                {
+                  'text' => '0'
+                }
+              ],
+              'info' => {
+                'spaces_after_argument' => {
+                  'text' => '
+'
+                }
+              },
+              'type' => 'line_arg'
+            }
+          ],
+          'cmdname' => 'documentencoding',
+          'extra' => {
+            'text_arg' => '0'
+          },
+          'info' => {
+            'spaces_before_argument' => {
+              'text' => ' '
+            }
+          },
+          'source_info' => {
+            'file_name' => '',
+            'line_nr' => 1,
+            'macro' => ''
+          }
+        }
+      ],
+      'type' => 'before_node_section'
+    }
+  ],
+  'type' => 'document_root'
+};
+
+$result_texis{'documentencoding_zero'} = '@documentencoding 0
+';
+
+
+$result_texts{'documentencoding_zero'} = '';
+
+$result_errors{'documentencoding_zero'} = [
+  {
+    'error_line' => 'warning: encoding `0\' is not a canonical texinfo encoding
+',
+    'file_name' => '',
+    'line_nr' => 1,
+    'macro' => '',
+    'text' => 'encoding `0\' is not a canonical texinfo encoding',
+    'type' => 'warning'
+  },
+  {
+    'error_line' => 'warning: unhandled encoding name `0\'
+',
+    'file_name' => '',
+    'line_nr' => 1,
+    'macro' => '',
+    'text' => 'unhandled encoding name `0\'',
+    'type' => 'warning'
+  }
+];
+
+
+$result_floats{'documentencoding_zero'} = {};
+
+
+1;
diff --git a/tp/t/results/misc_commands/invalid_documentencoding.pl 
b/tp/t/results/misc_commands/invalid_documentencoding.pl
index 9d070df09a..cefb4823a4 100644
--- a/tp/t/results/misc_commands/invalid_documentencoding.pl
+++ b/tp/t/results/misc_commands/invalid_documentencoding.pl
@@ -456,12 +456,12 @@ $result_errors{'invalid_documentencoding'} = [
     'type' => 'warning'
   },
   {
-    'error_line' => 'warning: unrecognized encoding name `YS-ASCII\'
+    'error_line' => 'warning: unhandled encoding name `YS-ASCII\'
 ',
     'file_name' => '',
     'line_nr' => 5,
     'macro' => '',
-    'text' => 'unrecognized encoding name `YS-ASCII\'',
+    'text' => 'unhandled encoding name `YS-ASCII\'',
     'type' => 'warning'
   },
   {
@@ -483,12 +483,12 @@ $result_errors{'invalid_documentencoding'} = [
     'type' => 'warning'
   },
   {
-    'error_line' => 'warning: unrecognized encoding name `bad encoding name\'
+    'error_line' => 'warning: unhandled encoding name `bad encoding name\'
 ',
     'file_name' => '',
     'line_nr' => 6,
     'macro' => '',
-    'text' => 'unrecognized encoding name `bad encoding name\'',
+    'text' => 'unhandled encoding name `bad encoding name\'',
     'type' => 'warning'
   },
   {
@@ -501,48 +501,30 @@ $result_errors{'invalid_documentencoding'} = [
     'type' => 'warning'
   },
   {
-    'error_line' => 'warning: unrecognized encoding name `1\'
+    'error_line' => 'warning: unhandled encoding name `1\'
 ',
     'file_name' => '',
     'line_nr' => 7,
     'macro' => '',
-    'text' => 'unrecognized encoding name `1\'',
+    'text' => 'unhandled encoding name `1\'',
     'type' => 'warning'
   },
   {
-    'error_line' => 'warning: encoding `%\' is not a canonical texinfo encoding
+    'error_line' => 'warning: bad encoding name `%\'
 ',
     'file_name' => '',
     'line_nr' => 8,
     'macro' => '',
-    'text' => 'encoding `%\' is not a canonical texinfo encoding',
+    'text' => 'bad encoding name `%\'',
     'type' => 'warning'
   },
   {
-    'error_line' => 'warning: unrecognized encoding name `%\'
-',
-    'file_name' => '',
-    'line_nr' => 8,
-    'macro' => '',
-    'text' => 'unrecognized encoding name `%\'',
-    'type' => 'warning'
-  },
-  {
-    'error_line' => 'warning: encoding `@\' is not a canonical texinfo encoding
-',
-    'file_name' => '',
-    'line_nr' => 9,
-    'macro' => '',
-    'text' => 'encoding `@\' is not a canonical texinfo encoding',
-    'type' => 'warning'
-  },
-  {
-    'error_line' => 'warning: unrecognized encoding name `@\'
+    'error_line' => 'warning: bad encoding name `@\'
 ',
     'file_name' => '',
     'line_nr' => 9,
     'macro' => '',
-    'text' => 'unrecognized encoding name `@\'',
+    'text' => 'bad encoding name `@\'',
     'type' => 'warning'
   },
   {
diff --git a/tp/t/results/misc_commands/many_lines.pl 
b/tp/t/results/misc_commands/many_lines.pl
index 21bada485a..7843e90132 100644
--- a/tp/t/results/misc_commands/many_lines.pl
+++ b/tp/t/results/misc_commands/many_lines.pl
@@ -1668,12 +1668,12 @@ $result_errors{'many_lines'} = [
     'type' => 'warning'
   },
   {
-    'error_line' => 'warning: unrecognized encoding name `US-ascii encoding 
name\'
+    'error_line' => 'warning: unhandled encoding name `US-ascii encoding name\'
 ',
     'file_name' => '',
     'line_nr' => 30,
     'macro' => '',
-    'text' => 'unrecognized encoding name `US-ascii encoding name\'',
+    'text' => 'unhandled encoding name `US-ascii encoding name\'',
     'type' => 'warning'
   },
   {
diff --git a/tp/t/results/plaintext_tests/chinese_mixed_with_en.pl 
b/tp/t/results/plaintext_tests/chinese_mixed_with_en.pl
index 73787826c0..f2437623a6 100644
--- a/tp/t/results/plaintext_tests/chinese_mixed_with_en.pl
+++ b/tp/t/results/plaintext_tests/chinese_mixed_with_en.pl
@@ -131,7 +131,7 @@ $result_trees{'chinese_mixed_with_en'} = {
         {
           'contents' => [
             {
-              'text' => 'Mixed in UTF-8'
+              'text' => 'Mixed chinese and english'
             }
           ],
           'info' => {
@@ -418,7 +418,7 @@ $result_texis{'chinese_mixed_with_en'} = '\\input texinfo
 @settitle chinese mixed with english
 
 @node Top
-@top Mixed in UTF-8
+@top Mixed chinese and english
 
 Example of english and chinese, chinese aligned or not.
 
@@ -456,8 +456,8 @@ overlap。
 
 
 $result_texts{'chinese_mixed_with_en'} = '
-Mixed in UTF-8
-**************
+Mixed chinese and english
+*************************
 
 Example of english and chinese, chinese aligned or not.
 
diff --git 
a/tp/t/results/plaintext_tests/chinese_mixed_with_en/res_plaintext/chinese_mixed_with_en.txt
 
b/tp/t/results/plaintext_tests/chinese_mixed_with_en/res_plaintext/chinese_mixed_with_en.txt
index a8ef1ee6d3..8ea380acce 100644
--- 
a/tp/t/results/plaintext_tests/chinese_mixed_with_en/res_plaintext/chinese_mixed_with_en.txt
+++ 
b/tp/t/results/plaintext_tests/chinese_mixed_with_en/res_plaintext/chinese_mixed_with_en.txt
@@ -1,5 +1,5 @@
-Mixed in UTF-8
-**************
+Mixed chinese and english
+*************************
 
 Example of english and chinese, chinese aligned or not.
 
diff --git a/tp/t/results/value/value_in_invalid_documentencoding.pl 
b/tp/t/results/value/value_in_invalid_documentencoding.pl
index b60ac27003..b66d64bf48 100644
--- a/tp/t/results/value/value_in_invalid_documentencoding.pl
+++ b/tp/t/results/value/value_in_invalid_documentencoding.pl
@@ -108,12 +108,12 @@ $result_errors{'value_in_invalid_documentencoding'} = [
     'type' => 'warning'
   },
   {
-    'error_line' => 'warning: unrecognized encoding name `bad\'
+    'error_line' => 'warning: unhandled encoding name `bad\'
 ',
     'file_name' => '',
     'line_nr' => 2,
     'macro' => '',
-    'text' => 'unrecognized encoding name `bad\'',
+    'text' => 'unhandled encoding name `bad\'',
     'type' => 'warning'
   }
 ];
[Prev in Thread]
Current Thread
[Next in Thread]
master updated (6dda4cc1c9 -> 81a8c41bca), Patrice Dumas, 2023/07/21
- [no subject], Patrice Dumas, 2023/07/21
- [no subject], Patrice Dumas <=
- [no subject], Patrice Dumas, 2023/07/21
Prev by Date: master updated (6dda4cc1c9 -> 81a8c41bca)
Next by Date: [no subject]
Previous by thread: [no subject]
Next by thread: [no subject]
Index(es):
- Date
- Thread