[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[no subject]
From: |
Patrice Dumas |
Date: |
Fri, 21 Jul 2023 15:50:35 -0400 (EDT) |
branch: master
commit 089f1e71ab0d646766536df72dbb75e18f1051c7
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Fri Jul 21 16:53:11 2023 +0200
Use any input encoding known by iconv in the XS parser
* doc/texinfo.texi (@code{@@documentencoding}),
tp/Texinfo/XS/parsetexi/api.c (reset_parser_except_conf),
tp/Texinfo/XS/parsetexi/end_line.c (end_line_misc_line),
tp/Texinfo/XS/parsetexi/input.c (set_input_encoding)
(convert_to_utf8, encode_file_name, reset_encoding_list):
use a list of encodings and iconv handlers based on the
@documentencoding found in the manual. Always set utf-8 at
the first position in the list. Use any encoding known by iconv
instead of a fixed list of known encodings. Remove enum
character_encoding.
* tp/Texinfo/ParserNonXS.pm (_end_line_misc_line),
tp/Texinfo/XS/parsetexi/end_line.c (end_line_misc_line)
<documentencoding>: trim non-ascii characters and keep only
alphanumeric characters, - and _ in encoding names. iconv also seems
to trim non alphanumeric non - _ characters. Consider an encoding not
handled by iconv or not found in perl parser during parsing to be
unhandled.
* tp/Texinfo/Common.pm (get_perl_encoding)
(element_extra_encoding_for_perl), tp/Texinfo/ParserNonXS.pm
(%parser_state_initialization, parse_texi_text, _input_push_file)
(get_parser_info, parse_texi_file, _encode_file_name),
tp/Texinfo/XS/parsetexi/Parsetexi.pm (get_parser_info): add
get_perl_encoding function in Texinfo::Common to set
{'info'}->{'input_perl_encoding'} based on XS parser code. Use this
function in both parsers. In the perl parser, add get_parser_info
function to call get_perl_encoding instead of doing it while parsing,
and call get_parser_info in parse_* functions as required, similar to
code in the XS parser. Consider an encoding not found by perl to
be unrecognized.
* tp/Makefile.am (test_files), tp/Makefile.tres,
tp/t/08misc_commands.t (documentencoding_zero), tp/t/info_tests.t
(chinese_mixed_with_en_EUC_CN),
tp/t/input_files/chinese_mixed_with_en_EUC_CN.texi: two new tests
related to encodings, one using EUC-CN an ascii compatible encodding
that was not available before in the XS parser.
---
ChangeLog | 43 +++++
NEWS | 4 +
doc/texinfo.texi | 6 +-
tp/Makefile.am | 1 +
tp/Makefile.tres | 3 +
tp/TODO | 9 --
tp/Texinfo/Common.pm | 28 +++-
tp/Texinfo/ParserNonXS.pm | 100 ++++++++----
tp/Texinfo/XS/parsetexi/Parsetexi.pm | 32 ++--
tp/Texinfo/XS/parsetexi/api.c | 4 +
tp/Texinfo/XS/parsetexi/end_line.c | 173 +++++++++++++--------
tp/Texinfo/XS/parsetexi/input.c | 167 ++++++++++----------
tp/Texinfo/XS/parsetexi/input.h | 3 +-
tp/t/08misc_commands.t | 3 +
tp/t/info_tests.t | 7 +-
tp/t/input_files/chinese_mixed_with_en.texi | 2 +-
...h_en.texi => chinese_mixed_with_en_EUC_CN.texi} | 20 +--
tp/t/input_files/sample_EUC_CN.texi | 4 +
.../macro_and_commands_in_early_commands.pl | 12 +-
.../chinese_mixed_with_en_EUC_CN.pl} | 66 ++++----
.../res_info/chinese_mixed_with_en_EUC_CN.info | 57 +++++++
tp/t/results/info_tests/unknown_encoding.pl | 4 +-
.../macro/macro_in_invalid_documentencoding.pl | 4 +-
.../results/misc_commands/documentencoding_zero.pl | 82 ++++++++++
.../misc_commands/invalid_documentencoding.pl | 38 ++---
tp/t/results/misc_commands/many_lines.pl | 4 +-
.../plaintext_tests/chinese_mixed_with_en.pl | 8 +-
.../res_plaintext/chinese_mixed_with_en.txt | 4 +-
.../value/value_in_invalid_documentencoding.pl | 4 +-
29 files changed, 579 insertions(+), 313 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 4eacfc722a..92dd33fca1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,46 @@
+2023-07-21 Patrice Dumas <pertusus@free.fr>
+
+ Use any input encoding known by iconv in the XS parser
+
+ * doc/texinfo.texi (@code{@@documentencoding}),
+ tp/Texinfo/XS/parsetexi/api.c (reset_parser_except_conf),
+ tp/Texinfo/XS/parsetexi/end_line.c (end_line_misc_line),
+ tp/Texinfo/XS/parsetexi/input.c (set_input_encoding)
+ (convert_to_utf8, encode_file_name, reset_encoding_list):
+ use a list of encodings and iconv handlers based on the
+ @documentencoding found in the manual. Always set utf-8 at
+ the first position in the list. Use any encoding known by iconv
+ instead of a fixed list of known encodings. Remove enum
+ character_encoding.
+
+ * tp/Texinfo/ParserNonXS.pm (_end_line_misc_line),
+ tp/Texinfo/XS/parsetexi/end_line.c (end_line_misc_line)
+ <documentencoding>: trim non-ascii characters and keep only
+ alphanumeric characters, - and _ in encoding names. iconv also seems
+ to trim non alphanumeric non - _ characters. Consider an encoding not
+ handled by iconv or not found in perl parser during parsing to be
+ unhandled.
+
+ * tp/Texinfo/Common.pm (get_perl_encoding)
+ (element_extra_encoding_for_perl), tp/Texinfo/ParserNonXS.pm
+ (%parser_state_initialization, parse_texi_text, _input_push_file)
+ (get_parser_info, parse_texi_file, _encode_file_name),
+ tp/Texinfo/XS/parsetexi/Parsetexi.pm (get_parser_info): add
+ get_perl_encoding function in Texinfo::Common to set
+ {'info'}->{'input_perl_encoding'} based on XS parser code. Use this
+ function in both parsers. In the perl parser, add get_parser_info
+ function to call get_perl_encoding instead of doing it while parsing,
+ and call get_parser_info in parse_* functions as required, similar to
+ code in the XS parser. Consider an encoding not found by perl to
+ be unrecognized.
+
+ * tp/Makefile.am (test_files), tp/Makefile.tres,
+ tp/t/08misc_commands.t (documentencoding_zero), tp/t/info_tests.t
+ (chinese_mixed_with_en_EUC_CN),
+ tp/t/input_files/chinese_mixed_with_en_EUC_CN.texi: two new tests
+ related to encodings, one using EUC-CN an ascii compatible encodding
+ that was not available before in the XS parser.
+
2023-07-20 Patrice Dumas <pertusus@free.fr>
* tp/t/65linemacro.t (end_conditional_in_linemacro)
diff --git a/NEWS b/NEWS
index 1bc0aa9092..4c40f41942 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,10 @@ See the manual for detailed information.
------------------------------------------------------------------------------
+* texi2any
+ . use any input encoding known by iconv and Perl in the XS (default) parser.
+
+
* Language
. new generic definition commands, @defblock, @defline and @deftypeline,
that do not automatically create index entries
diff --git a/doc/texinfo.texi b/doc/texinfo.texi
index e3c9a14aa0..51de13cb2d 100644
--- a/doc/texinfo.texi
+++ b/doc/texinfo.texi
@@ -12535,7 +12535,11 @@ if your document encoding is not the default encoding.
UTF-8 should always be the best choice for the encoding.
Texinfo still supports additional encodings, mainly for compatibility with
-older manuals:
+older manuals@footnote{@command{texi2any} supports more encodings for Texinfo
+manuals, potentially all the encodings supported by both Perl and iconv
+(@pxref{Generic Charset Conversion,,, libc, The GNU C Library}).
+The support in output formats may be lacking, however, especially for @LaTeX{}
+output.}:
@table @code
@item US-ASCII
diff --git a/tp/Makefile.am b/tp/Makefile.am
index 04aaaefffb..bb4f1cd928 100644
--- a/tp/Makefile.am
+++ b/tp/Makefile.am
@@ -191,6 +191,7 @@ test_files = \
t/input_files/char_latin2_latin2_in_refs.texi \
t/input_files/character_and_spaces_in_refs_text.texi \
t/input_files/chinese_mixed_with_en.texi \
+ t/input_files/chinese_mixed_with_en_EUC_CN.texi \
t/input_files/complex_sectioning_case.texi \
t/input_files/cond.texi \
t/input_files/contents_at_document_begin.texi \
diff --git a/tp/Makefile.tres b/tp/Makefile.tres
index a8acb75e34..82ab6f2373 100644
--- a/tp/Makefile.tres
+++ b/tp/Makefile.tres
@@ -884,6 +884,8 @@ test_files_generated_list =
$(test_tap_files_generated_list) \
t/results/info_tests/anchor_in_command.pl \
t/results/info_tests/before_node_and_section.pl \
t/results/info_tests/center_flush.pl \
+ t/results/info_tests/chinese_mixed_with_en_EUC_CN.pl \
+ t/results/info_tests/chinese_mixed_with_en_EUC_CN/res_info \
t/results/info_tests/colon_in_index_entry.pl \
t/results/info_tests/colons_in_index_entries_and_node.pl \
t/results/info_tests/colons_in_index_entries_and_node/res_info \
@@ -1422,6 +1424,7 @@ test_files_generated_list =
$(test_tap_files_generated_list) \
t/results/misc_commands/definfoenclose.pl \
t/results/misc_commands/definfoenclose_nestings.pl \
t/results/misc_commands/definfoenclose_with_empty_arg.pl \
+ t/results/misc_commands/documentencoding_zero.pl \
t/results/misc_commands/double_exdent.pl \
t/results/misc_commands/empty_center.pl \
t/results/misc_commands/empty_center_with_arg.pl \
diff --git a/tp/TODO b/tp/TODO
index d5f9e0bdc1..bba708b5c5 100644
--- a/tp/TODO
+++ b/tp/TODO
@@ -46,15 +46,6 @@ For converter writers,
Delayed bugs
============
-Change the implementation, in the XS parser, of encoding support. Instead of a
-fixed list of supported encodings, which all have their conversion initialized,
-the input encoding actually used could simply be used to initialize iconv (for
-instance when encountering @documentencoding), using a list of initialized
-conversion (which would only have two elements, one for utf-8 and one for
-another documentencoding in general as there is no more than one
-documentencoding in most manuals). See input.c set_input_encoding and
-convert_to_utf8.
-
See message/thread from Reißner Ernst: Feature request: api docs
hyphenation: should only appear in toplevel.
diff --git a/tp/Texinfo/Common.pm b/tp/Texinfo/Common.pm
index 90a4ca44c8..c416a12867 100644
--- a/tp/Texinfo/Common.pm
+++ b/tp/Texinfo/Common.pm
@@ -923,6 +923,32 @@ sub _add_preamble_before_content($)
unshift (@{$before_node_section->{'contents'}}, @first_types);
}
+sub get_perl_encoding($$$)
+{
+ my $commands_info = shift;
+ my $registrar = shift;
+ my $configuration_information = shift;
+
+ my $result;
+ if (defined($commands_info->{'documentencoding'})) {
+ foreach my $element (@{$commands_info->{'documentencoding'}}) {
+ my $perl_encoding = element_extra_encoding_for_perl($element);
+ if (!defined($perl_encoding)) {
+ my $encoding = $element->{'extra'}->{'input_encoding_name'}
+ if ($element->{'extra'});
+ if (defined($encoding)) {
+ $registrar->line_warn($configuration_information,
+ sprintf(__("unrecognized encoding name `%s'"), $encoding),
+ $element->{'source_info'});
+ }
+ } else {
+ $result = $perl_encoding;
+ }
+ }
+ }
+ return $result;
+}
+
# for Parser and main program
sub warn_unknown_language($) {
my $lang = shift;
@@ -1232,7 +1258,7 @@ sub element_extra_encoding_for_perl($)
my $encoding = $element->{'extra'}->{'input_encoding_name'}
if ($element->{'extra'});
- if ($encoding) {
+ if (defined($encoding) and $encoding ne '') {
my $Encode_encoding_object = Encode::find_encoding($encoding);
if (defined($Encode_encoding_object)) {
$perl_encoding = $Encode_encoding_object->name();
diff --git a/tp/Texinfo/ParserNonXS.pm b/tp/Texinfo/ParserNonXS.pm
index ce3685074c..57f6511fa8 100644
--- a/tp/Texinfo/ParserNonXS.pm
+++ b/tp/Texinfo/ParserNonXS.pm
@@ -147,6 +147,8 @@ my %parser_state_initialization = (
'merged_indices' => {}, # the key is merged in the value
'sections_level' => 0, # modified by raise/lowersections
'targets' => [], # array of elements used to build 'labels'
+ 'input_file_encoding' => 'utf-8', # perl encoding name used for the input
+ # file
# initialization of information returned by global_information()
'info' => {
'input_encoding_name' => 'utf-8',
@@ -757,6 +759,8 @@ sub parse_texi_piece($$;$)
= _setup_document_root_and_before_node_section();
my $tree = $self->_parse_texi($document_root, $before_node_section);
+ get_parser_info($self);
+
return $tree;
}
@@ -789,7 +793,11 @@ sub parse_texi_text($$;$)
_input_push_text($self, $text, $line_nr);
- return $self->_parse_texi_document();
+ my $tree = $self->_parse_texi_document();
+
+ get_parser_info($self);
+
+ return $tree;
}
# $INPUT_FILE_PATH the name of the opened file should be a binary string.
@@ -803,8 +811,8 @@ sub _input_push_file
return 0, undef, undef, $!;
}
- if (defined($self->{'info'}->{'input_perl_encoding'})) {
- if ($self->{'info'}->{'input_perl_encoding'} eq 'utf-8') {
+ if (defined($self->{'input_file_encoding'})) {
+ if ($self->{'input_file_encoding'} eq 'utf-8') {
binmode($filehandle, ":utf8");
# Use :utf8 instead of :encoding(utf-8), as the latter does
# error checking and has (unreliably) led to fatal errors
@@ -813,7 +821,7 @@ sub _input_push_file
# Evidently Perl is checking ahead in the file.
} else {
binmode($filehandle,
- ":encoding($self->{'info'}->{'input_perl_encoding'})");
+ ":encoding($self->{'input_file_encoding'})");
}
}
my ($file_name, $directories, $suffix) = fileparse($input_file_path);
@@ -837,6 +845,17 @@ sub _input_push_file
return 1, $file_name, $directories, undef;
}
+sub get_parser_info($)
+{
+ my $self = shift;
+
+ my $perl_encoding
+ = Texinfo::Common::get_perl_encoding($self->{'commands_info'},
+ $self->{'registrar'}, $self);
+ $self->{'info'}->{'input_perl_encoding'} = $perl_encoding
+ if (defined($perl_encoding));
+}
+
# parse a texi file
# $INPUT_FILE_PATH is the name of the parsed file and should be a binary
string.
sub parse_texi_file($$)
@@ -862,7 +881,10 @@ sub parse_texi_file($$)
$self->{'info'}->{'input_file_name'} = $file_name;
$self->{'info'}->{'input_directory'} = $directories;
- return $self->_parse_texi_document();
+ my $tree = $self->_parse_texi_document();
+ get_parser_info($self);
+
+ return $tree;
}
sub _parse_texi_document($)
@@ -2285,7 +2307,7 @@ sub _encode_file_name($$)
if ($input_file_name_encoding) {
$encoding = $input_file_name_encoding;
} elsif ($self->get_conf('DOC_ENCODING_FOR_INPUT_FILE_NAME')) {
- $encoding = $self->{'info'}->{'input_perl_encoding'};
+ $encoding = $self->{'input_file_encoding'};
} else {
$encoding = $self->get_conf('LOCALE_ENCODING');
}
@@ -3517,39 +3539,49 @@ sub _end_line_misc_line($$$)
= $self->{'info'}->{'input_encoding_name'}
if defined $self->{'info'}->{'input_encoding_name'};
} elsif ($command eq 'documentencoding') {
+ # lower case, trim non-ascii characters and keep only alphanumeric
+ # characters, - and _. iconv also seems to trim non alphanumeric
+ # non - _ characters
+ my $normalized_text = lc($text);
+ $normalized_text =~ s/[^[:alnum:]_\-]//;
- # Warn if the encoding is not one of the encodings supported as an
- # argument to @documentencoding, documented in Texinfo manual
- unless ($canonical_texinfo_encodings{lc($text)}) {
+ if ($normalized_text !~ /[[:alnum:]]/) {
$self->_command_warn($current, $source_info,
- __("encoding `%s' is not a canonical texinfo encoding"),
- $text)
- }
+ __("bad encoding name `%s'"), $text);
+ } else {
+ # Warn if the encoding is not one of the encodings supported as an
+ # argument to @documentencoding, documented in Texinfo manual
+ unless ($canonical_texinfo_encodings{lc($text)}) {
+ $self->_command_warn($current, $source_info,
+ __("encoding `%s' is not a canonical texinfo encoding"),
+ $text)
+ }
- # Set $perl_encoding -- an encoding name suitable for perl;
- # $input_encoding -- for output within an HTML file, used
- # in most output formats
- my ($perl_encoding, $input_encoding);
- my $Encode_encoding_object = find_encoding($text);
- if (defined($Encode_encoding_object)) {
- $perl_encoding = $Encode_encoding_object->name();
- # mime_name() is upper-case, our keys are lower case, set to lower
case
- $input_encoding = lc($Encode_encoding_object->mime_name());
- }
+ # Set $perl_encoding -- an encoding name suitable for perl;
+ # $input_encoding -- for output within an HTML file, used
+ # in most output formats
+ my ($perl_encoding, $input_encoding);
+ my $Encode_encoding_object = find_encoding($normalized_text);
+ if (defined($Encode_encoding_object)) {
+ $perl_encoding = $Encode_encoding_object->name();
+ # mime_name() is upper-case, our keys are lower case, set to lower
case
+ $input_encoding = lc($Encode_encoding_object->mime_name());
+ }
- if ($input_encoding) {
- $current->{'extra'}->{'input_encoding_name'} = $input_encoding;
- $self->{'info'}->{'input_encoding_name'} = $input_encoding;
- }
+ if (!$perl_encoding) {
+ $self->_command_warn($current, $source_info,
+ __("unhandled encoding name `%s'"), $text);
+ } else {
+ if ($input_encoding) {
+ $current->{'extra'}->{'input_encoding_name'} = $input_encoding;
+ $self->{'info'}->{'input_encoding_name'} = $input_encoding;
+ }
- if (!$perl_encoding) {
- $self->_command_warn($current, $source_info,
- __("unrecognized encoding name `%s'"), $text);
- } else {
- $self->{'info'}->{'input_perl_encoding'} = $perl_encoding;
- foreach my $input (@{$self->{'input'}}) {
- binmode($input->{'fh'}, ":encoding($perl_encoding)")
- if ($input->{'fh'});
+ $self->{'input_file_encoding'} = $perl_encoding;
+ foreach my $input (@{$self->{'input'}}) {
+ binmode($input->{'fh'}, ":encoding($perl_encoding)")
+ if ($input->{'fh'});
+ }
}
}
} elsif ($command eq 'documentlanguage') {
diff --git a/tp/Texinfo/XS/parsetexi/Parsetexi.pm
b/tp/Texinfo/XS/parsetexi/Parsetexi.pm
index 60ed6e2087..63f9dad186 100644
--- a/tp/Texinfo/XS/parsetexi/Parsetexi.pm
+++ b/tp/Texinfo/XS/parsetexi/Parsetexi.pm
@@ -251,31 +251,17 @@ sub get_parser_info {
$self->{'info'} = $GLOBAL_INFO;
$self->{'commands_info'} = $GLOBAL_INFO2;
- $self->{'info'}->{'input_perl_encoding'} = 'utf-8';
+ _set_errors_node_lists_labels_indices($self);
- if (defined($self->{'commands_info'}->{'documentencoding'})) {
- foreach my $element (@{$self->{'commands_info'}->{'documentencoding'}}) {
- my $perl_encoding
- = Texinfo::Common::element_extra_encoding_for_perl($element);
- # Note that the following condition cannot happen as long as
- # the encodings handled in the XS parser are all known by perl.
- if (!$perl_encoding) {
- my $encoding = $element->{'extra'}->{'input_encoding_name'}
- if ($element->{'extra'});
- if ($encoding) {
- my ($registrar, $configuration_information)
- = _get_error_registrar($self);
- $registrar->line_warn($configuration_information,
- sprintf(__("unrecognized encoding name `%s'"), $encoding),
- $element->{'source_info'});
- }
- } else {
- $self->{'info'}->{'input_perl_encoding'} = $perl_encoding;
- }
- }
- }
+ my ($registrar, $configuration_information)
+ = _get_error_registrar($self);
- _set_errors_node_lists_labels_indices($self);
+ $self->{'info'}->{'input_perl_encoding'} = 'utf-8';
+ my $perl_encoding
+ = Texinfo::Common::get_perl_encoding($self->{'commands_info'},
+ $registrar, $configuration_information);
+ $self->{'info'}->{'input_perl_encoding'} = $perl_encoding
+ if (defined($perl_encoding));
}
sub parse_texi_file ($$)
diff --git a/tp/Texinfo/XS/parsetexi/api.c b/tp/Texinfo/XS/parsetexi/api.c
index d5b11e5e82..5bd600da9c 100644
--- a/tp/Texinfo/XS/parsetexi/api.c
+++ b/tp/Texinfo/XS/parsetexi/api.c
@@ -140,6 +140,10 @@ reset_parser_except_conf (void)
reset_floats ();
wipe_global_info ();
set_input_encoding ("utf-8");
+ /* it is not totally obvious that is it better to reset the
+ list to avoid memory leaks rather than reuse the iconv
+ opened handlers */
+ reset_encoding_list ();
reset_internal_xrefs ();
reset_labels ();
input_reset_input_stack ();
diff --git a/tp/Texinfo/XS/parsetexi/end_line.c
b/tp/Texinfo/XS/parsetexi/end_line.c
index 143b579e4d..5fff51b73e 100644
--- a/tp/Texinfo/XS/parsetexi/end_line.c
+++ b/tp/Texinfo/XS/parsetexi/end_line.c
@@ -18,6 +18,7 @@
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
+#include <stdio.h>
#include "parser.h"
#include "debug.h"
@@ -1311,94 +1312,128 @@ end_line_misc_line (ELEMENT *current)
}
else if (current->cmd == CM_documentencoding)
{
- int i; char *p, *text2;
+ int i; char *p, *normalized_text, *q;
+ int encoding_set;
char *input_encoding = 0;
+ int possible_encoding = 0;
+
+ normalized_text = strdup (text);
+ q = normalized_text;
+ /* lower case, trim non-ascii characters and keep only
alphanumeric
+ characters, - and _. iconv also seems to trim non
alphanumeric
+ non - _ characters */
+ for (p = text; *p; p++)
+ {
+ /* check if ascii */
+ if ((*p & ~0x7f) == 0)
+ {
+ if (isalnum (*p))
+ {
+ possible_encoding = 1;
+ *q = tolower (*p);
+ q++;
+ }
+ else if (*p == '_' || *p == '-')
+ {
+ *q = *p;
+ q++;
+ }
+ }
+ }
+ *q = '\0';
- text2 = strdup (text);
- for (p = text2; *p; p++)
- *p = tolower (*p);
+ if (! possible_encoding)
+ command_warn (current, "bad encoding name `%s'",
+ text);
+ else
+ {
/* Warn if the encoding is not one of the encodings supported as an
argument to @documentencoding, documented in Texinfo manual */
- {
- char *texinfo_encoding = 0;
- static char *canonical_encodings[] = {
- "us-ascii", "utf-8", "iso-8859-1",
- "iso-8859-15","iso-8859-2","koi8-r", "koi8-u",
- 0
- };
-
- for (i = 0; (canonical_encodings[i]); i++)
{
- if (!strcmp (text2, canonical_encodings[i]))
+ char *texinfo_encoding = 0;
+ static char *canonical_encodings[] = {
+ "us-ascii", "utf-8", "iso-8859-1",
+ "iso-8859-15","iso-8859-2","koi8-r", "koi8-u",
+ 0
+ };
+ char *text_lc;
+
+ text_lc = strdup (text);
+ for (p = text_lc; *p; p++)
+ *p = tolower (*p);
+
+ for (i = 0; (canonical_encodings[i]); i++)
{
- texinfo_encoding = canonical_encodings[i];
- break;
+ if (!strcmp (text_lc, canonical_encodings[i]))
+ {
+ texinfo_encoding = canonical_encodings[i];
+ break;
+ }
+ }
+ free (text_lc);
+ if (!texinfo_encoding)
+ {
+ command_warn (current, "encoding `%s' is not a "
+ "canonical texinfo encoding", text);
}
}
- if (!texinfo_encoding)
- {
- command_warn (current, "encoding `%s' is not a "
- "canonical texinfo encoding", text);
- }
- }
/* Set input_encoding -- for output within an HTML file, used
in most output formats */
- {
- struct encoding_map {
- char *from; char *to;
- };
-
- /* In the perl parser,
- lc(Encode::find_encoding()->mime_name()) is used */
- static struct encoding_map map[] = {
- "utf-8", "utf-8",
- "ascii", "us-ascii",
- "shiftjis", "shift_jis",
- "latin1", "iso-8859-1",
- "latin-1", "iso-8859-1",
- "iso-8859-1", "iso-8859-1",
- "iso-8859-2", "iso-8859-2",
- "iso-8859-15", "iso-8859-15",
- "koi8-r", "koi8-r",
- "koi8-u", "koi8-u",
- };
- for (i = 0; i < sizeof map / sizeof *map; i++)
{
- /* Elements in first column map to elements in
- second column. Elements in second column map
- to themselves. */
- if (!strcasecmp (text2, map[i].from)
- || !strcasecmp (text2, map[i].to))
+ struct encoding_map {
+ char *from; char *to;
+ };
+
+ /* In the perl parser,
+ lc(Encode::find_encoding()->mime_name()) is used */
+ /* the Perl Parser calls Encode::find_encoding, so knows
+ about more encodings than what we know about here.
+ */
+ static struct encoding_map map[] = {
+ "utf-8", "utf-8",
+ "ascii", "us-ascii",
+ "shiftjis", "shift_jis",
+ "latin1", "iso-8859-1",
+ "latin-1", "iso-8859-1",
+ "iso-8859-1", "iso-8859-1",
+ "iso-8859-2", "iso-8859-2",
+ "iso-8859-15", "iso-8859-15",
+ "koi8-r", "koi8-r",
+ "koi8-u", "koi8-u",
+ };
+ for (i = 0; i < sizeof map / sizeof *map; i++)
{
- input_encoding = map[i].to;
- break;
+ /* Elements in first column map to elements in
+ second column. Elements in second column map
+ to themselves. */
+ if (!strcasecmp (normalized_text, map[i].from)
+ || !strcasecmp (normalized_text, map[i].to))
+ {
+ input_encoding = map[i].to;
+ break;
+ }
}
}
- }
- free (text2);
-
- if (input_encoding)
- {
- add_extra_string_dup (current, "input_encoding_name",
- input_encoding);
+ if (!input_encoding)
+ {
+ input_encoding = normalized_text;
+ }
- global_info.input_encoding_name = strdup (input_encoding);
- set_input_encoding (input_encoding);
- }
- else
- {
- command_warn (current, "unrecognized encoding name `%s'",
- text);
+ encoding_set = set_input_encoding (input_encoding);
+ if (encoding_set)
+ {
+ add_extra_string_dup (current, "input_encoding_name",
+ input_encoding);
- /* the Perl Parser calls Encode::find_encoding, so knows
- about more encodings than what we know about here.
- TODO: accept encoding not in encoding_map as long as
- an iconv conversion to UTF-8 is possible?
- Maybe we should check if an iconv conversion is
- possible from this encoding to UTF-8. */
+ global_info.input_encoding_name = strdup
(input_encoding);
+ }
+ else
+ command_warn (current, "unhandled encoding name `%s'",
+ text);
}
+ free (normalized_text);
}
else if (current->cmd == CM_documentlanguage)
{
diff --git a/tp/Texinfo/XS/parsetexi/input.c b/tp/Texinfo/XS/parsetexi/input.c
index 1669a5109f..5546c6fd14 100644
--- a/tp/Texinfo/XS/parsetexi/input.c
+++ b/tp/Texinfo/XS/parsetexi/input.c
@@ -31,16 +31,6 @@
enum input_type { IN_file, IN_text };
-enum character_encoding {
- ce_latin1,
- ce_latin2,
- ce_latin15,
- ce_utf8,
- ce_shiftjis,
- ce_koi8r,
- ce_koi8u
-};
-
typedef struct {
enum input_type type;
@@ -60,14 +50,33 @@ typedef struct {
static char *input_pushback_string;
-enum character_encoding input_encoding;
-
static char *input_encoding_name;
static iconv_t reverse_iconv; /* used in encode_file_name */
-void
+typedef struct {
+ char *encoding_name;
+ iconv_t iconv;
+} ENCODING;
+
+static ENCODING *encodings_list = 0;
+int encoding_number = 0;
+int encoding_space = 0;
+
+static ENCODING *current_encoding = 0;
+
+/* ENCODING should always be lower cased */
+/* WARNING: it is very important for the first call to
+ set_input_encoding to be for "utf-8" as the codes assume
+ a conversion to UTF-8 in encodings_list[0]. */
+int
set_input_encoding (char *encoding)
{
+ int encoding_index = -1;
+ int encoding_set = 0;
+
+ if (!strcmp (encoding, "us-ascii"))
+ encoding = "iso-8859-1";
+
free (input_encoding_name); input_encoding_name = strdup (encoding);
if (reverse_iconv)
{
@@ -75,23 +84,48 @@ set_input_encoding (char *encoding)
reverse_iconv = (iconv_t) 0;
}
- if (!strcasecmp (encoding, "utf-8"))
- input_encoding = ce_utf8;
- else if (!strcmp (encoding, "iso-8859-1")
- || !strcmp (encoding, "us-ascii"))
- input_encoding = ce_latin1;
- else if (!strcmp (encoding, "iso-8859-2"))
- input_encoding = ce_latin2;
- else if (!strcmp (encoding, "iso-8859-15"))
- input_encoding = ce_latin15;
- else if (!strcmp (encoding, "shift_jis"))
- input_encoding = ce_shiftjis;
- else if (!strcmp (encoding, "koi8-r"))
- input_encoding = ce_koi8r;
- else if (!strcmp (encoding, "koi8-u"))
- input_encoding = ce_koi8u;
+ if (!strcmp (encoding, "utf-8"))
+ {
+ if (encoding_number > 0)
+ encoding_index = 0;
+ }
+ else if (encoding_number > 1)
+ {
+ int i;
+ for (i = 1; i < encoding_number; i++)
+ {
+ if (!strcmp (encoding, encodings_list[i].encoding_name))
+ {
+ encoding_index = i;
+ break;
+ }
+ }
+ }
+
+ if (encoding_index == -1)
+ {
+ if (encoding_number >= encoding_space)
+ {
+ encodings_list = realloc (encodings_list,
+ (encoding_space += 3) * sizeof (ENCODING));
+ }
+ encodings_list[encoding_number].encoding_name = strdup (encoding);
+ /* Initialize conversions for the first time. iconv_open returns
+ (iconv_t) -1 on failure so these should only be called once. */
+ encodings_list[encoding_number].iconv = iconv_open ("UTF-8", encoding);
+ encoding_index = encoding_number;
+ encoding_number++;
+ }
+
+ if (encodings_list[encoding_index].iconv == (iconv_t) -1)
+ current_encoding = 0;
else
- fprintf (stderr, "warning: unhandled encoding %s\n", encoding);
+ {
+ current_encoding = &encodings_list[encoding_index];
+ encoding_set = 1;
+ }
+
+ return encoding_set;
}
@@ -139,14 +173,6 @@ new_line (ELEMENT *current)
}
-static iconv_t iconv_from_latin1;
-static iconv_t iconv_from_latin2;
-static iconv_t iconv_from_latin15;
-static iconv_t iconv_from_shiftjis;
-static iconv_t iconv_from_koi8u;
-static iconv_t iconv_from_koi8r;
-static iconv_t iconv_validate_utf8;
-
/* Run iconv using text buffer as output buffer. */
size_t
text_buffer_iconv (TEXT *buf, iconv_t iconv_state,
@@ -235,49 +261,7 @@ convert_to_utf8 (char *s)
file, then we'd have to keep track of which strings needed the UTF-8 flag
and which didn't. */
- /* Initialize conversions for the first time. iconv_open returns
- (iconv_t) -1 on failure so these should only be called once. */
- if (iconv_validate_utf8 == (iconv_t) 0)
- iconv_validate_utf8 = iconv_open ("UTF-8", "UTF-8");
- if (iconv_from_latin1 == (iconv_t) 0)
- iconv_from_latin1 = iconv_open ("UTF-8", "ISO-8859-1");
- if (iconv_from_latin2 == (iconv_t) 0)
- iconv_from_latin2 = iconv_open ("UTF-8", "ISO-8859-2");
- if (iconv_from_latin15 == (iconv_t) 0)
- iconv_from_latin15 = iconv_open ("UTF-8", "ISO-8859-15");
- if (iconv_from_shiftjis == (iconv_t) 0)
- iconv_from_shiftjis = iconv_open ("UTF-8", "SHIFT-JIS");
- if (iconv_from_koi8r == (iconv_t) 0)
- iconv_from_koi8r = iconv_open ("UTF-8", "KOI8-R");
- if (iconv_from_koi8u == (iconv_t) 0)
- iconv_from_koi8u = iconv_open ("UTF-8", "KOI8-U");
-
- switch (input_encoding)
- {
- case ce_utf8:
- our_iconv = iconv_validate_utf8;
- break;
- case ce_latin1:
- our_iconv = iconv_from_latin1;
- break;
- case ce_latin2:
- our_iconv = iconv_from_latin2;
- break;
- case ce_latin15:
- our_iconv = iconv_from_latin15;
- break;
- case ce_shiftjis:
- our_iconv = iconv_from_shiftjis;
- break;
- case ce_koi8r:
- our_iconv = iconv_from_koi8r;
- break;
- case ce_koi8u:
- our_iconv = iconv_from_koi8u;
- break;
- }
-
- if (our_iconv == (iconv_t) -1)
+ if (current_encoding == 0)
{
/* In case the converter couldn't be initialised.
Danger: this will cause problems if the input is not in UTF-8 as
@@ -285,7 +269,7 @@ convert_to_utf8 (char *s)
return s;
}
- ret = encode_with_iconv (our_iconv, s);
+ ret = encode_with_iconv (current_encoding->iconv, s);
free (s);
return ret;
}
@@ -323,7 +307,7 @@ encode_file_name (char *filename)
}
else if (doc_encoding_for_input_file_name)
{
- if (input_encoding != ce_utf8 && input_encoding_name)
+ if (input_encoding_name && strcmp (input_encoding_name, "utf-8"))
{
reverse_iconv = iconv_open (input_encoding_name, "UTF-8");
}
@@ -683,6 +667,25 @@ input_reset_input_stack (void)
value_expansion_nr = 0;
}
+void
+reset_encoding_list (void)
+{
+ int i;
+ /* never reset the utf-8 encoding in position 0 */
+ for (i = 1; i < encoding_number; i++)
+ {
+ free (encodings_list[i].encoding_name);
+ if (encodings_list[i].iconv != (iconv_t) -1)
+ iconv_close (encodings_list[i].iconv);
+ }
+ /* in theory, it could also be 0, but the function is called right
+ after set_input_encoding ("utf-8"); */
+ encoding_number = 1;
+ current_encoding = 0;
+ free (input_encoding_name);
+ input_encoding_name = 0;
+}
+
int
top_file_index (void)
{
diff --git a/tp/Texinfo/XS/parsetexi/input.h b/tp/Texinfo/XS/parsetexi/input.h
index 1749660e3a..b2168ba4e8 100644
--- a/tp/Texinfo/XS/parsetexi/input.h
+++ b/tp/Texinfo/XS/parsetexi/input.h
@@ -15,13 +15,14 @@ int input_push_file (char *filename);
void input_pushback (char *line);
void set_input_source_mark (SOURCE_MARK *source_mark);
void input_reset_input_stack (void);
+void reset_encoding_list (void);
int expanding_macro (char *macro);
int top_file_index (void);
char *locate_include_file (char *filename);
char *encode_file_name (char *filename);
char *convert_to_utf8 (char *s);
-void set_input_encoding (char *encoding);
+int set_input_encoding (char *encoding);
void add_include_directory (char *filename);
void clear_include_directories (void);
diff --git a/tp/t/08misc_commands.t b/tp/t/08misc_commands.t
index 19ea6e193d..d1eb5ba221 100644
--- a/tp/t/08misc_commands.t
+++ b/tp/t/08misc_commands.t
@@ -54,6 +54,9 @@ Test text after finalout
@finalout a word after finalout
Line after finalout
'],
+['documentencoding_zero',
+'@documentencoding 0
+'],
['also_not_line',
'
diff --git a/tp/t/info_tests.t b/tp/t/info_tests.t
index 68db18421d..3ab958c408 100644
--- a/tp/t/info_tests.t
+++ b/tp/t/info_tests.t
@@ -278,7 +278,7 @@ ref to anchor1@footnote{another footnote}, which is before
@@node Top: @ref{anch
@image{text_only_image,,,alt}
'],
-['image_quotes',
+['image_quotes',
'@node Top
@image{f--ile,,,alt""\\}
@@ -1082,6 +1082,9 @@ text @* f nl Something? @* After punct
* what @* is: ankh p.
@end menu
'],
+['chinese_mixed_with_en_EUC_CN',
+undef, {'test_file' => 'chinese_mixed_with_en_EUC_CN.texi'}
+],
);
my $colons_in_index_entries_and_node =
@@ -1114,7 +1117,7 @@ node one
';
-push @file_tests,
+push @file_tests,
['colons_in_index_entries_and_node',
$colons_in_index_entries_and_node,
undef, {'INFO_SPECIAL_CHARS_QUOTE' => 1,
diff --git a/tp/t/input_files/chinese_mixed_with_en.texi
b/tp/t/input_files/chinese_mixed_with_en.texi
index a4796841b7..057d7f7170 100644
--- a/tp/t/input_files/chinese_mixed_with_en.texi
+++ b/tp/t/input_files/chinese_mixed_with_en.texi
@@ -3,7 +3,7 @@
@settitle chinese mixed with english
@node Top
-@top Mixed in UTF-8
+@top Mixed chinese and english
Example of english and chinese, chinese aligned or not.
diff --git a/tp/t/input_files/chinese_mixed_with_en.texi
b/tp/t/input_files/chinese_mixed_with_en_EUC_CN.texi
similarity index 50%
copy from tp/t/input_files/chinese_mixed_with_en.texi
copy to tp/t/input_files/chinese_mixed_with_en_EUC_CN.texi
index a4796841b7..76bddbbf78 100644
--- a/tp/t/input_files/chinese_mixed_with_en.texi
+++ b/tp/t/input_files/chinese_mixed_with_en_EUC_CN.texi
@@ -1,9 +1,9 @@
\input texinfo
-@documentencoding utf-8
+@documentencoding EUC-CN
@settitle chinese mixed with english
@node Top
-@top Mixed in UTF-8
+@top Mixed chinese and english
Example of english and chinese, chinese aligned or not.
@@ -20,20 +20,20 @@ standard Emacs features when programming in Ada.
2. chinese already aligned in source(this result)
-这常用于修饰多个线程会访问或修改的全局变量,让编译器保证每次都从内存读取
-变量的值,而不是作某些优化。(这些优化有可能导致程序不能获得正确的值)
+�ⳣ�������ζ���̻߳���ʻ��ĵ�ȫ�ֱ������ñ�������֤ÿ�ζ����ڴ��ȡ
+������ֵ����������ijЩ�Ż�������Щ�Ż��п��ܵ��³����ܻ����ȷ��ֵ��
3. chinese not aligned in source
-这常用于修饰多个线程会访问或修改的全局变量,让编译器保证每次都从内存
-读取
-变量的值,而不是作某些优化。
-(这些优化有可能导致程序不能获得正确的值)
+�ⳣ�������ζ���̻߳���ʻ��ĵ�ȫ�ֱ������ñ�������֤ÿ�ζ����ڴ�
+��ȡ
+������ֵ����������ijЩ�Ż���
+����Щ�Ż��п��ܵ��³����ܻ����ȷ��ֵ��
4. a mix of chinese and english
-restrict 表示在当前 scope 内不允许其它变量指向它。用处,比如防止 memory
-overlap。
+restrict ��ʾ�ڵ�ǰ scope �ڲ�������������ָ�������ô��������ֹ memory
+overlap��
@bye
diff --git a/tp/t/input_files/sample_EUC_CN.texi
b/tp/t/input_files/sample_EUC_CN.texi
new file mode 100644
index 0000000000..9684c08459
--- /dev/null
+++ b/tp/t/input_files/sample_EUC_CN.texi
@@ -0,0 +1,4 @@
+\input texinfo @c -*-texinfo-*-
+@c %**start of header
+@setfilename sample_utf8.info
+@settitle Sample ʾ��
\ No newline at end of file
diff --git a/tp/t/results/include/macro_and_commands_in_early_commands.pl
b/tp/t/results/include/macro_and_commands_in_early_commands.pl
index f5d628c74f..c061f4d435 100644
--- a/tp/t/results/include/macro_and_commands_in_early_commands.pl
+++ b/tp/t/results/include/macro_and_commands_in_early_commands.pl
@@ -231,6 +231,7 @@ $result_trees{'macro_and_commands_in_early_commands'} = {
],
'cmdname' => 'documentencoding',
'extra' => {
+ 'input_encoding_name' => 'iso-8859-1',
'text_arg' => 'ISO-8859-1@'
},
'info' => {
@@ -688,7 +689,7 @@ $result_trees{'macro_and_commands_in_early_commands'} = {
],
'cmdname' => 'verbatiminclude',
'extra' => {
- 'input_encoding_name' => 'utf-8',
+ 'input_encoding_name' => 'iso-8859-1',
'text_arg' => 'inc_@f--ile.texi'
},
'info' => {
@@ -835,15 +836,6 @@ $result_errors{'macro_and_commands_in_early_commands'} = [
'macro' => '',
'text' => 'encoding `ISO-8859-1@\' is not a canonical texinfo encoding',
'type' => 'warning'
- },
- {
- 'error_line' => 'warning: unrecognized encoding name `ISO-8859-1@\'
-',
- 'file_name' => '',
- 'line_nr' => 11,
- 'macro' => '',
- 'text' => 'unrecognized encoding name `ISO-8859-1@\'',
- 'type' => 'warning'
}
];
diff --git a/tp/t/results/plaintext_tests/chinese_mixed_with_en.pl
b/tp/t/results/info_tests/chinese_mixed_with_en_EUC_CN.pl
similarity index 83%
copy from tp/t/results/plaintext_tests/chinese_mixed_with_en.pl
copy to tp/t/results/info_tests/chinese_mixed_with_en_EUC_CN.pl
index 73787826c0..7d996935a7 100644
--- a/tp/t/results/plaintext_tests/chinese_mixed_with_en.pl
+++ b/tp/t/results/info_tests/chinese_mixed_with_en_EUC_CN.pl
@@ -5,7 +5,7 @@ use vars qw(%result_texis %result_texts %result_trees
%result_errors
use utf8;
-$result_trees{'chinese_mixed_with_en'} = {
+$result_trees{'chinese_mixed_with_en_EUC_CN'} = {
'contents' => [
{
'contents' => [
@@ -26,7 +26,7 @@ $result_trees{'chinese_mixed_with_en'} = {
{
'contents' => [
{
- 'text' => 'utf-8'
+ 'text' => 'EUC-CN'
}
],
'info' => {
@@ -40,8 +40,8 @@ $result_trees{'chinese_mixed_with_en'} = {
],
'cmdname' => 'documentencoding',
'extra' => {
- 'input_encoding_name' => 'utf-8',
- 'text_arg' => 'utf-8'
+ 'input_encoding_name' => 'euc-cn',
+ 'text_arg' => 'EUC-CN'
},
'info' => {
'spaces_before_argument' => {
@@ -49,7 +49,7 @@ $result_trees{'chinese_mixed_with_en'} = {
}
},
'source_info' => {
- 'file_name' => 'chinese_mixed_with_en.texi',
+ 'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
'line_nr' => 2,
'macro' => ''
}
@@ -78,7 +78,7 @@ $result_trees{'chinese_mixed_with_en'} = {
}
},
'source_info' => {
- 'file_name' => 'chinese_mixed_with_en.texi',
+ 'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
'line_nr' => 3,
'macro' => ''
}
@@ -121,7 +121,7 @@ $result_trees{'chinese_mixed_with_en'} = {
}
},
'source_info' => {
- 'file_name' => 'chinese_mixed_with_en.texi',
+ 'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
'line_nr' => 5,
'macro' => ''
}
@@ -131,7 +131,7 @@ $result_trees{'chinese_mixed_with_en'} = {
{
'contents' => [
{
- 'text' => 'Mixed in UTF-8'
+ 'text' => 'Mixed chinese and english'
}
],
'info' => {
@@ -172,7 +172,7 @@ $result_trees{'chinese_mixed_with_en'} = {
}
},
'source_info' => {
- 'file_name' => 'chinese_mixed_with_en.texi',
+ 'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
'line_nr' => 6,
'macro' => ''
}
@@ -204,7 +204,7 @@ $result_trees{'chinese_mixed_with_en'} = {
}
},
'source_info' => {
- 'file_name' => 'chinese_mixed_with_en.texi',
+ 'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
'line_nr' => 10,
'macro' => ''
}
@@ -394,7 +394,7 @@ $result_trees{'chinese_mixed_with_en'} = {
}
},
'source_info' => {
- 'file_name' => 'chinese_mixed_with_en.texi',
+ 'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
'line_nr' => 11,
'macro' => ''
}
@@ -413,12 +413,12 @@ $result_trees{'chinese_mixed_with_en'} = {
'type' => 'document_root'
};
-$result_texis{'chinese_mixed_with_en'} = '\\input texinfo
-@documentencoding utf-8
+$result_texis{'chinese_mixed_with_en_EUC_CN'} = '\\input texinfo
+@documentencoding EUC-CN
@settitle chinese mixed with english
@node Top
-@top Mixed in UTF-8
+@top Mixed chinese and english
Example of english and chinese, chinese aligned or not.
@@ -455,9 +455,9 @@ overlap。
';
-$result_texts{'chinese_mixed_with_en'} = '
-Mixed in UTF-8
-**************
+$result_texts{'chinese_mixed_with_en_EUC_CN'} = '
+Mixed chinese and english
+*************************
Example of english and chinese, chinese aligned or not.
@@ -492,7 +492,7 @@ overlap。
';
-$result_sectioning{'chinese_mixed_with_en'} = {
+$result_sectioning{'chinese_mixed_with_en_EUC_CN'} = {
'structure' => {
'section_childs' => [
{
@@ -536,12 +536,12 @@ $result_sectioning{'chinese_mixed_with_en'} = {
'section_level' => -1
}
};
-$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0]{'structure'}{'section_childs'}[0]{'structure'}{'section_up'}
=
$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0];
-$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0]{'structure'}{'section_childs'}[0]{'structure'}{'toplevel_prev'}
=
$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0];
-$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0]{'structure'}{'section_childs'}[0]{'structure'}{'toplevel_up'}
=
$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0];
-$result_sectioning{'chinese_mixed_with_en'}{'structure'}{'section_childs'}[0]{'structure'}{'section_up'}
= $result_sectioning{'chinese_mixed_with_en'};
+$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0]{'structure'}{'section_childs'}[0]{'structure'}{'section_up'}
=
$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0];
+$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0]{'structure'}{'section_childs'}[0]{'structure'}{'toplevel_prev'}
=
$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0];
+$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0]{'structure'}{'section_childs'}[0]{'structure'}{'toplevel_up'}
=
$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0];
+$result_sectioning{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'section_childs'}[0]{'structure'}{'section_up'}
= $result_sectioning{'chinese_mixed_with_en_EUC_CN'};
-$result_nodes{'chinese_mixed_with_en'} = {
+$result_nodes{'chinese_mixed_with_en_EUC_CN'} = {
'cmdname' => 'node',
'extra' => {
'associated_section' => {
@@ -571,10 +571,10 @@ $result_nodes{'chinese_mixed_with_en'} = {
}
}
};
-$result_nodes{'chinese_mixed_with_en'}{'structure'}{'node_next'}{'structure'}{'node_prev'}
= $result_nodes{'chinese_mixed_with_en'};
-$result_nodes{'chinese_mixed_with_en'}{'structure'}{'node_next'}{'structure'}{'node_up'}
= $result_nodes{'chinese_mixed_with_en'};
+$result_nodes{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'node_next'}{'structure'}{'node_prev'}
= $result_nodes{'chinese_mixed_with_en_EUC_CN'};
+$result_nodes{'chinese_mixed_with_en_EUC_CN'}{'structure'}{'node_next'}{'structure'}{'node_up'}
= $result_nodes{'chinese_mixed_with_en_EUC_CN'};
-$result_menus{'chinese_mixed_with_en'} = {
+$result_menus{'chinese_mixed_with_en_EUC_CN'} = {
'cmdname' => 'node',
'extra' => {
'normalized' => 'Top'
@@ -582,10 +582,20 @@ $result_menus{'chinese_mixed_with_en'} = {
'structure' => {}
};
-$result_errors{'chinese_mixed_with_en'} = [];
+$result_errors{'chinese_mixed_with_en_EUC_CN'} = [
+ {
+ 'error_line' => 'warning: encoding `EUC-CN\' is not a canonical texinfo
encoding
+',
+ 'file_name' => 'chinese_mixed_with_en_EUC_CN.texi',
+ 'line_nr' => 2,
+ 'macro' => '',
+ 'text' => 'encoding `EUC-CN\' is not a canonical texinfo encoding',
+ 'type' => 'warning'
+ }
+];
-$result_floats{'chinese_mixed_with_en'} = {};
+$result_floats{'chinese_mixed_with_en_EUC_CN'} = {};
1;
diff --git
a/tp/t/results/info_tests/chinese_mixed_with_en_EUC_CN/res_info/chinese_mixed_with_en_EUC_CN.info
b/tp/t/results/info_tests/chinese_mixed_with_en_EUC_CN/res_info/chinese_mixed_with_en_EUC_CN.info
new file mode 100644
index 0000000000..17380710d6
--- /dev/null
+++
b/tp/t/results/info_tests/chinese_mixed_with_en_EUC_CN/res_info/chinese_mixed_with_en_EUC_CN.info
@@ -0,0 +1,57 @@
+This is chinese_mixed_with_en_EUC_CN.info, produced from
+chinese_mixed_with_en_EUC_CN.texi.
+
+
+File: chinese_mixed_with_en_EUC_CN.info, Node: Top, Next: Mixed english and
chinese, Up: (dir)
+
+Mixed chinese and english
+*************************
+
+Example of english and chinese, chinese aligned or not.
+
+* Menu:
+
+* Mixed english and chinese::
+
+
+File: chinese_mixed_with_en_EUC_CN.info, Node: Mixed english and chinese,
Prev: Top, Up: Top
+
+1 Mixed english and chinese
+***************************
+
+1. english only
+
+ The Emacs mode for programming in Ada 95 with GNAT helps the user in
+understanding existing code and facilitates writing new code. It
+furthermore provides some utility functions for easier integration of
+standard Emacs features when programming in Ada.
+
+ 2. chinese already aligned in source(this result)
+
+ �ⳣ�������ζ���̻߳���ʻ��ĵ�ȫ�ֱ������ñ�������֤ÿ�ζ����ڴ�
+��ȡ ������ֵ����������ijЩ�Ż�������Щ�Ż��п��ܵ��³����ܻ����ȷ��
+ֵ��
+
+ 3. chinese not aligned in source
+
+ �ⳣ�������ζ���̻߳���ʻ��ĵ�ȫ�ֱ������ñ�������֤ÿ�ζ����ڴ�
+��ȡ ������ֵ����������ijЩ�Ż��� ����Щ�Ż��п��ܵ��³����ܻ����ȷ��
+ֵ��
+
+ 4. a mix of chinese and english
+
+ restrict ��ʾ�ڵ�ǰ scope �ڲ�������������ָ�������ô��������ֹ
+memory overlap��
+
+
+
+Tag Table:
+Node: Top93
+Node: Mixed english and chinese344
+
+End Tag Table
+
+
+Local Variables:
+coding: euc-cn
+End:
diff --git a/tp/t/results/info_tests/unknown_encoding.pl
b/tp/t/results/info_tests/unknown_encoding.pl
index a38184bb56..130d916bf0 100644
--- a/tp/t/results/info_tests/unknown_encoding.pl
+++ b/tp/t/results/info_tests/unknown_encoding.pl
@@ -135,12 +135,12 @@ $result_errors{'unknown_encoding'} = [
'type' => 'warning'
},
{
- 'error_line' => 'warning: unrecognized encoding name `ggg\'
+ 'error_line' => 'warning: unhandled encoding name `ggg\'
',
'file_name' => '',
'line_nr' => 2,
'macro' => '',
- 'text' => 'unrecognized encoding name `ggg\'',
+ 'text' => 'unhandled encoding name `ggg\'',
'type' => 'warning'
}
];
diff --git a/tp/t/results/macro/macro_in_invalid_documentencoding.pl
b/tp/t/results/macro/macro_in_invalid_documentencoding.pl
index 6e83402f7e..9e2fa765fc 100644
--- a/tp/t/results/macro/macro_in_invalid_documentencoding.pl
+++ b/tp/t/results/macro/macro_in_invalid_documentencoding.pl
@@ -152,12 +152,12 @@ $result_errors{'macro_in_invalid_documentencoding'} = [
'type' => 'warning'
},
{
- 'error_line' => 'warning: unrecognized encoding name `badm\'
+ 'error_line' => 'warning: unhandled encoding name `badm\'
',
'file_name' => '',
'line_nr' => 4,
'macro' => '',
- 'text' => 'unrecognized encoding name `badm\'',
+ 'text' => 'unhandled encoding name `badm\'',
'type' => 'warning'
}
];
diff --git a/tp/t/results/misc_commands/documentencoding_zero.pl
b/tp/t/results/misc_commands/documentencoding_zero.pl
new file mode 100644
index 0000000000..5e8959f205
--- /dev/null
+++ b/tp/t/results/misc_commands/documentencoding_zero.pl
@@ -0,0 +1,82 @@
+use vars qw(%result_texis %result_texts %result_trees %result_errors
+ %result_indices %result_sectioning %result_nodes %result_menus
+ %result_floats %result_converted %result_converted_errors
+ %result_elements %result_directions_text %result_indices_sort_strings);
+
+use utf8;
+
+$result_trees{'documentencoding_zero'} = {
+ 'contents' => [
+ {
+ 'contents' => [
+ {
+ 'args' => [
+ {
+ 'contents' => [
+ {
+ 'text' => '0'
+ }
+ ],
+ 'info' => {
+ 'spaces_after_argument' => {
+ 'text' => '
+'
+ }
+ },
+ 'type' => 'line_arg'
+ }
+ ],
+ 'cmdname' => 'documentencoding',
+ 'extra' => {
+ 'text_arg' => '0'
+ },
+ 'info' => {
+ 'spaces_before_argument' => {
+ 'text' => ' '
+ }
+ },
+ 'source_info' => {
+ 'file_name' => '',
+ 'line_nr' => 1,
+ 'macro' => ''
+ }
+ }
+ ],
+ 'type' => 'before_node_section'
+ }
+ ],
+ 'type' => 'document_root'
+};
+
+$result_texis{'documentencoding_zero'} = '@documentencoding 0
+';
+
+
+$result_texts{'documentencoding_zero'} = '';
+
+$result_errors{'documentencoding_zero'} = [
+ {
+ 'error_line' => 'warning: encoding `0\' is not a canonical texinfo encoding
+',
+ 'file_name' => '',
+ 'line_nr' => 1,
+ 'macro' => '',
+ 'text' => 'encoding `0\' is not a canonical texinfo encoding',
+ 'type' => 'warning'
+ },
+ {
+ 'error_line' => 'warning: unhandled encoding name `0\'
+',
+ 'file_name' => '',
+ 'line_nr' => 1,
+ 'macro' => '',
+ 'text' => 'unhandled encoding name `0\'',
+ 'type' => 'warning'
+ }
+];
+
+
+$result_floats{'documentencoding_zero'} = {};
+
+
+1;
diff --git a/tp/t/results/misc_commands/invalid_documentencoding.pl
b/tp/t/results/misc_commands/invalid_documentencoding.pl
index 9d070df09a..cefb4823a4 100644
--- a/tp/t/results/misc_commands/invalid_documentencoding.pl
+++ b/tp/t/results/misc_commands/invalid_documentencoding.pl
@@ -456,12 +456,12 @@ $result_errors{'invalid_documentencoding'} = [
'type' => 'warning'
},
{
- 'error_line' => 'warning: unrecognized encoding name `YS-ASCII\'
+ 'error_line' => 'warning: unhandled encoding name `YS-ASCII\'
',
'file_name' => '',
'line_nr' => 5,
'macro' => '',
- 'text' => 'unrecognized encoding name `YS-ASCII\'',
+ 'text' => 'unhandled encoding name `YS-ASCII\'',
'type' => 'warning'
},
{
@@ -483,12 +483,12 @@ $result_errors{'invalid_documentencoding'} = [
'type' => 'warning'
},
{
- 'error_line' => 'warning: unrecognized encoding name `bad encoding name\'
+ 'error_line' => 'warning: unhandled encoding name `bad encoding name\'
',
'file_name' => '',
'line_nr' => 6,
'macro' => '',
- 'text' => 'unrecognized encoding name `bad encoding name\'',
+ 'text' => 'unhandled encoding name `bad encoding name\'',
'type' => 'warning'
},
{
@@ -501,48 +501,30 @@ $result_errors{'invalid_documentencoding'} = [
'type' => 'warning'
},
{
- 'error_line' => 'warning: unrecognized encoding name `1\'
+ 'error_line' => 'warning: unhandled encoding name `1\'
',
'file_name' => '',
'line_nr' => 7,
'macro' => '',
- 'text' => 'unrecognized encoding name `1\'',
+ 'text' => 'unhandled encoding name `1\'',
'type' => 'warning'
},
{
- 'error_line' => 'warning: encoding `%\' is not a canonical texinfo encoding
+ 'error_line' => 'warning: bad encoding name `%\'
',
'file_name' => '',
'line_nr' => 8,
'macro' => '',
- 'text' => 'encoding `%\' is not a canonical texinfo encoding',
+ 'text' => 'bad encoding name `%\'',
'type' => 'warning'
},
{
- 'error_line' => 'warning: unrecognized encoding name `%\'
-',
- 'file_name' => '',
- 'line_nr' => 8,
- 'macro' => '',
- 'text' => 'unrecognized encoding name `%\'',
- 'type' => 'warning'
- },
- {
- 'error_line' => 'warning: encoding `@\' is not a canonical texinfo encoding
-',
- 'file_name' => '',
- 'line_nr' => 9,
- 'macro' => '',
- 'text' => 'encoding `@\' is not a canonical texinfo encoding',
- 'type' => 'warning'
- },
- {
- 'error_line' => 'warning: unrecognized encoding name `@\'
+ 'error_line' => 'warning: bad encoding name `@\'
',
'file_name' => '',
'line_nr' => 9,
'macro' => '',
- 'text' => 'unrecognized encoding name `@\'',
+ 'text' => 'bad encoding name `@\'',
'type' => 'warning'
},
{
diff --git a/tp/t/results/misc_commands/many_lines.pl
b/tp/t/results/misc_commands/many_lines.pl
index 21bada485a..7843e90132 100644
--- a/tp/t/results/misc_commands/many_lines.pl
+++ b/tp/t/results/misc_commands/many_lines.pl
@@ -1668,12 +1668,12 @@ $result_errors{'many_lines'} = [
'type' => 'warning'
},
{
- 'error_line' => 'warning: unrecognized encoding name `US-ascii encoding
name\'
+ 'error_line' => 'warning: unhandled encoding name `US-ascii encoding name\'
',
'file_name' => '',
'line_nr' => 30,
'macro' => '',
- 'text' => 'unrecognized encoding name `US-ascii encoding name\'',
+ 'text' => 'unhandled encoding name `US-ascii encoding name\'',
'type' => 'warning'
},
{
diff --git a/tp/t/results/plaintext_tests/chinese_mixed_with_en.pl
b/tp/t/results/plaintext_tests/chinese_mixed_with_en.pl
index 73787826c0..f2437623a6 100644
--- a/tp/t/results/plaintext_tests/chinese_mixed_with_en.pl
+++ b/tp/t/results/plaintext_tests/chinese_mixed_with_en.pl
@@ -131,7 +131,7 @@ $result_trees{'chinese_mixed_with_en'} = {
{
'contents' => [
{
- 'text' => 'Mixed in UTF-8'
+ 'text' => 'Mixed chinese and english'
}
],
'info' => {
@@ -418,7 +418,7 @@ $result_texis{'chinese_mixed_with_en'} = '\\input texinfo
@settitle chinese mixed with english
@node Top
-@top Mixed in UTF-8
+@top Mixed chinese and english
Example of english and chinese, chinese aligned or not.
@@ -456,8 +456,8 @@ overlap。
$result_texts{'chinese_mixed_with_en'} = '
-Mixed in UTF-8
-**************
+Mixed chinese and english
+*************************
Example of english and chinese, chinese aligned or not.
diff --git
a/tp/t/results/plaintext_tests/chinese_mixed_with_en/res_plaintext/chinese_mixed_with_en.txt
b/tp/t/results/plaintext_tests/chinese_mixed_with_en/res_plaintext/chinese_mixed_with_en.txt
index a8ef1ee6d3..8ea380acce 100644
---
a/tp/t/results/plaintext_tests/chinese_mixed_with_en/res_plaintext/chinese_mixed_with_en.txt
+++
b/tp/t/results/plaintext_tests/chinese_mixed_with_en/res_plaintext/chinese_mixed_with_en.txt
@@ -1,5 +1,5 @@
-Mixed in UTF-8
-**************
+Mixed chinese and english
+*************************
Example of english and chinese, chinese aligned or not.
diff --git a/tp/t/results/value/value_in_invalid_documentencoding.pl
b/tp/t/results/value/value_in_invalid_documentencoding.pl
index b60ac27003..b66d64bf48 100644
--- a/tp/t/results/value/value_in_invalid_documentencoding.pl
+++ b/tp/t/results/value/value_in_invalid_documentencoding.pl
@@ -108,12 +108,12 @@ $result_errors{'value_in_invalid_documentencoding'} = [
'type' => 'warning'
},
{
- 'error_line' => 'warning: unrecognized encoding name `bad\'
+ 'error_line' => 'warning: unhandled encoding name `bad\'
',
'file_name' => '',
'line_nr' => 2,
'macro' => '',
- 'text' => 'unrecognized encoding name `bad\'',
+ 'text' => 'unhandled encoding name `bad\'',
'type' => 'warning'
}
];