branch master updated: Simpler more consistent UTF-8 and unicode handlin

texinfo-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch master updated: Simpler more consistent UTF-8 and unicode handlin

From:	Patrice Dumas
Subject:	branch master updated: Simpler more consistent UTF-8 and unicode handling, stricter UTF-8 conversion
Date:	Wed, 26 Jul 2023 17:22:06 -0400
This is an automated email from the git hooks/post-receive script.

pertusus pushed a commit to branch master
in repository texinfo.

The following commit(s) were added to refs/heads/master by this push:
     new 6557161e7c Simpler more consistent UTF-8 and unicode handling, 
stricter UTF-8 conversion
6557161e7c is described below

commit 6557161e7c4ad6d1ad2e919ea022e3aab3f8ff8e
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Wed Jul 26 23:20:19 2023 +0200

    Simpler more consistent UTF-8 and unicode handling, stricter UTF-8 
conversion
    
    * tp/Texinfo/XS/parsetexi/end_line.c (end_line_misc_line): map utf8 to
    utf-8 for input_encoding, to get the same output as the perl parser
    with mime_name and also because it is better.
    
    * tp/Texinfo/Common.pm (%encoding_name_conversion_map): map utf8 to
    utf-8 to always use the same conversion in perl, and prefer the strict
    conversion.
    
    * tp/Texinfo/Common.pm (encode_file_name, count_bytes): do not use
    utf-8 specific conversion, always use Encode encode and also use the
    strict conversion.
    
    * tp/Texinfo/Convert/Unicode.pm (_format_eight_bit_accents_stack),
    tp/Texinfo/ParserNonXS.pm (_new_text_input, _next_text): use the
    utf-8 encoding not utf8 for Encode encode strict conversion.
    
    * tp/Texinfo/Convert/HTML.pm (converter_initialize),
    tp/Texinfo/Convert/Unicode.pm: use charnames::vianame to obtain
    characters based on a string representation of unicode codepoints, as
    it is simple and this is what is described in the documentation.
    
    * tp/Texinfo/Convert/Unicode.pm (unicode_point_decoded_in_encoding):
    handle hex strings in the ascii range for 8bit encodings.
    
    * tp/Makefile.tres, tp/t/08misc_commands.t (documentencoding_utf8):
    new test with documentencoding utf8.
---
 ChangeLog                                          |  31 ++++
 tp/Makefile.tres                                   |   1 +
 tp/Texinfo/Common.pm                               |  38 ++---
 tp/Texinfo/Convert/HTML.pm                         |   7 +-
 tp/Texinfo/Convert/Unicode.pm                      |  58 +++----
 tp/Texinfo/ParserNonXS.pm                          |   4 +-
 tp/Texinfo/XS/parsetexi/end_line.c                 |   1 +
 tp/t/08misc_commands.t                             |   8 +
 .../results/misc_commands/documentencoding_utf8.pl | 166 +++++++++++++++++++++
 9 files changed, 257 insertions(+), 57 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 3df4e615bc..cb5decfc66 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,34 @@
+2023-07-26  Patrice Dumas  <pertusus@free.fr>
+
+       Simpler more consistent UTF-8 and unicode handling, stricter UTF-8 
conversion
+
+       * tp/Texinfo/XS/parsetexi/end_line.c (end_line_misc_line): map utf8 to
+       utf-8 for input_encoding, to get the same output as the perl parser
+       with mime_name and also because it is better.
+
+       * tp/Texinfo/Common.pm (%encoding_name_conversion_map): map utf8 to
+       utf-8 to always use the same conversion in perl, and prefer the strict
+       conversion.
+
+       * tp/Texinfo/Common.pm (encode_file_name, count_bytes): do not use
+       utf-8 specific conversion, always use Encode encode and also use the
+       strict conversion.
+
+       * tp/Texinfo/Convert/Unicode.pm (_format_eight_bit_accents_stack),
+       tp/Texinfo/ParserNonXS.pm (_new_text_input, _next_text): use the
+       utf-8 encoding not utf8 for Encode encode strict conversion.
+
+       * tp/Texinfo/Convert/HTML.pm (converter_initialize),
+       tp/Texinfo/Convert/Unicode.pm: use charnames::vianame to obtain
+       characters based on a string representation of unicode codepoints, as
+       it is simple and this is what is described in the documentation.
+
+       * tp/Texinfo/Convert/Unicode.pm (unicode_point_decoded_in_encoding):
+       handle hex strings in the ascii range for 8bit encodings.
+
+       * tp/Makefile.tres, tp/t/08misc_commands.t (documentencoding_utf8):
+       new test with documentencoding utf8.
+
 2023-07-26  Gavin Smith <gavinsmith0123@gmail.com>
 
        * doc/texinfo.tex (\summarycontents): Set \extrasecnoskip to
diff --git a/tp/Makefile.tres b/tp/Makefile.tres
index 6efcfb59f4..d3ab682858 100644
--- a/tp/Makefile.tres
+++ b/tp/Makefile.tres
@@ -1437,6 +1437,7 @@ test_files_generated_list = 
$(test_tap_files_generated_list) \
   t/results/misc_commands/definfoenclose.pl \
   t/results/misc_commands/definfoenclose_nestings.pl \
   t/results/misc_commands/definfoenclose_with_empty_arg.pl \
+  t/results/misc_commands/documentencoding_utf8.pl \
   t/results/misc_commands/documentencoding_zero.pl \
   t/results/misc_commands/double_exdent.pl \
   t/results/misc_commands/empty_center.pl \
diff --git a/tp/Texinfo/Common.pm b/tp/Texinfo/Common.pm
index 80bec30501..6727566083 100644
--- a/tp/Texinfo/Common.pm
+++ b/tp/Texinfo/Common.pm
@@ -32,7 +32,7 @@ use 5.008001;
 # to determine the null file
 use Config;
 use File::Spec;
-# for find_encoding, resolve_alias and maybe utf8 related functions
+# for find_encoding, resolve_alias
 use Encode;
 
 # debugging
@@ -569,6 +569,16 @@ sub valid_tree_transformation ($)
 our %encoding_name_conversion_map;
 %encoding_name_conversion_map = (
   'us-ascii' => 'iso-8859-1',
+  # The mapping to utf-8 is important for perl code, as it means using a strict
+  # conversion to utf-8 and not a lax conversion:
+  # 
https://perldoc.perl.org/perlunifaq#What's-the-difference-between-UTF-8-and-utf8?
+  # In more detail, we want to use utf-8 only for two different reasons
+  # 1) if input is malformed it is better to error out it as soon as possible
+  # 2) we do not want to have different behaviour and hard to find bugs
+  #    depending on whether the user used @documentencoding utf-8
+  #    or @documentencoding utf8.  There is a warning with utf8, but
+  #    we want to be clear in any case.
+  'utf8' => 'utf-8',
 );
 
 
@@ -1318,12 +1328,11 @@ sub encode_file_name($$)
     if (not defined($input_encoding));
 
   if ($input_encoding eq 'utf-8' or $input_encoding eq 'utf-8-strict') {
-    utf8::encode($file_name);
     $encoding = 'utf-8';
   } else {
-    $file_name = Encode::encode($input_encoding, $file_name);
     $encoding = $input_encoding;
   }
+  $file_name = Encode::encode($encoding, $file_name);
   return ($file_name, $encoding);
 }
 
@@ -1752,23 +1761,7 @@ sub count_bytes($$;$)
     $encoding = $self->get_conf('OUTPUT_PERL_ENCODING');
   }
 
-  if ($encoding eq 'utf-8'
-      or $encoding eq 'utf-8-strict') {
-    if (Encode::is_utf8($string)) {
-      # Get the number of bytes in the underlying storage.  This may
-      # be slightly faster than calling Encode::encode_utf8.
-      use bytes;
-      return length($string);
-
-      # Here's another way of doing it.
-      #Encode::_utf8_off($string);
-      #my $length = length($string);
-      #Encode::_utf8_on($string);
-      #return $length
-    } else {
-      return length(Encode::encode_utf8($string));
-    }
-  } elsif ($encoding and $encoding ne 'ascii') {
+  if ($encoding and $encoding ne 'ascii') {
     if (!defined($last_encoding) or $last_encoding ne $encoding) {
       # Look up and save encoding object for next time.  This is
       # slightly faster than calling Encode::encode.
@@ -1781,11 +1774,6 @@ sub count_bytes($$;$)
     return length($Encode_encoding_object->encode($string));
   } else {
     return length($string);
-    #my $length = length($string);
-    #$string =~ s/\n/\\n/g;
-    #$string =~ s/\f/\\f/g;
-    #print STDERR "Count($length): $string\n";
-    #return $length;
   }
 }
 
diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
index 27d4bfd759..81e975fa37 100644
--- a/tp/Texinfo/Convert/HTML.pm
+++ b/tp/Texinfo/Convert/HTML.pm
@@ -31,7 +31,8 @@
 
 package Texinfo::Convert::HTML;
 
-use 5.00405;
+# charnames::vianame is not documented in 5.6.0.
+use 5.008;
 
 # See 'The "Unicode Bug"' under 'perlunicode' man page.  This means
 # that regular expressions will treat characters 128-255 in a Perl string
@@ -54,6 +55,7 @@ use File::Copy qw(copy);
 use Storable;
 
 use Encode qw(find_encoding decode encode);
+use charnames ();
 
 use Texinfo::Commands;
 use Texinfo::Common;
@@ -7765,7 +7767,8 @@ sub converter_initialize($)
     if ($self->get_conf('OUTPUT_CHARACTERS')
         and Texinfo::Convert::Unicode::unicode_point_decoded_in_encoding(
                                          $output_encoding, $unicode_point)) {
-      $special_characters_set{$special_character} = chr(hex($unicode_point));
+      $special_characters_set{$special_character}
+                                    = charnames::vianame("U+$unicode_point");
     } elsif ($self->get_conf('USE_NUMERIC_ENTITY')) {
       $special_characters_set{$special_character} = 
'&#'.hex($unicode_point).';';
     } else {
diff --git a/tp/Texinfo/Convert/Unicode.pm b/tp/Texinfo/Convert/Unicode.pm
index 2dfb7511ec..03d15f5433 100644
--- a/tp/Texinfo/Convert/Unicode.pm
+++ b/tp/Texinfo/Convert/Unicode.pm
@@ -19,10 +19,9 @@
 
 package Texinfo::Convert::Unicode;
 
-# Seems to be the Perl version required for Encode:
-# http://cpansearch.perl.org/src/DANKOGAI/Encode-2.47/Encode/README.e2x
-# 
http://coding.derkeiler.com/Archive/Perl/comp.lang.perl.misc/2005-12/msg00833.html
-use 5.007_003;
+# Documentation of earlier releases for perluniintro is missing.
+# charnames::vianame is not documented in 5.6.0.
+use 5.008;
 use strict;
 
 # To check if there is no erroneous autovivification
@@ -33,6 +32,9 @@ use Carp qw(cluck);
 use Encode;
 use Unicode::Normalize;
 use Unicode::EastAsianWidth;
+# To obtain unicode characters based on code points represented as
+# strings
+use charnames ();
 
 use Texinfo::MiscXS;
 
@@ -563,19 +565,12 @@ our %extra_unicode_map = (
 %unicode_map = (%unicode_map, %extra_unicode_map);
 
 # set the %unicode_character_brace_no_arg_commands value to the character
-# corresponding to the hex value in %unicode_map.
+# corresponding to the textual hex value in %unicode_map.
 our %unicode_character_brace_no_arg_commands;
 foreach my $command (keys(%unicode_map)) {
   if ($unicode_map{$command} ne '') {
-    my $char_nr = hex($unicode_map{$command});
-    if ($char_nr > 126 and $char_nr < 255) {
-      # this is very strange, indeed.  The reason lies certainly in the
-      # magic backward compatibility support in Perl for 8bit encodings.
-      $unicode_character_brace_no_arg_commands{$command} =
-         Encode::decode("iso-8859-1", chr($char_nr));
-    } else {
-      $unicode_character_brace_no_arg_commands{$command} = chr($char_nr);
-    }
+    $unicode_character_brace_no_arg_commands{$command}
+      = charnames::vianame("U+$unicode_map{$command}");
   }
 }
 
@@ -697,6 +692,12 @@ foreach my $command (keys(%unicode_accented_letters)) {
   }
 }
 
+# Note that the values are not actually used anywhere, they are there
+# to mark unicode codepoints that exist in the encoding.  It is important
+# to get them right, though, as the values are shown when debugging.
+# Also note that values below A0, which correspond to the ascii range
+# are not in the values and therefore should be handled differently by the
+# codes using the hash.
 my %unicode_to_eight_bit = (
    'iso-8859-1' => {
       '00A0' => 'A0',
@@ -1332,7 +1333,7 @@ sub unicode_text {
   return $text;
 }
 
-# return the 8 bit, if it exists, and the unicode codepoint
+# return the hexadecimal 8 bit string, if it exists, and the unicode codepoint
 sub _eight_bit_and_unicode_point($$)
 {
   my $char = shift;
@@ -1428,36 +1429,36 @@ sub _format_eight_bit_accents_stack($$$$$;$)
       my $command = 'TEXT';
       $command = $partial_result->[1]->{'cmdname'} if ($partial_result->[1]);
       if (defined($partial_result->[0])) {
-        print STDERR "   -> ".Encode::encode('utf8', $partial_result->[0])
+        print STDERR "   -> ".Encode::encode('utf-8', $partial_result->[0])
                             ."|$command\n";
       } else {
-        print STDERR "   -> NO UTF8 |$command\n";
+        print STDERR "   -> NO accented character |$command\n";
       }
     }
   }
 
-  # At this point we have the utf8 encoded results for the accent
+  # At this point we have the unicode character results for the accent
   # commands stack, with all the intermediate results.
   # For each one we'll check if it is possible to encode it in the
   # current eight bit output encoding table and, if so set the result
   # to the character.
 
-  my $eight_bit = '';
+  my $prev_eight_bit = '';
 
   while (@results_stack) {
     my $char = $results_stack[0]->[0];
     last if (!defined($char));
 
-    my ($new_eight_bit, $new_codepoint)
+    my ($new_eight_bit, $codepoint)
       = _eight_bit_and_unicode_point($char, $encoding);
     if ($debug) {
       my $command = 'TEXT';
       $command = $results_stack[0]->[1]->{'cmdname'}
         if ($results_stack[0]->[1]);
-      my $new_eight_bit_txt = 'UNDEF';
-      $new_eight_bit_txt = $new_eight_bit if (defined($new_eight_bit));
-      print STDERR "" . Encode::encode('utf8', $char)
-        . " ($command) new_codepoint: $new_codepoint 8bit: $new_eight_bit_txt 
old: $eight_bit\n";
+      print STDERR "" . Encode::encode('utf-8', $char) . " ($command) "
+        . "codepoint: $codepoint "
+        ."8bit: ". (defined($new_eight_bit) ? $new_eight_bit : 'UNDEF')
+        . " prev: $prev_eight_bit\n";
     }
 
     # no corresponding eight bit character found for a composed character
@@ -1472,7 +1473,7 @@ sub _format_eight_bit_accents_stack($$$$$;$)
     #    appending or prepending a character. For example this happens for
     #    @={@,{@~{n}}}, where @,{@~{n}} is expanded to a 2 character:
     #    n with a tilde, followed by a ,
-    #    In that case, the additional utf8 diacritic is appended, which
+    #    In that case, the additional diacritic is appended, which
     #    means that it is composed with the , and leaves n with a tilde
     #    untouched.
     # -> the diacritic is appended but the normal form doesn't lead
@@ -1480,11 +1481,11 @@ sub _format_eight_bit_accents_stack($$$$$;$)
     #    of the string is unchanged. This, for example, happens for
     #    @ubaraccent{a} since there is no composed accent with a and an
     #    underbar.
-    last if ($new_eight_bit eq $eight_bit
+    last if ($new_eight_bit eq $prev_eight_bit
              and !($results_stack[0]->[1]->{'cmdname'} eq 'dotless'
                    and $char eq 'i'));
     $result = $results_stack[0]->[0];
-    $eight_bit = $new_eight_bit;
+    $prev_eight_bit = $new_eight_bit;
     shift @results_stack;
   }
 
@@ -1545,7 +1546,8 @@ sub unicode_point_decoded_in_encoding($$) {
 
     return 1 if ($encoding eq 'utf-8'
                     or ($unicode_to_eight_bit{$encoding}
-                        and 
$unicode_to_eight_bit{$encoding}->{$unicode_point}));
+                        and ($unicode_to_eight_bit{$encoding}->{$unicode_point}
+                             or hex($unicode_point) < 128)));
   }
   return 0;
 }
diff --git a/tp/Texinfo/ParserNonXS.pm b/tp/Texinfo/ParserNonXS.pm
index cbab8ae95a..8eb5fd1b91 100644
--- a/tp/Texinfo/ParserNonXS.pm
+++ b/tp/Texinfo/ParserNonXS.pm
@@ -678,7 +678,7 @@ sub _new_text_input($$)
   my $texthandle = do { local *FH };
   # In-memory scalar strings are considered a stream of bytes, so need
   # to encode/decode.
-  $text = Encode::encode("utf8", $text);
+  $text = Encode::encode('utf-8', $text);
   # Could fail with error like
   # Strings with code points over 0xFF may not be mapped into in-memory file 
handles
   if (!open ($texthandle, '<', \$text)) {
@@ -2364,7 +2364,7 @@ sub _next_text($;$)
       my $next_line = <$texthandle>;
       if (defined($next_line)) {
         # need to decode to characters
-        $next_line = Encode::decode('utf8', $next_line);
+        $next_line = Encode::decode('utf-8', $next_line);
         $input->{'input_source_info'}->{'line_nr'} += 1
           unless ($input->{'input_source_info'}->{'macro'} ne ''
                   or defined($input->{'value_flag'}));
diff --git a/tp/Texinfo/XS/parsetexi/end_line.c 
b/tp/Texinfo/XS/parsetexi/end_line.c
index 26f5cc5fb7..bd44533fe7 100644
--- a/tp/Texinfo/XS/parsetexi/end_line.c
+++ b/tp/Texinfo/XS/parsetexi/end_line.c
@@ -1397,6 +1397,7 @@ end_line_misc_line (ELEMENT *current)
                    */
                     static struct encoding_map map[] = {
                           "utf-8", "utf-8",
+                          "utf8", "utf-8",
                           "ascii",  "us-ascii",
                           "shiftjis", "shift_jis",
                           "latin1", "iso-8859-1",
diff --git a/tp/t/08misc_commands.t b/tp/t/08misc_commands.t
index eb1a5e8d13..4bef76cfe4 100644
--- a/tp/t/08misc_commands.t
+++ b/tp/t/08misc_commands.t
@@ -228,6 +228,13 @@ my @converted_test_cases = (
 @setfilename @ @verb{: name :}@ 
 
 ', {'full_document' => 1}],
+# this tests seems somewhat pointless, but it is not, as in perl
+# utf8 may mean a lax handling of UTF-8.  We want to avoid using
+# that lax handling of UTF-8, better get errors early.
+['documentencoding_utf8',
+'@documentencoding utf8
+
+'],
 ['definfoenclose',
 '
 definfoenclose phoo,//,\\  @definfoenclose phoo,//,\\
@@ -597,6 +604,7 @@ in example
 my %info_tests = (
   'comment_space_command_on_line' => 1,
   'setfilename' => 1,
+  'documentencoding_utf8' => 1,
 );
 
 my %xml_tests = (
diff --git a/tp/t/results/misc_commands/documentencoding_utf8.pl 
b/tp/t/results/misc_commands/documentencoding_utf8.pl
new file mode 100644
index 0000000000..019f09de4a
--- /dev/null
+++ b/tp/t/results/misc_commands/documentencoding_utf8.pl
@@ -0,0 +1,166 @@
+use vars qw(%result_texis %result_texts %result_trees %result_errors 
+   %result_indices %result_sectioning %result_nodes %result_menus
+   %result_floats %result_converted %result_converted_errors 
+   %result_elements %result_directions_text %result_indices_sort_strings);
+
+use utf8;
+
+$result_trees{'documentencoding_utf8'} = {
+  'contents' => [
+    {
+      'contents' => [
+        {
+          'args' => [
+            {
+              'contents' => [
+                {
+                  'text' => 'utf8'
+                }
+              ],
+              'info' => {
+                'spaces_after_argument' => {
+                  'text' => '
+'
+                }
+              },
+              'type' => 'line_arg'
+            }
+          ],
+          'cmdname' => 'documentencoding',
+          'extra' => {
+            'input_encoding_name' => 'utf-8',
+            'text_arg' => 'utf8'
+          },
+          'info' => {
+            'spaces_before_argument' => {
+              'text' => ' '
+            }
+          },
+          'source_info' => {
+            'file_name' => '',
+            'line_nr' => 1,
+            'macro' => ''
+          }
+        },
+        {
+          'text' => '
+',
+          'type' => 'empty_line'
+        }
+      ],
+      'type' => 'before_node_section'
+    }
+  ],
+  'type' => 'document_root'
+};
+
+$result_texis{'documentencoding_utf8'} = '@documentencoding utf8
+
+';
+
+
+$result_texts{'documentencoding_utf8'} = '
+';
+
+$result_errors{'documentencoding_utf8'} = [
+  {
+    'error_line' => 'warning: encoding `utf8\' is not a canonical texinfo 
encoding
+',
+    'file_name' => '',
+    'line_nr' => 1,
+    'macro' => '',
+    'text' => 'encoding `utf8\' is not a canonical texinfo encoding',
+    'type' => 'warning'
+  }
+];
+
+
+$result_floats{'documentencoding_utf8'} = {};
+
+
+
+$result_converted{'plaintext'}->{'documentencoding_utf8'} = '';
+
+
+$result_converted{'html_text'}->{'documentencoding_utf8'} = '
+';
+
+
+$result_converted{'latex'}->{'documentencoding_utf8'} = '\\documentclass{book}
+\\usepackage{amsfonts}
+\\usepackage{amsmath}
+\\usepackage[gen]{eurosym}
+\\usepackage{textcomp}
+\\usepackage{graphicx}
+\\usepackage{etoolbox}
+\\usepackage{titleps}
+\\usepackage[utf8]{inputenc}
+\\usepackage[T1]{fontenc}
+\\usepackage{float}
+% use hidelinks to remove boxes around links to be similar to Texinfo TeX
+\\usepackage[hidelinks]{hyperref}
+
+\\makeatletter
+\\newcommand{\\Texinfosettitle}{No Title}%
+
+% redefine the \\mainmatter command such that it does not clear page
+% as if in double page
+\\renewcommand\\mainmatter{\\clearpage\\@mainmattertrue\\pagenumbering{arabic}}
+\\newenvironment{Texinfopreformatted}{%
+  
\\par\\GNUTobeylines\\obeyspaces\\frenchspacing\\parskip=\\z@\\parindent=\\z@}{}
+{\\catcode`\\^^M=13 \\gdef\\GNUTobeylines{\\catcode`\\^^M=13 
\\def^^M{\\null\\par}}}
+\\newenvironment{Texinfoindented}{\\begin{list}{}{}\\item\\relax}{\\end{list}}
+
+% used for substitutions in commands
+\\newcommand{\\Texinfoplaceholder}[1]{}
+
+\\newpagestyle{single}{\\sethead[\\chaptername{} \\thechapter{} 
\\chaptertitle{}][][\\thepage]
+                              {\\chaptername{} \\thechapter{} 
\\chaptertitle{}}{}{\\thepage}}
+
+% allow line breaking at underscore
+\\let\\Texinfounderscore\\_
+\\renewcommand{\\_}{\\Texinfounderscore\\discretionary{}{}{}}
+\\renewcommand{\\includegraphics}[1]{\\fbox{FIG \\detokenize{#1}}}
+
+\\makeatother
+% set default for @setchapternewpage
+\\makeatletter
+\\patchcmd{\\chapter}{\\if@openright\\cleardoublepage\\else\\clearpage\\fi}{\\Texinfoplaceholder{setchapternewpage
 placeholder}\\clearpage}{}{}
+\\makeatother
+\\pagestyle{single}%
+
+
+\\end{document}
+';
+
+
+$result_converted{'info'}->{'documentencoding_utf8'} = 'This is , produced 
from .
+
+
+
+Tag Table:
+
+End Tag Table
+
+
+Local Variables:
+coding: utf-8
+End:
+';
+
+$result_converted_errors{'info'}->{'documentencoding_utf8'} = [
+  {
+    'error_line' => 'warning: document without nodes
+',
+    'text' => 'document without nodes',
+    'type' => 'warning'
+  }
+];
+
+
+
+$result_converted{'xml'}->{'documentencoding_utf8'} = '<documentencoding 
encoding="utf8" spaces=" ">utf8</documentencoding>
+
+';
+
+1;
[Prev in Thread]
Current Thread
[Next in Thread]
branch master updated: Simpler more consistent UTF-8 and unicode handling, stricter UTF-8 conversion, Patrice Dumas <=
Prev by Date: branch master updated: * doc/texinfo.tex (\summarycontents): Set \extrasecnoskip to restore extra space in the summary contents.
Next by Date: branch master updated: TOC @subsection number alignment
Previous by thread: branch master updated: * doc/texinfo.tex (\summarycontents): Set \extrasecnoskip to restore extra space in the summary contents.
Next by thread: branch master updated: TOC @subsection number alignment
Index(es):
- Date
- Thread