bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] difference in iconv for EBCDIC SBCS conversion fr


From: Bruno Haible
Subject: Re: [bug-gnu-libiconv] difference in iconv for EBCDIC SBCS conversion from z/OS OS-provided iconv
Date: Sun, 02 Apr 2023 03:08:43 +0200

Hi,

Mike Fulton wrote:
> I have hit an issue where the conversion for EBCDIC SBCS conversion is not
> consistent between the two utilities, and am wondering if:
> - this has come up before

It has not been reported before.

> - there is interest in providing consistent behaviour

Only in case of blatant mistakes.

There are many variations of encoding tables. For the non-EBCDIC ones, I
created this archive:
  https://haible.de/bruno/charsets/conversion-tables/
These many variations pose problems mostly for East Asian charsets.

As implementor of GNU libiconv, I am careful to choose mappings that
are
  1) as close to standards, de-facto standards, and glibc iconv mapping
     tables as possible,
  2) not going to cause big practical trouble.

Some differences are practically unimportant, for example this one:

$ ./table-diff ebcdic/glibc-iconv/IBM-273.TXT ebcdic/mine/IBM-273.TXT
***************
*** 188,190 ****
  0xBB  0x007C  #       VERTICAL LINE
! 0xBC  0x203E  #       OVERLINE
  0xBD  0x00A8  #       DIAERESIS
--- 188,190 ----
  0xBB  0x007C  #       VERTICAL LINE
! 0xBC  0x00AF  #       MACRON
  0xBD  0x00A8  #       DIAERESIS

This is unimportant, because OVERLINE and MACRON are interchangeable for
most users; you need to be a Unicode expert in order to understand the
difference.

> I have comparisons for the various code pages, but if we look at the most
> common code page conversion, it is likely IBM-1047 to/from ISO8859-1.
> Here is what I see:

> The numbers are the output from 'cmp -l' so the output is read as:
>   <byte-number> <byte value file 1> <byte value file 2>

Your first column appears to be decimal, the second and third column octal.
I prefer to use hexadecimal throughout.

> Convert from IBM-1047 to ISO8859-1: compare open source iconv to IBM z/OS
> iconv
>  21 205  12
>  37  12 205

Does EBCDIC 0x15 map to U+0085 or to U+000A ? The table entry for NL in
https://en.wikipedia.org/wiki/EBCDIC#Definitions_of_non-ASCII_EBCDIC_controls
is not conclusive:
  "Line break. Default mapping (0085) matches ISO/IEC 6429's NEL.
   Mappings sometimes swapped with Line Feed (EBCDIC 0x25) in accordance
   with UNIX line breaking convention."

I made a guess as to which choice will create the least interopability
problem. The same guess/choice as glibc does, by the way.

> Convert from ISO8859-1 to IBM-1047: compare open source iconv to IBM z/OS
> iconv
> 133  25  45

That's the mapping of U+0085: 0x15 or 0x25? It's just the inverse facet
of the difference discussed above.

> z/OS is not the only platform that has EBCDIC files. IBM i also uses EBCDIC
> as does z/VM, z/VSE.

Good to know. But as I said above, I'm interested in the differences between
them only if there is a significant practical relevance.

Bruno






reply via email to

[Prev in Thread] Current Thread [Next in Thread]