bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] difference in iconv for EBCDIC SBCS conversion fr


From: Mike Fulton
Subject: Re: [bug-gnu-libiconv] difference in iconv for EBCDIC SBCS conversion from z/OS OS-provided iconv
Date: Sun, 2 Apr 2023 12:06:33 -0700



On Sun, Apr 2, 2023 at 3:11 AM Bruno Haible <bruno@clisp.org> wrote:
[Re-adding the mailing list in CC. Please keep the mailing list in CC.]
Will do 

Mike Fulton wrote:
> > https://en.wikipedia.org/wiki/EBCDIC#Definitions_of_non-ASCII_EBCDIC_controls
> > is not conclusive:
> >   "Line break. Default mapping (0085) matches ISO/IEC 6429's NEL.
> >    Mappings sometimes swapped with Line Feed (EBCDIC 0x25) in accordance
> >    with UNIX line breaking convention."
> >
> I agree it's confusing.
> Here's a reasonably good IBM doc on what the EBCDIC code pages want for NL:
> https://www.ibm.com/docs/en/zos/2.1.0?topic=server-different-end-line-characters-in-text-files
>
> We definitely need EBCDIC 0x15 to map to ASCII 0x0A. Here is the 2 line
> file:
> Hello
> World
>
> dumped out in hex:
>
> FULTONM@ZOSCAN2B bash ~> hexdump hw.ibm1047.txt
> 000000 c8859393 9615e696 99938415
> 00000c
>
> FULTONM@ZOSCAN2B bash ~> hexdump hw.iso8859-1.txt
> 000000 48656c6c 6f0a576f 726c640a
> 00000c

If it was so simple, that EBCDIC 0x15 always needs to map to U+000A, then

  * Why does this documentation — also from z/OS 2.1.0 — say that EBCDIC 0x25
    maps to U+000A ?
    https://www.ibm.com/docs/en/zos/2.1.0?topic=acif-fileformat
I will reach out to the AFP team to understand where they are seeing 0x25 coming from.
Stream input on z/OS should typically be coming from the UNIX System Services environment
and I don't know why that wouldn't always be 0x15. Perhaps these are EBCDIC files from
another EBCDIC OS? I will reach out.  


  * Why does the glibc/localedata/charmaps/IBM1047, which has a
    "source: IBM Character Data Representation Architecture" annotation,
    map 0x25 to U+000A, since its initial revision in 1997?
I don't know of glibc being in use on z/OS - that is something we should be investigating, 
but at present, we use the underlying z/OS C services and where we have gaps we 
have been using an open source library called zoslib: https://github.com/ibmruntimes/zoslib
which is used by various Open Source languages like Python, Node.js, Golang. I'm not sure
if clang uses it. 

  * Why does Wikipedia say "sometimes swapped"?
    https://en.wikipedia.org/wiki/EBCDIC#Definitions_of_non-ASCII_EBCDIC_controls
I don't know - thank you for the link. This is a great reference.  
 
  * Why was the bug report that wanted glibc's IBM1047 mapping table changed
    closed as "NOT A BUG"?
    https://bugzilla.redhat.com/show_bug.cgi?id=170072
Based on Eric and Jakub's discussion, I would agree with Jakub that unfortunately we seem
to have 2 'standards' here which are incompatible and it would be good for the user community
if we supported both. 


  * Why does PCRE have two configure options --enable-ebcdic and
    --enable-ebcdic-nl25 ?
    https://opensource.apple.com/source/pcre/pcre-12/pcre/configure.ac.auto.html
I wonder how we determine who builds with --enable-ebcdic-nl25 ? 


  * Why did msbrown write "Note that "line feed" is 0x25 in EBCDIC/IBM-1047, but
    the C language '\n' is 0x15 (EBCDIC "new line")." ?
    https://www.austingroupbugs.net/view.php?id=251
This is the crux of the situation. A huge number of tools are either written in C/C++ or
the tools are built with other tools written in C/C++ and the '\n' in all the code is 0x15.
So choosing a different value for a file means that none of those tools work. In particular,
if you iconv a file and try to use 'less' it won't work because it won't 'see' the newlines. 


I'll keep doing what glibc does, in this respect.

But you can certainly, in the libiconv version that you build for z/OS, map
0x15 and 0x25 the other way around, and then get feedback from your users
about it. Just make sure, please, that you modify the "iconv --version"
output, so that when users report a bug, it's clear whether it's the
original libiconv or a modified one.
Would it be possible to have either a compile-time configuration option like PCRE
or an environment variable that we could have in the official code so that we don't
end up with 2 copies of the code? I really don't want z/OS to be 'different' and would
like things to 'just work' for customers. 

In this situation of contradicting statements, a better solution for you
is maybe to make the behaviour dependent on an environment variable. I'm
being told that some environment variables are needed on z/OS anyway, see
https://lists.gnu.org/archive/html/bug-gnulib/2019-11/msg00036.html .
Therefore another environment variable should be acceptable to your users.
Yes - we unfortunately have environment variables... They make things more 
complicated for people but I think you've clearly articulated that we have competing
'standards' here at play. I would think on z/OS for our UNIX System Services customers
we could 'compile in' this value (like PCRE) which would be my preference over an
environment variable.  

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]