bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] ability to tag generated files with the right ccs


From: Mike Fulton
Subject: Re: [bug-gnu-libiconv] ability to tag generated files with the right ccsid on z/OS
Date: Mon, 3 Apr 2023 14:41:42 -0700



On Mon, Apr 3, 2023 at 2:28 PM Bruno Haible <bruno@clisp.org> wrote:
Mike Fulton wrote:
> One of the things I would like to do on z/OS is be able to exploit our file
> tagging
> capabilities in the file system for iconv.
>
> For most cases, C programs can use 'auto-conversion' to convert files from
> various EBCDIC SBCS code pages to ISO8859-1, but this only works if the
> file is tagged with a CCSID, otherwise the file is treated as binary.

You are saying that many files on that system have an out-of-band indication
of the charset (like the xattrs on Linux or the data fork on macOS
<https://en.wikipedia.org/wiki/Resource_fork>)?
Right. we have chtag as a command as well as corresponding library functions
for setting and querying the CCSID of a stream. 
In this example, I have 2 files - one ASCII and one EBCDIC:

FULTONM@ZOSCAN2B bash /tmp/tagged> ls -T
t ISO8859-1   T=on  FileA.txt
t IBM-1047    T=on  FileB.txt
FULTONM@ZOSCAN2B bash /tmp/tagged> cat FileA.txt FileB.txt
This is File A
This is File B 

The system has an environment variable you can set: _BPXAUTOCVT=ON
and it will do 'autoconversion' for you. There are a variety of environment variables
that I describe briefly in my blog: 
https://makingdeveloperslivesbetter.wordpress.com/2022/01/07/is-z-os-ascii-or-ebcdic-yes/


What happens when a user does
  $ cat file1 file2 > file3
and file1 and file2 have different encodings specified? Does 'cat' do
the conversion it its source code, or is the open() / fopen() call
triggering the conversion?
Yes - the underlying C open/write code in cat is aware of the environment variables.
Not all C code is. One of the reasons we are porting the various low level tools is 
to improve this experience across the board for z/OS users so that it 'just works'.
 

And has 'cat' been modified to add a charset indicator on file3
upon close() / fclose()?
Yes.  

> I created a first 'proof of concept' patch for just IBM-1047 that works
> fine, but only for 1047:
> https://github.com/ZOSOpenTools/libiconvport/blob/main/tarball-patches/iconv.c.patch
> It would need to be fleshed out to properly support the other CCSIDs.
> I expect someone on z/OS has already done the mapping of iconv 'to' pages to
> integral CCSIDs, but if not, I could provide that.
>
> Is a z/OS specific enhancement something that would be considered for
> libiconv?

Yes, that could be considered.

The patch you showed looks reasonable.

For upstreaming, there are three important guidelines:
  - Do assign the copyright to the FSF as soon as its is of legally
    relevant size:
    https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html
  - Use the same coding style as the surrounding package.
  - Test the changes before you submit them.

If there is a lot of code for a specific platform to be integrated, I
_might_ request that it be separated out into a .h file.
It's not very much although depending on how I do the fix 'right' for the
encoding mapping, perhaps that might belong in a separate file, but
that's your call. 

Also, I might request adding a unit test, since I don't want to write
a unit test for your code if, two years from now, someone reports a bug.
Will do. Is there a particular doc I should read that describes the process for
a unit test or should I just read the test harness code?

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]