bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: odd behavior of length(), match() and field splitting with multi-byt


From: Eli Zaretskii
Subject: Re: odd behavior of length(), match() and field splitting with multi-byte characters
Date: Mon, 01 Jul 2024 15:20:50 +0300

> Date: Mon, 1 Jul 2024 05:56:02 -0500
> From: Ed Morton <mortoneccc@comcast.net>
> 
>          If we output 4 multi-byte characters as 10 bytes using:
> 
>              $ echo '61F09F948DF09F948E62' | xxd -r -p > file1
>              $
> 
>          and run the following gawk command on it we get the output shown:
> 
>              $ LC_ALL=en_US.utf8 gawk '{print(length($0))}' file1
>              6
>              $
> 
>          i.e. 6 instead of 4.

I cannot reproduce this with

  GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.1.0, GNU MP 6.2.1)
  Copyright (C) 1989, 1991-2023 Free Software Foundation.

running on

  Linux maintain0p.gnu.org 5.15.0-113-generic #123+11.0trisquel30 SMP Wed Jun 
26 05:33:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

I get 4, as expected.

So I presume this is specific to the Cygwin port of Gawk, and suggest
to take this up with the maintainers of that port.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]