[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: odd behavior of length(), match() and field splitting with multi-byt
From: |
Eli Zaretskii |
Subject: |
Re: odd behavior of length(), match() and field splitting with multi-byte characters |
Date: |
Mon, 01 Jul 2024 15:20:50 +0300 |
> Date: Mon, 1 Jul 2024 05:56:02 -0500
> From: Ed Morton <mortoneccc@comcast.net>
>
> If we output 4 multi-byte characters as 10 bytes using:
>
> $ echo '61F09F948DF09F948E62' | xxd -r -p > file1
> $
>
> and run the following gawk command on it we get the output shown:
>
> $ LC_ALL=en_US.utf8 gawk '{print(length($0))}' file1
> 6
> $
>
> i.e. 6 instead of 4.
I cannot reproduce this with
GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.1.0, GNU MP 6.2.1)
Copyright (C) 1989, 1991-2023 Free Software Foundation.
running on
Linux maintain0p.gnu.org 5.15.0-113-generic #123+11.0trisquel30 SMP Wed Jun
26 05:33:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
I get 4, as expected.
So I presume this is specific to the Cygwin port of Gawk, and suggest
to take this up with the maintainers of that port.
- odd behavior of length(), match() and field splitting with multi-byte characters, Ed Morton, 2024/07/01
- Re: odd behavior of length(), match() and field splitting with multi-byte characters, Ed Morton, 2024/07/01
- Re: odd behavior of length(), match() and field splitting with multi-byte characters, arnold, 2024/07/01
- Re: odd behavior of length(), match() and field splitting with multi-byte characters,
Eli Zaretskii <=
- Re: odd behavior of length(), match() and field splitting with multi-byte characters, Ed Morton, 2024/07/01
- Re: odd behavior of length(), match() and field splitting with multi-byte characters, Eli Zaretskii, 2024/07/01
- Re: odd behavior of length(), match() and field splitting with multi-byte characters, Ed Morton, 2024/07/06
- Re: odd behavior of length(), match() and field splitting with multi-byte characters, Eli Zaretskii, 2024/07/06
- Re: odd behavior of length(), match() and field splitting with multi-byte characters, Ed Morton, 2024/07/06