bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] gawk 4.x series mmap attempts to alocates 32GB of memory


From: green fox
Subject: Re: [bug-gawk] gawk 4.x series mmap attempts to alocates 32GB of memory and fails when using printf("%c") supplied with large floating point value.
Date: Sat, 12 Jul 2014 15:41:32 +0900

On 7/11/14, Aharon Robbins <address@hidden> wrote:
>> Date: Fri, 11 Jul 2014 09:47:52 +0900
>> From: green fox <address@hidden>
> Do you have a real name?  Just wondering.

Yes, and my name is Gentaro (written in the closest sounding English
possible, but not quite right) if that matters.
See [http://pastebin.com/Ewv0BFij] for the issues _I_ PERSONALLY have.
But leaving that aside, lets get to business...

>> A) Routines to address the issue for handling utf-8 string when -b is at
>> effect.
>>
>> B) Provide length(),substr(),index(),print() with extended capability to
>>    handle raw single byte data. (even when one is on a utf-8 system)
>>
>> The reason asking this, is when one is reading from a ( disk / server )
>> that does not match the locale character set, the current gawk setup
>> fails really badly.
>
> I'm aware of this. I don't have a good solution to this very thorny
> problem.
>
> I would actually prefer that instead of a patch, you write a loadable
> extension using the API defined for that purpose in the 4.1 release.
> You could then contribute it to the gawkextlib project.
>

_If_ the necessary routines to help out with said matter of code page /
binary data handling support
_ABSOLUTELY_ALWAYS_ gets distributed _alongside_ gawk itself,
there would never be such a problem from the start.
Engrave it in gawk, and people will carry it.

There is also the problem that you can not check the existence of
any function form within the script. Just gotta hope and pray for mercy...

Looking at history, it seems absolutely impossible for such thing
(distribution of said support libs) to happen. People strip it out
for no reason. I could write utf-8 / code-page handling code all day
long, and the libs will never get distributed alongside gawk,
than it is totally pointless.

Anyone heard of ICONV, NKF cp detection, TCS, the sorting algorithm
issue with Thai names, and such?
(I am not saying gawk should include said list of libs. That is wrong.)

It also means that the bare minimum change required must be the least
invasive as possible (for it to get accepted and stay there for good).

So...

The _improvised_ approach was,

I) gawk gets ability to read/dump data in range 0x80-0xff
   I.A)Europe/America can accept that as _useful_feature_to_handle_binary_data
      and allow said patch for inclusion.
      (less than 2kB compiled size increase, 10~200 lines code size increase.
        well justifiable)

II) Rest of the world gets _standard_always_there_assured_guaranteed_
  API to handle binary i/o.
  AND Uses that to write locale code page <-> utf-8 conversion code
  _IN_AWK_SCRIPT_
    so while there is a performance penalty, at least
    II.A) Job gets done (Slow , but still. this is the most important bit)
    II.B) said AWK code only gets included in needed scripts, so
    American/Europe Admins can leave it out, they will have no reason
    to complain about _code_bloat_
    II.C) The rest of the world can use gawk function from script.
    II.D) runs, as long as gawk runs.
    II.E) better chance than trying to get
iconv_open(),iconv(),iconv_close() into gawk.
    II.F) For really messed up situations, its always good to have binary i/o so
           you can do the calculation yourself.

The reason for doing it said way, is as following. Please bare with
me. It is long
[http://pastebin.com/KeFtZP8i]
And I have written most of it off of my head, so there are other
workarounds that
people have used in the past that I missed / forgot about to include...

But to summarize it up, it just explains how to work when you are inside the tin
can, filled with worms.
I'm not saying to open the can, just asking enough air pipe so the man inside
the can can breath while locked and submerged in worms.

GreenFox



reply via email to

[Prev in Thread] Current Thread [Next in Thread]