bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] feature request: iconv/recode dynamic extension


From: Eli Zaretskii
Subject: Re: [bug-gawk] feature request: iconv/recode dynamic extension
Date: Sat, 22 Dec 2018 12:08:07 +0200

> From: Wolfgang Laun <address@hidden>
> Date: Sat, 22 Dec 2018 09:17:07 +0100
> 
> The most general case of transliteration is handled by defining individual
> characters.

As I tried to explain, it isn't transliteration that is being sought
here, it's removal of combining accents and diacritics.

> You can add such a transliteration function ("dedia(str)") to
> any awk program ("foo.awk") using a simple generator like genf.awk:
>      gawk -- "`gawk -f genf.awk <<<"üöóäěščřžýáíéúů uooaescrzyaieuu
> foo.awk"`

Yes, of course.  But coming up with the list of such translations on
one's own is a huge job, and the Unicode database already has all that
figured out.  So my suggestion would be to import their tables, rather
than create them from scratch manually.

Of course, for one-off jobs that need to handle only a small set of
accented characters, what you suggest is sufficient.  My
interpretation of the question was that a solution for a more general
problem was sought.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]