[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: script to convert separators for CSV processing
From: |
Ed Morton |
Subject: |
Re: script to convert separators for CSV processing |
Date: |
Sat, 11 Nov 2023 10:57:18 -0600 |
User-agent: |
Mozilla Thunderbird |
Looks like the script got crushed onto 1 line in transit so trying again:
----------
$ cat changeSeps.awk
BEGIN {
FS = OFS = "\""
if ( (old == "") || (new == "") ) {
printf "Error: old=\047%s\047 and/or new=\047%s\047 separator
string missing.\n", old, new >"/dev/stderr"
printf "Usage: awk -v old=\047;\047 -v new=\047,\047 -f
changeSeps.awk infile [> outfile]\n" >"/dev/stderr"
err = 1
exit
}
sanitized_old = old
sanitized_new = new
# Ensure all regexp and replacement chars get treated as literal
gsub(/[^^\\]/,"[&]",sanitized_old) # regexp: char other than ^ or
\ -> [char]
gsub(/\\/,"\\\\",sanitized_old) # regexp: \ -> \\
gsub(/\^/,"\\^",sanitized_old) # regexp: ^ -> \^
gsub(/[&]/,"\\\\&",sanitized_new) # replacement: & -> \\&
}
{
$0 = prev ors $0
prev = $0
ors = ORS
}
NF%2 {
for ( i=1; i<=NF; i+=2 ) {
cnt += gsub(sanitized_old,sanitized_new,$i)
}
print
prev = ors = ""
}
END {
if ( !err ) {
printf "Converted %d \047%s\047 field separators to
\047%s\047s.\n", cnt+0, old, new >"/dev/stderr"
}
exit err
}
---------
On 11/11/2023 10:54 AM, Ed Morton wrote:
The new `--csv` processing mode is great but since it doesn't handle
chars other than commas as the separator, I expect many people will
want to know how to convert their TSV, `;`-separated, `|`-separated,
etc. files to/from `,`-separated so they can use the new functionality
and so here's a suggestion of a script that you could include in the
documentation to convert string-separated input into CSV (or other
string-separated) output without reading all of the input into memory
at once for input files that otherwise follow CSV quoting/separator
rules, etc. so that multiple people don't have to try to figure it out:
-------
|$ cat changeSeps.awk BEGIN { FS = OFS = "\"" if ( (old == "") || (new
== "") ) { printf "Error: old=\047%s\047 and/or new=\047%s\047
separator string missing.\n", old, new ||>"/dev/stderr"||printf
"Usage: awk -v old=\047;\047 -v new=\047,\047 -f changeSeps.awk infile
[> outfile]\n" ||>"/dev/stderr"||err = 1 exit } sanitized_old = old
sanitized_new = new # Ensure all regexp and replacement chars get
treated as literal gsub(/[^^\\]/,"[&]",sanitized_old) # regexp: char
other than ^ or \ -> [char] gsub(/\\/,"\\\\",sanitized_old) # regexp:
\ -> \\ gsub(/\^/,"\\^",sanitized_old) # regexp: ^ -> \^
gsub(/[&]/,"\\\\&",sanitized_new) # replacement: & -> \\& } { $0 =
prev ors $0 prev = $0 ors = ORS } NF%2 { for ( i=1; i<=NF; i+=2 ) {
cnt += gsub(sanitized_old,sanitized_new,$i) } print prev = ors = "" }
END { if ( !err ) { printf "Converted %d \047%s\047 field separators
to \047%s\047s.\n", cnt+0, old, new >"/dev/stderr" } exit err }|
-------
You'd call it as:
-----
awk -v old='<old separator>' -v new='<new separator>' -f
changeSeps.awk file
-----
e.g. to convert TSV to CSV:
-----
$ printf '"foo\tbar"\tetc\n'
"foo bar" etc
$ printf '"foo\tbar"\tetc\n' | awk -v old='\t' -v new=',' -f
changeSeps.awk
"foo bar",etc
Converted 1 ' ' field separators to ','s.
-----
Regards,
Ed.