bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

MacOS X: Redirection performance problem


From: mailinglist
Subject: MacOS X: Redirection performance problem
Date: Sun, 21 Sep 2008 01:47:19 -0400
User-agent: Thunderbird 2.0.0.16 (Macintosh/20080707)

Hello,

I'm facing a performance problem under MacOS X when using gawk's output redirection: it's very slow.

I have to process CSV files (~5G lines each) that must be splited into separated files (~300) based on a field value, so performance is critical. For now my old PIII outperforms my MacPro... so something clearly isn't right somewhere under MacOS... Here what I'm using:

{
FS=","
row=$0
var=$5
gsub(/\"/,"",var)
path=dir"/"var".csv"
print row >> path
close(path)
}

Find below some simple test cases that compare performance of my MacPro to an old IBM server. Any idea how the redirection could be optimized under MacOS? I'm not a programmer but I can realize tests if necessary, so please don't hesitate to ask... simply let me know exactly what you want me to do.

Best regards,

Ben.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-> MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with default bin:

$ awk -V
awk version 20040207

$ time awk '{ print > "/tmp/output.txt" }'  /tmp/input.txt
real    0m12.071s
user    0m5.171s
sys    0m6.171s

$ time awk '{ print }' < /tmp/input.txt  > /tmp/output.txt
real    0m3.648s
user    0m2.561s
sys    0m0.665s

-- MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with default bin using /dev/null

$ time awk '{ print > "/dev/null" }'  /tmp/input.txt
real    0m7.068s
user    0m4.752s
sys    0m2.314s

$ time awk '{ print }' < /tmp/input.txt  > /dev/null
real    0m2.602s
user    0m2.425s
sys    0m0.177s


$ wc -l /tmp/output.txt
2000000 /tmp/output.txt
$ wc -l /tmp/input.txt
2000000 /tmp/input.txt
$ ls -lh /tmp/output.txt
-rw-rw-r-- 1 abc abc 129M Sep 21 00:58 /tmp/output.txt


-> MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with gawk 3.1.6 (Built: ./configure --prefix=/usr/local/gawk-3.1.6) :

$ /usr/local/gawk-3.1.6/bin/awk -W version
GNU Awk 3.1.6

$ time /usr/local/gawk-3.1.6/bin/awk '{ print > "/tmp/output.txt"}' /tmp/input.txt

real    0m6.657s
user    0m3.968s
sys    0m2.107s

$ time /usr/local/gawk-3.1.6/bin/awk '{ print }' /tmp/input.txt > /tmp/output.txt

real    0m6.475s
user    0m3.757s
sys    0m2.136s


-- MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with gawk 3.1.6 using /dev/null

$ time /usr/local/gawk-3.1.6/bin/awk '{ print > "/dev/null"}' /tmp/input.txt

real    0m5.341s
user    0m3.779s
sys    0m1.561s

$ time /usr/local/gawk-3.1.6/bin/awk '{ print }' /tmp/input.txt > /dev/null

real    0m5.192s
user    0m3.620s
sys    0m1.570s


Here an example with gawk 3.1.6 using an old IBM address@hidden server running CentOS 5:

$ time /usr/src/gawk-3.1.6/gawk '{ print > "/tmp/output.txt" }' < /tmp/input.txt

real    0m3.334s
user    0m2.184s
sys    0m1.150s

$ time /usr/src/gawk-3.1.6/gawk '{ print }' < /tmp/input.txt > /tmp/output.txt

real    0m2.969s
user    0m1.727s
sys    0m1.243s

-> IBM address@hidden using /dev/null

$ time /usr/src/gawk-3.1.6/gawk '{ print > "/dev/null" }' /tmp/input.txt

real    0m2.614s
user    0m2.271s
sys    0m0.343s

$ time /usr/src/gawk-3.1.6/gawk '{ print }' /tmp/input.txt > /dev/null

real    0m2.520s
user    0m2.144s
sys    0m0.358s






reply via email to

[Prev in Thread] Current Thread [Next in Thread]