[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
MacOS X: Redirection performance problem
From: |
mailinglist |
Subject: |
MacOS X: Redirection performance problem |
Date: |
Sun, 21 Sep 2008 01:47:19 -0400 |
User-agent: |
Thunderbird 2.0.0.16 (Macintosh/20080707) |
Hello,
I'm facing a performance problem under MacOS X when using gawk's output
redirection: it's very slow.
I have to process CSV files (~5G lines each) that must be splited into
separated files (~300) based on a field value, so performance is
critical. For now my old PIII outperforms my MacPro... so something
clearly isn't right somewhere under MacOS... Here what I'm using:
{
FS=","
row=$0
var=$5
gsub(/\"/,"",var)
path=dir"/"var".csv"
print row >> path
close(path)
}
Find below some simple test cases that compare performance of my MacPro
to an old IBM server. Any idea how the redirection could be optimized
under MacOS? I'm not a programmer but I can realize tests if necessary,
so please don't hesitate to ask... simply let me know exactly what you
want me to do.
Best regards,
Ben.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-> MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with default bin:
$ awk -V
awk version 20040207
$ time awk '{ print > "/tmp/output.txt" }' /tmp/input.txt
real 0m12.071s
user 0m5.171s
sys 0m6.171s
$ time awk '{ print }' < /tmp/input.txt > /tmp/output.txt
real 0m3.648s
user 0m2.561s
sys 0m0.665s
-- MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with default bin using /dev/null
$ time awk '{ print > "/dev/null" }' /tmp/input.txt
real 0m7.068s
user 0m4.752s
sys 0m2.314s
$ time awk '{ print }' < /tmp/input.txt > /dev/null
real 0m2.602s
user 0m2.425s
sys 0m0.177s
$ wc -l /tmp/output.txt
2000000 /tmp/output.txt
$ wc -l /tmp/input.txt
2000000 /tmp/input.txt
$ ls -lh /tmp/output.txt
-rw-rw-r-- 1 abc abc 129M Sep 21 00:58 /tmp/output.txt
-> MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with gawk 3.1.6 (Built:
./configure --prefix=/usr/local/gawk-3.1.6) :
$ /usr/local/gawk-3.1.6/bin/awk -W version
GNU Awk 3.1.6
$ time /usr/local/gawk-3.1.6/bin/awk '{ print > "/tmp/output.txt"}'
/tmp/input.txt
real 0m6.657s
user 0m3.968s
sys 0m2.107s
$ time /usr/local/gawk-3.1.6/bin/awk '{ print }' /tmp/input.txt >
/tmp/output.txt
real 0m6.475s
user 0m3.757s
sys 0m2.136s
-- MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with gawk 3.1.6 using /dev/null
$ time /usr/local/gawk-3.1.6/bin/awk '{ print > "/dev/null"}'
/tmp/input.txt
real 0m5.341s
user 0m3.779s
sys 0m1.561s
$ time /usr/local/gawk-3.1.6/bin/awk '{ print }' /tmp/input.txt > /dev/null
real 0m5.192s
user 0m3.620s
sys 0m1.570s
Here an example with gawk 3.1.6 using an old IBM address@hidden server
running CentOS 5:
$ time /usr/src/gawk-3.1.6/gawk '{ print > "/tmp/output.txt" }' <
/tmp/input.txt
real 0m3.334s
user 0m2.184s
sys 0m1.150s
$ time /usr/src/gawk-3.1.6/gawk '{ print }' < /tmp/input.txt >
/tmp/output.txt
real 0m2.969s
user 0m1.727s
sys 0m1.243s
-> IBM address@hidden using /dev/null
$ time /usr/src/gawk-3.1.6/gawk '{ print > "/dev/null" }' /tmp/input.txt
real 0m2.614s
user 0m2.271s
sys 0m0.343s
$ time /usr/src/gawk-3.1.6/gawk '{ print }' /tmp/input.txt > /dev/null
real 0m2.520s
user 0m2.144s
sys 0m0.358s
- MacOS X: Redirection performance problem,
mailinglist <=