bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: using mapfile is extreamly slow compared to oldfashinod ways to read


From: Lennart Schultz
Subject: Re: using mapfile is extreamly slow compared to oldfashinod ways to read files
Date: Fri, 27 Mar 2009 12:54:29 +0100

Chris,
I agree with you to use the right tool at the right time, and mapfile seems
not to be the right tool for my problem, but I will just give you some facts
of my observations:

using a fast tool like egrep just to find a simple string in my datafile
gives the following times:

time egrep '<pro' >/dev/null < dr.xml

real    0m54.628s
user    0m27.310s
sys     0m0.036s

My original bash script :

time xml2e2-loadepg

real    1m53.264s
user    1m22.145s
sys     0m30.674s

While the questions seems to go on spawning subshells and the cost I have
checked my script
it is only calling one external command is date which in total is called a
little less than 20000 times. I have just for this test changed the call of
date to an assignment of an constant. and now it looks:

time xml2e2-loadepg

real    1m3.826s
user    1m2.700s
sys     0m1.004s

I also made the same change to the version of the program using mapfile, and
changed  line=$(echo $i) to
line=${i##+([[:space:]])}
so the mainloop is absolulty without any sub shell spawns:

time xml2e2-loadepg.new

real    65m2.378s
user    63m16.717s
sys     0m1.124s



Lennart


2009/3/26 Chris F.A. Johnson <cfaj@freeshell.org>

> On Thu, 26 Mar 2009, Lennart Schultz wrote:
>
>  I have a bash script which reads about 250000 lines of xml code generating
>> about 850 files with information extracted from the xml file.
>> It uses the construct:
>>
>> while read line
>> do
>>  case "$line" in
>>  ....
>> done < file
>>
>> and this takes a little less than 2 minutes
>>
>> Trying to use mapfile I changed the above construct to:
>>
>> mapfile  < file
>> for i in "${MAPFILE[@]}"
>> do
>>  line=$(echo $i) # strip leading blanks
>>  case "$line" in
>>  ....
>> done
>>
>> With this change the job now takes more than 48 minutes. :(
>>
>
>   As has already been suggested, the time it almost certainly taken
>   up in the command substitution which you perform on every line.
>
>   If you want to remove leading spaces, it would be better to use a
>   single command to do that before reading with mapfile, e,g,:
>
> mapfile < <(sed 's/^ *//' file)
>
>   If you want to remove trailing spaces as well:
>
> mapfile < <(sed -e 's/^ *//' -e 's/ *$//' file)
>
>   Chet, how about an option to mapfile that strips leading and/or
>   trailing spaces?
>
>   Another useful option would be to remove newlines.
>
> --
>   Chris F.A. Johnson, webmaster         <http://woodbine-gerrard.com>
>   ========= Do not reply to the From: address; use Reply-To: ========
>   Author:
>   Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]