[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gawk] awk big file dead loop
From: |
dragan legic |
Subject: |
[bug-gawk] awk big file dead loop |
Date: |
Fri, 31 Oct 2014 17:21:58 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Icedove/24.8.1 |
Program-utility works properly till the output file reaches a certain
file size, at which point it blocks up the memory. When the output is
checked from command [??] top, it can be seen that CPU is busy with
almost nothing, but the memory is at 80%. I have 8 GB of RAM, input file
is 670 MB, that's certainly a lot smaller than 8 GB. The program is
doing a trivial operation where it eliminates the double syllables [??],
such operations were in IBM tests a long time ago. The program does the
following: with the prerequisite that the file is sorted [??], the
program loads one syllable, loads the next one, compares them, and if
they are the same it loads the next syllable and compares it with the
first one it keep saved. If the next loaded syllable is not the same,
the saved syllable gets written to the output file, and the new syllable
gets put in a buffer and is used for a new comparison. This is where the
utility is making a mess it seems, it takes up all the RAM and bothers
itself with swapping which isn't its strong point. The essential mistake
is that it takes up all the RAM for a file that's much smaller than all
the available RAM. Most likely awk eats up all the memory with buffers,
which it uses for comparing, because it's not releasing the memory. It
can be clearly seen that the program awk is working 2 hours on a PC with
8 GB RAM, AMD-FX-6100 CPU six core, SATA-3 HDD 2TB. Something like this
could have already been completed with 512KB memory on an old Facom.I
must kill awk program becauses it work in dead loop. I use sort -u and
this operation finish for 3 minuts.
|address@hidden
|
||cat izlaz.txt | awk '!seen[$0]++' >> izlaz1.txt
|top
Tasks: 214 total, 2 running, 212 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0,8 us, 0,6 sy, 0,0 ni, 83,7 id, 14,9 wa, 0,0 hi, 0,0 si, 0,0 st
KiB Mem: 8059252 total, 7921212 used, 138040 free, 1800 buffers
KiB Swap: 10251260 total, 4649756 used, 5601504 free. 201920 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2831 dragan 20 0 3110044 48940 33448 S 4,3 0,6 12:00.57 kwin
2854 dragan 20 0 4385988 80660 19788 S 3,3 1,0 15:14.00 plasma-des+
1580 root 20 0 378232 129836 95588 S 2,3 1,6 16:18.19 Xorg
11157 dragan 20 0 523628 17104 6552 S 0,7 0,2 0:05.19 konsole
1954 dirmngr 20 0 21980 120 48 S 0,3 0,0 0:01.88 dirmngr
2881 dragan 20 0 1840204 11128 0 S 0,3 0,1 0:14.25 mysqld
10936 dragan 20 0 825844 7812 3108 S 0,3 0,1 0:05.40 dolphin
10969 dragan 20 0 8294648 6,772g 164 D 0,3 88,1 1:31.92 awk
lsb_release -a
LSB Version:
core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch:core-4.1-amd64:core-4.1-noarch:security-4.0-amd64:security-4.0-noarch:security-4.1-amd64:security-4.1-noarch
Distributor ID: LinuxMint
Description: Linux Mint 17 Qiana
Release: 17
Codename: qiana
inxi -F
System: Host: dragan-MS-7693 Kernel: 3.14.21-031421-generic x86_64 (64 bit)
Desktop: KDE 4.13.3 Distro: Linux Mint 17 Qiana
Machine: Mobo: MSI model: 970A-G46 (MS-7693) version: 2.0 Bios: American
Megatrends version: V2.6 date: 10/08/2013
CPU: Hexa core AMD FX-6100 Six-Core (-MCP-) cache: 12288 KB flags: (lm nx
sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm)
Clock Speeds: 1: 3300.00 MHz 2: 3300.00 MHz 3: 1400.00 MHz 4:
3300.00 MHz 5: 1400.00 MHz 6: 3300.00 MHz
Graphics: Card: Advanced Micro Devices [AMD/ATI] Bonaire XTX [Radeon R7 260X]
X.Org: 1.15.1 driver: fglrx Resolution: address@hidden
GLX Renderer: AMD Radeon R7 200 Series GLX Version: 4.4.13084 - CPC
14.301.1001
Audio: Card-1: Advanced Micro Devices [AMD/ATI] Device aac0 driver:
snd_hda_intel
Card-2: Advanced Micro Devices [AMD/ATI] SBx00 Azalia (Intel HDA)
driver: snd_hda_intel
Card-3: Logitech Portable Webcam C905 driver: USB Audio
Sound: Advanced Linux Sound Architecture ver: k3.14.21-031421-generic
Network: Card-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
Controller driver: r8169
IF: eth0 state: up speed: 1000 Mbps duplex: full mac:
8c:89:a5:9c:d9:b3
Card-2: Ovislink AirLive WL-1600USB 802.11g Adapter [Realtek
RTL8187L] driver: rtl8187
IF: wlan0 state: down mac: 00:4f:78:01:5b:4a
Drives: HDD Total Size: 4060.8GB (35.0% used) 1: id: /dev/sda model:
Patriot_Pyro size: 60.0GB
2: id: /dev/sdb model: ST1000DL002 size: 1000.2GB 3: id: /dev/sdc
model: WDC_WD10EZEX size: 1000.2GB
4: id: /dev/sdd model: ST2000DM001 size: 2000.4GB
Partition: ID: / size: 69G used: 17G (25%) fs: ext4 ID: /home size: 70G used:
22G (34%) fs: ext4
ID: swap-1 size: 10.50GB used: 0.00GB (0%) fs: swap
RAID: No RAID devices detected - /proc/mdstat and md_mod kernel raid
module present
Sensors: System Temperatures: cpu: 54.0C mobo: 37.1C gpu: 54.00C
Fan Speeds (in rpm): cpu: 1118 fan-1: 3685 fan-3: 1639
Info: Processes: 222 Uptime: 3:55 Memory: 2021.9/7870.4MB Client: Shell
inxi: 1.8.4
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug-gawk] awk big file dead loop,
dragan legic <=