[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-tar] use optimal file system block size
From: |
Christian Krause |
Subject: |
[Bug-tar] use optimal file system block size |
Date: |
Wed, 18 Jul 2018 14:58:13 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
Dear tar Community,
We are using **tar** at our High-Performance Computing (HPC) at our research
institute iDiv. The networked file system serving (scientific) data on our
cluster is using a block size of 2 MiB:
```
$ mkdir data
$ dd if=/dev/zero bs=2M count=42 of=data/blob status=none
$ stat -c %o data/blob
2097152
```
**tar** does not explicitly use the block size of the file system where the
files are located, but, for a reason I don't know (feel free to educate me), 10
KiB:
```
$ tar --version | head -1
tar (GNU tar) 1.30
$ strace -T -ttt -ff -o tar-1.30.strace tar cf data.tar data
$ strace-analyzer io tar-1.30.strace.59539 | grep data | column -t
read 84M in 444.041 ms (~ 189M / s) with 8602 ops (~ 10K / op,
~ 10K request size) data/blob
write 84M in 404.483 ms (~ 208M / s) with 8602 ops (~ 10K / op,
~ 10K request size) data.tar.gz
```
If you're interested, you can find strace-analyzer
[here](https://github.com/wookietreiber/strace-analyzer). It is, more or less,
just doing some stats over the strace log.
Especially for a networked file system, the comparatively high amount of IOPS
with that block size results in not so good performance. Using the native file
system block size would generally yield better performance.
I would like to propose to use the native file system block size in favor of
the currently used 10 KiB. The block size can be queried with the `stat`
syscall, just like with the `stat` command from above. If the syscall does not
return the block size, e.g. if the file system does not support it, the current
default of 10 KiB could still be applied as a fallback.
What do you think about an improvement like this?
I can offer to try to implement this myself and provide a patch. I'm fairly new
to GNU Savannah, so I'm still a bit fuzzy on what the preferred way to submit
patches to the project is (I'm used to the fork plus pull request / merge
request model as you can find on GitHub/GitLab).
Best Regards
--
Christian Krause
Scientific Computing Administration and Support
-----------------------------------------------------------------------------
Email: address@hidden
Office: BioCity Leipzig 5e, Room 3.201.3
Phone: +49 341 97 33144
-----------------------------------------------------------------------------
German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig
Deutscher Platz 5e
04103 Leipzig
Germany
-----------------------------------------------------------------------------
iDiv is a research centre of the DFG – Deutsche Forschungsgemeinschaft
iDiv ist eine zentrale Einrichtung der Universität Leipzig im Sinne des § 92
Abs. 1 SächsHSFG und wird zusammen mit der Martin-Luther-Universität
Halle-Wittenberg und der Friedrich-Schiller-Universität Jena betrieben sowie in
Kooperation mit dem Helmholtz-Zentrum für Umweltforschung GmbH – UFZ.
Beteiligte Kooperationspartner sind die folgenden außeruniversitären
Forschungseinrichtungen: das Helmholtz-Zentrum für Umweltforschung GmbH - UFZ,
das Max-Planck-Institut für Biogeochemie (MPI BGC), das Max-Planck-Institut für
chemische Ökologie (MPI CE), das Max-Planck-Institut für evolutionäre
Anthropologie (MPI EVA), das Leibniz-Institut Deutsche Sammlung von
Mikroorganismen und Zellkulturen (DSMZ), das Leibniz-Institut für
Pflanzenbiochemie (IPB), das Leibniz-Institut für Pflanzengenetik und
Kulturpflanzenforschung (IPK) und das Leibniz-Institut Senckenberg Museum für
Naturkunde Görlitz (SMNG). USt-IdNr. DE 141510383
- [Bug-tar] use optimal file system block size,
Christian Krause <=
Re: [Bug-tar] use optimal file system block size, Joerg Schilling, 2018/07/18