h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] particle number


From: Felix Höfling
Subject: Re: [h5md-user] particle number
Date: Fri, 08 Feb 2013 15:18:17 +0100
User-agent: Opera Mail/12.12 (Linux)

Am 08.02.2013, 08:59 Uhr, schrieb Pierre de Buyl <address@hidden>:

Hi Felix,

On Fri, Feb 01, 2013 at 03:50:40PM +0100, Felix Höfling wrote:
Am 25.01.2013, 20:19 Uhr, schrieb Pierre de Buyl
<address@hidden>:
>On Fri, Jan 25, 2013 at 04:10:49PM +0100, Felix Höfling wrote:
>>1) The space dimension shall be stored in /parameters+dimension (as
>>integer attribute).
>>
>>In principle, it can be deduced from the size of the box offset, but
>>this appears pretty cumbersome. Apart from handiness, the box may be
>>found either in observables or in trajectory, requiring a
>>distinction of cases—just to obtain the space dimension.
>
>This seems reasonable enough :-)
>

I've included the space dimension in the draft. Since I can't push
to git://git.savannah.nongnu.org/h5md.git (why?) I have attached the
patch.

I checked your patch. Wouldn't it be easier to make "dimension" a dataset
instead of an attribute? It is an important parameter so I don't feel like it
should be made less visible.

Good question, see below.

Just nitpicking anyway, if you prefer attribute I'll commit anyway.

Here's the URL I have in my .git/config:
address@hidden:/srv/git/h5md.git
git:// is read-only. address@hidden uses ssh and savannah uses ssh-key login.

(First use of "git am", it is not that complex to use).


Thanks, I used git://git.savannah.nongnu.org/h5md.git, using my login should work.

>>2) If data are present only in /observables, the number of particles
>>can not be inferred. My suggestion is to supplement each observable
>>group with an attribute indicating the number of particles that lead
>>to this specific average. (So far, all macroscopic observables
>>result from an average over particles.) Thereby, also partial
>>observables of particle subgroups are handled correctly. The
>>attribute may be attached either to the top groups ('all', 'A', and
>>so on), or to individual data groups like 'total_energy'.
>
>I have mixed feelings about this one. For the moment (this does
>not mean that
>this should be the final solution, btw) I run multispecies
>simulations. The
>number of particles is in /observables/solvent_N and is a
>[:,N_species] dataset.
>In all generality, the number of particles depends on time and on
>the species.
>The variety of situations makes me think that this should not go
>into the first
>published version (FPV).
>
>>I believe that both attributes are of sufficient generality to
>>deserve a place in H5MD, and I will add them if there are no urgent
>>objections.
>>
>>BTW, what is missing as well is an (optional) error field (=standard
>>deviation) for the observables. What do you think?
>
>Is this critical for the FPV?
>

I see that we're running in a similar trouble as with the
static/fluctuating box size. Nevertheless, I think storing the
particle number of averaged quantities is an essential feature, two
examples:

Example 1: given two subset of particles, the total potential energy
per particle can not be computed from the subgroups unless the
particle number is known.

Example 2: computing simple response coefficients like the specific
heat from energy fluctuations requires the particle number.

The crucial point is to add the possibility to convert a per
particle quantity in a total quantity. If the particle group
contains a mixture (as in your case, and also in some of my
simulations), the _total_ number of particles is stored.
(Information about the fluctutating composition of the mixture may
go in a seperate observable.)

My suggestion is to store the particle number either in an attribute
attached to the observable if it is fixed in time or in a dataset
next to value/time/step if it fluctuating. The naming may be
"number", "particle_number", or simply "count".

"count" is more generic. It could be simpler to parse if it remains a dataset either way. An attribute is not really necessary as "all that jazz" is enclosed
in a group anyway.

"count" is fine.

observables
 \-- obs_1
      \-- step [var]
      \-- time [var]
      \-- value [var] (or [var,d1,d2,...]
      \-- count SCALAR


observables
 \-- obs_1
      \-- step [var]
      \-- time [var]
      \-- value [var]
      \-- count [var]

What do you think?

OK, just check what you think of the attribute/dataset issue.

Best,

Pierre


Indeed, we have no rule yet whether a piece of data is stored as attribute or as dataset. In the h5md group, everything is an attribute, while otherwise only the box type is stored as an attribute. It seems that in the current draft, scalar values are stored as an attribute and one- or multi-dimensional arrays as datasets. But this distinction appears arbitrary.

Technically, what do the HDF5 docs say about them?

http://www.hdfgroup.org/HDF5/doc/H5.intro.html#Intro-OAttributes
http://www.hdfgroup.org/HDF5/doc/UG/UG_frame13Attributes.html

As far as I see, the main differences are

1) attributes may be attached to datasets (which we do not use, they are always attached to groups)

2) attributes can not be chunked etc, mainly they represent a single piece of data of fixed size (a number, a string, could even be a small array)

3) attributes do not come with their own header (which may be negligible)

A benefit of using attributes for scalars and a datasets for arrays may help to distinguish the fixed/variable cases. It might be easier to check for the presence of a dataset "count" and to fall back to the attribute "count" than checking the dimensions of the dataset "count". [In h5py, the scalar is characterised by "shape" being the empty tuple, ()]

On the other hand, I agree that storing central information as attribute is kind of hiding this information. But then, the entries in the h5md group should become regular datasets as well.

It might be worth playing a bit with h5py, the use of the scalar attribute appears a bit simpler to me:

f.attrs.create("scalar", 1)
f.create_dataset("scalar", data=1)

In [39]: type(f.attrs['scalar'])
Out[39]: numpy.int64

In [40]: f.attrs['scalar']
Out[40]: 1

In [42]: type(f['scalar'])
Out[42]: h5py._hl.dataset.Dataset

In [49]: f['scalar'].__array__()
Out[49]: array(1)

Actually, I don't have a strong preference here. We may avoid attributes as much as possible, or we may use them for scalars throughout.

Any other opinions?

Regards,

Felix



reply via email to

[Prev in Thread] Current Thread [Next in Thread]