h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[h5md-user] Variable-size particle groups


From: Peter Colberg
Subject: [h5md-user] Variable-size particle groups
Date: Sat, 26 May 2012 09:55:55 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Dear H5MD community,

Let's break the silence with a new extension for H5MD :-).

While finishing the support of particle groups in HALMD, which allow
selection of a subset of particles of the system for observation,
I am pondering how to store variable-size trajectory data in H5MD.

This would become necessary once I track, e.g., particles in the
neighbourhood of a particle, while avoiding to sample an entire
system of millions of solvent particles (or, at least, with a
significantly lower frequency).

One idea I had in mind was to use the existing trajectory dataset
structure, and fill empty placeholders with some invalid value (NaN).
While the storage overhead should be negligible due to compression,
this has a serious disadvantage: The number of placeholders must be
chosen wisely, otherwise a lengthy simulation may have to abort due
to an overflow of particles.

Instead, I propose a better scheme:

H5MD implements an optional dataset “range” inside each trajectory
subgroup, next to the other datasets groups “step” and “time”.

The dataset “range” is two-dimensional, with the first dimension
as the [variable] dimension (in H5MD lingo “to accumulate time steps”),
and the second dimension equal to 2. The dataset stores an array of
ranges [first, last), which reference the variable dimension of the
datasets position/sample, velocity/sample, …

The datasets position/sample, velocity/sample, … are reduced by one
dimension, i.e. [variable][N][D] are reduced to [variable × N][D].

For readers, this will add an additional indirection when looking up
particle data, e.g. to look up the position sample at step s, the
reader first looks up the range [first, last) at step s, and then
selects this range from the position/sample dataset.

As an example, a lookup by range [first, last) could be implemented
with ease using NumPy's array indexing, array[first:last], e.g.

  first, last = range[step]
  sample = position[first:last]


Of course, with a fluctuating number of particles, one would probably
also store a trajectory subgroup “tag” to identify particles, but this
is a separate issue from my proposal.


What do you think of this proposal?

Should such an extension be optional, or mandatory? Do you see even
more complex use cases which could not be handled by this scheme?

One detail for discussion would be the redundancy of the value of
“last”, which could also be determined from the subsequent value
of “first”. However, this would require case handling in readers
for the last sample. Thus I believe the redundancy is a good thing
since it simplifies the implementation of readers.

Regards,
Peter

-- 
Peter Colberg
Chemical Physics Theory Group
Department of Chemistry
University of Toronto
Canada
http://www.chem.utoronto.ca/~pcolberg/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]