[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz-commits] manuscripts/storm article.rst
From: |
Hermanni Hyytiälä |
Subject: |
[Gzz-commits] manuscripts/storm article.rst |
Date: |
Fri, 31 Jan 2003 05:58:56 -0500 |
CVSROOT: /cvsroot/gzz
Module name: manuscripts
Changes by: Hermanni Hyytiälä <address@hidden> 03/01/31 05:58:56
Modified files:
storm : article.rst
Log message:
Comments, suggestions etc.
CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.60&tr2=1.61&r1=text&r2=text
Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.60 manuscripts/storm/article.rst:1.61
--- manuscripts/storm/article.rst:1.60 Fri Jan 31 01:57:16 2003
+++ manuscripts/storm/article.rst Fri Jan 31 05:58:56 2003
@@ -28,8 +28,8 @@
computers, being sent as e-mail attachments, carried around on disks,
published on the web, moved between desktop and laptop systems,
downloaded for off-line reading or copied between computers in a LAN.
-Often, the same document will be independently modified
-on two unconnected systems. In this paper, we address two issues
+Often, the same document is independently modified
+on two unconnected, separete systems. We address two issues
raised by this *data mobility*: Dangling links, and keeping track
of alternative versions. Resolvable location-independent identifiers
make these issues much easier to deal with, since data
@@ -43,7 +43,7 @@
private data and documents published on the Internet by
using the same identifiers for both.
Storm has been partially implemented as a part of the Gzz project [ref],
-which uses it exclusively for all disk storage. On top of Storm,
+which uses Storm exclusively for all disk storage. On top of Storm,
we have built a system for storing mutable, versioned data
and an implementation of Xanalogical storage [ref].
@@ -213,6 +213,9 @@
3. Block storage
================
+[Do we need a figure, which shows the overall structure of block storage
+with pointers and diffs ? -Hermanni]
+
In our system, Storm (for *storage module*), all data is stored
as *blocks*, byte sequences identified by a SHA-1 cryptographic content-hash
[ref SHA-1 and our ht'02 paper]. Blocks often have a similar granularity
@@ -221,8 +224,8 @@
Mutable data structures are built on top of the immutable blocks
(see Section 6).
-hemppah: Or should these lines be inserted to some other section and tell more
about these
-systems, e.g. 5.2 ?
+[Or should these lines be inserted to some other section and tell more about
these
+systems, e.g. 5.2 ? -Hermanni]
CFS [ref], which is built upon Chord routing layer[ref], store data as blocks.
However, CFS *splits* files into several miniblocks and spreads blocks over
the
@@ -230,15 +233,18 @@
files into blocks, since they store data as whole files. All previously
mentioned
systems lack of the immutable property which is used in Storm blocks.
-Immutable blocks has several benefits...
+Immutable blocks has several benefits over existing systems...
-Block storage makes it easy to replicate data between systems.
+1) Storm's block storage makes it easy to replicate data between systems.
Different versions of the same document can easily coexist at this level,
-stored in different blocks. To replicate all data from computer A
+stored in different blocks.
+[Previous sentence doesn't parse to me (what level ?) :( -Hermanni]
+To replicate all data from computer A
on computer B, it suffices to copy all blocks from A to B that B
does not already store.
+[Example of Lotus Notes' replication conficts ? -Hermanni]
-Storm blocks are MIME messages [ref MIME], i.e., objects with
+2) Storm blocks are MIME messages [ref MIME], i.e., objects with
a header and body as used in Internet mail or HTTP.
This allows them to carry any metadata that can be carried
in a MIME header, most importantly a content type.
@@ -250,34 +256,37 @@
get(id) -> block
add(block)
delete(block)
+
+[analogy to regular Hash Table/DHT ? -Hermanni]
-Implementations may store blocks in RAM, in individual files,
+3) Implementations may store blocks in RAM, in individual files,
in a Zip archive, in a database or through other means.
We have implemented the first three (using hexadecimal
representations of the block ids for file names).
-Storing all data in Storm blocks provides *reliability*:
+4) Storing all data in Storm blocks provides *reliability*:
When saving a document, an application will only *add* blocks,
never overwrite existing data. When a bug causes an application
to write malformed data, only the changes from one session
will be lost; the previous version of the data will still
-be accessible. This makes Storm well suited as a basis
-for implementing experimental projects (such as ours).
+be accessible. (Footnote: This makes Storm well suited as a basis
+for implementing experimental projects (such as ours).)
-When used in a network environment, Storm ids do not provide
+5) When used in a network environment, Storm ids do not provide
a hint as to where in the network the matching block can be found.
However, current peer-to-peer systems could be used to
-find blocks in a distributed fashion; for example, Freenet [ref],
-a few recent Gnutella clients [e.g. ref: shareaza] , Overnet/eDonkey2000 [ref]
-also use SHA-1-based identifiers [e.g. ref: magnet uri].
-However, we have not put a network implementation into regular use
+find blocks efficiently in a distributed fashion; for example,
+Freenet [ref], a few recent Gnutella clients [e.g. ref: shareaza],
+Overnet/eDonkey2000 [ref] also use SHA-1-based identifiers
+[e.g. ref: magnet uri].
+(Footnote:However, we have not put a network implementation into regular use
yet and thus can only describe our design, not report on
-implementation experience.
+implementation experience.)
We discuss peer-to-peer implementations in Section 7, below.
-The immutability of blocks should make caching trivial, since it is
+6) The immutability of blocks should make caching trivial, since it is
never necessary to check for new versions of blocks.
-Since the same namespace is used for local data and data
+Since the same namespace [mention urn-5 ? -Hermanni] is used for local data
and data
retrieved from the network, online documents that have been
permanently downloaded to the local harddisk can also be found
by the caching mechanism. This is convenient for offline browsing,
@@ -285,25 +294,28 @@
while they are online, store them locally, and be sure that
their software will be able to access them as if downloaded
from the net, without broken links.
+[Previous sentence doesn't parse to me: more simple :( -Hermanni]
Given a peer-to-peer distribution mechanism, it would be possible
to retrieve blocks from any peer online that has a copy
in its cache or permanent storage. This is similar to the Squirrel
-web cache [ref], but does not require trust between the peers,
-since it is possible to check the blocks' cryptographic hashes.
-Since much-requested blocks would be cached on many systems,
-such a network could deal with XXX much more easily.
-On the other hand, there are privacy concerns with exposing
-one's browser cache to the outside world.
-
+web cache [ref] [more refs? -Hermanni], but does not require trust
+between the peers, since it is possible to check the blocks' integrity by
using
+cryptographic hashes. Since much-requested blocks would be
+cached on many systems, such a network could deal with XXX
+much more easily. On the other hand, there are privacy
+concerns with exposing one's browser cache to the outside world.
+[Merge this paragraph with 5) ? -Hermanni]
That all data is stored in blocks means that links to it
are completely independent of location; when data is moved
-between servers, references to it do not break. (Of course, this
-requires that the blocks can be found no matter what server
+between servers, references to it do not break. (Footnote: Of course,
+this requires that the blocks can be found no matter what server
they are on. Again, see Section 7.)
+[Is there disadvantages/issus which we are aware of ? -Hermanni]
+
4. Xanalogical storage
======================
@@ -321,23 +333,25 @@
=============
Clearly, for block storage to be useful, there has to be a way to
-efficiently update documents. We archieve this by a combination of
-two mechanisms. Firstly, a *pointer* is an updatable reference to a block;
+efficiently update documents/maintain different versions of documents.
+We achieve this by a combination of two mechanisms. Firstly, a
+*pointer* is an updatable reference to a block;
pointers can be updated by creating a specific kind of Storm block
representing an assertion of the form, "pointer ``P`` now points
to block ``B``." Pointers are resolved with the help of a Storm index
mapping pointer identifiers to blocks providing targets for that pointer.
Through this mechanism, we can keep old versions of documents
along with the current versions.
+[Figure ? -Hermanni]
Secondly, in the spirit of version control systems like CVS,
-we do not store each version, but only the differences between versions.
+we do not store *each version*, but only the differences between versions.
However, we still refer to each full version by the id of a block
containing that version, even though we do not store this block.
When we want to access a particular version, we reconstruct it
using the differences, and then check the result using
the cryptographic hash in the full version's block id.
-
+[Figure ? -Hermanni]
6.1. Pointers
-------------
@@ -388,9 +402,9 @@
is structured. On the other hand, the overlay connectivity graph of
broadcasting
approach is formed more or less (depends on implementation) in a random
manner.
-When performing queries, in broadcasting approach peer sends a query request
to a
+When performing queries, in broadcasting approach, peer sends a query request
to a
subset of its neighbors and these peers to their subsequent neighbors. The
-process will continue as long as query's time-to-live (TTL) hasn't been
reached.
+process will continue as long as query's time-to-live (TTL) value hasn't been
reached.
In DHT approach, query request is deterministically routed towards the peer
which hosts a specific data item. Routing is based on 'hints' (based on
differences between data item's key and peer's key), which each peer provides
@@ -451,7 +465,9 @@
Future directions: of course, we shoul implement a prototype
-Open issue/Future directions: implement multisource downloading
+Open issue/Future directions: implement multisource downloading
+
+Future directions: Implement home node model or directory model ?
9. Conclusions
==============
- [Gzz-commits] manuscripts/storm article.rst, (continued)
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/01/29
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/01/29
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/01/29
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/01/29
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/01/29
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/01/29
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/01/29
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/01/29
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/01/30
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/01/31
- [Gzz-commits] manuscripts/storm article.rst,
Hermanni Hyytiälä <=
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/01/31
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/01/31
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/01/31
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/01/31
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/01/31
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/01/31
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/01/31