[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gzz] hemppah's research problems document
From: |
B. Fallenstein |
Subject: |
Re: [Gzz] hemppah's research problems document |
Date: |
Sat, 14 Dec 2002 21:08:22 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.2) Gecko/20021204 Debian/1.2.1-1 |
Hi Hermanni,
we seem to be miscommunicating again and again... sorry. *sigh* Please,
be as explicit as possible when answering, to avoid misunderstandings...
:-/
address@hidden wrote:
-DHTs require O(log n) hops to reach arbitrary destinations, assuming that
each node maintains
information about O(log n) nodes
This is not true. *Some* DHT routing algorithms require O(log n) nodes
to be stored, but CAN or a small-world assumption-based routing
algorithm only require O(1) neighbours (where CAN routing takes square
root time). You say that yourself, below.
For CAN, though, for only *oneĆ dimension is a "special" case. For CAN, when
there is a only one dimesion, CAN performance is far for other DHTs.
I don't understand these two sentences...
Indeed,
authors suggest that setting as d = log n, CAN has O(log n) neighbors and O(log
n) pathlengths like other DHT algorithms.
So? That still means that *some* DHT algorithms perform as you said,
*but not all* (CAN with say, 3 dimensions).
-SWNs require O(log^2 n) hops to reach arbitrary destinations, assuming
(*only and only if* !!!) that
links between nodes are constructed in the way that they are uniformly
distributed over all distances
in the network (Kleinberg)
"only and only if"? :)
Yes! There have to be *very* careful (aka "clever") when constructing links
between nodes in SWNs. See Jon Kleinberg's works for details (google jon
kleinberg)
"only and only if" is still not a meaningful English expression, AFAIK ;-)
-Example systems: SWAN, Freenet
Freenet? Freenet definitely does not have the properties of storing
mappings on your machine-- in Freenet, *everything* is distributed
through the network.
Umm..perhaps I should be more specific. Basically I have categorized these
systems based on the routing mechanism. I should, however, update the section
names for better clearance.
No, you should make clear that the characterizations you list do not
apply to all examples you give.
Insert/Delete Space
Search
Pastry: O(log^2 n) O(nlog n) O(log n)
Tapestry: O(log^2 n) O(nlog n) O(log n)
In Pastry and Tapestry, each node has to know *more neighbours than
there are nodes in the system* (n * log n)? Ahem.
Ok, in this we have to be careful :). The summary table is far from complete
(from "truth"). However, I copied part of this table from the article named
"Distributed Object Location in a Dynamic Network",
http://oceanstore.cs.berkeley.edu/publications/papers/pdf/SPAA02.pdf.
I have a strong suspicion that they mean space required in the *whole*
network-- O(log n) neighbours per node = O(n log n) neighbours in the
whole system. The O(n log n) per node which you gave doesn't make sense,
anyway... They also give space for CAN as O(nr), where r is the number
of dimensions-- neighbours in CAN per node are O(r), AFAIK (O(1) if you
consider the number of dimensions to be fixed-- which makes sense,
because it's a parameter picked at implementation time, not modified
when the network grows). And so on.
2.1.1 Facts
-specific Storm block can be identified with an ID, generated by SHA-1
algorithm
-block ID is globally *unique*, e.g. "Front page of New York Times
newspaper on 10.10.2002"
But note that's it's a *number*, which you cannot immediately see has
anything to do with the New York Times, newspapers, or 10.10.2002.
Yes, *random* number.
Random-looking. It is a deterministic function of the data in the block,
so it isn't really random.
-approach specific pseudo thinking in SWAN: "based on block's ID, go to the
node, whose (euclidean) distance is closest
to block's ID in identity space"
Here the simplification bites. There are two ways to interpret this
'pseudo thinking':
1. This is used as a DHT routing algorithm: that node will tell us which
other node has the block (i.e., we 'map our resources into the network'
as you described it above)
2. The node we found is the node to download the block from. This means
that a computer would have to run a node for every block it stores
(making number of neighbours to store an extremely unattractive O(k)
with k = number of blocks on the local computer).
My problem: If it's 1), this is just another type of DHT. If it's 2),
it's about as attractive as flooding broadcast networks, because it
simply doesn't scale.
I think the first approach is more close ("somekind of DHT")...
Sorry if this sounds harsh :-( but your work is of no use to me as long
as you don't clearly explain the algorithms you're talking about. I said
that I see only two interpretations of what you said: DHT or
one-node-per-block. I see no others, so if you say one of these is
"close," but not quite there, I haven't been helped at all. Please,
either explain how this works, or say that you can't and why.
(In case you don't really know how it works: That's not a problem. But
then, please *say* so instead of making asserions about this sysatem and
its properties...)
Sorry, but this is really important for your work to be useful to us
others in the group...
...however, in SWNs, the most important thing how the structure (links between
nodes) is constructed; so that certain rules are are fulfilled (again, see
Kleingberg).
This doesn't tell me how SWAN works, and how that's different from a
DHT. I have a rough understanding of how it links nodes; I do not
understand how blocks are published and looked up. I see that
small-world networks could provide an interesting DHT routing algorithm;
you're saying SWAN is doing something different, but not *what*. :(
c) FBN
-In this case, there is no benefit from the block's ID at all
What do you mean by this?
Hm..that there is not "hints" in the network where block is located in the
network. On the other hand, in DHTs (and in somehow in SWNs also) based on block
ID's hash, nodes gives "hints" (routes more close to ID in the key space)
constantly to a query lookup until the specific node is found which hosts the
block.
Ok.
-finding key blocks for data block can be performed locally (in logarithmic
time)
- "...can be performed locally"? It's like loading blocks: If you don't
have the data, you must request it from the network.
- Why logarithmic time? Locally, we can use a hashtable -> O(1)?
Also, look above :). Ok, to be more specific: worste case is locarithmic time,
usually constant time.
Sorry, I understand neither of these hints. Could you reply to my
questions directly-- what do you mean by "can be performed locally," and
what do you mean by 'logarithmic time'? (If you're giving 'logarithmic
time' as the worst-case performance of hashtables, I think that should
be 'linear time'?)
-simple pseudo thinking: "find all Storm blocks from the network,
> which uses specific urn-5 name. Compare blocks and
return the most recent block, if the signing key is "valid"."
Well... you would probably only request the most current from the
network and check its signature, since if the network is trying to fool
you into believing an old version is the current one, it could simply
hide the blocks containing the newer versions from you in the first place.
How do we *know* to request the most current version from the distributed
network (key ?) ?
Ok, right: the network has to do some cooperation on this one-- in a
flat DHT, we would have to request all blocks. To be faster, we would
need to be able to do some computation on the node that stores the
mappings, to determine the newest block, and then return only that--
GISP is planned to support XML queries over mapping values in the
future, IIRC...
Hm, here's another idea: As the mappings, store
(uri -> [blockid, date]). I.e., the value is a record containing
a block id and the timestamp of that block. Then, we could request all
*mappings* and select the newest one, without actually downloading the
*blocks*. (We would then of course check that the block we select does
actually have that date and urn-5 and is signed...) This works on top of
a plain DHT.
I may have understood this all wrong, but if we want all blocks associated with
a specific urn-5, we have to check all blocks (since from this point of view,
urn-5 and block IDs are independent from each other: urn-5 names has no
information about the blocks which may associated with the specific urn -5 ??) ?
Ok, that's right-- we have to scan each block and put it into the index.
2.2.4. Open questions:
-in this case, is it sensible to maintain two different key-value mappings
key-value based systems (DHT, SWN) ?
-one fore block IDs
-one for urn-5 names, which are associated with block IDs
-is this approach too difficult to maintain ?
Why would one want to? I don't see any benefits.
This is based on my preasumption below.
Which one?
> If we know all the block's associated
with a given urn-5, we can find the block(s) more efficiently.
Well, duh. But your question was (or I understood it as) whether it
makes sense to maintain *two different* key-value based systems for
block IDs and urn-5 names. I don't see why you would want to-- you can
store both mappings in a single DHT (which is probably more efficient
than having two parallel DHTs, and also easier to maintain).
-is there possibility, in the specific urn-5 name, to maintain
information
about most recent block's ID for better search performance (or moreover,
tree based structure for all blocks for specific urn-5 name) ?
No. You create the urn-5 *before* you create any of the blocks, so you
cannot make it depend on the blocks (that's why we can't use
cryptographical hashes).
Heh...here's the answer for my two answers above :).
What do you mean?
Sorry for having such a hard time understanding you :-(
- Benja
- Re: [Gzz] hemppah's research problems document, (continued)
- Re: [Gzz] hemppah's research problems document, Tuomas Lukka, 2002/12/16
- Re: [Gzz] hemppah's research problems document, hemppah, 2002/12/16
- Re: [Gzz] hemppah's research problems document, Tuomas Lukka, 2002/12/16
- Re: [Gzz] hemppah's research problems document, hemppah, 2002/12/16
- Re: [Gzz] hemppah's research problems document, Tuomas Lukka, 2002/12/16
- Re: [Gzz] hemppah's research problems document, hemppah, 2002/12/16
- Re: [Gzz] hemppah's research problems document, b . fallenstein, 2002/12/16
- Re: [Gzz] hemppah's research problems document, hemppah, 2002/12/17
- Re: [Gzz] hemppah's research problems document, b . fallenstein, 2002/12/17
- Re: [Gzz] hemppah's research problems document, hemppah, 2002/12/18
Re: [Gzz] hemppah's research problems document,
B. Fallenstein <=