[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz] hemppah's research problems document
From: |
B. Fallenstein |
Subject: |
[Gzz] hemppah's research problems document |
Date: |
Fri, 13 Dec 2002 00:52:00 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.2) Gecko/20021204 Debian/1.2.1-1 |
Please notice: in this section + = pro, - = con
Probably this'll make diffs to that section hard to read...
1.1. Distributed Hash Tables (DHT)
...
-own resources are mapped into the network
I still don't see this as a problem. You should have an explanation if
you do... :)
-keyword/fuzzy search not possible yet
-hotspots
-Reliability depends on replicas. If it is possible for an adversary to
control a sufficient number of replicas for a given key, they can spoof
that key easily (both giving wrong data and hiding correct data).
-DHTs require O(log n) hops to reach arbitrary destinations, assuming that each node maintains
information about O(log n) nodes
This is not true. *Some* DHT routing algorithms require O(log n) nodes
to be stored, but CAN or a small-world assumption-based routing
algorithm only require O(1) neighbours (where CAN routing takes square
root time). You say that yourself, below.
-Of couse, there is the possibility to route in constant times, but it requires
that *each** node
maintains information about all the nodes in the network. Therefore , practically, this method
impossible
(Not 100% correct: for example, it suffices to have an O(1) route to one
computer that knows each node. But you're right about constant time
routing not being workable in any case.)
-Example systems: Chord, CAN, Kademlia, Pastry, Tapestry
GISP and the Circle. Mentioning them here because they contrast your
examples in not being academic research projects.
1.2. Small World Networks (SWN)
...
-SWNs require O(log^2 n) hops to reach arbitrary destinations, assuming (*only and only if* !!!) that
links between nodes are constructed in the way that they are uniformly distributed over all distances
in the network (Kleinberg)
"only and only if"? :)
-Example systems: SWAN, Freenet
Freenet? Freenet definitely does not have the properties of storing
mappings on your machine-- in Freenet, *everything* is distributed
through the network.
1.3. Flooding Broadcast Networks (FBN)
...
-not fast routing (in fact, there is no upper limit, because there are multiple
simultaneous
breadh-first searches in the network)
I don't understand this (why there is no upper limit). I would've
thought that there's usually an upper limit of O(n^2)-- if all nodes
know each other, and thus every node forwards the request to every other
node (where it is cancelled, because a node will not process the same
lookup twice).
Insert/Delete Space Search
Pastry: O(log^2 n) O(nlog n) O(log n)
Tapestry: O(log^2 n) O(nlog n) O(log n)
In Pastry and Tapestry, each node has to know *more neighbours than
there are nodes in the system* (n * log n)? Ahem.
Flooding: N/A N/A No limit!
Space for flooding is O(1), I presume. You generally set a number in the
preferences how much other computers your computer will connect to.
2. Thesis research problems
2.1 "Searching for a specific Storm block"
2.1.1 Facts
-specific Storm block can be identified with an ID, generated by SHA-1 algorithm
-block ID is globally *unique*, e.g. "Front page of New York Times newspaper on
10.10.2002"
But note that's it's a *number*, which you cannot immediately see has
anything to do with the New York Times, newspapers, or 10.10.2002.
2.1.2. Objectives
-if block exists in the network, algorithm *will* find it
Yep, exactly.
-algorithm will find the specific Storm block as quicly as possible (and return
it to the user) based
on the the block's ID
-simple pseudo thinking: "based on block ID, find the specific block fast and return
it."
You're missing a step here:
1. Find the block
2. *Download the block*
3. Return the block
Maybe you want to leave it out of the pseudo thinking, but it's quite
essential to the point above that :-)
2.1.3. Existing approaches and this specific research problem
a) DHT
-natural choice, since in DHTs are based on key-value associations
-in our case, keys are block IDs (SHA-1)
-if exists in the network, specific block will be located based on block ID
-Efficient, O(log n) in existing DHT systems
-approach specific pseudo thinking: "based on block's ID, go to the node, whose ID
is the nearest to block's ID in key space"
Ok, but--
b) SWN
...
-approach specific pseudo thinking in SWAN: "based on block's ID, go to the node, whose (euclidean) distance is closest
to block's ID in identity space"
Here the simplification bites. There are two ways to interpret this
'pseudo thinking':
1. This is used as a DHT routing algorithm: that node will tell us which
other node has the block (i.e., we 'map our resources into the network'
as you described it above)
2. The node we found is the node to download the block from. This means
that a computer would have to run a node for every block it stores
(making number of neighbours to store an extremely unattractive O(k)
with k = number of blocks on the local computer).
My problem: If it's 1), this is just another type of DHT. If it's 2),
it's about as attractive as flooding broadcast networks, because it
simply doesn't scale.
c) FBN
-In this case, there is no benefit from the block's ID at all
What do you mean by this?
-not very efficient, since every Storm node has to "ask" every neighbor node it
has the specific block ID
-there is no guarantee that block will be found in the network
-obviously a lot of extra traffic arises in the network
-approach specific pseudo thinking: "ask repeating from each node if it has block,
whose ID value is X"
Yep.
2.2. "Searching for most recent Storm block associated with specific urn-5
name, where
the block has been signed with a given key"
This is not specific to urn-5; any URI used for identification works.
For example, the proposed tag URIs (tag:address@hidden:2002-12:foo
or some such) would work just as well.
2.2.1. Facts
-urn-5 is random "unique keyword", e.g. "Front page of New York Times newspaper"
-urn-5 can be updated
No, a urn-5 cannot be updated. We can associate a new block with the
urn-5, but there is no such thing as 'updating' it.
-urn-5 name is saved in the header of Storm block (|block urn-5: "Front page of New
York Times newspaper"|)
This is not what we have been doing so far. Your approach may have merit
though, especially for network publishing (like 'Front page of NYT'):
the one we currently use has some problems with determining what is
'current' if we delete some old versions, but not all (even older
versions may suddenly appear to be current).
-finding key blocks for data block can be performed locally (in logarithmic
time)
- "...can be performed locally"? It's like loading blocks: If you don't
have the data, you must request it from the network.
- Why logarithmic time? Locally, we can use a hashtable -> O(1)?
2.2.2. Objectives
-Find the specific (and the most recent) Storm block as quicly as possible
> (and return it to the user) based on the given urn-5
And key.
-simple pseudo thinking: "find all Storm blocks from the network,
> which uses specific urn-5 name. Compare blocks and
return the most recent block, if the signing key is "valid"."
Well... you would probably only request the most current from the
network and check its signature, since if the network is trying to fool
you into believing an old version is the current one, it could simply
hide the blocks containing the newer versions from you in the first place.
2.2.3. Existing approaches and this specific research problem
-First, *all* blocks (with the specific urn-5 name) have to checked and analysed
-In this case (urn-5), there is no benefit from the block's ID at all (DHT, SWN)
What do you mean by this? I don't understand at all.
2.2.4. Open questions:
-in this case, is it sensible to maintain two different key-value mappings
key-value based systems (DHT, SWN) ?
-one fore block IDs
-one for urn-5 names, which are associated with block IDs
-is this approach too difficult to maintain ?
Why would one want to? I don't see any benefits.
-is there possibility, in the specific urn-5 name, to maintain information
> about most recent block's ID for better search performance (or moreover,
tree based structure for all blocks for specific urn-5 name) ?
No. You create the urn-5 *before* you create any of the blocks, so you
cannot make it depend on the blocks (that's why we can't use
cryptographical hashes).
2.3. "Searching for Storm blocks associated with specific urn-5 name, where
specific date has been defined (or date range), and where Storm block
has been signed with a given key"
2.3.1. Facts
-in addition to section 2.2.1, in every block's header, there is a timestamp for
date&time
Huh? This isn't needed in 2.2.1? How would one then compare two blocks
to determine which of the two is more current (should be shown)?!?
Hope these comments help,
- Benja
- [Gzz] hemppah's research problems document,
B. Fallenstein <=
- Re: [Gzz] hemppah's research problems document, hemppah, 2002/12/13
- Re: [Gzz] hemppah's research problems document, Tuomas Lukka, 2002/12/13
- Re: [Gzz] hemppah's research problems document, B. Fallenstein, 2002/12/14
- Re: [Gzz] hemppah's research problems document, Tuomas Lukka, 2002/12/14
- Re: [Gzz] hemppah's research problems document, B. Fallenstein, 2002/12/14
- Re: [Gzz] hemppah's research problems document, Tuomas Lukka, 2002/12/14
- Re: [Gzz] hemppah's research problems document, B. Fallenstein, 2002/12/14
- Re: [Gzz] hemppah's research problems document, Tuomas Lukka, 2002/12/15
- Re: [Gzz] hemppah's research problems document, B. Fallenstein, 2002/12/15
- Re: [Gzz] hemppah's research problems document, hemppah, 2002/12/16