gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gzz] Hemppah's PEG round 2 comments


From: Hermanni Hyytiälä
Subject: Re: [Gzz] Hemppah's PEG round 2 comments
Date: 05 Jun 2003 11:20:19 +0300

On Wed, 2003-06-04 at 18:11, Tuomas Lukka wrote:


> ISSUE: what is known or easily deducible about the effect of a single
> hostile node? Following variants:
> 
>     - Node just wants to make the network work slower
> 
>     - Node wants to make searches for some particular informations difficult
> 
>     - Node wants to make life difficult for some particular nodes,
>       slowing down their queries
> 
>     - Node wants to make search for particular information by particular nodes
>       difficult
> 
>     - Node wants to spam a particular identifier or xu block
> 
> What about a constant fraction of nodes?
> 
> These questions are of utmost importance to us.


Directly related to DHTs, there are two publications about this topic:

"Security Considerations for Peer-to-Peer Distributed Hash Tables"
by Emil Sit and Robert Morris

"Security for structured peer-to-peer overlay networks"
by M. Castro, P. Druschel, A. Ganesh, A. Rowstron and and D. Wallach

Both papers discusses scenarions you mentioned above, e.g., Sit's
paper describes:

- incorrect lookup routing
- incorrect routing updates
- partition
- storage and retrieval attacks
- inconsistent behavior
- overload of targeted nodes
- rapid joins and leaves
- unsolicited messages

(these are the ones that Benja listed yesterday at #fenfire)

Castro's paper discusses:

-secure nodeId assignment methods
-secure routing table maintenance
-secure message forwarding
-self-certifying data

Castro et al. first describe briefly different kinds of attacks
ans possible solutions to them.

Additionally, "Controlling the Cost of Reliability in Peer-to-Peer
Overlays" focuses DHTs' capability to self-organise in highly
adverse conditions (e.g., a network partition). The paper
is written by Ratul Mahajan, Miguel Castro and Antony Rowston.

> 
> ISSUE: Do we want to keep GISP or move to Kademlia or some other?
> 
> > ISSUE:
> >         
> >     What experiments / simulations / methods are used in the literature?
> >     
> >     RESOLVED:
> >     
> >     - mainly theoretical/best case experiments/simulations
> >     - small amount of uncontrolled (real life), great amount of
> >       controlled (simulation environment)
> >     - simulations have been rather static and are often favored the 
> >       proposed algorithm    
> >     - in "real life" experiments, there have been very fast connections 
> >       between peers
> >     - some form of formal analyses ("proofs") are used, 
> >       e.g., with Chord and Kademlia
> >     - distribution of number of peers used in simulations: from 2^7 to 2^20 
> 
> A summary of what the actual findings are would be good, i.e. which
> things work and how. 

Hm, how detailed information do you want ? Could you give an example of
"which thing" ? :)

In general, based on the published papers, it's quite hard say for sure
what things *really* work and what works "in the article".


> 
> Any hostile-node stuff?

Yes, look above.


> 
> > Plan
> > ====
> > 
> > First of all, we will create a PEG document (this document) which 
> > discusses general and theoretical aspects of the research problems. Then, 
> > if necessary, we program (rather short) test cases which will test 
> > the GISP/Storm P2P properties, as discussed in this document. Finally, we 
> > will 
> > collect and analyse test cases' information and publish any interesting 
> > discoveries. 
> > 
> > The GISP protocol
> > =================
> > 
> > According to Daishi Kato, the author of GISP, GISP ''...intends leverage 
> > those (DHT) algoritms and make a practical protocol.''. GISP's distance 
> > function is based on the XOR-metric, which is similar to Kademlia's 
> > XOR-metric 
> > (unsigned integer of XOR). The GISP protocol specification paper describes 
> > the 
> > XOR-metric but the original GISP publication describes only numerical 
> > metric (which is little confusing since Chord uses numerical distance
> > metric and it's evident that a numerical metric is different from a 
> > XOR-metric).
> > 
> > GISP extends Chord's routing table to have more peer information as a cache.
> > Thus, log^2 messages are required to join/leave operations (according to the
> > original Chord publication). Additionally, Chord's routing table is 
> > asymmetric 
> > (requires stabilization protocol) and lookup process is unidirectional 
> > (in a virtual overlay ring). Original Kademlia publication states that 
> > Chord's routing table is rigid compared to Kademlia's routing table.
> > 
> > Since GISP lacks of Kademlia's binary-tree-like abstraction of the routing 
> > table, it is evident that the benefits of Kademlia's lookup properties 
> > (over 
> > other DHTs) are not available in the GISP protocol, e.g., no concurrent and
> > asynchronous lookups (no free of choice) and when a peer leaves the system
> > no messages are required.
> 
> Uhh, sentence unclear: in GISP, are messages required or not?

Since original GISP publication talks about Chord-like routing table, we
can assume that GISP doesn't use Kademlia's routing table (abstraction).
Thus, GISP requires O(log^2 n) messages when a peer joins/leaves the
system.


> 
> If they are, then we should move away from it immediately!


Hm, do have any options ? I still recommend Kademlia ;).


> 
> > Original GISP publication describes "peer strength" as a measure for peer 
> > heteregeneity but do not describe what properties are used and how this
> > value is calculated.
> > 
> > For network communication the GISP protocol uses XML-RPC.
> > 
> > Research problems
> > =================
> >  
> > For determining whether Storm with unmodified GISP is practical, we want 
> > the 
> > answers to the following questions.
> > 
> > - How well GISP can scale if there are lot of concurrent peer joins and
> >   leaves in the system ? 
> >   
> > - What about GISP's lookup effieciency when the network grows ?
> > 
> > - GISP does not use Kademlia's binary-tree abstraction - does it have 
> > influences 
> >   and if it does what are the influences ? 
> 
> does it... --> good or bad?


No, it does not. This a bad thing.

> 
> > - How much better/worse pure Kademlia implementation (e.g. Kashmir) is over 
> > the GISP 
> >   protocol in face performance and fault-tolerance ?
> 
> in face -> w.r.t.


I don't understand this :).


> 
> > - How well GISP is able to perform in adverse conditions, e.g., a
> >   network partition occurs ?
> >    
> > - How well GISP is able to perform against different kind of
> >   security attacks and what are the impacts ?
> >    
> > 
> > Simulations
> > ===========
> > 
> > If needed, using a simulation process we try solve research problems. Also, 
> > with a simulation process we are able test the GISP protocol without having 
> > to deploy real life experiments.
> 
> Not protocol - the core algorithms.


Well, I see both the protocol and algorithm important. The algortihm (a
routing table) gives a general abstraction and properties for the 
system and the protocol defines how well the system is able to benefit
from the abstraction (e.g., this is the case with Kademlia).

> 
> > During simulation process, we assume that simulation network is rather 
> > ideal, 
> > e.g., there are no network latencies in the simulation network. In the 
> > future,
> > however, we plan to perform simulations in a non-ideal network.
> > 
> >    
> > Hypothesis
> > ==========
> > 
> > In this section we introduce few hypothesis related to research problems.
> > Hypothesis' claims (goodness/badness) are based on the features of other 
> > DHTs 
> > and their simulation results (which can be found from the literature).
> > Also, values presented here (e.g., 1000-10000) are widely used in DHTs' 
> > simulation processes. 
> > 
> > 
> > - GISP's overlay can scale rather well when peers join and leave the system 
> > at a 
> >   constant rate or constant fraction (both are used) for a given time 
> > period 
> >   and cost of joining/leaving is logarithmic (e.g. Start with 1000 blocks 
> > and 
> >   1000 Storm-servers, 10 peer(s) joins/leaves every 5 seconds).
> 
> Umm, the 1000 blocks.. example does not say what you expect the result to be
> and why!
> 
> You should have a concrete estimate, including how many queries are made,
> how many packets used in the network, how much delay for queries, ...
> 
> logarithmic w.r.t. what?
> 
> What's the difference between the peers sending "I'm leaving" messages
> and just taking the network cable out of the wall?
> 
> Needs a LOT of work to make more exact.


Yes. As I said, I didn't do much work on hypothesis yesterday ;).


> > - A hostile entity is able to reroute a data lookup to a incorrect 
> >   destination peer during a data lookup process (e.g., e.g., 1000 Storm 
> >   blocks are insterted into a 1000 Storm-server system in which a a fraction
> >   of peers are hostile. Perform data lookups 1000 lookups randomly so that 
> >   in every lookup process, one forwarding request is rerouted incorrectly 
> > towards
> >   randomly chosen destionation peer; calculate average block request 
> > failure, 
> >   average lookup length, number of timed-out lookups and the distribution 
> > of 
> >   lookup messages processed per peer).  
> 
> Also, hostile node can direct towards non-peers, or empty network spots
> where it'll take a long time for the query to time out.

In the papers, mentioned above, there are some proposals for this kind
of scenario. 

OTOH, Kademlia has a nice feature which makes Kademlia
(rather) resistant to this attack: the "free choice" feature can be used
to select peers with low latency or few network hops. Additionally, it
can be used to select on arbitrary other criteria, such as avoiding
peers that are unreachable (due to an incompletely connected underlying
network), avoiding peers that are untrusted etc.



-Hermanni





reply via email to

[Prev in Thread] Current Thread [Next in Thread]