[Gzz-commits] gzz/Documentation/misc/hemppah-progradu Misc

gzz-commits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] gzz/Documentation/misc/hemppah-progradu Misc_No...

From:	Hermanni Hyytiälä
Subject:	[Gzz-commits] gzz/Documentation/misc/hemppah-progradu Misc_No...
Date:	Mon, 18 Nov 2002 07:31:53 -0500

CVSROOT:        /cvsroot/gzz
Module name:    gzz
Changes by:     Hermanni Hyytiälä <address@hidden>      02/11/18 07:31:53

Modified files:
        Documentation/misc/hemppah-progradu: Misc_Notes 
                                             existing_systems_overview 

Log message:
        UCLA seminars about P2P systems (Shenker and Karp)

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/Misc_Notes.diff?tr1=1.4&tr2=1.5&r1=text&r2=text
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/existing_systems_overview.diff?tr1=1.10&tr2=1.11&r1=text&r2=text

Patches:
Index: gzz/Documentation/misc/hemppah-progradu/Misc_Notes
diff -u gzz/Documentation/misc/hemppah-progradu/Misc_Notes:1.4 
gzz/Documentation/misc/hemppah-progradu/Misc_Notes:1.5
--- gzz/Documentation/misc/hemppah-progradu/Misc_Notes:1.4      Mon Nov 11 
08:46:24 2002
+++ gzz/Documentation/misc/hemppah-progradu/Misc_Notes  Mon Nov 18 07:31:52 2002
@@ -116,4 +116,146 @@
 Comments ?
 
 
+3. Analysis of the existing systems
+
+Since P2P concept was invented, a huge collection of academic p2p
+systems had been proposed (CAN, chord, hamming tree, pastry et al) and
+several practical p2p systems had been implemented (napster, Gnutella,
+Freenet et al). So the coexistence of different approaches confuse the
+research society and industry for a long time. They (speakers and other
+researchers at CMU, MIT, industry et al) right now initiate a new project
+to analyze the different approaches and try to combine them into one. The
+main objectives are:
+  1. Study the common building blocks of p2p system, and analyze the
+difference between different approaches.
+  2. Create a new(or modified) p2p system to provide both high performance
+and implementability.
+
+Dr. Karp mainly focused on the "analysis of common component" part while
+Dr. Shenker was paying attention to the new pratical system "Gnutella++"
+in his talk. Unfortunately, I do not find any information about this
+ongoing project online or in any digital library although it seems to be a
+huge and collaborative project. If some people knows something about it, I
+will appreciate your valuable information and comment.
+
+The script for Dr. Shenker's talk :
+
+<Objective> -- Find a uniform approach as the best platform for file sharing.
+
+ Currently P2P traffic is about 80% volume of Internet traffic on campus, but
+
+ the various research proposals and real applications confused us a lot.
+
+<Choice> -- Gnutella++
+
+Right now three major approaches for distributed file sharing are
+presented to us:
+1. Centralized index (Napster): Single point of failure, unscalable.
+2. Decentralized index (Gnutella): ineffective, also unscalable.
+3. Distributed Hash Table (CAN, Chord, Tapestry et al). Excellent logN
+storage -LogN lookup tradeoff.
+  Certainly, centralized approach is not a choice due to its expensive
+effort to improve scalability; To our surprise, Dr. Shenker argue that the
+modified Decentralized index (Gnutella with scalability improvement) is
+better than the academic distributed hash table approach (CAN, Chord et
+al) because of the following reasons:
+(1) DHT (distribute hash table) uses the complex routing protocol and huge
+processing/storage overhead, but Gnutella just adapt to simple flooding
+scheme;
+(2) DHT require fixed storage on each disk, Gnutella only need what the
+user want;
+(3) Gnutella's best-effort semantic is more fit for the desire of human
+being;
+(4) DHT only support the exact match of request file, but Gnutella smart
+local arbitrary search;
+(5) Currently millions of persons installed Gnutella while only dozens of
+CAN or Chord nodes are used in real world.
+  In short, if Gnutella can be refined to be scalable (He call it as
+Gnutella++ ), it will be more useful than the Napster approach or academic
+CAN/Chord approach.
+
+<How to create Gnutella++> -- Use existing schemes to reduce/control/
+
+direct/anticipate workload
+  The following schemes are used to improve the scalability of Gnutella,
+none of them is new thing (all are borrowed from elsewhere).
+(1) Reduce the workload: use TTL-controlled flooding or random walk to
+control TTL;
+(2) Control the workload: give each neighbor a quota of workload that is
+adaptive over time;
+(3) Direct the workload: construct a capacity-sensitive overlay network
+over heterogeneous network, can direct the workload to the super-capacity
+node;
+(4) Anticipate the workload: anticipate how intense the workload will be
+and smartly replicate the data on the different nodes in the networks.
+
+<DHT's usage> -- need to think about new application for CAN/Chord
+approach.
+  Since Gnutella++ is best for the file sharing, so where can we use the
+DHT(CAN/Chord) approach?  Anyway, DHT is inherently scalable, robust,
+easily deploymentable, i.e., DHT is a wonderful academic research
+approach, we should think about something!  Unfortunately, this is still
+on progress.
+
+The script for Dr. R. Karp's talk at UCLA:
+
+    In the effort to combine the different p2p approaches such as CAN,
+Chord, Hamming tree and Pastry, researchers thought one plausible way is
+to summarize the common point and components of various approaches. In Dr.
+Karp's talk, he presented three approaches: CAN, Chord and Hamming Tree.
+Then he try to find the common building blocks. I will list them in the
+following section:
+
+<Structure>
+
+
+1. Base model: Distributed Hashing Table. Each proposal is with some variation.
+--In CAN cell is retangular box while the cell is an interval on a cicle for 
Chord.
+
+In Hamming tree approach, a cell corresponding to a leaf in the B-tree.
+2. Shortcut: In P2P system structured shortcut is used to store the fast
+access to remote destination.
+--In CAN the shorrcut is the "neighboring region" of a box; In Chord, The
+shortcut is the path to 1st, 2nd,4th ,8th 2^nth node; In Hamming tree, the
+shortcut is to the leaf at the same peer.
+
+<Semantic>
+
+
+3. Find Operation: Find the key k with hash address h(k).
+4. Join Operation: choose a random virtual address x and find its location,
+
+then insert the cell containing x into the existing infrastructure.
+5. Departure Operation: reverse operation for join operation.
+
+<Performance Enhancement>
+
+6. Broadcast and summing (aggregate) the information for better management.
+7. Volume Balancing: Use some neighboring-consulting insertion algorithm
+to insert the new cell into an appropriate position.
+8. Load Balancing: Not overthrow all task on one node. Two solutions:
+
+
+(1) Use a centralized load directory to keep the workload of all nodes,
+shift the workload from the heavy-use node into the light-use node;
+
+(2) Hot spot node spreads the workload to the neighboring cells.
+
+9. Fault-tolerant communication: Each cell maintains a certified link to
+its closet active successor /predecessor in case of the failure of
+intermediate node.
+10. Reducing Latency:  A bunch of schemes:
+
+
+(1) use the proximity-sensitive node assignment scheme
+(2) assign several processor to one cell and choose the optimal one.
+(3) Et al. See the CAN/Chord paper.
+
+In summary, His talk tried to summarize the common structure, basic
+operation and performance enhancement of p2p system and applied to the
+design of some uniform platform (maybe it's Gnutella++ in Dr. Shenker's
+talk). Although nothing made you feel revolutionary, we can feel his
+thought is clear when he pursue his aim and their work is done elegantly.
+
+
 
Index: gzz/Documentation/misc/hemppah-progradu/existing_systems_overview
diff -u gzz/Documentation/misc/hemppah-progradu/existing_systems_overview:1.10 
gzz/Documentation/misc/hemppah-progradu/existing_systems_overview:1.11
--- gzz/Documentation/misc/hemppah-progradu/existing_systems_overview:1.10      
Tue Nov 12 06:07:45 2002
+++ gzz/Documentation/misc/hemppah-progradu/existing_systems_overview   Mon Nov 
18 07:31:53 2002
@@ -287,3 +287,21 @@
 
 -Object's popularity: studies have shown that Napster, Gnutella and Web 
 queries follow Zipf-like distributions (1/2, 1/3, 1/4 etc.)
+
+
+5. Security
+
+5.1 The Merkle Hash Tree
+
+The Merkle Hash Tree, invented by Ralph Merkle, is a hash construct that has 
very nice properties for 
+verifying the integrity of files and file subranges in an incremental or 
out-of-order fashion. This 
+document describes a binary serialization format for hash trees that is 
compact and optimized for both 
+sequential and random access. This memo has two goals:
+
+   1. To describe Merkle Hash Trees and how they are used for file integrity 
verification.
+   2. To describe a serialization format for storage and transmission of hash 
trees.
+   
+See http://open-content.net/specs/draft-jchapweske-thex-01.html for details
+
+
+

[Prev in Thread]

Current Thread

[Next in Thread]

[Gzz-commits] gzz/Documentation/misc/hemppah-progradu Misc_No..., Hermanni Hyytiälä, 2002/11/07
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu Misc_No..., Hermanni Hyytiälä, 2002/11/11
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu Misc_No..., Hermanni Hyytiälä <=

Prev by Date: [Gzz-commits] gzz/test/gzz/view boxcellview.test
Next by Date: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu existin...
Previous by thread: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu Misc_No...
Next by thread: [Gzz-commits] gzz ./TODO doc/gzz.css doc/pegboard/.cvsignore ...
Index(es):
- Date
- Thread