[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz-commits] gzz/Documentation/misc/hemppah-progradu Misc_No...
From: |
Hermanni Hyytiälä |
Subject: |
[Gzz-commits] gzz/Documentation/misc/hemppah-progradu Misc_No... |
Date: |
Mon, 18 Nov 2002 07:31:53 -0500 |
CVSROOT: /cvsroot/gzz
Module name: gzz
Changes by: Hermanni Hyytiälä <address@hidden> 02/11/18 07:31:53
Modified files:
Documentation/misc/hemppah-progradu: Misc_Notes
existing_systems_overview
Log message:
UCLA seminars about P2P systems (Shenker and Karp)
CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/Misc_Notes.diff?tr1=1.4&tr2=1.5&r1=text&r2=text
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/existing_systems_overview.diff?tr1=1.10&tr2=1.11&r1=text&r2=text
Patches:
Index: gzz/Documentation/misc/hemppah-progradu/Misc_Notes
diff -u gzz/Documentation/misc/hemppah-progradu/Misc_Notes:1.4
gzz/Documentation/misc/hemppah-progradu/Misc_Notes:1.5
--- gzz/Documentation/misc/hemppah-progradu/Misc_Notes:1.4 Mon Nov 11
08:46:24 2002
+++ gzz/Documentation/misc/hemppah-progradu/Misc_Notes Mon Nov 18 07:31:52 2002
@@ -116,4 +116,146 @@
Comments ?
+3. Analysis of the existing systems
+
+Since P2P concept was invented, a huge collection of academic p2p
+systems had been proposed (CAN, chord, hamming tree, pastry et al) and
+several practical p2p systems had been implemented (napster, Gnutella,
+Freenet et al). So the coexistence of different approaches confuse the
+research society and industry for a long time. They (speakers and other
+researchers at CMU, MIT, industry et al) right now initiate a new project
+to analyze the different approaches and try to combine them into one. The
+main objectives are:
+ 1. Study the common building blocks of p2p system, and analyze the
+difference between different approaches.
+ 2. Create a new(or modified) p2p system to provide both high performance
+and implementability.
+
+Dr. Karp mainly focused on the "analysis of common component" part while
+Dr. Shenker was paying attention to the new pratical system "Gnutella++"
+in his talk. Unfortunately, I do not find any information about this
+ongoing project online or in any digital library although it seems to be a
+huge and collaborative project. If some people knows something about it, I
+will appreciate your valuable information and comment.
+
+The script for Dr. Shenker's talk :
+
+<Objective> -- Find a uniform approach as the best platform for file sharing.
+
+ Currently P2P traffic is about 80% volume of Internet traffic on campus, but
+
+ the various research proposals and real applications confused us a lot.
+
+<Choice> -- Gnutella++
+
+Right now three major approaches for distributed file sharing are
+presented to us:
+1. Centralized index (Napster): Single point of failure, unscalable.
+2. Decentralized index (Gnutella): ineffective, also unscalable.
+3. Distributed Hash Table (CAN, Chord, Tapestry et al). Excellent logN
+storage -LogN lookup tradeoff.
+ Certainly, centralized approach is not a choice due to its expensive
+effort to improve scalability; To our surprise, Dr. Shenker argue that the
+modified Decentralized index (Gnutella with scalability improvement) is
+better than the academic distributed hash table approach (CAN, Chord et
+al) because of the following reasons:
+(1) DHT (distribute hash table) uses the complex routing protocol and huge
+processing/storage overhead, but Gnutella just adapt to simple flooding
+scheme;
+(2) DHT require fixed storage on each disk, Gnutella only need what the
+user want;
+(3) Gnutella's best-effort semantic is more fit for the desire of human
+being;
+(4) DHT only support the exact match of request file, but Gnutella smart
+local arbitrary search;
+(5) Currently millions of persons installed Gnutella while only dozens of
+CAN or Chord nodes are used in real world.
+ In short, if Gnutella can be refined to be scalable (He call it as
+Gnutella++ ), it will be more useful than the Napster approach or academic
+CAN/Chord approach.
+
+<How to create Gnutella++> -- Use existing schemes to reduce/control/
+
+direct/anticipate workload
+ The following schemes are used to improve the scalability of Gnutella,
+none of them is new thing (all are borrowed from elsewhere).
+(1) Reduce the workload: use TTL-controlled flooding or random walk to
+control TTL;
+(2) Control the workload: give each neighbor a quota of workload that is
+adaptive over time;
+(3) Direct the workload: construct a capacity-sensitive overlay network
+over heterogeneous network, can direct the workload to the super-capacity
+node;
+(4) Anticipate the workload: anticipate how intense the workload will be
+and smartly replicate the data on the different nodes in the networks.
+
+<DHT's usage> -- need to think about new application for CAN/Chord
+approach.
+ Since Gnutella++ is best for the file sharing, so where can we use the
+DHT(CAN/Chord) approach? Anyway, DHT is inherently scalable, robust,
+easily deploymentable, i.e., DHT is a wonderful academic research
+approach, we should think about something! Unfortunately, this is still
+on progress.
+
+The script for Dr. R. Karp's talk at UCLA:
+
+ In the effort to combine the different p2p approaches such as CAN,
+Chord, Hamming tree and Pastry, researchers thought one plausible way is
+to summarize the common point and components of various approaches. In Dr.
+Karp's talk, he presented three approaches: CAN, Chord and Hamming Tree.
+Then he try to find the common building blocks. I will list them in the
+following section:
+
+<Structure>
+
+
+1. Base model: Distributed Hashing Table. Each proposal is with some variation.
+--In CAN cell is retangular box while the cell is an interval on a cicle for
Chord.
+
+In Hamming tree approach, a cell corresponding to a leaf in the B-tree.
+2. Shortcut: In P2P system structured shortcut is used to store the fast
+access to remote destination.
+--In CAN the shorrcut is the "neighboring region" of a box; In Chord, The
+shortcut is the path to 1st, 2nd,4th ,8th 2^nth node; In Hamming tree, the
+shortcut is to the leaf at the same peer.
+
+<Semantic>
+
+
+3. Find Operation: Find the key k with hash address h(k).
+4. Join Operation: choose a random virtual address x and find its location,
+
+then insert the cell containing x into the existing infrastructure.
+5. Departure Operation: reverse operation for join operation.
+
+<Performance Enhancement>
+
+6. Broadcast and summing (aggregate) the information for better management.
+7. Volume Balancing: Use some neighboring-consulting insertion algorithm
+to insert the new cell into an appropriate position.
+8. Load Balancing: Not overthrow all task on one node. Two solutions:
+
+
+(1) Use a centralized load directory to keep the workload of all nodes,
+shift the workload from the heavy-use node into the light-use node;
+
+(2) Hot spot node spreads the workload to the neighboring cells.
+
+9. Fault-tolerant communication: Each cell maintains a certified link to
+its closet active successor /predecessor in case of the failure of
+intermediate node.
+10. Reducing Latency: A bunch of schemes:
+
+
+(1) use the proximity-sensitive node assignment scheme
+(2) assign several processor to one cell and choose the optimal one.
+(3) Et al. See the CAN/Chord paper.
+
+In summary, His talk tried to summarize the common structure, basic
+operation and performance enhancement of p2p system and applied to the
+design of some uniform platform (maybe it's Gnutella++ in Dr. Shenker's
+talk). Although nothing made you feel revolutionary, we can feel his
+thought is clear when he pursue his aim and their work is done elegantly.
+
+
Index: gzz/Documentation/misc/hemppah-progradu/existing_systems_overview
diff -u gzz/Documentation/misc/hemppah-progradu/existing_systems_overview:1.10
gzz/Documentation/misc/hemppah-progradu/existing_systems_overview:1.11
--- gzz/Documentation/misc/hemppah-progradu/existing_systems_overview:1.10
Tue Nov 12 06:07:45 2002
+++ gzz/Documentation/misc/hemppah-progradu/existing_systems_overview Mon Nov
18 07:31:53 2002
@@ -287,3 +287,21 @@
-Object's popularity: studies have shown that Napster, Gnutella and Web
queries follow Zipf-like distributions (1/2, 1/3, 1/4 etc.)
+
+
+5. Security
+
+5.1 The Merkle Hash Tree
+
+The Merkle Hash Tree, invented by Ralph Merkle, is a hash construct that has
very nice properties for
+verifying the integrity of files and file subranges in an incremental or
out-of-order fashion. This
+document describes a binary serialization format for hash trees that is
compact and optimized for both
+sequential and random access. This memo has two goals:
+
+ 1. To describe Merkle Hash Trees and how they are used for file integrity
verification.
+ 2. To describe a serialization format for storage and transmission of hash
trees.
+
+See http://open-content.net/specs/draft-jchapweske-thex-01.html for details
+
+
+