gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...


From: Hermanni Hyytiälä
Subject: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...
Date: Thu, 20 Feb 2003 04:27:16 -0500

CVSROOT:        /cvsroot/gzz
Module name:    gzz
Changes by:     Hermanni Hyytiälä <address@hidden>      03/02/20 04:27:15

Modified files:
        Documentation/misc/hemppah-progradu: masterthesis.tex 
                                             progradu.bib 

Log message:
        more text

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/masterthesis.tex.diff?tr1=1.50&tr2=1.51&r1=text&r2=text
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/progradu.bib.diff?tr1=1.76&tr2=1.77&r1=text&r2=text

Patches:
Index: gzz/Documentation/misc/hemppah-progradu/masterthesis.tex
diff -u gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.50 
gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.51
--- gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.50       Wed Feb 
19 09:14:19 2003
+++ gzz/Documentation/misc/hemppah-progradu/masterthesis.tex    Thu Feb 20 
04:27:15 2003
@@ -1,7 +1,7 @@
 %***********************
 %   Käytetään gradu2-tyyliluokkaa
 %***********************
-\documentclass[a4paper,12pt]{gradu2}
+\documentclass[a4paper,12pt, english]{gradu2}
 
 \usepackage[T1]{fontenc}
 \usepackage[english]{babel}
@@ -71,9 +71,9 @@
 footnote:use of the plural is customary even if research paper is authored 
solely
 
 
-\cite{p2pworkinggroup}
-\cite{graham02lecture}
-\cite{winer00whatisp2p}
+definition \cite{p2pworkinggroup}
+definition \cite{graham02lecture}
+definition \cite{winer00whatisp2p}
 
 \chapter{Peer-to-Peer approaches}
 
@@ -101,9 +101,43 @@
 
 \section{Centralized}
 
+-this category: e.g. Napster
+-resources are kept locally
+-centralized index of all available resources in a a system
+-a service request is totally based on centralized index
+-system doe find the service, if it exists
+
+
+\subsection{Formal definition}
+
+-let S be the aggregate of all services s in system (data, service, computing 
power)
+-let P be the aggregate of all peers (providers) p in system (all physical 
entities participating)
+-let SE be the aggregate of all servers se in system (all physical central 
server participating)
+-let I be the aggregate of all index entries ie in system (index of all 
services in system)
+-for each service s 'mathematical belongs to' S, there is a index entry in 
some se 'mathematical belongs to' SE , expressed as 'ie = indexentry(s)'
+-for each service s 'mathematical belongs to' S, there is a provider of the 
service, expressed as 'p = provider(s)'
+%-every p has only one neighbor, named as p_centralized_neigbor, which is P = 
{p 'mathematical belongs to' P: 'mathematical there exists at least one' 
p_centralized_neighbor, which is some se 'mathematical belongs to' SE}
+
 \cite{napsterurl}
 
-\section{Unstructured}
+\section{Loosely structured}
+
+
++own resources are not mapped into the network
++keyword/fuzzy search possible
+-based on FBNt technique, 
++solves some of the Gnutella's scalability issues by introducing ``Super 
nodes'' (a superNode acts like a local hub, building an index 
+of the resources being shared by each node connected to it and proxying lookup 
queries on behalf of other nodes)
++this kind of structure reduces network traffic in comparison to a original 
broadcast query algorithm employed on the Gnutella system
+-not scalable
+-huge network traffic
+-not fast routing 
+-no guarantee that all data will be located
+
+-this category: Gnutella, Freenet, hierarchical Gnutella's (superpeers, 
clusters)
+-a service request is routed randomly to all/specific neighbors
+-system doesn't have *any* knowledge, where service is located (opposite to 
centralized and decentralized but structured) (Gnutellas)
+-system doesn't not necessary find the service, if it exists
 
 \begin{figure}
 \centering
@@ -112,12 +146,39 @@
 \label{fig:gnutella_overlay}
 \end{figure}
 
+
+\cite{yang02comparinghybrid}
+
+\subsection{Formal definition}
+
+-let S be the aggregate of all services s in system (data, service, computing 
power)
+-let P be the aggregate of all peers (providers) p in system (all physical 
entities participating)
+-for each service s 'mathematical belongs to' S, there is a provider of the 
service, expressed as 'p = provider(s)'
+-hierarchical gnutellas: let DI be the aggregate of all decentralized index 
entries die in system (decentralized index of all services in system)
+-hierarchical gnutellas: define super peer, which hosts the indices of other 
peers, as a 'sp = summaryindex(provider(s))'
+%-every p has neighbor(s), named as p_neigbor, which is P = {p 'mathematical 
belongs to' P: 'mathematical there exists at least one' p_neighbor, which is 
'randomly' chosen from p_neighbor 'mathematical belongs to'}
+%-hierarchical gnutellas: for each peer reqular peer, p_regular, there is 
super peer, p_super, P = {p 'mathematical belongs to' P: 'mathematical there 
exists at least one' p_super, where p_super = summaryindex(provider(s)) 
'boolean AND' (p_regular = provider(s))}
+
+
 \subsection{Protocols}
 
-\cite{clarke00freenet}
-\cite{zhang02using}
-\cite{milgram67smallworld}
-\cite{adamic99small}
+
+%-SWNs require O(log^2 n) hops to reach arbitrary destinations, assuming 
(*only and only if* !!!) that 
+links between nodes are constructed in the way that they are uniformly 
distributed over all distances 
+in the network (Kleinberg)
+
+1.5  Social Discovery Systems (SDS)
+Notice: pros and cons are not presented here
+-nodes continually discover new nodes to communicate with and determine which 
properties each node have. 
+-every node has a total control over the connections in the
+-as in real social life, nodes who have returned relevant results in the past, 
will have a high quality value in future query lookups
+-with every lookup query, a node determines how proficient a given node is to 
another node's objectives
+
+
+Freenet \cite{clarke00freenet}
+Improve Freenet performance with small worlds \cite{zhang02using}
+Milgram's small world experiment \cite{milgram67smallworld}
+Small worlds \cite{adamic99small}
 \cite{ramanathan02goodpeers}
 \cite{kleinberg99small}
 \cite{watts00dynamics}
@@ -166,7 +227,62 @@
 
 
 
-\section{Structured}
+\section{Tightly structured}
+
+-service is data block, node/peer is a physical computer
+-*servers* self-organize towards a lookup network
+-DHTs can be thought as a 'structured overlay random graphs'
+-There is a explicit metric space in every DHT. Term 'closeness' differs 
between existing DHTs: it can be XOR/numerical/eucklidean difference between 
identifiers
+-Resource can *be* a resource, or a *pointer* to a resource
+-a service request is routed towards service based on node's local knowledge
+-identifiers are expected to be distributed uniformly
+-DHTs require a knowledge of identifier space's size initially
+-resembles a balanced tree structure
+SWAN and Skip Graphs
+-in this scheme, node = service
+-*key-value pairs* self-organise towards a lookup network
+-There is a explicit metric space. Term 'closeness' differs between existing 
DHTs: it can be XOR/numerical/eucklidean difference between identifiers
+-Resource can *be* a resource, or a *pointer* to a resource
+-a service request is routed towards service based on node's local knowledge
+-services can be hosted locally (opposite to DHTs)
+-identifiers are expected to be distributed uniformly
+-system does find the service, if it exists
+
+
++fast routing (aka searching)
+%+scalable (10^9 users, 10^14 data items)
++robust
++little network traffic
+-own resources are mapped into the network (not necessary!!)
+-keyword/fuzzy search not possible yet
+-routing/query hotspots
+-ASSUME THAT ALL NODES HAVE IDENTICAL CABABILITIES! However, in real life, p2p 
enviroment is extremely heterogeneous!
+
+-the basic idea behind many DHTs is the fact that they perform operations in a 
binary-like tree
+-more space/node --> the search arity is higher --> k is higher in k-ary trees
+-DHTs' performance efficiency is derived from these tree based operations 
(e.g. split to half the previous scope)
+
+Skip graphs
++fast routing (aka searching)
++scalable
++little network traffic (however, more than DHTs)
++support *data* locality, opposite to DHTs where data is spread out evenly, 
destroying locality. In skip graphs, hashing is not requires, only *keys* 
matters (which we already have)
++support for partial repair mechanism/self-stabilization (DHTs lack of repair 
mechanism/self-stabilization)
++the most robust algorithm up to date, tolerance of adversial faults (DHTs 
support only of random faults)
++adaptive: doesn't need to know keyspace size before hand, instead of many 
DHTs, which require a priori knowledge about the size of the system or its 
keyspace
++support for range queries, e.g. natural for version control: 'find latest 
news from yesterday', 'find largest key < news:12/12/2002', 'find all related 
objects nearby'
+(in DHTs this is a open problem)
++There are not hotspots in skip graphs (query/routing hotsports)
+-not network/geographical locality
+-for our purposes, Skip Graps could be used (sensible) only for block IDs, not 
for URNs: in Skip Graps, there have to be total order for data elements --> for 
URNs, there cannot be a total order
+-for range queries, we would have to use local sensitive hashing, *if* we want 
fetch blocks like 'next block = i + 1'. SHA-1 is a (secure) random hash of 
content (it won't help for us in this case)
+
+Peernet
+-Peernet is a p2p based *network layer* for large networks
+-Peernet makes an explicit distinction between node identity and address
+-requires no manual configuration (plug-and-play)
+-can support wireless or wireline infrastructure
+-log(n) routing table, log(n) search  
 
 \cite{aspnes02faultrouting}
 \cite{ratnasamy02ght}
@@ -177,6 +293,19 @@
 
 \subsection{Protocols}
 
+Measures:
+
+degree:
+
+hop count: 
+
+fault-tolerance
+
+maintenance overhead
+
+load balance
+
+
 \cite{zhao01tapestry}
 
 \cite{rowston01pastry}
@@ -223,6 +352,21 @@
 \cite{debruijn46graph}
 
 
+\subsection{Formal definition}
+
+-let S be the aggregate of all services s in system (data, service, computing 
power)
+-let P be the aggregate of all peers (providers) p in system (all physical 
entities participating)
+-let I be the aggregate of all identifiers i in system (All possible unique 
identifiers, based on e.g. SHA-1)
+-let IS be the aggregate of all identifier points ip in system (entity, where 
'closeness' of services are calculated, e.g. XOR/numerical metrics, based on 
identifiers)
+-for each service s 'mathematical belongs to' S, there is a provider of the 
service, expressed as 'p = provider(s)'
+-service's identifier is defined as 'i = identifier(s)' (in our case, 
SHA-1(content of data block))
+-metric space is defined as a pair '(IS,d)', where d is the distance between 
two coordinate points ip in IS space
+-mapping function is defined as 'map: I -> IS', and coordinate point as 'ip = 
map(identifier(s))', which maps service, expressed by a identifier to 
coordinate point ip in '(IS,d)'
+%-In DHT, peer's p resources are mapped onto a set IS = {ip 'mathematical 
belongs to' IS: 'mathematical there exists at least one' s 'mathematical 
belongs to' S, ip = map(identifier(s)) 'boolean AND' (provider(s) = p)}, which 
means
+that resources that a peer provides into the system, are not kept locally. 
This is a important feature of DHTs (to be specific, feature of 'map: I -> 
IS')! In SWAN and Skip Graphs, resources are can be kept locally, if wanted!
+%-every p has neighbor(s), named as p_neighbor, which are P = {p 'mathematical 
belongs to' P: 'mathematical there exists at least one' p_neighbor, where 
'difference(p,p_neighbor)= 'close'', where 'close' is minimal difference d in 
'(IS,d'}
+
+
 
 
 \subsection{Distributed Hash Table (DHT)}
@@ -237,6 +381,9 @@
 
 \cite{rowstron01storage}
 
+-CFS splits files into blocks (<50Kb), PAST distributed whole files
+
+
 \subsection{Decentralized Object Location and Routing Networks (DOLR)}
 
 \cite{kubiatowicz00oceanstore}
@@ -275,10 +422,11 @@
 
 \scriptsize
 \begin{longtable}{|l|c|c|c|c|l|}
-\caption[Different Peer-to-Peer lookup protocols]{Different Peer-to-Peer 
lookup protocols} 
-\label{table_Peer-to-Peer_protocols} \\
 
-\hline
+\caption[Different Peer-to-Peer lookup protocols]{Different Peer-to-Peer 
lookup protocols} 
+\label{table_Peer-to-Peer_protocols}
+ 
+\\ \hline
 \multicolumn{1}{|c|}{\textbf{Protocol}} &
 \multicolumn{1}{c|}{\textbf{Insert/Delete}} & 
 \multicolumn{1}{c|}{\textbf{Space}} & 
@@ -453,9 +601,9 @@
 \parbox{85pt}{System's performance may decrease if nodes are not homogeneous 
and nodes join and leave the system in a dynamic manner, not necessarily 
fault-tolerant because of constant degree of neighbors}
 \\ \hline
 
-
+ 
 \end{longtable}
-
+\normalsize
 Insert/Delete: 
 Number of messages when a node joins or leaves the network.
 
@@ -470,10 +618,12 @@
 
 \scriptsize
 \begin{longtable}{|l|l|l|}
+
 \caption[Comparison of Broadicasting and Structured approaches]{Comparison of 
Broadicasting and Structured approaches} 
-\label{table_comparison_approach} \\
+\label{table_comparison_approach} 
 
-\hline 
+
+\\ \hline 
 \multicolumn{1}{|c|}{\textbf{Property}} &
 \multicolumn{1}{c|}{\textbf{Unstructured}} & 
 \multicolumn{1}{c|}{\textbf{Structured}}  
@@ -554,8 +704,9 @@
 \parbox{100pt}{High}
 \\ \hline
 
-\end{longtable}
 
+\end{longtable}
+\normalsize
 
 
 
@@ -564,156 +715,383 @@
 
 \section{Security problems in Peer-to-Peer}
 
-\section{Miscellaneous problems in Peer-to-Peer}
 
-\section{Performance and usability problems in Peer-to-Peer}
+-Could we use SDSI/SPKI in our system (it's hierarchical), like in ConChord 
\cite{ajmani02conchord}
+-is there any other implementations of SDSI/SPKI-like systems ?
+-SDSI/SPKI is not optimal for us, but somewhat working solution
+-in our model: trust = trust no one
+-give a brief explanation of current techiques in accountability and trust
+
+%Tuomas' example scenario #1(in finnish):
+
+jos joku sanoo, ett? urn-5 foo on blokeissa X
+ja j?tt?? blokit Y pois
+miten systeemi selviytyy ?
+eli kuinka pahasti pystyy mit?kin spooffaamaan, mill? todenn?k?isyyksill?
+tai jos vaan ei haluta antaa joillekin koneille jotakin blokkia
+niin miten helppoa on organisoida sellainen hy?kk?ys
+tai halutaan hukata verkosta tietyt blokit
+floodaamalla indeksi
+mutta sitten poistamalla ne serverit
+
+Solutions:
+1) periodically (cross) checking the consistency of routing tables with 
randomly chosen other nodes
+
+%Tuomas' example scenario #2:
+'What will happen if, jyu's network connection is broken ? Inside jyu network, 
how do we/system will self-organise ?'
+
+Solution: 
+-SkipNet is able to self-organise after a organization has been disconnected 
from the network
+-However, SkipNet uses Pastry as routing layer. Kademlia/Symphony would be 
better choice
+-network partitions: techniques for maintaining DHT structure in real and 
dynamic environment \cite{rowston03controlloingreliability}
+
+%Tuomas' example scenario #3:
+'What will happen if a malicious node quantities large amounts of data in the 
system ?'
+
+Solutions:
+1) centralized: Per node based quotas based on reliable identification of 
publishers (usually requires CAs to work correctly)
+2) decentralized: Per node based quotas based on the ip address of the node
+
+It seems to be that DHTs are unable to self-reorganise rapidly after a sudden 
partitioning, because
+a) every node has a fixed space in identifier space
+b) system has a fixed identifier space, e.g. Pastry and Chord require a priori 
knowledge about the size of the system
+c) even if system is able self-reorganise, the key space is not uniformly 
distributed anymore -> all nodes have to leave/rejoin --> lot of traffic
+
+To ensure data availability:
+System myst maintain the desired level of replcation as nodes join and leave 
the system in order to. If nodes often join/leave the system
+for a short time before leaving (i.e. short 'half-life'), replicas will be 
wasted. Therefore, system should have a support foe lazy replica
+copying
+
+
+Network proximity: 
+-\cite{pias03lighthouse}, \cite{ng02predicting}
+
+Efficient searching:
+-result caching and view trees \cite{Bhattacharjee03resultcache} (insight: 
store and and retrieve prior results from view tree)
+-bloom filters \cite{362692}
+
+Better load balancing techiques:
+-Better and simple load balancing \cite{byers03dhtbalancing} (insight: "power 
of two choices" whereby an item is stored at the less loaded of two (or more) 
random alternatives)
+
+%One question is: how well SWAN can self-organise, e.g. in the case of Tuomas' 
scanario #2 ?
+
+Half-life phenomenon \cite{libennowell01observations}
+%Current decentralized, but structured \cite{chord, can, pastry, Tapestry 
etc.) ignores the fact 
+that a P2P system is never in 'ideal' state. P2P system is always evolving 
system.
+
+-techniques for maintaining DHT structure in real and dynamic environment 
\cite{rowston03controlloingreliability}
+-this is not problem any more for structured p2p overlay networks
+-results can be directly applied to other DHTs
+
+Half-life concept:
+'Let there be N live nodes at time t. The doubling from time t is the time 
that pass before
+N new additional nodes arrive into the system. The halving time from time t is 
the time
+requires for half of the living nodes at time t to leave the system. The 
half-life from 
+time t is smaller of the properties stated above. The half-life of the entire 
system is the 
+minimum half-life overl all times t.'
+
+Heterogeneity
+%Current decentralized, but structured \cite{chord, can, pastry, Tapestry 
etc.) designs assume
+that all participating peers are equal, in terms of processing power, memory 
and network bandwidth.
+However, recent measurement study \cite{saroiu02measurementstudyp2p} exhibits 
the fact that in p2p networks, 
+there is a great amount of heterogeneity among participating peers.
+
+Proposal, Brocade: Landmark routing on overlay networks \cite{zhao02brocade}
+-Secondary overlay layer on top of routing layer, which exploits knowledge of
+underlying network properties
+
+
+Security issues related to p2p, create by Ross Lee Graham:
+http://www.ida.liu.se/~rosgr/p2psecurity.html
+
+Symphony seems to be the first DHT system which support hetergeneity 
\cite{gurmeet03symphony}
+
+Notes from from \cite{hearn02mojonation}
+Open problems in p2p data-sharing:
+1) Attack resistance
+2) malicious nodes
+3) mutual distrust
+4) motivation to cooperate
+
+Other:
+a) Frequent join / leave caused poor data availability; discriminate againts 
newly joined zones
+b) Splitting data into a set of shares decreased data robustness
+
+
+Decentralized, but structured
+a) Censorship Resistant Peer-to-Peer Content Addressable Networks 
\cite{fiat02censorship},
+-system is resilient to adversial and controlled attacks
+-however, they assume that number of deleted peers is constant
+-not effiecient methods for maintaining dynamic netoworks 
+b) Dynamically Fault-Tolerant Content Addressable Networks 
\cite{saia02dynamicfaultcontentnetwork}
+-system is resilient to adversial and controlled attacks (partial support for 
dynamic deletions, see below)
+-however, still assume a constant number of participating peers
+-not effiecient methods for maintaining dynamic netoworks
+c) Butterflies and Peer-to-Peer Networks \cite{datar02butterflie}
+-system is resilient to adversial and controlled attacks
+-support for dynamic deletions and dynamic number of participants
+-not effiecient methods for maintaining dynamic netoworks
+
+Open problems, which remain to be addressed for fault tolerant decentralized, 
but structured strategies
+
+a) Is it possible to and efficient and dynamic fault tolerant decentralized, 
but structured system, which
+allows e.g. multiple rounds of adversary attack ?
+b) Could multi-butterflier be used in and efficient manner to construct a span 
resistant network ?
+c) Are there lower bounds for average degree of nodes, query path length etc. 
for a network that is
+fault tolerant to linear number of adversial faults ?
 
-\cite{harren02complex}
+ 
+5.6. Attack models/behavior of faulty nodes
 
-\cite{ratnasamy02routing}
+1) Sybil attack \cite{douceur02sybil} 
+2) Fail-stop
+3) Spam generating model \cite{naor03simpledht}
+4) Byzantine problem \cite{357176}, p2p domain \cite{296824}
 
-\cite{hildrum02distributedobject}
+Solutions for Sybil Attack: 
+1) data replication among several peers
+2) data fragmentation among several peer
 
-\cite{oram01harnessingpower}
+BUT: 
+-in either case, both approaches assumes that two different remote entities 
are actually different; sybil attacks are still possible --> need for 
centralized authority
+-in p2p environment, trusting to collective assurance of multiple signatories 
(like PGP) is not safe/undermines the authenticity of system (because of sybil 
attacks)
+-\cite{douceur02sybil} argues that Sybil attacks are always possible except 
under extreme and unrealistic assumptions of resource parity and coordination 
among entities
 
-\cite{sloppy:iptps03}
+Current research with regard to p2p security:
+-lot of done has been done on persistence
+-little has been done on distinctness (sybil attack)
+-computational puzzles for preventing DDOS attacks (force attacker perform 
more work than victim)
+-puzzles can be used for accountability (Dingeline, in Peer-to-Peer: 
Harnessing...), but can dangerous
+-some research has been done on on-line identities for humans. However, they 
often has a direct relation to phychical world
 
-\cite{yang02improvingsearch}
 
-\cite{daswani03openproblems}
+5.3. General security issues related to decentralized, but structured 
strategies
 
-\cite{sit02securitycons}
+Secure routing requires \cite{castro02securerouting}:
+1) a secure assignment of node identifiers
+2) secure routing table maintenance
+3) secure message forwarding
 
-\cite{lv02searchreplication}
+General security considerations \cite{sit02securitycons}
+1) Define verifiable system invariants (and verify them!)
+2) Allow querier to observe lookup progress
+3) Assign keys to nodes in a verifiable way
+4) Server selection in routing may be abused
+5) Cross-check routing tables using random queries
+6) Avoid single points of respinsability
 
-\cite{libennowell01observations}
+5.2. Open Problems in p2p data-sharing \cite{daswani03openproblems}
 
-\cite{krishnamurthy01earlymeasurements}
+Search
+a) Query rigidity
 
-\cite{golle01incentivesp2p}
+b) Autonomy, efficiency and Robustness
 
-\cite{karger02findingnearest}
+c) Quality of Service, QoS
 
-\cite{cornelli02reputableservents}
+Security
+a) Availability (Gzz: priority 3)
 
-\cite{aberer01trust}
+b) Data Authenticity (Gzz: priority 1)
 
-\cite{adamic02localsearch}
+c) Anonymity (Gzz: priority 4)
 
-\cite{adamic01powerlawsearch}
+d) Access Control (Gzz: priority 2)
 
-\cite{saroiu02measurementstudyp2p}
 
-\cite{ripeanu02mappinggnutella}
+\cite{daswani03openproblems}
 
-\cite{kronfol02fasdsearch}
+\cite{sit02securitycons}
 
-\cite{brinkmann02compactplacement}
+\cite{aberer01trust}
 
 \cite{ajmani02conchord}
 
-\cite{362692}
-
-\cite{CuencaAcuna2002DSIWorkshop}
-
 \cite{reiter98crowds}
 
-\cite{352607}
-
-\cite{293447}
-
 \cite{tarzan:ccs9}
 
 \cite{pub00}
 
-\cite{rhea02probabilistic}
+\cite{grahamp2psecurity}
 
-\cite{502002}
+\cite{nejdl03accesscontrol}
 
 %dup
 \cite{castro02securitystructured}
 \cite{castro02securerouting}
 
-\cite{zhao02brocade}
-
 \cite{datar02butterflies}
 
-\cite{crespo02semanticoverlay}
+\cite{fiat02censorship}
 
-\cite{lv02gnutellascalable}
+\cite{juels99clientpuzzles}
 
-\cite{keleher-02-p2p}
+Anonymoys \cite{352607}
 
-\cite{saia02dynamicfaultcontentnetwork}
+Anonymoys \cite{293447}
 
-\cite{yang02efficientsearch}
+Censorship \cite{502002}
 
-%dup
-\cite{liben-nowell02observatorionsp2p}
-\cite{571863}
+\cite{douceur02sybil}
+
+\cite{saia02dynamicfaultcontentnetwork}
 
 \cite{lynch02atomicdataaccess}
 
-\cite{yang02comparinghybrid}
 
-\cite{ledlie02selfp2p}
+\section{Performance and usability problems in Peer-to-Peer}
 
-\cite{frise02p2pframework}
+
+
+1) Which one is more important: short path length or overhead associated with 
keeping routing tables updated, e.g. number of state updates whenever 
join/leave occurs
+ (number of neighbors)
+2) Are we able to achieve reasonably pathlenghts with less neigbors (Viceroy) ?
+3) How big is the difference between optimal path length and worst case path 
length ?
+4) How difficult is to recover from total routing mislead and the cost of it ?
+5) Can we choose better neighbors by using network latencies instead of 
closeness of IDs in the ID space ? What are the effects doing so ?
+6) Can we choose IDs (globally) based on the geographical location/distance ? 
Is there a working model for doing so ?
+7) How do we should work with node heterogeneity; how big changes have to be 
made to existing algorithms for better support to heterogeneity ?
+
+Principles on scalable search in decentralized, and unstructured networks 
\cite{lv02searchreplication}:
+1) system must support adaptive termination 
+2) message duplication should be minimized
+3) each additional step during search should not significantly increase the 
number of nodes visited
+
+
+\cite{harren02complex}
+
+\cite{ratnasamy02routing}
+
+\cite{hildrum02distributedobject}
+
+\cite{sloppy:iptps03}
+
+\cite{yang02improvingsearch}
+
+\cite{lv02searchreplication}
+
+\cite{libennowell01observations}
+
+\cite{karger02findingnearest}
+
+\cite{adamic02localsearch}
+
+\cite{adamic01powerlawsearch}
+
+\cite{ripeanu02mappinggnutella}
+
+\cite{brinkmann02compactplacement}
 
 \cite{joseph02p2players}
 
-\cite{andrzejak02rangequeries}
+\cite{lv02gnutellascalable}
 
-\cite{babaoglu02anthill}
+\cite{rao03loadbalancing}
 
-\cite{fiat02censorship}
+\cite{zhang03somo}
 
-\cite{hearn02mojonation}
+\cite{bhagwan03availability}
 
-\cite{osokine02distnetworks}
+\cite{li03feasibility}
 
-\cite{harvey03skipnet1}
+\cite{zhao02brocade}
+
+\cite{crespo02semanticoverlay}
+
+\cite{rhea02probabilistic}
 
 \cite{ansaryefficientbroadcast03}
 
-\cite{ng02predicting}
+Bloom filters \cite{362692}
 
-\cite{douceur02sybil}
+\cite{CuencaAcuna2002DSIWorkshop}
+
+Locality \cite{keleher-02-p2p}
+
+Practical Byzantine fault tolerance \cite{296824}
+Byzantine Generals \cite{357176}
 
 \cite{castro02networkproximity}
 
-\cite{296824}
+\cite{yang02efficientsearch}
 
-\cite{juels99clientpuzzles}
-\cite{357176}
+%dup
+\cite{liben-nowell02observatorionsp2p}
+\cite{571863}
 
-\cite{grahamp2psecurity}
+\cite{ledlie02selfp2p}
 
-\cite{zhao03api}
+\cite{andrzejak02rangequeries}
 
-\cite{nejdl03accesscontrol}
+\cite{hearn02mojonation}
 
-\cite{bhagwan03availability}
+\cite{osokine02distnetworks}
 
-\cite{li03feasibility}
+\cite{harvey03skipnet1}
 
-\cite{zhang03somo}
+\cite{ng02predicting}
 
-\cite{rao03loadbalancing}
+\cite{chord:om_p-meng}
+
+
+\section{Miscellaneous problems in Peer-to-Peer}
+
+Research on *usage patterns* haven't been done. However, there have been some 
research
+focusing on file popularity in in www and p2p networks: file popularity 
follows the zipf/harmonical distribution)
+
+\cite{kronfol02fasdsearch}
+
+\cite{oram01harnessingpower}
+
+\cite{krishnamurthy01earlymeasurements}
+
+\cite{golle01incentivesp2p}
+
+\cite{cornelli02reputableservents}
+
+\cite{saroiu02measurementstudyp2p}
 
 \cite{rhea03benchmarks}
 
-\cite{chord:om_p-meng}
+\cite{zhao03api}
+
+\cite{frise02p2pframework}
+
+\cite{babaoglu02anthill}
+
+
 
 \section{Summary}
 
+Main problems in different approaches (own conclusions, based on research 
efforts):
+1) Decentralized, but structured
+-make routing/system more flexible againts adversial attacks
+       -e.g. -there is a single point of routing failure in these approaches 
(h hops, little amount f of system are adversial) 
+-make searching/query model more flexible (keyword searching)
+       -use keywords/range searches instead of exact keys
+       -proposal for range searches: Scalable, Efficient Range Queries for 
Grid Information Services \cite{andrzejak02rangequeries}
+       -proposal for broadcast operation in DHT-based systems 
\cite{ansaryefficientbroadcast03} (insight: k-ary tree is a spanning tree, 
traverse spanning tree)
+       -proposal for complex queries in DHTs \cite{harren02complex} (insight: 
use SQL-like mechanisms)
+
+2) Decentralized, but unstructured
+-make routing more scalable: reach more nodes, create less traffic
+-remember mention: when people talk about Gnutella's scalability issues
+they jumble apples and oranges: *Gnutella* itself is scalable, but query model 
+is not scalable, since searching creates a lot of traffic!
+
+3) Centralized
+-no recent research efforts/no interests
+-not suitable for real p2p, as Napster case showed
+
+
 \scriptsize
 \begin{longtable}{|l|l|l|l|}
-\caption[Security problems in Peer-to-Peer]{Security problems in Peer-to-Peer} 
\label{table_security_problems_Peer-to-Peer} \\
 
+\caption[Security problems in Peer-to-Peer]{Security problems in Peer-to-Peer} 
\label{table_security_problems_Peer-to-Peer}
 
 
-\hline 
+\\ \hline 
 \multicolumn{1}{|c|}{\textbf{Problem}} & 
 \multicolumn{1}{c|}{\textbf{Problem description}} & 
 \multicolumn{1}{c|}{\textbf{Solutions}} &
@@ -826,11 +1204,11 @@
 
 
 \end{longtable}
-
+\normalsize
                
                
 
-
+\scriptsize
 \begin{longtable}{|l|l|l|l|}
 \caption[Performance and usability problems in Peer-to-Peer]{Performance and 
usability problems in Peer-to-Peer} 
\label{table_performanceusability_problems_Peer-to-Peer} \\
 
@@ -854,10 +1232,10 @@
 
 \endfoot
                
-\parbox{90pt}{Searching} &
-\parbox{110pt}{Efficient searching requires exploiting of previous queries} &
-\parbox{110pt}{View trees, bloom filters and its variations} &
-\parbox{110pt}{Effective, but not flexible}
+\parbox{90pt}{Web indexing and searching} &
+\parbox{110pt}{Perform Web like searches in Peer-to-Peer network} &
+\parbox{110pt}{Data compression, view trees, bloom filters and its variations, 
gap compression, index intersection optimizations, clustering} &
+\parbox{110pt}{Effective but complex solutions, some compromises have to be 
done (decrease result quality, modify overlay's structure), more research 
needed}
 \\ \hline
 
 
@@ -937,10 +1315,10 @@
 
 
 \end{longtable}
+\normalsize
 
 
-
-
+\scriptsize
 \begin{longtable}{|l|l|l|l|}
 \caption[Miscellaneous problems in Peer-to-Peer]{Miscellaneous problems in 
Peer-to-Peer} \label{table_Miscellaneous_problems_Peer-to-Peer} \\
 
@@ -1034,20 +1412,7 @@
 
 
 \end{longtable}
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+\normalsize
 
 \begin{figure}
 \centering
@@ -1074,6 +1439,29 @@
 
 \section{Xanalogical model}
 
+4.1. Xanalocical model (some parts from our article, modify)
+Xanalocical model [cite] is a simple, but rather different model for 
representing content. 
+Project Xanadu [cite] has been a pioneer implementing xanalogical model. In 
xanalocical model
+links are not between documents, but individual characters.
+For instance, when a character is first typed in xanalogical, it acquires a 
permanent ID
+("the character 'D' typed by Janne Kujala on 10/8/97 8:37:18"),
+which it retains when copied to a different document, distinguishing
+it from all similar characters typed in independently.
+A link is shown between any two documents containing the characters
+that the link connects. Xanalogical links are external and bidirectional.
+
+In addition to content links, xanalogical storage keeps an index of
+transclusions: identical characters copied into different documents.
+Through this mechanism, the system can show to the user all documents
+that share text with the current document.
+
+    * Enfilade: a list of fluid media contents, a "virtual file" or part of one
+    * Transclusion: the inclusion in an enfilade of contents already used in 
another enfilade
+    * Xanadu link: an association of two enfilades, such as an annotation to a 
part of a document
+    * pdf files are stored in image blocks
+    * Documents can be created by listing new content or transcluding existing 
content in an enfilade.
+    * Annotations can be entered as new content and Xanadu-linked to a part of 
document.
+
 \cite{nelson99xanalogicalneeded}
 
 \section{Storm}
@@ -1087,14 +1475,110 @@
 
 \chapter{Evaluation of Peer-to-Peer for Gzz}
 
+
+-BitTorrent is not a lookup system-- it's a tool for *distributing* data
+-MIT License, written in Python
+-resembles Overnet's/ed2k's MFTP
+-supports resumed downloads
+-supports upload-download ratios (if this is a built in feature, this might be 
a problem for us (if used as out-of-box))
+-however, upload-download ratios reduces DoS attacks: one cannot publish a 
large amount hokum data in the network if ratios are used
+-reduces query hotspots (DHTs), however routing hotspots remain
+-when more than one person is downloading from a server/node/computer, 
BitTorrent sends blocks of the files to downloaders
+-this reserves the server's/node's/computer's bandwidth
+-supports simultaenous downloads
+-since each new downloader introduces a new upload capacity, eventually the 
upload overhead for the original server/node/computer bandwidth is reasonable 
small
+-uses SHA-1 for checking data integrity (suits for us very well!)
+-we don't have to create 'mini-blocks' for Gzz p2p, since bitTorrent itself 
partitions data into several blocks for us
+
 \section{Motivation}
 
 \section{Objectives}
 
+2.1  "Searching for a specific Storm block"
+
+2.1.1 Facts
+-specific Storm block can be identified with an ID, generated by SHA-1 
algorithm
+-block ID is globally *unique*, e.g. "Front page of New York Times newspaper 
on 10.10.2002"
+-each Storm node can host multiple blocks
+-in my thesis, we can assume that there is working PKI
+
+2.1.2. Objectives
+-if block exists in the network, algorithm *will* find it
+-algorithm will find the specific Storm block as quicly as possible (and 
return it to the user) based
+on the the block's ID
+-simple pseudo thinking: "based on block ID, find the specific block fast and 
return it."
+
+2.2. "Searching for most recent Storm block associated with specific urn-5 
name, where
+     the block has been signed with a given key"
+
+2.2.1. Facts     
+-urn-5 is random "unique keyword", e.g. "Front page of New York Times 
newspaper"
+-urn-5 can't be udated, but we can associate a new block with the urn-5 name
+-urn-5 name is saved in the header of Storm block (|block urn-5: "Front page 
of New York Times newspaper"|)
+-urn-5 names can be created before Storm blocks
+-key signings are saved in separate Storm blocks
+-finding key blocks for data block can be performed locally ("fast enough")
+-node has an internal index for key blocks and associated data blocks
+-for every urn-5 name, there is zero, one or more Storm blocks associated with 
it
+%-in every block's header, there is a timestamp for date&time
+
+2.2.2. Objectives
+-Find the specific (and the most recent) Storm block as quicly as possible 
(and return it to the user) based on the given urn-5 with a block's ID
+-simple pseudo thinking: "find all Storm blocks from the network, which uses 
specific urn-5 name. Compare blocks and 
+return the most recent block, if the signing key is "valid"."
+
+2.3.1. Facts
+-same as for 2.2.1.
+
+2.3.2. Objectives
+-Find the specific (most recent) Storm block as quicly as possible (and return 
it to the user) based on the given urn-5 with a block's ID
+*and* a given date (range ?)
+-simple pseudo thinking: "find all Storm blocks from the network, which uses 
specific urn-5 name. Compare blocks and 
+return the most recent block, if the signing key and block's timestamp are 
"valid"."
+
+
 \section{Benefits over existing Peer-to-Peer file sharing systems}
 
+- Easy syncing:
+  - Just copy a bunch of blocks
+  %- Documents can be synced & merged
+  %- Inter-document structures can be synced & merged
+  - Syncing can be done without merging immediately,
+    leaving two alternative versions current
+    (so e.g. an automated process is entirely possible,
+    even when there are conflicts)
+- Versioning
+
+From Benja's (plus antont and me) article:
+- Reliability (old versions, links work always, accessibility, 
append-and-delete)
+- Usability in the face of intermittent connectivity 
+  (includes syncing, finding a document if available...)  
+- Xanalogical structure 
+  (includes versioning, non-breaking links etc.)
+
+-current p2p systems don't support all of these properties together
+
 \section{Special needs}
 
+Gzz's requirements
+-append-and-delete (when a modification has been commited, a new version must 
be created)
+-the same data block can be stored to many different locations (can we use 
'How share a secret ?', like in Publius and Tangler ?)
+-old version(s) must remain accessible in the network
+-must support xanalogical structure
+-links must work always
+-*selective* replication/synking
+-do we have a support for transcopyright ?
+
+P2P network requirements
+-system must find the service, if it exists in the network
+-following properties: scalable, efficient, adaptive, robust, self-organising
+-is possible, it would be benefitial if p2p system representing named 
resources as keys 
+-p2p system must support the use of block IDs (SHA-1) and urn-5s (random 
string) when obtaining data from the network
+-one or more blocks are associated with a specific urn-5
+-support for location independent routing (urn-5 -> block(s))
+-data block can represent a link, document/part of document, transclusion
+-must perform well under DDoS attacks
+
 \cite{bittorrenturl}
 \cite{maymounkov03ratelesscodes}
 
@@ -1104,7 +1588,16 @@
 
 \cite{gribble01p2pdatabase}
 
-\chapter{Conclusion} 
+\chapter{Conclusion}
+
+-Currently, DHT is the best alternative for Gzz P2P, because:
+       +blocks have unique IDs, they can be used as keys in DHT, it's a 
natural choice
+       +DHT will find a resource, if it exists in a network
+       +for a specific urn-5, we could find all associated block information 
from a single peer
+       -for every urn-5, there is one permanent ID
+       -for every version of block, there is one permanent ID
+       -for every version of xu link, there is one permanent ID (does 
transclusion have different versions ?)
+       -for every version xu transclusion, there is one permanent ID (does 
links have different versions ?) 
 
 
 \bibliographystyle{gradu}
Index: gzz/Documentation/misc/hemppah-progradu/progradu.bib
diff -u gzz/Documentation/misc/hemppah-progradu/progradu.bib:1.76 
gzz/Documentation/misc/hemppah-progradu/progradu.bib:1.77
--- gzz/Documentation/misc/hemppah-progradu/progradu.bib:1.76   Wed Feb 19 
09:14:19 2003
+++ gzz/Documentation/misc/hemppah-progradu/progradu.bib        Thu Feb 20 
04:27:15 2003
@@ -916,6 +916,7 @@
 %definition of p2p
 @misc{p2pworkinggroup,
        key = {Peer-to-Peer Working Group},
+       author = {Ross Lee Graham},
        title = {Peer-to-Peer Working Group},
        howpublished = {www.p2pwg.org}
 }




reply via email to

[Prev in Thread] Current Thread [Next in Thread]