[lsd0003] branch master updated: chapter 4

gnunet-svn
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lsd0003] branch master updated: chapter 4

From:	gnunet
Subject:	[lsd0003] branch master updated: chapter 4
Date:	Sat, 12 Jun 2021 20:37:36 +0200
This is an automated email from the git hooks/post-receive script.

grothoff pushed a commit to branch master
in repository lsd0003.

The following commit(s) were added to refs/heads/master by this push:
     new fa6c0d6  chapter 4
fa6c0d6 is described below

commit fa6c0d634c3d16e70eb57c589b3f32d8248e2f20
Author: Christian Grothoff <christian@grothoff.org>
AuthorDate: Sat Jun 12 20:34:52 2021 +0200

    chapter 4
---
 draft-summermatter-set-union.xml | 201 +++++++++++++++++++++------------------
 1 file changed, 110 insertions(+), 91 deletions(-)

diff --git a/draft-summermatter-set-union.xml b/draft-summermatter-set-union.xml
index 8e70da5..d8dbb49 100644
--- a/draft-summermatter-set-union.xml
+++ b/draft-summermatter-set-union.xml
@@ -498,15 +498,18 @@ FUNCTION get_bucket_id (key, k, L)
                     <name>Insert Element</name>
                     <t>
                       To add an element to an IBF, the element is mapped to a 
subset of k buckets using
-                      the injective mapping function M as described in section 
<xref target="ibf_m" format="title" /> section introducing
-                      BFs. For the buckets selected by the mapping function, 
the counter is increased by one and the
-                      IDSUM field is set to the XOR of the element ID and the 
previously stored IDSUM. Furthermore,
-                      the HASHSUM is set to the XOR of the hash of the element 
ID and the previously stored HASHSUM.
+                      the injective mapping function M as described in section 
<xref target="ibf_m" format="title" />. For the buckets selected by the mapping 
function, the counter is increased by one and the
+                      IDSUM field is set to the XOR of the element ID
+                      computed as described in section <xref 
target="ibf_format_id_generation" format="title" />
+                      and the previously stored IDSUM. Furthermore,
+                      the HASHSUM is set to the XOR of the previously stored 
HASHSUM
+                      and the hash of the element ID computed as described
+                      in section <xref target="ibf_format_HASH_calculation" 
format="title" />.
                     </t>
                     <t>
                       In the following example, the insert operation is 
illustrated using an element with the
-                      ID 0x0102 and a hash of 0x4242, and a second element 
with the ID 0x0304 and
-                      a hash of 0x0101.
+                      ID 0x0102 mapped to {1,3} with a hash of 0x4242, and a 
second element with the
+                      ID 0x0304 mapped to {0,1} and a hash of 0x0101.
                     </t>
                     <t>Empty IBF:</t>
                     <figure anchor="figure_ibf_insert_0">
@@ -521,7 +524,7 @@ hashSum |    0x0000   |    0x0000   |    0x0000   |    
0x0000   |
         +-------------+-------------+-------------+-------------+
                  ]]></artwork>
                     </figure>
-                    <t>Insert first element: [0101] with ID 0x0102 and hash 
0x4242:</t>
+                    <t>Insert first element with ID 0x0102 and hash 0x4242 
into {1,3}:</t>
                     <figure anchor="figure_ibf_insert_1">
                         <artwork name="" type="" align="left" alt=""><![CDATA[
             bucket-0     bucket-1       bucket-2      bucket-3
@@ -534,7 +537,7 @@ hashSum |    0x0000   |   0x4242    |    0x0000   |   
0x4242    |
         +-------------+-------------+-------------+-------------+
                  ]]></artwork>
                     </figure>
-                    <t>Insert second element: [1100] with ID 0x0304 and hash 
0101:</t>
+                    <t>Insert second element with ID 0x0304 and hash 0101 into 
{0,1}:</t>
                     <figure anchor="figure_ibf_insert_2">
                         <artwork name="" type="" align="left" alt=""><![CDATA[
             bucket-0     bucket-1       bucket-2      bucket-3
@@ -557,9 +560,9 @@ hashSum |    0x0101   |   0x4343    |    0x0000   |   
0x4242    |
                         HASHSUM is similarly replaced with the XOR of the old 
HASHSUM and the hash of the ID.
                     </t>
                     <t>
-                        In the following example the remove operation for the 
element [1100] with the hash 0x0101 is demonstrated.
+                        In the following example the remove operation is 
illustrated.
                     </t>
-                <t>IBF with encoded elements:</t>
+                <t>IBF with two encoded elements:</t>
                 <figure anchor="figure_ibf_remove_0">
                     <artwork name="" type="" align="left" alt=""><![CDATA[
             bucket-0     bucket-1       bucket-2      bucket-3
@@ -572,7 +575,7 @@ hashSum |   0x0101    |   0x4343    |    0x0000   |   
0x4242    |
         +-------------+-------------+-------------+-------------+
                  ]]></artwork>
                 </figure>
-                    <t>Remove element [1100] with ID 0x0304 and hash 0x0101 
from the IBF:</t>
+                    <t>After removal of element with ID 0x0304 and hash 0x0101 
mapped to {0,1} from the IBF:</t>
                     <figure anchor="figure_ibf_remove_1">
                         <artwork name="" type="" align="left" alt=""><![CDATA[
             bucket-0     bucket-1       bucket-2      bucket-3
@@ -599,9 +602,9 @@ hashSum |    0x0000   |   0x4242    |    0x0000   |   
0x4242    |
                     </t>
                     <t>
                       Buckets in an IBF with a counter of 1 or -1 are crucial 
for decoding an IBF, as
-                      they might represent only a single element, with the 
IDSUM being the ID of that element.
-                      Following Eppstein (CITE), we will call buckets that 
only represent a single
-                      element pure buckets.
+                      they MIGHT represent only a single element, with the 
IDSUM being the ID of that element.
+                      Following Eppstein (FIXME-CITE), we will call buckets 
that only represent a single
+                      element <em>pure buckets</em>.
                       Note that due to the possibility of multiple insertion 
and removal operations
                       affecting the same bucket, not all buckets with a 
counter of 1 or -1 are
                       actually pure buckets.  Sometimes a counter can be 1 or 
-1 because N elements
@@ -611,51 +614,55 @@ hashSum |    0x0000   |   0x4242    |    0x0000   |   
0x4242    |
                 </section>
 
                 <section anchor="ibf_operations_decode" numbered="true" 
toc="default">
-                    <name>Decode IBF</name>
+                    <name>Extracting elements</name>
                     <t>
-                      Decoding an IBF yields the HASH of an element from the 
IBF, or failure.
+                      Extracting elements from an IBF yields IDs of elements 
from the IBF.
+                      Elements are extracted from an IBF by repeatedly 
performing a
+                      decode operation on the IBF.
                     </t>
                     <t>
-                        A decode operation requires a pure bucket, that is a 
bucket to which M
-                        only mapped a single element, to succeed.  Thus, if 
there is no bucket with
-                        a counter of 1 or -1, decoding fails. However, as a 
counter of 1 or -1 is
-                        not a guarantee that the bucket is pure, there is also 
a chance that the
-                        decoder returns an IDSUM value that is actually the 
XOR of several IDSUMs.
-                        This is primarily detected by checking that the 
HASHSUM is the hash of the IDSUM.
-                        Only if the HASHSUM also matches, the bucket could be 
pure.  Additionally,
-                        one MUST check that the IDSUM value actually would be 
mapped by M to
-                        the respective bucket. If not, there was a hash 
collision.
+                      A decode operation requires a pure bucket, that is a 
bucket to which M
+                      only mapped a single element, to succeed.  Thus, if 
there is no bucket with
+                      a counter of 1 or -1, decoding fails. However, as a 
counter of 1 or -1 is
+                      not a guarantee that the bucket is pure, there is also a 
chance that the
+                      decoder returns an IDSUM value that is actually the XOR 
of several IDSUMs.
+                      This is primarily detected by checking that the HASHSUM 
is the hash of the IDSUM.
+                      Only if the HASHSUM also matches, the bucket could be 
pure.  Additionally,
+                      one MUST check that the IDSUM value actually would be 
mapped by M to
+                      the respective bucket. If not, there was a hash 
collision and the bucket
+                      is also not pure.
                     </t>
                     <t>
-                        The very rare case that after all these checks a 
bucket is still
-                        falsely identified as pure MUST be detected (say by 
determining that
-                        extracted element IDs do not match any actual 
elements), and addressed
-                        at a higher level in the protocol. As these failures 
are probabilistic
-                        and depend on element IDs and the IBF construction, 
they can typically
-                        be avoided by retrying with different parameters, such 
as a different
-                        way to assign element IDs to elements, using a larger 
value for L, or
-                        a different mapping function M.
-                        A more common scenario (especially if L was too small) 
is that
-                        IBF decoding fails because there is no pure bucket. In 
this case, the
-                        higher-level protocol SHOULD also retry using 
different parameters.
+                      The very rare case that after all these checks a bucket 
is still
+                      falsely identified as pure MUST be detected (say by 
determining that
+                      extracted element IDs do not match any actual elements), 
and addressed
+                      at a higher level in the protocol. As these failures are 
probabilistic
+                      and depend on element IDs and the IBF construction, they 
can typically
+                      be avoided by retrying with different parameters, such 
as a different
+                      way to assign element IDs to elements (by varying the 
IBF-salt),
+                      using a larger value for L, or a different mapping 
function M.
+                      A more common scenario (especially if L was too small) 
is that
+                      IBF decoding fails because there is no pure bucket. In 
this case, the
+                      higher-level protocol generally MUST also retry using 
different
+                      parameters (except if an attack is detected).
                     </t>
                     <t>
-                        Suppose the IBF contains a pure bucket. In this case, 
the IDSUM in the
-                        bucket identifies a single element.  Furthermore, it 
is then possible
-                        to remove that element from the IBF (by inserting it 
if the counter
-                        was negative, and by removing it if the counter was 
positive). This
-                        is likely to cause other buckets to become pure, 
allowing further
-                        elements to be decoded.  Eventually, decoding ought to 
succeed with
-                        all counters and IDSUM and HASHSUM values reach zero. 
However, it is also
-                        possible that an IBF only partly decodes and then 
decoding fails after
-                        obtaining some elements.
+                      Suppose the IBF contains a pure bucket. In this case, 
the IDSUM in the
+                      bucket is the ID of an element.  Furthermore, it is then 
possible
+                      to remove that element from the IBF (by inserting it if 
the counter
+                      was negative, and by removing it if the counter was 
positive). This
+                      is likely to cause other buckets to become pure, 
allowing further
+                      elements to be decoded.  Eventually, decoding ought to 
finish with
+                      all counters and IDSUM and HASHSUM values reach zero. 
However, it is also
+                      possible that an IBF only partly decodes and then 
decoding fails due
+                      to the lack of pure buckets after extracting some 
element IDs.
                     </t>
                     <t>
-                        In the following example the successful decoding of an 
IBF containing
-                        the two elements previously added in our running 
example.
+                      In the following example the successful decoding of an 
IBF containing
+                      the two elements previously added in our running example.
                     </t>
                     <t>
-                        IBF with the two encoded elements:
+                       We begin with an IBF with two elements added:
                     </t>
                     <figure anchor="figure_ibf_decode_0">
                         <artwork name="" type="" align="left" alt=""><![CDATA[
@@ -670,8 +677,11 @@ hashSum |   0x0101    |   0x4343    |    0x0000   |   
0x4242    |
                  ]]></artwork>
                     </figure>
                     <t>
-                        In the IBF are two pure buckets to decode (bit-1 and 
bit-4) we choose to start with decoding bucket 1,
-                        we decode the element with the hash 1010 and we see 
that there is a new pure bucket created (bit-2)
+                      In the IBF are two pure buckets to decode (buckets 0 and 
3) we choose to start with
+                      decoding bucket 0. This yields the element with the hash 
ID 0x0304 and
+                      hash 1010. This element ID is mapped to buckets
+                      {0,1}.
+                      Subtracting this element results in bucket 1 becoming 
pure:
                     </t>
                     <figure anchor="figure_ibf_decode_1">
                         <artwork name="" type="" align="left" alt=""><![CDATA[
@@ -686,9 +696,10 @@ hashSum |    0x0000   |   0x4242    |    0x0000   |   
0x4242    |
                  ]]></artwork>
                     </figure>
                     <t>
-                        In the IBF only pure buckets are left, we choose to 
continue decoding bucket 2 and decode element
-                        with the hash 0x4242. Now the IBF is empty (all 
buckets have count 0) that means the IBF has been successfully
-                        decoded.
+                        We can now decoding bucket 2 and extract the element
+                        with the ID 0x0102 and hash 0x4242. Now the IBF is
+                        empty. Extraction completes with the status that
+                        the IBF has been successfully decoded.
                     </t>
                     <figure anchor="figure_ibf_decode_2">
                         <artwork name="" type="" align="left" alt=""><![CDATA[
@@ -710,10 +721,12 @@ hashSum |    0x0000   |    0x0000   |    0x0000   |    
0x0000   |
                         Given addition and removal as defined above, it is 
possible to define an operation on IBFs
                         that computes an IBF representing the set difference.  
Suppose IBF1 represents set A, and
                         IBF2 represents set B.  Then this set difference 
operation will compute IBF3 which
-                        represents the set A - B --- without having to 
transfer the elements from set A or B.
+                        represents the set A - B. Note that this computation 
can be done only on the IBFs,
+                        and does not require access to the elements from set A 
or B.
 
                         To calculate the IBF representing this set difference, 
both IBFs MUST have the same
-                        length L, the same number of buckets per element k and 
use the same map M.  Given this,
+                        length L, the same number of buckets per element k and 
use the same map M.
+                        Naturally, all IDs must have been computed using the 
same IBF-salt.  Given this,
                         one can compute the IBF representing the set 
difference by taking the XOR of the IDSUM and HASHSUM values
                         of the respective buckets and subtracting the 
respective counters.  Care MUST be taken
                         to handle overflows and underflows by setting the 
counter to "infinity" as necessary.
@@ -729,7 +742,8 @@ hashSum |    0x0000   |    0x0000   |    0x0000   |    
0x0000   |
                         To demonstrate the set difference operation we compare 
IBF-A with IBF-B and generate as described
                         IBF-AB
                     </t>
-                    <t>IBF-A containing elements with hashes 0x0101 and 
0x4242:</t>
+                    <t>IBF-A contains the elements with ID 0x0304 and hash 
0x0101 mapped to {0,1},
+                       and ID 0x0102 and hash 0x4242 mapped to {1,3}:</t>
                     <figure anchor="figure_ibf_setdiff_A">
                         <artwork name="" type="" align="left" alt=""><![CDATA[
             bucket-0     bucket-1       bucket-2      bucket-3
@@ -742,36 +756,38 @@ hashSum |    0x0101   |   0x4343    |    0x0000   |   
0x4242    |
         +-------------+-------------+-------------+-------------+
                  ]]></artwork>
                     </figure>
-                    <t>IBF-B containing elements with hashes 0x4242 and 
0x5050</t>
+                    <t>IBF-B also contains the element with ID 0x0102 and
+                     and another element with ID 0x1345 and hash 0x5050
+                     mapped to {1,2}.</t>
                     <figure anchor="figure_ibf_setdiff_B">
                         <artwork name="" type="" align="left" alt=""><![CDATA[
             bucket-0     bucket-1       bucket-2      bucket-3
         +-------------+-------------+-------------+-------------+
   count |      0      |      1      |      1      |      1      |
         +-------------+-------------+-------------+-------------+
-  idSum |    0x0000   |    0x0102   |    0x1345   |    0x0102    |
+  idSum |    0x0000   |    0x1447   |    0x1345   |    0x0102   |
         +-------------+-------------+-------------+-------------+
-hashSum |    0x0000   |    0x4242   |    0x5050   |    0x4242   |
+hashSum |    0x0000   |    0x9292   |    0x5050   |    0x4242   |
         +-------------+-------------+-------------+-------------+
                  ]]></artwork>
                     </figure>
-                <t>IBF-AB XOR value and subtract count:</t>
+                <t>IBF-A minus IBF-B is then:</t>
                 <figure anchor="figure_ibf_setdiff_AB">
                     <artwork name="" type="" align="left" alt=""><![CDATA[
             bucket-0     bucket-1       bucket-2      bucket-3
         +-------------+-------------+-------------+-------------+
-  count |      1      |      1      |      -1     |      0      |
+  count |      1      |      0      |      -1     |      0      |
         +-------------+-------------+-------------+-------------+
-  idSum |    0x0304   |    0x0304   |    0x1345   |    0x0000   |
+  idSum |    0x0304   |    0x1049   |    0x1345   |    0x0000   |
         +-------------+-------------+-------------+-------------+
-hashSum |    0x0101   |    0x0101   |    0x5050   |    0x0000   |
+hashSum |    0x0101   |    0x5151   |    0x5050   |    0x0000   |
         +-------------+-------------+-------------+-------------+
                  ]]></artwork>
                 </figure>
                 <t>After calculating and decoding the IBF-AB shows clear that 
in IBF-A the element with the hash 0x5050
-                    is missing (-1 in bit-3) while in IBF-B the element with 
the hash 0101 is missing
-                    (1 in bit-1 and bit-2). The element with hash 0x4242 is 
present in IBF-A and IBF-B and is
-                    removed by the set difference operation (bit-4).
+                    is missing (-1 in bucket 2) while in IBF-B the element 
with the hash 0101 is missing
+                    (1 in bucket 0). The element with hash 0x4242 is present 
in IBF-A and IBF-B and is
+                    removed by the set difference operation.  Bucket 2 is not 
empty.
                 </t>
             </section>
             </section>
@@ -779,12 +795,18 @@ hashSum |    0x0101   |    0x0101   |    0x5050   |    
0x0000   |
             <section anchor="ibf_format" numbered="true" toc="default">
                 <name>Wire format</name>
                 <t>
-                    The counter size transmitted over the wire
-                    varies between 1 and 64 bit, depending on the
-                    maximum counter in the IBF. This variable counter
-                    should cover most areas of application.
+                    For the counter field, we use a variable-size encoding to 
ensure
+                    that even for very large sets the counter should never 
reach
+                    "infinity", while also ensuring that the encoding is 
compact for
+                    small sets.
+                    Hence, the counter size transmitted over the wire
+                    varies between 1 and 64 bits, depending on the
+                    maximum counter in the IBF. A range of 1 to 64 bits
+                    should cover most areas of application and can be
+                    efficiently implemented on most contemporary CPU
+                    architectures and programming languages.
                     The bit length for the transmitted IBF
-                    is defined in the header of the
+                    will be communicated in the header of the
                     <em><xref target="messages_ibf" format="title" /></em> 
message
                     in the "IMCS" field as unsigned 8-bit integer.
                     For implementation details see section <xref 
target="performance_counter_variable_size" format="title" />.
@@ -806,18 +828,14 @@ hashSum |    0x0101   |    0x0101   |    0x5050   |    
0x0000   |
                     For the "HASHSUM", we always use a 32-bit
                     representation.  Here, it is most important to
                     avoid collisions, where different elements are
-                    mapped to the same hash.  However, we note that
-                    by design only a few elements (certainly less than
-                    127) should ever be mapped
-                    to the same bucket, a small number of bits
-                    should suffice.  Furthermore, our protocol is designed
-                    to handle occasional collisions, so while with
-                    32-bits there remains a chance of
-                    accidental collisions, at 32 bit the chance is
-                    generally believed to be sufficiently small
-                    for the protocol to handle those cases efficiently
-                    for a wide range of use-cases.  Smaller hash
-                    values would safe bandwidth, but also drastically
+                    mapped to the same hash, possibly resulting in
+                    a bucket being falsely classified as pure.
+                    While with 32 bits there remains a non-negligible chance of
+                    accidental collisions, our protocol is designed
+                    to handle occasional collisions. Hence, at 32 bit the 
chance is
+                    believed to be sufficiently small enough
+                    for the protocol to handle those cases efficiently.  
Smaller hash
+                    values would safe bandwidth, but also substantially
                     increase the chance of collisions. 32 bits are
                     also again a reasonable size for many CPU
                     architectures.
@@ -833,7 +851,7 @@ hashSum |    0x0101   |    0x0101   |    0x5050   |    
0x0000   |
                     a good value for L.
                 </t>
                 <t>
-                    Basically a Strata Estimator (SE) is a series of IBFs 
(with a rather small value of L)
+                    Basically a Strata Estimator (SE) is a series of IBFs 
(with a rather small value of L=79)
                     in which increasingly large subsets of the full set
                     of elements are added to each IBF.  For the n-th IBF, the 
function selecting the
                     subset of elements MUST sample to select 
(probabilistically) 1/(2^n) of all
@@ -843,18 +861,19 @@ hashSum |    0x0101   |    0x0101   |    0x5050   |    
0x0000   |
                     IBF being statistically expected to contain 1/(2^n) 
elements.
                 </t>
                 <t>
-                    Given two SEs, the set size difference can be estimated by 
trying to decode all of the
-                    IBFs.  Given that L was set to a rather small value, IBFs 
containing large strata
-                    will likely fail to decode.  For those IBFs that failed to 
decode, one simply
+                    Given two SEs, the set size difference can be estimated by 
attempting to decode all of the
+                    IBFs.  Given that L is set to a fixed and rather small 
value, IBFs containing large strata
+                    will likely fail to decode.  For IBFs that failed to 
decode, one simply
                     extrapolates the number of elements by scaling the numbers 
obtained from the
                     other IBFs that did decode.  If none of the IBFs of the SE 
decoded (which given
-                    a reasonable choice of L should be highly unlikely), one 
can retry using a different
-                    mapping function M.
+                    a reasonable number of IBFs in the SE should be highly 
unlikely), one can theoretically
+                    retry using a different IBF-salt.
                 </t>
                 <t>
-                    In addition, when decoding the IBFs in the strata 
estimator, it is possible to determine
+                    When decoding the IBFs in the strata estimator, it is 
possible to determine
                     on which side which part of the difference is. For this 
purpose, the pure buckets with
-                    counter 1 and -1 must be distinguished and assigned to the 
respective side when decoding the IBFs.
+                    counter 1 and -1 must be distinguished and assigned to the 
respective side when decoding
+                    the IBFs.
                 </t>
         </section>
 

-- 
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.
[Prev in Thread]
Current Thread
[Next in Thread]
[lsd0003] branch master updated: chapter 4, gnunet <=
Prev by Date: [lsd0003] branch master updated: chapter 3.1-3.4
Next by Date: [lsd0003] branch master updated: work on sec 5
Previous by thread: [lsd0003] branch master updated: chapter 3.1-3.4
Next by thread: [lsd0003] branch master updated: work on sec 5
Index(es):
- Date
- Thread