[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lsd0003] branch master updated: chapter 4
From: |
gnunet |
Subject: |
[lsd0003] branch master updated: chapter 4 |
Date: |
Sat, 12 Jun 2021 20:37:36 +0200 |
This is an automated email from the git hooks/post-receive script.
grothoff pushed a commit to branch master
in repository lsd0003.
The following commit(s) were added to refs/heads/master by this push:
new fa6c0d6 chapter 4
fa6c0d6 is described below
commit fa6c0d634c3d16e70eb57c589b3f32d8248e2f20
Author: Christian Grothoff <christian@grothoff.org>
AuthorDate: Sat Jun 12 20:34:52 2021 +0200
chapter 4
---
draft-summermatter-set-union.xml | 201 +++++++++++++++++++++------------------
1 file changed, 110 insertions(+), 91 deletions(-)
diff --git a/draft-summermatter-set-union.xml b/draft-summermatter-set-union.xml
index 8e70da5..d8dbb49 100644
--- a/draft-summermatter-set-union.xml
+++ b/draft-summermatter-set-union.xml
@@ -498,15 +498,18 @@ FUNCTION get_bucket_id (key, k, L)
<name>Insert Element</name>
<t>
To add an element to an IBF, the element is mapped to a
subset of k buckets using
- the injective mapping function M as described in section
<xref target="ibf_m" format="title" /> section introducing
- BFs. For the buckets selected by the mapping function,
the counter is increased by one and the
- IDSUM field is set to the XOR of the element ID and the
previously stored IDSUM. Furthermore,
- the HASHSUM is set to the XOR of the hash of the element
ID and the previously stored HASHSUM.
+ the injective mapping function M as described in section
<xref target="ibf_m" format="title" />. For the buckets selected by the mapping
function, the counter is increased by one and the
+ IDSUM field is set to the XOR of the element ID
+ computed as described in section <xref
target="ibf_format_id_generation" format="title" />
+ and the previously stored IDSUM. Furthermore,
+ the HASHSUM is set to the XOR of the previously stored
HASHSUM
+ and the hash of the element ID computed as described
+ in section <xref target="ibf_format_HASH_calculation"
format="title" />.
</t>
<t>
In the following example, the insert operation is
illustrated using an element with the
- ID 0x0102 and a hash of 0x4242, and a second element
with the ID 0x0304 and
- a hash of 0x0101.
+ ID 0x0102 mapped to {1,3} with a hash of 0x4242, and a
second element with the
+ ID 0x0304 mapped to {0,1} and a hash of 0x0101.
</t>
<t>Empty IBF:</t>
<figure anchor="figure_ibf_insert_0">
@@ -521,7 +524,7 @@ hashSum | 0x0000 | 0x0000 | 0x0000 |
0x0000 |
+-------------+-------------+-------------+-------------+
]]></artwork>
</figure>
- <t>Insert first element: [0101] with ID 0x0102 and hash
0x4242:</t>
+ <t>Insert first element with ID 0x0102 and hash 0x4242
into {1,3}:</t>
<figure anchor="figure_ibf_insert_1">
<artwork name="" type="" align="left" alt=""><![CDATA[
bucket-0 bucket-1 bucket-2 bucket-3
@@ -534,7 +537,7 @@ hashSum | 0x0000 | 0x4242 | 0x0000 |
0x4242 |
+-------------+-------------+-------------+-------------+
]]></artwork>
</figure>
- <t>Insert second element: [1100] with ID 0x0304 and hash
0101:</t>
+ <t>Insert second element with ID 0x0304 and hash 0101 into
{0,1}:</t>
<figure anchor="figure_ibf_insert_2">
<artwork name="" type="" align="left" alt=""><![CDATA[
bucket-0 bucket-1 bucket-2 bucket-3
@@ -557,9 +560,9 @@ hashSum | 0x0101 | 0x4343 | 0x0000 |
0x4242 |
HASHSUM is similarly replaced with the XOR of the old
HASHSUM and the hash of the ID.
</t>
<t>
- In the following example the remove operation for the
element [1100] with the hash 0x0101 is demonstrated.
+ In the following example the remove operation is
illustrated.
</t>
- <t>IBF with encoded elements:</t>
+ <t>IBF with two encoded elements:</t>
<figure anchor="figure_ibf_remove_0">
<artwork name="" type="" align="left" alt=""><![CDATA[
bucket-0 bucket-1 bucket-2 bucket-3
@@ -572,7 +575,7 @@ hashSum | 0x0101 | 0x4343 | 0x0000 |
0x4242 |
+-------------+-------------+-------------+-------------+
]]></artwork>
</figure>
- <t>Remove element [1100] with ID 0x0304 and hash 0x0101
from the IBF:</t>
+ <t>After removal of element with ID 0x0304 and hash 0x0101
mapped to {0,1} from the IBF:</t>
<figure anchor="figure_ibf_remove_1">
<artwork name="" type="" align="left" alt=""><![CDATA[
bucket-0 bucket-1 bucket-2 bucket-3
@@ -599,9 +602,9 @@ hashSum | 0x0000 | 0x4242 | 0x0000 |
0x4242 |
</t>
<t>
Buckets in an IBF with a counter of 1 or -1 are crucial
for decoding an IBF, as
- they might represent only a single element, with the
IDSUM being the ID of that element.
- Following Eppstein (CITE), we will call buckets that
only represent a single
- element pure buckets.
+ they MIGHT represent only a single element, with the
IDSUM being the ID of that element.
+ Following Eppstein (FIXME-CITE), we will call buckets
that only represent a single
+ element <em>pure buckets</em>.
Note that due to the possibility of multiple insertion
and removal operations
affecting the same bucket, not all buckets with a
counter of 1 or -1 are
actually pure buckets. Sometimes a counter can be 1 or
-1 because N elements
@@ -611,51 +614,55 @@ hashSum | 0x0000 | 0x4242 | 0x0000 |
0x4242 |
</section>
<section anchor="ibf_operations_decode" numbered="true"
toc="default">
- <name>Decode IBF</name>
+ <name>Extracting elements</name>
<t>
- Decoding an IBF yields the HASH of an element from the
IBF, or failure.
+ Extracting elements from an IBF yields IDs of elements
from the IBF.
+ Elements are extracted from an IBF by repeatedly
performing a
+ decode operation on the IBF.
</t>
<t>
- A decode operation requires a pure bucket, that is a
bucket to which M
- only mapped a single element, to succeed. Thus, if
there is no bucket with
- a counter of 1 or -1, decoding fails. However, as a
counter of 1 or -1 is
- not a guarantee that the bucket is pure, there is also
a chance that the
- decoder returns an IDSUM value that is actually the
XOR of several IDSUMs.
- This is primarily detected by checking that the
HASHSUM is the hash of the IDSUM.
- Only if the HASHSUM also matches, the bucket could be
pure. Additionally,
- one MUST check that the IDSUM value actually would be
mapped by M to
- the respective bucket. If not, there was a hash
collision.
+ A decode operation requires a pure bucket, that is a
bucket to which M
+ only mapped a single element, to succeed. Thus, if
there is no bucket with
+ a counter of 1 or -1, decoding fails. However, as a
counter of 1 or -1 is
+ not a guarantee that the bucket is pure, there is also a
chance that the
+ decoder returns an IDSUM value that is actually the XOR
of several IDSUMs.
+ This is primarily detected by checking that the HASHSUM
is the hash of the IDSUM.
+ Only if the HASHSUM also matches, the bucket could be
pure. Additionally,
+ one MUST check that the IDSUM value actually would be
mapped by M to
+ the respective bucket. If not, there was a hash
collision and the bucket
+ is also not pure.
</t>
<t>
- The very rare case that after all these checks a
bucket is still
- falsely identified as pure MUST be detected (say by
determining that
- extracted element IDs do not match any actual
elements), and addressed
- at a higher level in the protocol. As these failures
are probabilistic
- and depend on element IDs and the IBF construction,
they can typically
- be avoided by retrying with different parameters, such
as a different
- way to assign element IDs to elements, using a larger
value for L, or
- a different mapping function M.
- A more common scenario (especially if L was too small)
is that
- IBF decoding fails because there is no pure bucket. In
this case, the
- higher-level protocol SHOULD also retry using
different parameters.
+ The very rare case that after all these checks a bucket
is still
+ falsely identified as pure MUST be detected (say by
determining that
+ extracted element IDs do not match any actual elements),
and addressed
+ at a higher level in the protocol. As these failures are
probabilistic
+ and depend on element IDs and the IBF construction, they
can typically
+ be avoided by retrying with different parameters, such
as a different
+ way to assign element IDs to elements (by varying the
IBF-salt),
+ using a larger value for L, or a different mapping
function M.
+ A more common scenario (especially if L was too small)
is that
+ IBF decoding fails because there is no pure bucket. In
this case, the
+ higher-level protocol generally MUST also retry using
different
+ parameters (except if an attack is detected).
</t>
<t>
- Suppose the IBF contains a pure bucket. In this case,
the IDSUM in the
- bucket identifies a single element. Furthermore, it
is then possible
- to remove that element from the IBF (by inserting it
if the counter
- was negative, and by removing it if the counter was
positive). This
- is likely to cause other buckets to become pure,
allowing further
- elements to be decoded. Eventually, decoding ought to
succeed with
- all counters and IDSUM and HASHSUM values reach zero.
However, it is also
- possible that an IBF only partly decodes and then
decoding fails after
- obtaining some elements.
+ Suppose the IBF contains a pure bucket. In this case,
the IDSUM in the
+ bucket is the ID of an element. Furthermore, it is then
possible
+ to remove that element from the IBF (by inserting it if
the counter
+ was negative, and by removing it if the counter was
positive). This
+ is likely to cause other buckets to become pure,
allowing further
+ elements to be decoded. Eventually, decoding ought to
finish with
+ all counters and IDSUM and HASHSUM values reach zero.
However, it is also
+ possible that an IBF only partly decodes and then
decoding fails due
+ to the lack of pure buckets after extracting some
element IDs.
</t>
<t>
- In the following example the successful decoding of an
IBF containing
- the two elements previously added in our running
example.
+ In the following example the successful decoding of an
IBF containing
+ the two elements previously added in our running example.
</t>
<t>
- IBF with the two encoded elements:
+ We begin with an IBF with two elements added:
</t>
<figure anchor="figure_ibf_decode_0">
<artwork name="" type="" align="left" alt=""><![CDATA[
@@ -670,8 +677,11 @@ hashSum | 0x0101 | 0x4343 | 0x0000 |
0x4242 |
]]></artwork>
</figure>
<t>
- In the IBF are two pure buckets to decode (bit-1 and
bit-4) we choose to start with decoding bucket 1,
- we decode the element with the hash 1010 and we see
that there is a new pure bucket created (bit-2)
+ In the IBF are two pure buckets to decode (buckets 0 and
3) we choose to start with
+ decoding bucket 0. This yields the element with the hash
ID 0x0304 and
+ hash 1010. This element ID is mapped to buckets
+ {0,1}.
+ Subtracting this element results in bucket 1 becoming
pure:
</t>
<figure anchor="figure_ibf_decode_1">
<artwork name="" type="" align="left" alt=""><![CDATA[
@@ -686,9 +696,10 @@ hashSum | 0x0000 | 0x4242 | 0x0000 |
0x4242 |
]]></artwork>
</figure>
<t>
- In the IBF only pure buckets are left, we choose to
continue decoding bucket 2 and decode element
- with the hash 0x4242. Now the IBF is empty (all
buckets have count 0) that means the IBF has been successfully
- decoded.
+ We can now decoding bucket 2 and extract the element
+ with the ID 0x0102 and hash 0x4242. Now the IBF is
+ empty. Extraction completes with the status that
+ the IBF has been successfully decoded.
</t>
<figure anchor="figure_ibf_decode_2">
<artwork name="" type="" align="left" alt=""><![CDATA[
@@ -710,10 +721,12 @@ hashSum | 0x0000 | 0x0000 | 0x0000 |
0x0000 |
Given addition and removal as defined above, it is
possible to define an operation on IBFs
that computes an IBF representing the set difference.
Suppose IBF1 represents set A, and
IBF2 represents set B. Then this set difference
operation will compute IBF3 which
- represents the set A - B --- without having to
transfer the elements from set A or B.
+ represents the set A - B. Note that this computation
can be done only on the IBFs,
+ and does not require access to the elements from set A
or B.
To calculate the IBF representing this set difference,
both IBFs MUST have the same
- length L, the same number of buckets per element k and
use the same map M. Given this,
+ length L, the same number of buckets per element k and
use the same map M.
+ Naturally, all IDs must have been computed using the
same IBF-salt. Given this,
one can compute the IBF representing the set
difference by taking the XOR of the IDSUM and HASHSUM values
of the respective buckets and subtracting the
respective counters. Care MUST be taken
to handle overflows and underflows by setting the
counter to "infinity" as necessary.
@@ -729,7 +742,8 @@ hashSum | 0x0000 | 0x0000 | 0x0000 |
0x0000 |
To demonstrate the set difference operation we compare
IBF-A with IBF-B and generate as described
IBF-AB
</t>
- <t>IBF-A containing elements with hashes 0x0101 and
0x4242:</t>
+ <t>IBF-A contains the elements with ID 0x0304 and hash
0x0101 mapped to {0,1},
+ and ID 0x0102 and hash 0x4242 mapped to {1,3}:</t>
<figure anchor="figure_ibf_setdiff_A">
<artwork name="" type="" align="left" alt=""><![CDATA[
bucket-0 bucket-1 bucket-2 bucket-3
@@ -742,36 +756,38 @@ hashSum | 0x0101 | 0x4343 | 0x0000 |
0x4242 |
+-------------+-------------+-------------+-------------+
]]></artwork>
</figure>
- <t>IBF-B containing elements with hashes 0x4242 and
0x5050</t>
+ <t>IBF-B also contains the element with ID 0x0102 and
+ and another element with ID 0x1345 and hash 0x5050
+ mapped to {1,2}.</t>
<figure anchor="figure_ibf_setdiff_B">
<artwork name="" type="" align="left" alt=""><![CDATA[
bucket-0 bucket-1 bucket-2 bucket-3
+-------------+-------------+-------------+-------------+
count | 0 | 1 | 1 | 1 |
+-------------+-------------+-------------+-------------+
- idSum | 0x0000 | 0x0102 | 0x1345 | 0x0102 |
+ idSum | 0x0000 | 0x1447 | 0x1345 | 0x0102 |
+-------------+-------------+-------------+-------------+
-hashSum | 0x0000 | 0x4242 | 0x5050 | 0x4242 |
+hashSum | 0x0000 | 0x9292 | 0x5050 | 0x4242 |
+-------------+-------------+-------------+-------------+
]]></artwork>
</figure>
- <t>IBF-AB XOR value and subtract count:</t>
+ <t>IBF-A minus IBF-B is then:</t>
<figure anchor="figure_ibf_setdiff_AB">
<artwork name="" type="" align="left" alt=""><![CDATA[
bucket-0 bucket-1 bucket-2 bucket-3
+-------------+-------------+-------------+-------------+
- count | 1 | 1 | -1 | 0 |
+ count | 1 | 0 | -1 | 0 |
+-------------+-------------+-------------+-------------+
- idSum | 0x0304 | 0x0304 | 0x1345 | 0x0000 |
+ idSum | 0x0304 | 0x1049 | 0x1345 | 0x0000 |
+-------------+-------------+-------------+-------------+
-hashSum | 0x0101 | 0x0101 | 0x5050 | 0x0000 |
+hashSum | 0x0101 | 0x5151 | 0x5050 | 0x0000 |
+-------------+-------------+-------------+-------------+
]]></artwork>
</figure>
<t>After calculating and decoding the IBF-AB shows clear that
in IBF-A the element with the hash 0x5050
- is missing (-1 in bit-3) while in IBF-B the element with
the hash 0101 is missing
- (1 in bit-1 and bit-2). The element with hash 0x4242 is
present in IBF-A and IBF-B and is
- removed by the set difference operation (bit-4).
+ is missing (-1 in bucket 2) while in IBF-B the element
with the hash 0101 is missing
+ (1 in bucket 0). The element with hash 0x4242 is present
in IBF-A and IBF-B and is
+ removed by the set difference operation. Bucket 2 is not
empty.
</t>
</section>
</section>
@@ -779,12 +795,18 @@ hashSum | 0x0101 | 0x0101 | 0x5050 |
0x0000 |
<section anchor="ibf_format" numbered="true" toc="default">
<name>Wire format</name>
<t>
- The counter size transmitted over the wire
- varies between 1 and 64 bit, depending on the
- maximum counter in the IBF. This variable counter
- should cover most areas of application.
+ For the counter field, we use a variable-size encoding to
ensure
+ that even for very large sets the counter should never
reach
+ "infinity", while also ensuring that the encoding is
compact for
+ small sets.
+ Hence, the counter size transmitted over the wire
+ varies between 1 and 64 bits, depending on the
+ maximum counter in the IBF. A range of 1 to 64 bits
+ should cover most areas of application and can be
+ efficiently implemented on most contemporary CPU
+ architectures and programming languages.
The bit length for the transmitted IBF
- is defined in the header of the
+ will be communicated in the header of the
<em><xref target="messages_ibf" format="title" /></em>
message
in the "IMCS" field as unsigned 8-bit integer.
For implementation details see section <xref
target="performance_counter_variable_size" format="title" />.
@@ -806,18 +828,14 @@ hashSum | 0x0101 | 0x0101 | 0x5050 |
0x0000 |
For the "HASHSUM", we always use a 32-bit
representation. Here, it is most important to
avoid collisions, where different elements are
- mapped to the same hash. However, we note that
- by design only a few elements (certainly less than
- 127) should ever be mapped
- to the same bucket, a small number of bits
- should suffice. Furthermore, our protocol is designed
- to handle occasional collisions, so while with
- 32-bits there remains a chance of
- accidental collisions, at 32 bit the chance is
- generally believed to be sufficiently small
- for the protocol to handle those cases efficiently
- for a wide range of use-cases. Smaller hash
- values would safe bandwidth, but also drastically
+ mapped to the same hash, possibly resulting in
+ a bucket being falsely classified as pure.
+ While with 32 bits there remains a non-negligible chance of
+ accidental collisions, our protocol is designed
+ to handle occasional collisions. Hence, at 32 bit the
chance is
+ believed to be sufficiently small enough
+ for the protocol to handle those cases efficiently.
Smaller hash
+ values would safe bandwidth, but also substantially
increase the chance of collisions. 32 bits are
also again a reasonable size for many CPU
architectures.
@@ -833,7 +851,7 @@ hashSum | 0x0101 | 0x0101 | 0x5050 |
0x0000 |
a good value for L.
</t>
<t>
- Basically a Strata Estimator (SE) is a series of IBFs
(with a rather small value of L)
+ Basically a Strata Estimator (SE) is a series of IBFs
(with a rather small value of L=79)
in which increasingly large subsets of the full set
of elements are added to each IBF. For the n-th IBF, the
function selecting the
subset of elements MUST sample to select
(probabilistically) 1/(2^n) of all
@@ -843,18 +861,19 @@ hashSum | 0x0101 | 0x0101 | 0x5050 |
0x0000 |
IBF being statistically expected to contain 1/(2^n)
elements.
</t>
<t>
- Given two SEs, the set size difference can be estimated by
trying to decode all of the
- IBFs. Given that L was set to a rather small value, IBFs
containing large strata
- will likely fail to decode. For those IBFs that failed to
decode, one simply
+ Given two SEs, the set size difference can be estimated by
attempting to decode all of the
+ IBFs. Given that L is set to a fixed and rather small
value, IBFs containing large strata
+ will likely fail to decode. For IBFs that failed to
decode, one simply
extrapolates the number of elements by scaling the numbers
obtained from the
other IBFs that did decode. If none of the IBFs of the SE
decoded (which given
- a reasonable choice of L should be highly unlikely), one
can retry using a different
- mapping function M.
+ a reasonable number of IBFs in the SE should be highly
unlikely), one can theoretically
+ retry using a different IBF-salt.
</t>
<t>
- In addition, when decoding the IBFs in the strata
estimator, it is possible to determine
+ When decoding the IBFs in the strata estimator, it is
possible to determine
on which side which part of the difference is. For this
purpose, the pure buckets with
- counter 1 and -1 must be distinguished and assigned to the
respective side when decoding the IBFs.
+ counter 1 and -1 must be distinguished and assigned to the
respective side when decoding
+ the IBFs.
</t>
</section>
--
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [lsd0003] branch master updated: chapter 4,
gnunet <=