[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lsd0003] branch master updated: chapter 2
From: |
gnunet |
Subject: |
[lsd0003] branch master updated: chapter 2 |
Date: |
Sat, 12 Jun 2021 17:22:43 +0200 |
This is an automated email from the git hooks/post-receive script.
grothoff pushed a commit to branch master
in repository lsd0003.
The following commit(s) were added to refs/heads/master by this push:
new 2e15bfd chapter 2
2e15bfd is described below
commit 2e15bfdc45ea02d516126f2947e8d3ffb409d084
Author: Christian Grothoff <christian@grothoff.org>
AuthorDate: Sat Jun 12 17:19:59 2021 +0200
chapter 2
---
draft-summermatter-set-union.xml | 114 +++++++++++++++++++++------------------
1 file changed, 63 insertions(+), 51 deletions(-)
diff --git a/draft-summermatter-set-union.xml b/draft-summermatter-set-union.xml
index 66fdfab..87793e6 100644
--- a/draft-summermatter-set-union.xml
+++ b/draft-summermatter-set-union.xml
@@ -174,16 +174,17 @@
<section anchor="background" numbered="true" toc="default">
<name>Background</name>
<section anchor="bf" numbered="true" toc="default">
- <name>Bloom Filters</name>
+ <name>Bloom Filter</name>
<t>
- A Bloom filter (BF) is a space-efficient datastructure to
test if an element is part of a set of elements.
- Elements are identified by an element ID.
- Since a BF is a probabilistic datastructure, it is
possible to have false-positives: when asked
- if an element is in the set, the answer from a BF is
either "no" or "maybe".
+ A Bloom filter (BF) is a space-efficient probabilistic
+ datastructure to test if an element is part of a set of
elements.
+ Elements are identified by an element ID.
+ Since a BF is a probabilistic datastructure, it is possible
to have false-positives: when asked
+ if an element is in the set, the answer from a BF is either
"no" or "maybe".
</t>
<t>
A BF consists of L buckets. Every bucket is a binary value
that can be either 0 or 1. All buckets are initialized
- to 0. A mapping function M is used to map each ID of each
element from the set to a subset of k buckets. M is non-injective
+ to 0. A mapping function M is used to map each ID of each
element from the set to a subset of k buckets. In the original proposal by
Bloom, M is non-injective
and can thus map the same element multiple times to the
same bucket.
The type of the mapping function can thus be described by
the following mathematical notation:
</t>
@@ -211,13 +212,11 @@
To check if an element may be in the set, one tests if all
buckets under the map M are set to 1.
</t>
<t>
- Further in this document a bitstream output by the mapping
function is represented by
- a set of numeric values for example (0101) = (2,4).
In the BF the buckets are set to 1 if the corresponding
bit in the bitstream is 1.
- If there is a collision and a bucket is already set to 1,
the bucket stays 1.
+ If there is a collision and a bucket is already set to 1,
the bucket stays at 1.
</t>
<t>
- In the following example the element M(element) = (1,3)
has been added:
+ In the following example the element e0 with M(e0) = {1,3}
has been added:
</t>
<figure anchor="figure_bf_insert_0">
<artwork name="" type="" align="left" alt=""><![CDATA[
@@ -228,8 +227,10 @@
]]></artwork>
</figure>
<t>
- It is easy to see that the M(element) = (0,3) could be in
the BF below and M(element) = (0,2) cannot be
- in the BF below:
+ It is easy to see that an element e1 with M(e1) = {0,3}
+ could have been added to the BF below, while an element e2
+ with M(e2) = {0,2} cannot be in the set represented by the
+ BF below:
</t>
<figure anchor="figure_bf_contains">
@@ -255,14 +256,18 @@
<section anchor="cbf" numbered="true" toc="default">
<name>Counting Bloom Filter</name>
<t>
- A Counting Bloom Filter (CBF) is an extension of the <xref
target="bf" format="title" />. In the CBF, buckets are
- unsigned numbers instead of binary values. This allows the
removal of an element from the CBF.
+ A Counting Bloom Filter (CBF) is a variation on the idea
+ of a <xref target="bf" format="title" />. With a CBF,
buckets are
+ unsigned numbers instead of binary values.
+ This allows the removal of an element from the CBF.
</t>
<t>
- Adding an element to the CBF is similar to the adding
operation of the BF. However, instead of setting the bucket on hit to 1 the
- numeric value stored in the bucket is increased by 1. For
example if two colliding elements M(element1) = (1,3) and
- M(element2) = (0,3) are added to the CBF, bucket 0 and 1
are set to 1 and bucket 3 (the colliding bucket) is set
- to 2:
+ Adding an element to the CBF is similar to the adding
operation of the BF.
+ However, instead of setting the buckets to 1 the
+ numeric value stored in the bucket is increased by 1.
+ For example, if two colliding elements M(e1) = {1,3} and
+ M(e2) = {0,3} are added to the CBF, bucket 0 and 1 are set
+ to 1 and bucket 3 (the colliding bucket) is set to 2:
</t>
<figure anchor="figure_cbf_insert_0">
<artwork name="" type="" align="left" alt=""><![CDATA[
@@ -273,13 +278,15 @@
]]></artwork>
</figure>
<t>
- The counter stored in the bucket is also called the order
of the bucket.
+ The counter stored in the bucket is also called the order of
the bucket.
</t>
<t>
- To remove an element form the CBF the counters of all
buckets the element is mapped to are decreased by 1.
+ To remove an element form the CBF the counters of all buckets
+ the element is mapped to are decreased by 1.
</t>
<t>
- Removing M(element2) = (1,3) from the CBF above:
+ For example, removing M(e2) = {1,3} from the CBF above
+ results in:
</t>
<figure anchor="figure_cbf_remove_0">
<artwork name="" type="" align="left" alt=""><![CDATA[
@@ -290,15 +297,19 @@
]]></artwork>
</figure>
<t>
- In practice, the number of bits available for the counters
is usually finite. For example, given a 4-bit
- counter, a CBF bucket would overflow 16 elements are mapped
to the same bucket. To efficiently
- handle this case, the maximum value (15 in our example) is
considered to represent "infinity". Once the
+ In practice, the number of bits available for the counters
+ is often finite. For example, given a 4-bit
+ counter, a CBF bucket would overflow 16 elements are mapped
+ to the same bucket. To handle this case, the maximum value
+ (15 in our example) is considered to represent "infinity".
Once the
order of a bucket reaches "infinity", it is no longer
incremented or decremented.
</t>
<t>
- The parameters L and k and the number of bits allocated to
the counters depend on the set size.
- An IBF will degenerate when subjected to insert and remove
iterations of different elements, and eventually all
- buckets will reach "infinity". The speed of the degradation
will depend on the choice of L and k in
+ The parameters L and k and the number of bits allocated to
the counters
+ SHOULD depend on the set size.
+ A CBF will degenerate when subjected to insert and remove
iterations of
+ different elements, and eventually all buckets will reach
"infinity".
+ The speed of the degradation will depend on the choice of L
and k in
relation to the number of elements stored in the IBF.
</t>
</section>
@@ -309,34 +320,34 @@
<t>
An Invertible Bloom Filter (IBF) is a further extension of the
<xref target="cbf" format="title" />.
An IBF extends the <xref target="cbf" format="title" /> with
two more operations:
- decode and set difference. This two extra operations are
useful to efficiently extract
+ decode and set difference. This two extra operations are key
to efficiently obtain
small differences between large sets.
</t>
<section anchor="ibf_structure" numbered="true" toc="default">
<name>Structure</name>
<t>
- An IBF consists of a mapping function M and
- L buckets that each store a signed
- counter and an XHASH. An XHASH is the XOR of various
- hash values. As before, the
- values used for k, L and the number of bits used
- for the signed counter and the XHASH depend
- on the set size and various other trade-offs,
- including the CPU architecture.
- </t>
- <t>
- If the IBF size is too small or the mapping
- function does not spread out the elements
- uniformly, the signed counter can overflow or
- underflow. As with the CBF, the "maximum" value is
- thus used to represent "infinite". As there is no
- need to distinguish between overflow and
- underflow, the most canonical representation of
- "infinite" would be the minimum value of the
- counter in the canonical 2-complement
- interpretation. For example, given a 4-bit
- counter a value of -8 would be used to represent
- "infinity".
+ An IBF consists of an injective mapping function M mapping
+ elements to k out of L buckets. Each of the L buckets stores
+ a signed COUNTER, an IDSUM and an XHASH.
+ An IDSUM is the XOR of various element IDs.
+ An XHASH is the XOR of various hash values.
+ As before, the values used for k, L and the number of bits
used
+ for the signed counter and the XHASH depend
+ on the set size and various other trade-offs.
+ </t>
+ <t>
+ If the IBF size is too small or the mapping
+ function does not spread out the elements
+ uniformly, the signed counter can overflow or
+ underflow. As with the CBF, the "maximum" value is
+ thus used to represent "infinite". As there is no
+ need to distinguish between overflow and
+ underflow, the most canonical representation of
+ "infinite" would be the minimum value of the
+ counter in the canonical 2-complement
+ interpretation. For example, given a 4-bit
+ counter a value of -8 would be used to represent
+ "infinity".
</t>
<figure anchor="figure_ibf_structure">
<artwork name="" type="" align="left" alt=""><![CDATA[
@@ -737,7 +748,8 @@ FUNCTION id_calculation (element,ibf_salt):
<section anchor="ibf_format_bucket_identification"
numbered="true" toc="default">
<name>Mapping Function</name>
<t>
- The mapping function M as described above in the
figure <xref target="bf_mapping_function_math" format="default" />
+ For an IBF, it is beneficial to use an injective mapping
function M.
+ The mapping function M as described above in the figure
<xref target="bf_mapping_function_math" format="default" />
decides in which buckets the ID and HASH have to be
binary XORed to. In practice
the following algorithm is used:
</t>
--
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [lsd0003] branch master updated: chapter 2,
gnunet <=