RE: Trustworthiness of build farms (was Re: CDN performance)

guix-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Trustworthiness of build farms (was Re: CDN performance)

From:	Jeremiah
Subject:	RE: Trustworthiness of build farms (was Re: CDN performance)
Date:	Sat, 22 Dec 2018 17:23:00 +0000

> I agree that a mathematical proof is what we should be aiming for, and
> the only kind of proof that I could trust in, in this scenario.

A formal proof would be just one piece used in building layers of trust,
each of them indpendent and reinforcing of each other like layers of
kevlar in a bullet proof vest; thus even if some of the layers break,
the bullet doesn't get in to do damage.

> However, it's important to note several caveats:
Of course, we always miss things.

> * A mathematical proof showing that the binary conforms to the source
>   requires a proof of correctness of the language implementation, which
>     in turn requires formal semantics for both the source language and
>   the
>    underlying machine code.  As far as I know, the current
>       bootstrappable.org effort does not include anything like this.
>         Existing projects that provide this include CakeML and Jitawa.
Well hex2's entire language is:
# line comment - ignore all text until lf
; line comment - ignore all text until lf
[0-F, 0-F] - The 2 chars are nibbles combined into a byte (Big Endian)
:label - The absolute address of the label used by pointers
!label{>label} - the 8bit relative address to label
@label{>label} - the 16bit relative address to label
$label - the 16bit absolute address of label
%label{>label} - the 32bit relative address to label
&label - the 32bit absolute address of label

the ?label>label form changes the displacement from current IP to the IP
of the label (aka calculates the length between labels) and outputs the
value in the format inidicated by the leading char (address@hidden)
With the explicit requirement of specifying the architecture (which
specifies the format and displacement calculation) and the base address
(aka what address does the IP start at)

The M1-macro language is even smaller:
# line comment - ignore all text until lf
; line comment - ignore all text until lf
DEFINE STRING1 STRING2 - Replace all STRING1 with STRING2
!number - the 8bit value of number in hex form
@number - the 16bit value of number in hex form
%number - the 32bit value of number in hex form
{:address@hidden&}string - do not change and just output
"RAW STRING" - Convert to Hex (Big Endian)
'HEX' - Do not change and just output

So the specification for correctness is defined by the languages
themselves but you are correct in that someone has not picked up the
task of making a formally verified version of the tools yet.


> * One must assume that the hardware behaves according to its formal
>   specification.  The veracity of this assumption is not something we
>     can currently verify, and even if we could, it would be invalidated
>   if
>     someone gained physical access to the machine and modified it.
Actually no, I am expecting arbitrary hardware platforms to cross verify
each other and thus provided one system is uncompromised or trusted, the
rest can be validated.


> * The hardware initialization program (e.g. coreboot) and the kernel
>   used to perform the bootstrap must still be trusted, and unless I'm
>     mistaken, the bootstrappable.org effort does not currently address
>       these issues.
Stage0 assumes no operating system nor firmware and runs on bare metal
(literally TTL when done). Mescc-tools is just a port of the
specification to Linux to allow developers to leverage it to simplify
the later task of finishing.


> * The software running on the substitute servers could be compromised by
>   stealing SSH keys from someone with root access.
Yes that is true but I don't want you to trust the substitute
servers. They exist for the purpose of convience not security


> * If the private signing key of the substitute server can be stolen,
>   e.g. by gaining physical access to the machine for a short time, then
>     a man-in-the-middle can deliver to users compromised binaries that
>       appear to come from the substitute server itself.
I believe that is a duplicate of the previous point but yes you are
correct that is a risk of having a single point of trust. Which is why
stage0 by design can not have a single point of trust because the
bootstrap steps can be done on arbitrary hardware and rewritten from
scratch with minimal effort. Thus one must need only review the source
code (which focuses on clarity and understandability and any issues with
it is a bug that needs to be reported).


> * Not only the substitute server, but also all of its build slaves, must
>   be trusted as well.
Assume all machines could be compromised, then design assuming that any
of them can be compromised at any time but not all of them all of the
time. Then you have the solution to trust.


> So, while I certainly agree that it will be a *major* improvement to
> avoid the need to blindly trust precompiled bootstrap binaries
In terms of size that is true but in number of people who understand the
seed pieces, that unfortunately has reduced and that issue needs to be
addressed.

> we should be clear that the current efforts fall far short of a proof,
Absolutely agree at this port, we only have the specification for which
the proof must be written and someone will need to write a formally
verified version to verify that our roots of truth which are
approximations of the specification limited by the constraints of time
and human effort have not missed a subtle error.

> and there still remain several valid reasons not to place one's trust in
> substitute servers.
Personally I believe in terms of security and trust in absolute senses
of the words, you are beyound a shadow of a doubt correct but the
substitutes have never been about trust or security but convience which
many guix users want. But that doesn't mean that there are not things we
should start doing to make the task of compromising that trust harder.

For example:
Cryptographic port knocking for administration of servers requiring
access to a physically seperate hard token.

Cryptographically signed software white-listing on servers with the
signing keys being on a seperate system that is offline and the binaries
being signed being built from source on machines that have no internal
state and running only software explicitly specified and required for
the build.

Random system spot checks and wipes.

In short anything that makes single point compromises worthless needs to
be actively considered.

> Does that make sense?
Yes and hopefully my perspective makes sense as well.

-Jeremiah

[Prev in Thread]

Current Thread

[Next in Thread]

RE: Trustworthiness of build farms (was Re: CDN performance), Jeremiah <=
- Re: Trustworthiness of build farms (was Re: CDN performance), Mark H Weaver, 2018/12/23
- Re: Trustworthiness of build farms (was Re: CDN performance), Jeremiah, 2018/12/25
  - Re: Trustworthiness of build farms (was Re: CDN performance), Mark H Weaver, 2018/12/28

Prev by Date: Re: New Gash build Bash without Bash, Coreutils, and a few others
Next by Date: Re: New Gash build Bash without Bash, Coreutils, and a few others
Previous by thread: warning: SQLite database is busy
Next by thread: Re: Trustworthiness of build farms (was Re: CDN performance)
Index(es):
- Date
- Thread