Tag Archives: probabilistic method

computer science expository math

Shorter Certificates for Set Disjointness


Suppose two players, Alice and Bob, each hold equal sized subsets of \([n] = \{1, 2, \ldots, n\}\). A third party, Carole, wishes to convince Alice and Bob that their subsets are disjoint, i.e., their sets have no elements in common. How efficiently can Carole prove to Alice and Bob that their sets are disjoint?

To formalize the situation, suppose \(k\) is an integer with \(1 < k < n/2\). Alice holds a set \(A \subseteq [n]\) with \(|A| = k\) and similarly Bob holds \(B \subseteq [n]\) with \(|B| = k\). We assume that Carole sees the sets \(A\) and \(B\), but Alice and Bob have no information about each others' sets. We allow Carole to send messages to Alice and Bob, and we allow Alice and Bob to communicate with each other. Our goal is to minimize the total amount of communication between Carole, Alice, and Bob. In particular, we will consider communication protocols of the following form:

  1. Carole produces a certificate or proof that \(A \cap B = \emptyset\) and sends this certificate to Alice and Bob.
  2. Alice and Bob individually verify that Carole’s certificate is valid with respect to their individual inputs. Alice and Bob then send each other (very short) messages saying whether or not they accept Carole’s proof. If they both accept the proof, they can be certain that their sets are disjoint
  3. In order for the proof system described above to be valid, it must satisfy the following two properties:

    Completeness If \(A \cap B = \emptyset\) then Carole can send a certificate that Alice and Bob both accept.

    Soundness If \(A \cap B \neq \emptyset\) then any certificate that Carole sends must be rejected by at least one of Alice and Bob.

    Before giving a “clever” solution to this communication problem, we describe a naive solution. Since Carole sees \(A\) and \(B\), her proof of their disjointness could simply be to send Alice and Bob the (disjoint) pair \(C = (A, B)\). Then Alice verifies the validity of the certificate \(C\) by checking that her input \(A\) is equal to the first term in \(C\), and similarly Bob checks that \(B\) is equal to the second term. Clearly, if \(A\) and \(B\) are disjoint, Alice and Bob will both accept \(C\), while if \(A\) and \(B\) intersect, Alice or Bob will reject every certificate that Carole could send.

    Let us quickly analyze the efficiency of this protocol. The certificate that Carole sends consists of a pair of \(k\)-subsets of \([n]\). The naive encoding of simply listing the elements of \(A\) and \(B\) requires \(2 k \log n\) bits — each \(i \in A \cup B\) requires \(\log n\) bits, and there are \(2 k\) such indices in the list. In fact, even if Carole is much cleverer in her encoding of the sets \(A\) and \(B\), she cannot compress the proof significantly for information theoretic reasons. Indeed, there are
    $$
    {n \choose k}{n-k \choose k}
    $$
    distinct certificates that Carole must be able to send, hence her message must be of length at least
    $$
    \log {n \choose k} \geq \log ((n/k)^k) = k \log n – k \log k.
    $$
    Is it possible for Carole, Alice, and Bob to devise a more efficient proof system for disjointness?

    read more »