Tag Archives: streaming algorithms

computer science expository math

Estimating the Second Frequency Moment

\(\)
I just uploaded an updated version of an essay, Estimating the Second Frequency Moment. The frequency moment problem was addressed in the seminal paper of Alon, Matias and Szegedy. In my essay, I give a couple of results from that paper and give indications of how the results can be generalized.

Suppose you are shown a stream of numbers \(s_1, s_2, \ldots, s_n\) where each \(s_j \in \{1,2, \ldots, m\}\) for \(j = 1, 2, \ldots, n\). If \(f_i\) denotes the number of times \(i \in \{1, 2, \ldots, m\}\) (that is the frequency of \(i\) in the stream), then  we define

$$F_2 = \sum_{i = 1}^m f_i^2 .$$

To compute \(F_2\) naively, we can maintain counters for \(f_1, f_2, \ldots, f_m\), but this approach requires a lot of memory as \(m\) may be large. Instead, we can estimate \(F_2\) by defining a random variable that can be computed from the stream whose expected value is \(F_2\). By maintaining several independent versions of this random variable, we can obtain a good approximation of \(F_2\) using an amount of memory that is only logarithmic in \(m\) and \(n\).