# Tag Archives: essays

## Information Theory

I have just uploaded the beginnings of an essay on information theory to my website. You can see the essay (in its currently incomplete state) here. Information theory was first described by Claude Shannon in his groundbreaking 1948 paper, A Mathematical Theory of Communication. What is particularly surprising about Shannon’s truly remarkable paper is its completeness. Not only does Shannon suggest a mathematical model for (digital) communication and information, but he produces a huge array of fundamental results for his model. It is exceptionally rare that such a complete theory comes into being so fully formed, like the birth of Athena. A decade ago, UCSD produced documentary about Shannon’s life and work, which I have posted below.

## Estimating the Second Frequency Moment


I just uploaded an updated version of an essay, Estimating the Second Frequency Moment. The frequency moment problem was addressed in the seminal paper of Alon, Matias and Szegedy. In my essay, I give a couple of results from that paper and give indications of how the results can be generalized.

Suppose you are shown a stream of numbers $$s_1, s_2, \ldots, s_n$$ where each $$s_j \in \{1,2, \ldots, m\}$$ for $$j = 1, 2, \ldots, n$$. If $$f_i$$ denotes the number of times $$i \in \{1, 2, \ldots, m\}$$ (that is the frequency of $$i$$ in the stream), then  we define

$$F_2 = \sum_{i = 1}^m f_i^2 .$$

To compute $$F_2$$ naively, we can maintain counters for $$f_1, f_2, \ldots, f_m$$, but this approach requires a lot of memory as $$m$$ may be large. Instead, we can estimate $$F_2$$ by defining a random variable that can be computed from the stream whose expected value is $$F_2$$. By maintaining several independent versions of this random variable, we can obtain a good approximation of $$F_2$$ using an amount of memory that is only logarithmic in $$m$$ and $$n$$.