<< Chapter < Page | Chapter >> Page > |
Recall that a context tree source is similar to a Markov source, where the number of states is greatly reduced. Let $T$ be the set of leaves of a context tree source, then the redundancy is
where $\left|T\right|$ is the number of leaves, and we have $log\left(\frac{n}{\left|T\right|}\right)$ instead of $log\left(n\right)$ , because each state generated $\frac{n}{\left|T\right|}$ symbols, on average. In contrast, the redundancy for a Markov representation of the tree source $T$ is much larger. Therefore, tree sources are greatly preferable in practice, they offer a significant reductionin redundancy.
How can we compress universally over the parametric class of tree sources? Suppose that we knew $T$ , that is we knew the set of leaves. Then we could process $x$ sequentially, where for each ${x}_{i}$ we can determine what state its context is in, that is the unique suffix of ${x}_{1}^{i-1}$ that belongs to the set of leaf labels in $T$ . Having determined that we are in some state $s$ , $Pr({x}_{i}=0|s,{x}_{1}^{i-1})$ can be computed by examining all previous times that we were in state $s$ and computing the probability with the Krichevsky-Trofimov approach based on the number of times that the following symbol(after $s$ ) was 0 or 1. In fact, we can store symbol counts ${n}_{x}(s,0)$ and ${n}_{x}(s,1)$ for all $s\in T$ , update them sequentially as we process $x$ , and compute $Pr({x}_{i}=0|s,{x}_{1}^{i-1})$ efficiently. (The actual translation to bits is performed with an arithmetic encoder.)
While promising, this approach above requires to know $T$ . How do we compute the optimal ${T}^{*}$ from the data?
Semi-predictive coding : The semi-predictive approach to encoding for context tree sources [link] is to scan the data twice, where in the first scan we estimate ${T}^{*}$ and in the second scan we encode $x$ from ${T}^{*}$ , as described above. Let us describe a procedure for computing the optimal ${T}^{*}$ among tree sources whose depth is bounded by $D$ . This procedure is visualized in [link] . As suggested above, we count ${n}_{x}(s,a)$ , the number of times that each possible symbol appeared in context $s$ , for all $s\in {\alpha}^{D},a\in \alpha $ . Having computed all the symbol counts, we process the depth- $D$ tree in a bottom-top fashion, from the leaves to the root, where for each internal node $s$ of the tree (that is, $s\in {\alpha}^{d}$ where $d<D$ ), we track ${T}_{s}^{*}$ , the optimal tree structure rooted at $s$ to encode symbols whose context ends with $s$ , and $\text{MDL}\left(s\right)$ the minimum description lengths (MDL) required for encoding these symbols.
Without loss of generality, consider the simple case of a binary alphabet $\alpha =\{0,1\}$ . When processing $s$ we have already computed the symbol counts ${n}_{x}(0s,0)$ and ${n}_{x}(0s,1)$ , ${n}_{x}(1s,0)$ , ${n}_{x}(1s,1)$ , the optimal trees ${T}_{0s}^{*}$ and ${T}_{1s}^{*}$ , and the minimum description lengths (MDL) $\text{MDL}\left(0s\right)$ and $\text{MDL}\left(1S\right)$ . We have two options for state $s$ .
Notification Switch
Would you like to follow the 'Universal algorithms in signal processing and communications' conversation and receive update notifications?