# 3.1 Null space conditions

 Page 1 / 1
This module introduces the spark and the null space property, two common conditions related to the null space of a measurement matrix that ensure the success of sparse recovery algorithms. Furthermore, the null space property is shown to be a necessary condition for instance optimal or uniform recovery guarantees.

A natural place to begin in establishing conditions on $\Phi$ in the context of designing a sensing matrix is by considering the null space of $\Phi$ , denoted

$\mathcal{N}\left(\Phi \right)=\left\{z:\Phi z=0\right\}.$

If we wish to be able to recover all sparse signals $x$ from the measurements $\Phi x$ , then it is immediately clear that for any pair of distinct vectors $x,{x}^{\text{'}}\in {\Sigma }_{K}=\left\{x,:,{∥x∥}_{0},\le ,K\right\}$ , we must have $\Phi x\ne \Phi {x}^{\text{'}}$ , since otherwise it would be impossible to distinguish $x$ from ${x}^{\text{'}}$ based solely on the measurements $y$ . More formally, by observing that if $\Phi x=\Phi {x}^{\text{'}}$ then $\Phi \left(x-{x}^{\text{'}}\right)=0$ with $x-{x}^{\text{'}}\in {\Sigma }_{2K}$ , we see that $\Phi$ uniquely represents all $x\in {\Sigma }_{K}$ if and only if $\mathcal{N}\left(\Phi \right)$ contains no vectors in ${\Sigma }_{2K}$ . There are many equivalent ways of characterizing this property; one of the most common is known as the spark   [link] .

## The spark

The spark of a given matrix $\Phi$ is the smallest number of columns of $\Phi$ that are linearly dependent.

This definition allows us to pose the following straightforward guarantee.

## (corollary 1 of [link] )

For any vector $y\in {\mathbb{R}}^{M}$ , there exists at most one signal $x\in {\Sigma }_{K}$ such that $y=\Phi x$ if and only if $\mathrm{spark}\left(\Phi \right)>2K$ .

We first assume that, for any $y\in {\mathbb{R}}^{M}$ , there exists at most one signal $x\in {\Sigma }_{K}$ such that $y=\Phi x$ . Now suppose for the sake of a contradiction that $\mathrm{spark}\left(\Phi \right)\le 2K$ . This means that there exists some set of at most $2K$ columns that are linearly dependent, which in turn implies that there exists an $h\in \mathcal{N}\left(\Phi \right)$ such that $h\in {\Sigma }_{2K}$ . In this case, since $h\in {\Sigma }_{2K}$ we can write $h=x-{x}^{\text{'}}$ , where $x,{x}^{\text{'}}\in {\Sigma }_{K}$ . Thus, since $h\in \mathcal{N}\left(\Phi \right)$ we have that $\Phi \left(x-{x}^{\text{'}}\right)=0$ and hence $\Phi x=\Phi {x}^{\text{'}}$ . But this contradicts our assumption that there exists at most one signal $x\in {\Sigma }_{K}$ such that $y=\Phi x$ . Therefore, we must have that $\mathrm{spark}\left(\Phi \right)>2K$ .

Now suppose that $\mathrm{spark}\left(\Phi \right)>2K$ . Assume that for some $y$ there exist $x,{x}^{\text{'}}\in {\Sigma }_{K}$ such that $y=\Phi x=\Phi {x}^{\text{'}}$ . We therefore have that $\Phi \left(x-{x}^{\text{'}}\right)=0$ . Letting $h=x-{x}^{\text{'}}$ , we can write this as $\Phi h=0$ . Since $\mathrm{spark}\left(\Phi \right)>2K$ , all sets of up to $2K$ columns of $\Phi$ are linearly independent, and therefore $h=0$ . This in turn implies $x={x}^{\text{'}}$ , proving the theorem.

It is easy to see that $\mathrm{spark}\left(\Phi \right)\in \left[2,M+1\right]$ . Therefore, [link] yields the requirement $M\ge 2K$ .

## The null space property

When dealing with exactly sparse vectors, the spark provides a complete characterization of when sparse recovery is possible. However, when dealing with approximately sparse signals we must introduce somewhat more restrictive conditions on the null space of $\Phi$   [link] . Roughly speaking, we must also ensure that $\mathcal{N}\left(\Phi \right)$ does not contain any vectors that are too compressible in addition to vectors that are sparse. In order to state the formal definition we define the following notation that will prove to be useful throughout much of this course . Suppose that $\Lambda \subset \left\{1,2,\cdots ,N\right\}$ is a subset of indices and let ${\Lambda }^{c}=\left\{1,2,\cdots ,N\right\}\setminus \Lambda$ . By ${x}_{\Lambda }$ we typically mean the length $N$ vector obtained by setting the entries of $x$ indexed by ${\Lambda }^{c}$ to zero. Similarly, by ${\Phi }_{\Lambda }$ we typically mean the $M×N$ matrix obtained by setting the columns of $\Phi$ indexed by ${\Lambda }^{c}$ to zero. We note that this notation will occasionally be abused to refer to the length $|\Lambda |$ vector obtained by keeping only the entries corresponding to $\Lambda$ or the $M×|\Lambda |$ matrix obtained by only keeping the columns corresponding to $\Lambda$ . The usage should be clear from the context, but typically there is no substantive difference between the two.

A matrix $\Phi$ satisfies the null space property (NSP) of order $K$ if there exists a constant $C>0$ such that,

${∥{h}_{\Lambda }∥}_{2}\le C\frac{{∥{h}_{{\Lambda }^{c}}∥}_{1}}{\sqrt{K}}$

holds for all $h\in \mathcal{N}\left(\Phi \right)$ and for all $\Lambda$ such that $|\Lambda |\le K$ .

The NSP quantifies the notion that vectors in the null space of $\Phi$ should not be too concentrated on a small subset of indices. For example, if a vector $h$ is exactly $K$ -sparse, then there exists a $\Lambda$ such that ${∥{h}_{{\Lambda }^{c}}∥}_{1}=0$ and hence [link] implies that ${h}_{\Lambda }=0$ as well. Thus, if a matrix $\Phi$ satisfies the NSP then the only $K$ -sparse vector in $\mathcal{N}\left(\Phi \right)$ is $h=0$ .

To fully illustrate the implications of the NSP in the context of sparse recovery, we now briefly discuss how we will measure the performance of sparse recovery algorithms when dealing with general non-sparse $x$ . Towards this end, let $\Delta :{\mathbb{R}}^{M}\to {\mathbb{R}}^{N}$ represent our specific recovery method. We will focus primarily on guarantees of the form

${∥\Delta ,\left(,\Phi ,x,\right),-,x∥}_{2}\le C\frac{{\sigma }_{K}{\left(x\right)}_{1}}{\sqrt{K}}$

for all $x$ , where we recall that

${\sigma }_{K}{\left(x\right)}_{p}=\underset{\stackrel{^}{x}\in {\Sigma }_{K}}{min}{∥x,-,\stackrel{^}{x}∥}_{p}.$

This guarantees exact recovery of all possible $K$ -sparse signals, but also ensures a degree of robustness to non-sparse signals that directly depends on how well the signals are approximated by $K$ -sparse vectors. Such guarantees are called instance-optimal since they guarantee optimal performance for each instance of $x$   [link] . This distinguishes them from guarantees that only hold for some subset of possible signals, such as sparse or compressible signals — the quality of the guarantee adapts to the particular choice of $x$ . These are also commonly referred to as uniform guarantees since they hold uniformly for all $x$ .

Our choice of norms in  [link] is somewhat arbitrary. We could easily measure the reconstruction error using other ${\ell }_{p}$ norms. The choice of $p$ , however, will limit what kinds of guarantees are possible, and will also potentially lead to alternative formulations of the NSP. See, for instance,  [link] . Moreover, the form of the right-hand-side of [link] might seem somewhat unusual in that we measure the approximation error as ${\sigma }_{K}{\left(x\right)}_{1}/\sqrt{K}$ rather than simply something like ${\sigma }_{K}{\left(x\right)}_{2}$ . However, we will see later in this course that such a guarantee is actually not possible without taking a prohibitively large number of measurements, and that [link] represents the best possible guarantee we can hope to obtain (see "Instance-optimal guarantees revisited" ).

Later in this course, we will show that the NSP of order $2K$ is sufficient to establish a guarantee of the form [link] for a practical recovery algorithm (see "Noise-free signal recovery" ). Moreover, the following adaptation of a theorem in  [link] demonstrates that if there exists any recovery algorithm satisfying [link] , then $\Phi$ must necessarily satisfy the NSP of order $2K$ .

## (theorem 3.2 of [link] )

Let $\Phi :{\mathbb{R}}^{N}\to {\mathbb{R}}^{M}$ denote a sensing matrix and $\Delta :{\mathbb{R}}^{M}\to {\mathbb{R}}^{N}$ denote an arbitrary recovery algorithm. If the pair $\left(\Phi ,\Delta \right)$ satisfies [link] then $\Phi$ satisfies the NSP of order $2K$ .

Suppose $h\in \mathcal{N}\left(\Phi \right)$ and let $\Lambda$ be the indices corresponding to the $2K$ largest entries of $h$ . We next split $\Lambda$ into ${\Lambda }_{0}$ and ${\Lambda }_{1}$ , where $|{\Lambda }_{0}|=|{\Lambda }_{1}|=K$ . Set $x={h}_{{\Lambda }_{1}}+{h}_{{\Lambda }^{c}}$ and ${x}^{\text{'}}=-{h}_{{\Lambda }_{0}}$ , so that $h=x-{x}^{\text{'}}$ . Since by construction ${x}^{\text{'}}\in {\Sigma }_{K}$ , we can apply [link] to obtain ${x}^{\text{'}}=\Delta \left(\Phi {x}^{\text{'}}\right)$ . Moreover, since $h\in \mathcal{N}\left(\Phi \right)$ , we have

$\Phi h=\Phi \left(x,-,{x}^{\text{'}}\right)=0$

so that $\Phi {x}^{\text{'}}=\Phi x$ . Thus, ${x}^{\text{'}}=\Delta \left(\Phi x\right)$ . Finally, we have that

${∥{h}_{\Lambda }∥}_{2}\le {∥h∥}_{2}={∥x,-,{x}^{\text{'}}∥}_{2}={∥x,-,\Delta ,\left(,\Phi ,x,\right)∥}_{2}\le C\frac{{\sigma }_{K}{\left(x\right)}_{1}}{\sqrt{K}}=\sqrt{2}C\frac{{∥{h}_{{\Lambda }^{c}}∥}_{1}}{\sqrt{2K}},$

where the last inequality follows from [link] .

### Read also:

#### Get the best College algebra course in your pocket!

Source:  OpenStax, An introduction to compressive sensing. OpenStax CNX. Apr 02, 2011 Download for free at http://legacy.cnx.org/content/col11133/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'An introduction to compressive sensing' conversation and receive update notifications?

 By Mariah Hauptman By Gerr Zen