压缩感知与单像素照相机

【译文】

【作者 Terence Tao;译者 山寨盲流,他的更多译作在;校对 木遥】

最近有不少人问我究竟"压缩感知"是什么意思(特别是随着最近这个概念名声大噪),所谓“单像素相机”又是怎样工作的(又怎么能在某些场合比传统相机有优势呢)。这个课题已经有了大量文献,不过对于这么一个相对比较新的领域,还没有一篇优秀的非技术性介绍。所以笔者在此小做尝试,希望能够对非数学专业的读者有所帮助。

具体而言我将主要讨论摄像应用,尽管压缩传感作为测量技术应用于比成像广泛得多的领域(例如天文学,核磁共振,统计选取,等等),我将在帖子结尾简单谈谈这些领域。

相机的用途,自然是记录图像。为了简化论述,我们把图像假设成一个长方形阵列,比如说一个1024x2048像素的阵列(这样就总共是二百万像素)。为了省略彩色的问题(这个比较次要),我们就假设只需要黑白图像,那么每个像素就可以用一个整型的灰度值来计量其亮度(例如用八位整型数表示0到 255,16位表示0到65535)。

接下来,按照最最简化的说法,传统相机会测量每一个像素的亮度(在上述例子中就是二百万个测量值),结果得到的图片文件就比较大(用8位灰度值就是2MB,16位灰度就是4MB)。数学上就认为这个文件是用超高维矢量值描绘的(在本例中就是约二百万维)。(Daniel注:1Byte=8bit(位),1024*2048*8bit=2MByte)

在我开始讲“压缩感知”这个新故事之前,必须先快速回顾一下“老式压缩”的旧故事。(已经了解图像压缩算法的读者可以跳过这几段。)

上述的图片会占掉相机的很多存储空间(上传到计算机里还占磁盘空间),在各种介质之间传输的时候也要浪费时间。于是,相机带有显著压缩图像的功能就顺理成章了(通常能从2MB那么大压缩到十分之一——200KB的一小坨)。关键是尽管“所有图片”所构成的空间要占用2MB的“自由度”或者说“熵”,(Daniel注:在统计学中,自由度指的是计算某一统计量时,取值不受限制的变量个数。如有3个变量x、y、z,但x+y+z=18,因此其自由度等于2。)由“有意义的图片”所构成的空间其实要小得多,尤其是如果人们愿意降低一点图像质量的话。(实际上,如果一个人真的利用所有的自由度随机生成一幅图片,他不大可能得到什么有意义的图像,而是得到相当于电视荧屏上的静电雪花那样的随机噪声之类。)

怎么样压缩图像?方式多种多样,其中有些非常先进,不过我来试试用一种不太高科技的(而且也不太精确的)说法来描述一下这些先进技术。图像通常都含有大片无细节部分--比如在风景照里面,将近一半的画面都可能被单色的天空背景占据。我们假设提取一个大方块,比方说100x100像素,其中完全是同一颜色的——假设是全白的吧。无压缩时,这个方块要占10000字节存储空间(按照8位灰度算);但是我们可以只记录这个方块的维度和坐标,还有填充整个方块的单一颜色;这样总共也只要记录四五个字节,省下了可观的空间。不过在现实中,压缩效果没有这么好,因为表面看来没有细节的地方其实是有着细微的色差的。所以,给定一个无细节方块,我们记录其平均色值,就把图片中这一块区域抽象成了单色色块,只留下微小的残余误差。接下来就可以继续选取更多色彩可见的方块,抽象成单色色块。最后剩下的是亮度(色彩强度)很小的,肉眼无法察觉的细节。于是就可以抛弃这些剩余的细节,只需要记录那些“可见”色块的大小,位置和亮度。日后则可以反向操作,重建出比原始图像质量稍低一些,占空间却小得多的复制图片。

其实上述的算法并不适合处理颜色剧烈变动的情况,所以在实际应用中不很有效。事实上,更好的办法不是用均匀色块,而是用“不均匀”的色块——比方说右半边色彩强度平均值大于左半边这样的色块。这种情况可以用(二维)Haar小波系统来描述。后来人们又发现一种"更平滑的"小波系统更能够避免误差,不过这都是技术细节,我们就不深入讨论了。然而所有这些系统的原理都是相同的:把原始图像表示为不同“小波(类似于上文中的色块)”的线性叠加,记录显著的(高强度的)小波的系数,放弃掉(或者用阈值排除掉)剩下的小波系数。这种“小波系数硬阈值”压缩算法没有实际应用的算法(比如JPEG 2000标准中所定义的)那么精细,不过多少也能描述压缩的普遍原理。

总体来讲(也是非常简化的说法),原始的1024x2048图像可能含有两百万自由度,想要用小波来表示这个图像的人需要两百万个不同小波才能完美重建。但是典型的有意义的图像,从小波理论的角度看来是非常稀疏的,也就是可压缩的:可能只需要十万个小波就已经足够获取图像所有的可见细节了,其余一百九十万小波只贡献很少量的,大多数观测者基本看不见的“随机噪声”。(这也不是永远适用:含有大量纹理的图像--比如毛发、毛皮的图像——用小波算法特别难压缩,也是图像压缩算法的一大挑战。不过这是另一个故事了。)

接下来呢,如果我们(或者不如说是相机)事先知道两百万小波系数里面哪十万个是重要的,那就可以只计量这十万个系数,别的就不管了。(在图像上设置一种合适的“过滤器”或叫“滤镜”,然后计量过滤出来的每个像素的色彩强度,是一种可行的系数计量方法。)但是,相机是不会知道哪个系数是重要的,所以它只好计量全部两百万个像素,把整个图像转换成基本小波,找出需要留下的那十万个主导基本小波,再删掉其余的。(这当然只是真正的图像压缩算法的一个草图,不过为了便于讨论我们还是就这么用吧。)

(Daniel注:小波(Wavelet)这一术语,顾名思义,“小波”就是小的波形。所谓“小”是指它具有衰减性;而称之为“波”则是指它的波动性,其振幅正负相间的震荡形式。与Fourier变换相比,小波变换是时间(空间)频率的局部化分析,它通过伸缩平移运算对信号(函数)逐步进行多尺度细化,最终达到高频处时间细分,低频处频率细分,能自动适应时频信号分析的要求,从而可聚焦到信号的任意细节,解决了Fourier变换的困难问题,成为继Fourier变换以来在科学方法上的重大突破。有人把小波变换称为“数学显微镜”。傅立叶变换通过把空间图像转变为频率域图像,并根据傅立叶级数平移性,将原点移到图像中心,使图像成放射状。中心是低频信号,周围是高频信号。用于滤除高频噪声或选取文理操作。那小波可能就是利用了这种性质分析低频信号,像作者所说的那样把他们当作“块”来处理,然后高频的,比如“头发”就直接在空间尺度保留,那样的话就实现了低频和高频信号的保留,同时实现图像压缩存储。???)

那么,如今的数码相机当然已经很强大了,没什么问题干吗还要改进?事实上,上述的算法,需要收集大量数据,但是只需要存储一部分,在消费摄影中是没有问题的。尤其是随着数据存储变得很廉价,现在拍一大堆完全不压缩的照片也无所谓。而且,尽管出了名地耗电,压缩所需的运算过程仍然算得上轻松。但是,在非消费领域的某些应用中,这种数据收集方式并不可行,特别是在传感器网络中。如果打算用上千个传感器来收集数据,而这些传感器需要在固定地点呆上几个月那么长的时间,那么就需要尽可能地便宜和节能的传感器——这首先就排除了那些有强大运算能力的传感器(然而——这也相当重要——我们在接收处理数据的接收端仍然需要现代科技提供的奢侈的运算能力)。在这类应用中,数据收集方式越“傻瓜”越好(而且这样的系统也需要很强壮,比如说,能够忍受10%的传感器丢失或者各种噪声和数据缺损)。

这就是压缩传感的用武之地了。其理论依据是:如果只需要10万个分量就可以重建绝大部分的图像,那何必还要做所有的200万次测量,只做10万次不就够了吗?(在实际应用中,我们会留一个安全余量,比如说测量30万像素,以应付可能遭遇的所有问题,从干扰到量化噪声,以及恢复算法的故障。)这样基本上能使节能上一个数量级,这对消费摄影没什么意义,对传感器网络而言却有实实在在的好处。

不过,正像我前面说的,相机自己不会预先知道两百万小波系数中需要记录哪十万个。要是相机选取了另外10万(或者30万),反而把图片中所有有用的信息都扔掉了怎么办?

解决的办法简单但是不太直观。就是用非小波的算法来做30万个测量——尽管我前面确实讲过小波算法是观察和压缩图像的最佳手段。实际上最好的测量其实应该是(伪)随机测量——比如说随机生成30万个“滤镜”图像并测量真实图像与每个滤镜的相关程度。这样,图像与滤镜之间的这些测量结果(也就是“相关性”)很有可能是非常小非常随机的。但是——这是关键所在——构成图像的2百万种可能的小波函数会在这些随机的滤镜的测量下生成自己特有的“特征”,它们每一个都会与某一些滤镜成正相关,与另一些滤镜成负相关,但是与更多的滤镜不相关。可是(在极大的概率下)2百万个特征都各不相同;更有甚者,其中任意十万个的线性组合仍然是各不相同的(以线性代数的观点来看,这是因为一个30万维线性子空间中任意两个10万维的子空间极有可能互不相交)。因此,基本上是有可能从这30万个随机数据中恢复图像的(至少是恢复图像中的10万个主要细节)。简而言之,我们是在讨论一个哈希函数的线性代数版本。

(Daniel注:理想的情况是希望不经过任何比较,一次存取便能够得到所查记录,那就必须在记录的存储位置和它的关键字之间建立一个确定的对应关系f,使每个关键字和结构中一个唯一的存储位置相对应。因而在查找时,只要根据这个对应关系f找到给定值k的像f(K)。若结构中存在关键字和K相等的记录,则必定在f(K)的存储位置上,由此,不需要进行比较便可直接取得所查记录。在此,我们称这个对应关系f为哈希(Hash)函数,按这个思想建立的表为哈希表。了解了哈希函数的定义,其实哈希函数是建立两组完全不相关(独立)的数的映射关系f,那两组数据一组作为x索引(自变量),一组作为y值(因变量),然后建立他们间的函数关系。然而若x和y相互独立,则通过关系f就能根据x得到唯一对应的y。作者也是这个意思,他的两个10万维子空间互不相交也就是独立,那就满足了哈希函数的定义,那么就可以根据建立的哈希表恢复这10W个像素的位置。???)

然而这种方式仍然存在两个技术问题。首先是噪声问题:10万个小波系数的叠加并不能完全代表整幅图像,另190万个系数也有少许贡献。这些小小贡献有可能会干扰那10万个小波的特征,这就是所谓的“失真”问题。第二个问题是如何运用得到的30万测量数据来重建图像。

我们先来关注后一个问题。如果我们知道了2百万小波中哪10万个是有用的,那就可以使用标准的线性代数方法(高斯消除法,最小二乘法等等)来重建信号。(这正是线性编码最大的优点之一——它们比非线性编码更容易求逆。大多数哈希变换实际上是不可能求逆的——这在密码学上是一大优势,在信号恢复中却不是。)可是,就像前面说的那样,我们事前并不知道哪些小波是有用的。怎么找出来呢?一个单纯的最小二乘近似法会得出牵扯到全部2百万系数的可怕结果,生成的图像也含有大量颗粒噪点。要不然也可以代之以一种强力搜索,为每一组可能的10万关键系数都做一次线性代数处理,不过这样做的耗时非常恐怖(总共要考虑大约10的17万次方个组合!),而且这种强力搜索通常是NP完备的(其中有些特例是所谓的“子集合加总”问题)。不过还好,还是有两种可行的手段来恢复数据:

(Daniel注:NP的英文全称是Non-deterministic Polynomial的问题,即多项式复杂程度的非确定性问题。)

• 匹配追踪:找到一个其标记看上去与收集到的数据相关的小波;在数据中去除这个标记的所有印迹;不断重复直到我们能用小波标记“解释”收集到的所有数据。

• 基追踪(又名L1模最小化):在所有与录得数据匹配的小波组合中,找到一个“最稀疏的”,也就是其中所有系数的绝对值总和越小越好。(这种最小化的结果趋向于迫使绝大多数系数都消失了。)这种最小化算法可以利用单纯形法之类的凸规划算法,在合理的时间内计算出来。

(Daniel注:线性规划中最基本的概念之一。是由系数矩阵A中的线性无关的列向量构成的可逆方阵。用来构成的列向量称为该向量。)

需要注意到的是,这类图像恢复算法还是需要相当的运算能力的(不过也还不是太变态),不过在传感器网络这样的应用中这不成问题,因为图像恢复是在接收端(这端有办法连接到强大的计算机)而不是传感器端(这端就没办法了)进行的。

现在已经有严密的结果显示,对原始图像设定不同的压缩率或稀疏性,这两种算法完美或近似完美地重建图像的成功率都很高。匹配追踪法通常比较快,而基追踪算法在考虑到噪声时则显得比较准确。这些算法确切的适用范围问题在今天仍然是非常热门的研究领域。(说来遗憾,目前还没有出现对P不等于NP问题的应用;如果一个重建问题(在考虑到测量矩阵时)是NP完备的,那它刚好就不能用上述算法解决。)

由于压缩传感还是一个相当新的领域(尤其是严密的数学结果刚刚出现),现在就期望这个技术应用到实用的传感器上还为时尚早。不过已经有概念验证模型出现了,其中最著名的是Rice大学研制的单像素相机。

最后必须提到的是,压缩传感技术是一种抽象的数学概念,而不是具体的操作方案,它可以应用到成像以外的许多领域。以下只是其中几个例子:

• 磁共振成像(MRI)。在医学上,磁共振的工作原理是做许多次(但次数仍是有限的)测量(基本上就是对人体图像进行离散拉东变换(也叫X光变换)),再对数据进行加工来生成图像(在这里就是人体内水的密度分布图像)。由于测量次数必须很多,整个过程对患者来说太过漫长。压缩传感技术可以显著减少测量次数,加快成像(甚至有可能做到实时成像,也就是核磁共振的视频而非静态图像)。此外我们还可以以测量次数换图像质量,用与原来一样的测量次数可以得到好得多的图像分辨率。

• 天文学。许多天文现象(如脉冲星)具有多种频率震荡特性,使其在频域上是高度稀疏也就是可压缩的。压缩传感技术将使我们能够在时域内测量这些现象(即记录望远镜数据)并能够精确重建原始信号,即使原始数据不完整或者干扰严重(原因可能是天气不佳,上机时间不够,或者就是因为地球自传使我们得不到全时序的数据)。

• 线性编码。压缩传感技术提供了一个简单的方法,让多个传送者可以将其信号带纠错地合并传送,这样即使输出信号的一大部分丢失或毁坏,仍然可以恢复出原始信号。例如,可以用任意一种线性编码把1000比特信息编码进一个3000比特的流;那么,即使其中300位被(恶意)毁坏,原始信息也能完全无损失地完美重建。这是因为压缩传感技术可以把破坏动作本身看作一个稀疏的信号(只集中在3000比特中的300位)。

许多这种应用都还只停留在理论阶段,可是这种算法能够影响测量和信号处理中如此之多的领域,其潜力实在是振奋人心。笔者自己最有成就感的就是能看到自己在纯数学领域的工作(例如估算傅立叶子式的行列式或单数值)最终具备造福现实世界的前景。

 

【原文】

Compressed sensing and single-pixel cameras

by Terence Tao

I’ve had a number of people ask me (especially in light of some recent publicity) exactly what “compressed sensing” means, and how a “single pixel camera” could possibly work (and how it might be advantageous over traditional cameras in certain circumstances). There is a large literature on the subject, but as the field is relatively recent, there does not yet appear to be a good non-technical introduction to the subject. So here’s my stab at the topic, which should hopefully be accessible to a non-mathematical audience.

For sake of concreteness I’ll primarily discuss the camera application, although compressed sensing is a more general measurement paradigm which is applicable to other contexts than imaging (e.g. astronomy, MRI, statistical selection, etc.), as I’ll briefly remark upon at the end of this post.

The purpose of a camera is, of course, to record images. To simplify the discussion, let us think of an image as a rectangular array, e.g. a 1024 x 2048 array of pixels (thus there are 2 megapixels in all). To ignore the (minor) issue of colour, let us assume that we are just taking a black-and-white picture, so that each pixel is measured in grayscale as an integer (e.g. an 8-bit integer from 0 to 255, or a 16-bit integer from 0 to 65535) which signifies the intensity of each pixel.

Now, to oversimplify things quite a bit, a traditional digital camera would take one measurement of intensity for each of its pixels (so, about 2 million measurements in the above example), resulting in a relatively large image file (2MB if one uses 8-bit grayscale, or 4MB if one uses 16-bit grayscale). Mathematically, this file can be represented by a very high-dimensional vector of numbers (in this example, the dimension is about 2 million).

Before I get to the new story of “compressed sensing”, I have to first quickly review the somewhat older story of plain old “compression”. (Those who already know how image compression works can skip forward a few paragraphs.)

The images described above can take up a lot of disk space on the camera (or on some computer where the images are later uploaded), and also take a non-trivial amount of time (and energy) to transfer from one medium to another. So, it is common practice to get the camera to compress the image, from an initial large size (e.g. 2MB) to a much smaller size (e.g. 200KB, which is 10% of the size). The thing is that while the space of all images has 2MB worth of “degrees of freedom” or “entropy”, the space of all interesting images is much smaller, and can be stored using much less space, especially if one is willing to throw away some of the quality of the image. (Indeed, if one generates an image at random, one will almost certainly not get an interesting image; instead, one will just get random noise looking much like the static one can get on TV screens.)

How can one compress an image? There are many ways, some of which are rather technical, but let me try to give a non-technical (and slightly inaccurate) sketch of how it is done. It is quite typical for an image to have a large featureless component - for instance, in a landscape, up to half of the picture might be taken up by a monochromatic sky background. Suppose for instance that we locate a large square, say 100 \times 100 pixels, which are all exactly the same colour - e.g. all white. Without compression, this square would take 10,000 bytes to store (using 8-bit grayscale); however, instead, one can simply record the dimensions and location of the square, and note a single colour with which to paint the entire square; this will require only four or five bytes in all to record, leading to a massive space saving. Now in practice, we don’t get such an impressive gain in compression, because even apparently featureless regions have some small colour variation between them. So, given a featureless square, what one can do is record the average colour of that square, and then subtract that average off from the image, leaving a small residual error. One can then locate more squares where the average colour is significant, and subtract those off as well. If one does this a couple times, eventually the only stuff left will be very small in magnitude (intensity), and not noticeable to the human eye. So we can throw away the rest of the image and record only the size, location, and intensity of the “significant” squares of the image. We can then reverse this process later and reconstruct a slightly lower-quality replica of the original image, which uses much less space.

Now, the above algorithm is not all that effective in practice, as it does not cope well with sharp transitions from one colour to another. It turns out to be better to work not with average colours in squares, but rather with average colour imbalances in squares - the extent to which the intensity on (say) the right half of the square is higher on average than the intensity on the left. One can formalise this by using the (two-dimensional) Haar wavelet system. It then turns out that one can work with “smoother” wavelet systems which are less susceptible to artefacts, but this is a technicality which we will not discuss here. But all of these systems lead to similar schemes: one represents the original image as a linear superposition of various “wavelets” (the analogues of the coloured squares in the preceding paragraph), stores all the significant (large magnitude) wavelet coefficients, and throws away (or “thresholds”) all the rest. This type of “hard wavelet coefficient thresholding” compression algorithm is not nearly as sophisticated as the ones actually used in practice (for instance in the JPEG 2000 standard) but it is somewhat illustrative of the general principles in compression.

To summarise (and to oversimplify somewhat), the original 1024 \times 2048 image may have two million degrees of freedom, and in particular if one wants to express this image in terms of wavelets then one would need thus need two million different wavelets in order to reconstruct all images perfectly. However, the typical interesting image is very sparse or compressible in the wavelet basis: perhaps only a hundred thousand of the wavelets already capture all the notable features of the image, with the remaining 1.9 million wavelets only contributing a very small amount of “random noise” which is largely invisible to most observers. (This is not always the case: heavily textured images - e.g. images containing hair, fur, etc. - are not particularly compressible in the wavelet basis, and pose a challenge for image compression algorithms. But that is another story.)

Now, if we (or the camera) knew in advance which hundred thousand of the 2 million wavelet coefficients are going to be the important ones, then the camera could just measure those coefficients and not even bother trying to measure the rest. (It is possible to measure a single coefficient by applying a suitable “filter” or “mask” to the image, and making a single intensity measurement to what comes out.) However, the camera does not know which of the coefficients are going to be the key ones, so it must instead measure all 2 million pixels, convert the image to a wavelet basis, locate the hundred thousand dominant wavelet coefficients to keep, and throw away the rest. (This is of course only a caricature of how the image compression algorithm really works, but we will use it for sake of discussion.)

Now, of course, modern digital cameras work pretty well, and why should we try to improve on something which isn’t obviously broken? Indeed, the above algorithm, in which one collects an enormous amount of data but only saves a fraction of it, works just fine for consumer photography. Furthermore, with data storage becoming quite cheap, it is now often feasible to use modern cameras to take many images with no compression whatsoever. Also, the computing power required to perform the compression is manageable, even if it does contribute to the notoriously battery-draining energy consumption level of these cameras. However, there are non-consumer imaging applications in which this type of data collection paradigm is infeasible, most notably in sensor networks. If one wants to collect data using thousands of sensors, which each need to stay in situ for long periods of time such as months, then it becomes necessary to make the sensors as cheap and as low-power as possible - which in particular rules out the use of devices which require heavy computer processing power at the sensor end (although - and this is important - we are still allowed the luxury of all the computer power that modern technology affords us at the receiver end, where all the data is collected and processed). For these types of applications, one needs a data collection paradigm which is as “dumb” as possible (and which is also robust with respect to, say, the loss of 10% of the sensors, or with respect to various types of noise or data corruption).

This is where compressed sensing comes in. The main philosophy is this: if one only needs a 100,000 components to recover most of the image, why not just take a 100,000 measurements instead of 2 million? (In practice, we would allow a safety margin, e.g. taking 300,000 measurements, to allow for all sorts of issues, ranging from noise to aliasing to breakdown of the recovery algorithm.) In principle, this could lead to a power consumption saving of up to an order of magnitude, which may not mean much for consumer photography but can be of real importance in sensor networks.

But, as I said before, the camera does not know in advance which hundred thousand of the two million wavelet coefficients are the important ones that one needs to save. What if the camera selects a completely different set of 100,000 (or 300,000) wavelets, and thus loses all the interesting information in the image?

The solution to this problem is both simple and unintuitive. It is to make 300,000 measurements which are totally unrelated to the wavelet basis - despite all that I have said above regarding how this is the best basis in which to view and compress images. In fact, the best types of measurements to make are (pseudo-)random measurements - generating, say, 300,000 random “mask” images and measuring the extent to which the actual image resembles each of the masks. Now, these measurements (or “correlations”) between the image and the masks are likely to be all very small, and very random. But - and this is the key point - each one of the 2 million possible wavelets which comprise the image will generate their own distinctive “signature” inside these random measurements, as they will correlate positively against some of the masks, negatively against others, and be uncorrelated with yet more masks. But (with overwhelming probability) each of the 2 million signatures will be distinct; furthermore, it turns out that arbitrary linear combinations of up to a hundred thousand of these signatures will still be distinct from each other (from a linear algebra perspective, this is because two randomly chosen 100,000-dimensional subspaces of a 300,000 dimensional ambient space will be almost certainly disjoint from each other). Because of this, it is possible in principle to recover the image (or at least the 100,000 most important components of the image) from these 300,000 random measurements. In short, we are constructing a linear algebra analogue of a hash function.

There are however two technical problems with this approach. Firstly, there is the issue of noise: an image is not perfectly the sum of 100,000 wavelet coefficients, but also has small contributions from the other 1.9 million coefficients also. These small contributions could conceivably disguise the contribution of the 100,000 wavelet signatures as coming from a completely unrelated set of 100,000 wavelet signatures; this is a type of “aliasing” problem. The second problem is how to use the 300,000 measurements obtained to recover the image.

Let us focus on the latter problem first. If we knew which 100,000 of the 2 million wavelets involved were, then we could use standard linear algebra methods (Gaussian elimination, least squares, etc.) to recover the signal. (Indeed, this is one of the great advantages of linear encodings - they are much easier to invert than nonlinear ones. Most hash functions are practically impossible to invert - which is an advantage in cryptography, but not in signal recovery.) However, as stated before, we don’t know in advance which wavelets are involved. How can we find out? A naive least-squares approach gives horrible results which involve all 2 million coefficients and thus lead to very noisy and grainy images. One could perform a brute-force search instead, applying linear algebra once for each of the possible set of 100,000 key coefficients, but this turns out to take an insanely impractical amount of time (there are roughly 10^{170,000} combinations to consider!) and in any case this type of brute-force search turns out to be NP-complete in general (it contains problems such as subset-sum as a special case). Fortunately, however, there are two much more feasible ways to recover the data:

    * Matching pursuit: locate a wavelet whose signature seems to correlate with the data collected; remove all traces of that signature from the data; and repeat until we have totally “explained” the data collected in terms of wavelet signatures.
    * Basis pursuit (or l^1 minimisation): Out of all the possible combinations of wavelets which would fit the data collected, find the one which is “sparsest” in the sense that the total sum of the magnitudes of all the coefficients is as small as possible. (It turns out that this particular minimisation tends to force most of the coefficients to vanish.) This type of minimisation can be computed in reasonable time via convex optimisation methods such as the simplex method.

Note that these image recovery algorithms do require a non-trivial (though not ridiculous) amount of computer processing power, but this is not a problem for applications such as sensor networks since this recovery is done on the receiver end (which has access to powerful computers) rather than the sensor end (which does not).

There are now rigorous results which show that these approaches can reconstruct the original signals perfectly or almost-perfectly with very high probability of success, given various compressibility or sparsity hypotheses on the original image. The matching pursuit algorithm tends to be somewhat faster, but the basis pursuit algorithm seems to be more robust with respect to noise. Exploring the exact range of applicability of these methods is still a highly active current area of research. (Sadly, there does not seem to be an application to P\neq NP; the type of sparse recovery problems which are NP-complete are the total opposite (as far as the measurement matrix is concerned) with the type of sparse recovery problems which can be treated by the above methods.)

As compressed sensing is still a fairly new field (especially regarding the rigorous mathematical results), it is still a bit premature to expect developments here to appear in actual sensors. However, there are proof-of-concept prototypes already, most notably the single-pixel camera developed at Rice.

Finally, I should remark that compressed sensing, being an abstract mathematical idea rather than a specific concrete recipe, can be applied to many other types of contexts than just imaging. Some examples include:

    * Magnetic resonance imaging (MRI). In medicine, MRI attempts to recover an image (in this case, the water density distribution in a human body) by taking a large but finite number of measurements (basically taking a discretised Radon transform (or x-ray transform) of the body), and then reprocessing the data. Because of the large number of measurements needed, the procedure is lengthy for the patient. Compressed sensing techniques can reduce the number of measurements required significantly, leading to faster imaging (possibly even to real-time imaging, i.e. MRI videos rather than static MRI). Furthermore, one can trade off the number of measurements against the quality of the image, so that by using the same number of measurements as one traditionally does, one may be able to get much finer scales of resolution.
    * Astronomy. Many astronomical phenomena (e.g. pulsars) have various frequency oscillation behaviours which make them very sparse or compressible in the frequency domain. Compressed sensing techniques then allow one to measure these phenomena in the time domain (i.e. by recording telescope data) and being able to reconstruct the original signal accurately even from incomplete and noisy data (e.g. if weather, lack of telescope time, or simply the rotation of the earth prevents a complete time-series of data).
    * Linear coding. Compressed sensing also gives a simple way for multiple transmitters to combine their output in an error-correcting way, so that even if a significant fraction of the output is lost or corrupted, the original transmission can still be recovered. For instance, one can transmit 1000 bits of information by encoding them using a random linear code into a stream of 3000 bits; and then it will turn out that even if, say, 300 of the bits (chosen adversarially) are then corrupted, the original message can be reconstructed perfectly with essentially no chance of error. The relationship with compressed sensing arises by viewing the corruption itself as the sparse signal (it is only concentrated on 300 of the 3000 bits).

Many of these applications are still only theoretical, but nevertheless the potential of these algorithms to impact so many types of measurement and signal processing is rather exciting. From a personal viewpoint, it is particularly satisfying to see work arising from pure mathematics (e.g. estimates on the determinant or singular values of Fourier minors) end up having potential application to the real world.

 

【有感】

前沿的科学都是与数学息息相关的,用到了代数理论、离散数学、数字图像处理、算法等多领域的科技,压缩感知理论的确振奋人心,有很多东西也看不懂,一遍看一遍做注释理解,一定有理解不到位之处。同时在看此文之中也了解了不少新知识,也对原来的知识有了新的理解。眨眼

posted @ 2012-01-15 16:42  Ю詺菛╀時代  阅读(906)  评论(0编辑  收藏  举报