简单的纯数字图像(如电话号码、数字验证码)识别

http://www.cnblogs.com/CoreCaiNiao/archive/2011/12/26/2302141.html

 

又到岁末,大家都忙着捞年底最后一桶金,我也不例外,忙着采集数据,不过有时候需要付出一点点时间而已。


在本案例中,我遇到了一个纯数字的电话号码变成了图片需要采集过来,在原网页上以<img src="一个JSP文件地址加一串密码" />的形式展现给我们,在采集的时候,有人建议我绕过去,直接采图片算了,不过本着对品质的追求,还是觉得应该做到采集的同时转化为文本。

我的思路是这样的,先处理保存0-9及“-”的黑白图片到本地磁盘,并分别取名为0.gif,1.gif....9.gif,-.gif,之后采集图片流到内存中,处理成黑白图片后,按长度等分切割,并与本地图片循环比对。这种情况也仅适合于“纯数字的简单”图片。请注意,在本案例中,没有纹路识别,只有像素比对。

于是,试验品开始了:
首先,得到远端图片的地址,并根据这个地址得到Response出来的图片(注,这是一个流,并非一个真正的图片文件,就像网站上的图片验证码一样。)。

在这里我用到了HttpWebRequest,但是发现直接代入图片地址后GET到的是空白,于是加参数.Referer = "http://该网站的域名",好了,现在远端给了我图片的流。


本段伪代码如下:

View Code
1 HttpWebRequest objRequest = (HttpWebRequest)WebRequest.Create("图片地址");
2             objRequest.Timeout = 10000;//设置超时10秒       
3             objRequest.Referer = "http://被采网站的域名/";            
4             HttpWebResponse objResponse = (HttpWebResponse)objRequest.GetResponse();         
5             System.IO.Stream resStream = objResponse.GetResponseStream();
6             从以上代码中,我得到这样一副图片:
7             //保存图片的代码也供上。
8            //System.Drawing.Image i = System.Drawing.Image.FromStream(resStream);
9            //i.Save(@"c:\x.gif ", System.Drawing.Imaging.ImageFormat.Gif);
10            //resStream.Close();
11            //i.Dispose();

第二步,图片流得到了,接下来是处理这副图片了。不过在这之前,我还有件重要的事情要做,因为是纯数字的图片,并且可能区号和电话之间有“-”符号,所以我必须在本地保存了这些图片的黑白样品,这个过程很简单,就是用PHOTOSHOP去色,然后到调整亮度/对比度,各增加100,即可得到一张黑白的图片,之后将其切分为10*20的小图,分别取名为0.gif,1.gif...9.gif,-.gif一共11个图片。

,之后,我的处理流程是:图片流到内存====》变灰处理=====》加亮度,对比度=====》变黑白处理=====》切分并和本地这11张图片比对

View Code
1                 Bitmap iGetPhoto = new Bitmap(resStream);
2                 //第一步 变灰度图
3                 iGetPhoto = ToGray(iGetPhoto);
4                 //第二步 增加亮度100
5                 iGetPhoto = KiLighten(iGetPhoto, 100);
6                 //第三步增加对比度100
7                 iGetPhoto = KiContrast(iGetPhoto, 100);
8                 //第四步 变黑白
9                 iGetPhoto = ToBlackWhite(iGetPhoto);

四个函数体:

View Code
  1         /// <summary>
  2         /// 图片变成灰度
  3         /// </summary>
  4         /// <param name="b"></param>
  5         /// <returns></returns>
  6         public Bitmap ToGray(Bitmap b) {
  7             for (int x = 0; x < b.Width; x++)
  8             {
  9                 for (int y = 0; y < b.Height; y++)
10                 {
11                     Color c = b.GetPixel(x, y);
12                     int luma = (int)(c.R * 0.3 + c.G * 0.59 + c.B * 0.11);//转换灰度的算法
13                     b.SetPixel(x, y, Color.FromArgb(luma, luma, luma));
14                 }
15             }return b;
16         }
17         /// <summary>
18         /// 图像变成黑白
19         /// </summary>
20         /// <param name="b"></param>
21         /// <returns></returns>
22         public Bitmap ToBlackWhite(Bitmap b) {
23             for (int x = 0; x < b.Width; x++)
24             {
25                 for (int y = 0; y < b.Height; y++)
26                 {
27                     Color c = b.GetPixel(x, y);
28                     if (c.R < (byte)255)
29                     {
30                         b.SetPixel(x, y, Color.FromArgb(0, 0, 0));
31                     }
32                 }
33             }return b;
34         }
35         /// <summary>
36         /// 图像亮度调整
37         /// </summary>
38         /// <param name="b"></param>
39         /// <param name="degree"></param>
40         /// <returns></returns>
41         public Bitmap KiLighten(Bitmap b, int degree)
42         {
43
44             if (b == null)
45             {
46
47                 return null;
48
49             }
50
51
52
53             if (degree < -255) degree = -255;
54
55             if (degree > 255) degree = 255;
56
57
58
59             try
60             {
61
62
63
64                 int width = b.Width;
65
66                 int height = b.Height;
67
68
69
70                 int pix = 0;
71
72
73
74                 BitmapData data = b.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
75
76
77
78                 unsafe
79                 {
80
81                     byte* p = (byte*)data.Scan0;
82
83                     int offset = data.Stride - width * 3;
84
85                     for (int y = 0; y < height; y++)
86                     {
87
88                         for (int x = 0; x < width; x++)
89                         {
90
91                             // 处理指定位置像素的亮度
92
93                             for (int i = 0; i < 3; i++)
94                             {
95
96                                 pix = p[i] + degree;
97
98
99
100                                 if (degree < 0) p[i] = (byte)Math.Max(0, pix);
101
102                                 if (degree > 0) p[i] = (byte)Math.Min(255, pix);
103
104
105
106                             } // i
107
108                             p += 3;
109
110                         } // x
111
112                         p += offset;
113
114                     } // y
115
116                 }
117
118
119
120                 b.UnlockBits(data);
121
122
123
124                 return b;
125
126             }
127
128             catch
129             {
130
131                 return null;
132
133             }
134
135
136
137         }
138
139        
140         /// <summary>
141
142         /// 图像对比度调整
143
144         /// </summary>
145
146         /// <param name="b">原始图</param>
147
148         /// <param name="degree">对比度[-100, 100]</param>
149
150         /// <returns></returns>
151
152         public Bitmap KiContrast(Bitmap b, int degree)
153         {
154
155             if (b == null)
156             {
157
158                 return null;
159
160             }
161
162
163
164             if (degree < -100) degree = -100;
165
166             if (degree > 100) degree = 100;
167
168
169
170             try
171             {
172
173
174
175                 double pixel = 0;
176
177                 double contrast = (100.0 + degree) / 100.0;
178
179                 contrast *= contrast;
180
181                 int width = b.Width;
182
183                 int height = b.Height;
184
185                 BitmapData data = b.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
186
187                 unsafe
188                 {
189
190                     byte* p = (byte*)data.Scan0;
191
192                     int offset = data.Stride - width * 3;
193
194                     for (int y = 0; y < height; y++)
195                     {
196
197                         for (int x = 0; x < width; x++)
198                         {
199
200                             // 处理指定位置像素的对比度
201
202                             for (int i = 0; i < 3; i++)
203                             {
204
205                                 pixel = ((p[i] / 255.0 - 0.5) * contrast + 0.5) * 255;
206
207                                 if (pixel < 0) pixel = 0;
208
209                                 if (pixel > 255) pixel = 255;
210
211                                 p[i] = (byte)pixel;
212
213                             } // i
214
215                             p += 3;
216
217                         } // x
218
219                         p += offset;
220
221                     } // y
222                 }
223                 b.UnlockBits(data);
224                 return b;           
225             }catch
226             {
227                 return null;
228             }
229         }

第三步,所有的准备工作已经完成了,开始比对!内存中有了一副黑白的数字图,它的尺寸是140*20,并且左边空出来7像素,右侧待定,具体看电话号码有几位,可能是3像素,也可能是13像素;本地磁盘中有了0-9.gif,它们的尺寸是10*20,现在要做的就是比对:比对代码如下:

View Code
1                 //读取物理磁盘的文件到内存,注意这个.ToServerPath()是扩展方法,它的原型是HttpContext.Server.Mappath("xxx")
2                 Bitmap[] numColl = {
3                     new Bitmap(("/Temp/0.gif").ToServerPath()),
4                     new Bitmap(("/Temp/1.gif").ToServerPath()),
5                     new Bitmap(("/Temp/2.gif").ToServerPath()),
6                     new Bitmap(("/Temp/3.gif").ToServerPath()),
7                     new Bitmap(("/Temp/4.gif").ToServerPath()),
8                     new Bitmap(("/Temp/5.gif").ToServerPath()),
9                     new Bitmap(("/Temp/6.gif").ToServerPath()),
10                     new Bitmap(("/Temp/7.gif").ToServerPath()),
11                     new Bitmap(("/Temp/8.gif").ToServerPath()),
12                     new Bitmap(("/Temp/9.gif").ToServerPath()),
13                     new Bitmap(("/Temp/-.gif").ToServerPath())
14                 };

View Code
1         /// <summary>
2         /// 比较原图和每个小样图,并给出数字结果
3         /// </summary>
4         /// <param name="iGetPhoto"></param>
5         /// <param name="numColl"></param>
6         /// <returns></returns>
7         public string ComparePic(Bitmap iGetPhoto/*原图*/, Bitmap[] numColl/*小图样图集*/) {
8             int numCount = 13;
9             string result = string.Empty;
10             for (int i = 0; i < numCount; i++)
11             {
12                 int x = i * 10 + 7;//原始图的开始取像素位置
13                 Bitmap perBmp = new Bitmap(10, 20);
14                 Graphics gPhoto = Graphics.FromImage(perBmp);
15                 gPhoto.Clear(Color.White);
16                 gPhoto.DrawImage(iGetPhoto/*原图*/, 0, 0,/*目标位置*/ new Rectangle(new Point(x, 0), new Size(10, 20))/*源位置*/, GraphicsUnit.Pixel);
17                 for (int j = 0; j < 11; j++)//这是数字样图的集合循环
18                 {
19                     bool isTrue = true;//接下来循环小图的每一个像素,与大图中裁出的小图作比较,只要一个像素不对,就OVER
20                     for (int n = 0; n < 20; n++)
21                     {
22                         for (int m = 0; m < 10; m++)
23                         {
24                             Color point1 = perBmp.GetPixel(m, n);
25                             Color point2 = numColl[j].GetPixel(m, n);
26                             if (point1.ToArgb() != point2.ToArgb())
27                             {
28                                 isTrue = false;
29                             }
30                         }
31                     }
32                     if (isTrue)
33                     {
34                         result += j == 10 ? "-" : j.ToString();
35                         break;
36                     }
37                 }
38                 perBmp.Dispose();
39                 gPhoto.Dispose();
40             }
41             return result;
42         }

最后,把零散的调用封装成调用的入口函数:

View Code
1         //入口函数
2         public string GetPicTel(string url)
3         {
4             HttpWebRequest objRequest = (HttpWebRequest)WebRequest.Create(url);
5             objRequest.Timeout = 10000;//设置尾5秒       
6             objRequest.Referer = "http://你要偷图片的网站域名";
7             try
8             {
9                 HttpWebResponse objResponse = (HttpWebResponse)objRequest.GetResponse(); 
10                 //System.IO.Stream resStream = objResponse.GetResponseStream();
11                 //System.Drawing.Image i = System.Drawing.Image.FromStream(resStream);
12                 //i.Save(@"c:\x.gif ", System.Drawing.Imaging.ImageFormat.Gif);
13                 //resStream.Close();
14                 //i.Dispose();
15
16                 System.IO.Stream resStream = objResponse.GetResponseStream();               
17                 Bitmap iGetPhoto = new Bitmap(resStream);
18                 //第一步 变灰度图
19                 iGetPhoto = ToGray(iGetPhoto);
20                 //第二步 增加亮度100
21                 iGetPhoto = KiLighten(iGetPhoto, 100);
22                 //第三步增加对比度100
23                 iGetPhoto = KiContrast(iGetPhoto, 100);
24                 //第四步 变黑白
25                 iGetPhoto = ToBlackWhite(iGetPhoto);
26
27               
28
29                 //测试图片的结果
30                 //iGetPhoto.Save(@"c:\x.gif ", System.Drawing.Imaging.ImageFormat.Gif);
31                 //resStream.Close();
32                 //return string.Empty;
33                 Bitmap[] numColl = {
34                     new Bitmap(("/Temp/0.gif").ToServerPath()),
35                     new Bitmap(("/Temp/1.gif").ToServerPath()),
36                     new Bitmap(("/Temp/2.gif").ToServerPath()),
37                     new Bitmap(("/Temp/3.gif").ToServerPath()),
38                     new Bitmap(("/Temp/4.gif").ToServerPath()),
39                     new Bitmap(("/Temp/5.gif").ToServerPath()),
40                     new Bitmap(("/Temp/6.gif").ToServerPath()),
41                     new Bitmap(("/Temp/7.gif").ToServerPath()),
42                     new Bitmap(("/Temp/8.gif").ToServerPath()),
43                     new Bitmap(("/Temp/9.gif").ToServerPath()),
44                     new Bitmap(("/Temp/-.gif").ToServerPath())
45                 };
46                 return ComparePic(iGetPhoto, numColl);
47            
48            
49             }
50             catch (Exception ex)
51             {
52                 return string.Empty;
53             }
54         }

另外,要说明的是,有些验证码(不带扭曲),有倾斜或上下波动的也可以用这种方法搞定,只是需要再多动一点点脑筋.但是带扭曲效果的验证码就是非常专业的事情了,不过我们只用来做采集,不是爆破专家,这种方式应该基本满足应用了.

希望本文可以抛砖引玉,帮你采集到你需要的数据,当然,尽可能地支持别人的版权!也正是因为版权,恕我无法给出原始图片地址.

posted @ 2011-12-27 14:49  冰封的心  阅读(535)  评论(0)    收藏  举报