ruby on Httpwatch 脚本
HTTPwatch官方:http://www.httpwatch.com/rubywatir/
ruby on httpwatch例子:http://www.httpwatch.com/rubywatir/site_spider.zip (这个例子官网可能更新)
得到这个例子后做了一些中文注释,对一些代码进行了删减,主要修改内容如下:
1、在url = gets.chomp!上面添加($*[0].nil?)?(url = url):(url = $*[0]),目前URL可以在命令行加载,也可以在脚本中固定;命令行方式用法:ruby 脚本名 网站名,具体的用法请参看脚本中的注释,说明一下 在URL前面不要添加http://
2、注视掉了两个break,在ruby186版本没有问题,在ruby192这样的高版本上会有错,需要注视掉
3、注视掉 plugin.Container.Quit(); 即不退出IE,运行完毕后,测试人员需要去查看结果
运行时问题:如果测试机网速较低可能出现超时而退出
C:/Ruby192/lib/ruby/gems/1.9.1/gems/watir-classic-3.0.0/lib/watir-classic/ie-cla
ss.rb:374:in `method_missing': (in OLE method `navigate': ) (WIN32OLERuntimeErro
r)
OLE error code:800C000E in <Unknown>
<No Description>
HRESULT error code:0x80020009
发生意外。
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/watir-classic-3.0.0/lib/watir-c
lassic/ie-class.rb:374:in `goto'
from C:/Documents and Settings/Administrator/桌面/site_spider/site_spide
r.rb:55:in `<main>'
site_spider.rb
1 # A Site Spider that use HttpWatch, Ruby And Watir
2 #
3 # For more information about this example please refer to http://www.httpwatch.com/rubywatir/
4 #
5 MAX_NO_PAGES = 200 #一次访问多少个页面,由MAX_ON_PAGES控制
6
7 require 'win32ole' # win32ole来驱动HttpWatch工具,HttpWatch6.0以下版本不能调用
8 require 'rubygems'
9 require 'watir'
10 require './url_ops.rb' # url_ops.rb要放在该脚本的同一目录下
11 url = "www.gaopeng.com/?ADTAG=beijing_from_beijing" #要测试的URL,也可以在命令行读取前面不要添加http://
12
13 # Create HttpWatch
14 control = WIN32OLE.new('HttpWatch.Controller')
15 httpWatchVer = control.Version
16 if httpWatchVer[0...1] == "4" or httpWatchVer[0...1] == "5"
17 puts "\nERROR: You are running HttpWatch #{httpWatchVer}. This sample requires HttpWatch 6.0 or later. Press Enter to exit..."; $stdout.flush
18 gets
19 #break #ruby186版本没有问题,在ruby192这样的高版本上会有错,需要注视掉
20 end
21
22 # Get the domain name to spider
23 puts "Enter the domain name of the site to check (press enter for url):\n"; $stdout.flush
24 ($*[0].nil?)?(url = url):(url = $*[0]) #从命令行传文件名过去,优先读取命令行的
25 #url = gets.chomp! #如果添加上面一行的代码,必须注视这一行
26 if url.empty?
27 url = url
28 end
29 hostName =url.HostName
30 if hostName.empty?
31 puts "\nPlease enter a valid domain name. Press Enter to exit..."; $stdout.flush
32 gets
33 #break #ruby186版本没有问题,在ruby192这样的高版本上会有错,需要注视掉
34 end
35
36 # 启动IE
37 ie = Watir::IE.new
38 ie.logger.level = Logger::ERROR
39
40 # 定位IE窗口
41 plugin = control.ie.Attach(ie.ie)
42
43 # 开始记录HTTP流量
44 plugin.Clear()
45 plugin.Log.EnableFilter(false)
46 plugin.Record()
47
48
49 url = url.CanonicalUrl
50 urlsVisited = Array.new; urlsToVisit = Array.new( 1, url )
51 # 开始访问页面
52
53 while urlsToVisit.length > 0 && urlsVisited.length < MAX_NO_PAGES
54
55 nextUrl= urlsToVisit.pop
56 puts "Loading " + nextUrl + "..."; $stdout.flush
57
58 ie.goto(nextUrl) # get WATIR to load URL
59 urlsVisited.push( nextUrl) # store this URL in the list that has been visited
60
61 begin
62 # Look at each link on the page and decide if it needs to be visited
63 ie.links().each() do |link|
64
65 linkUrl = link.href.CanonicalUrl
66 # if the url has already been accessed or if it is a download or if it from a different domain
67 if !url.IsSubDomain( linkUrl.HostName ) ||
68 linkUrl.Path.include?( ".exe" ) || linkUrl.Path.include?(".zip") || linkUrl.Path.include?(".csv") ||
69 linkUrl.Path.include?( ".pdf" ) || linkUrl.Path.include?( ".png" ) ||
70 urlsToVisit.find{ |aUrl| aUrl == linkUrl} != nil ||
71 urlsVisited.find{ |aUrl| aUrl == linkUrl} != nil
72 # Don't add this URL to the list
73 next
74 end
75 # Add this URL to the list
76 urlsToVisit.push(linkUrl)
77 end
78 rescue
79 puts "Failed to find links in " + nextUrl + " " + $!; $stdout.flush
80 end
81
82 end
83
84 if ( urlsVisited.length == MAX_NO_PAGES )
85 puts "\nThe spider has stopped because #{MAX_NO_PAGES} pages have been visited. (Change MAX_NO_PAGES if you want to increase this limit)"; $stdout.flush
86 end
87
88 # Stop Recording HTTP data in HttpWatch
89 plugin.Stop()
90
91 puts "\nAnalyzing HTTP data.."; $stdout.flush
92
93
94 # Look at each HTTP request in the log to compile list of URLs
95 # for each error
96 errorUrls = Hash.new
97 plugin.Log.Entries.each do |entry|
98 if !entry.Error.empty? && entry.Error != "Aborted" || entry.StatusCode >= 400
99 if !errorUrls.has_key?(entry.Result )
100 errorUrls[entry.Result] = Array.new( 1, entry.Url )
101 else
102 if errorUrls[entry.Result].find{ |aUrl| aUrl == entry.Url } == nil
103 errorUrls[entry.Result].push( entry.Url )
104 end
105 end
106 end
107 end
108
109 # Display summary statistics for whole log
110 summary = plugin.Log.Entries.Summary
111
112 printf "Total time to load page (secs): %.3f\n", summary.Time
113 printf "Number of bytes received on network: %d\n", summary.BytesReceived
114
115 printf "HTTP compression saving (bytes): %d\n", summary.CompressionSavedBytes
116 printf "Number of round trips: %d\n", summary.RoundTrips
117 printf "Number of errors: %d\n", summary.Errors.Count
118
119 # Print out errors
120 summary.Errors.each do |error|
121 numErrors = error.Occurrences
122 description = error.Description
123 puts "#{numErrors} URL(s) caused a #{description} error:"
124 errorUrls[error.Result].each do |aUrl|
125 puts "-> #{aUrl}"
126 end
127
128 end
129
130 # 退出IE,这里注释掉,在运行完毕后,测试人员需要去查看结果
131 #plugin.Container.Quit();
132
133 puts "\r\nPress Enter to exit"; $stdout.flush
134 #gets
url_ops.rb
1 # Helper functions used to parse URLs 2 class String 3 def HostName 4 matches = scan(/^(?:https?:\/\/)?([^\/]*)/) 5 if matches.length > 0 && matches[0].length > 0 6 return matches[0][0].downcase 7 else 8 return "" 9 end 10 end 11 def IsSubDomain( hostName) 12 thisHostName = self.HostName 13 if thisHostName.slice(0..3) == "www." 14 thisHostName = thisHostName.slice(4..-1) 15 end 16 if thisHostName == hostName || 17 (hostName.length > thisHostName.length && 18 hostName.slice( -thisHostName.length ..-1) == thisHostName) 19 return true 20 end 21 return false 22 end 23 def Protocol 24 matches = scan(/^(https?:\/\/)/) 25 if matches.length > 0 && matches[0].length > 0 26 return matches[0][0].downcase 27 else 28 return "http://" 29 end 30 end 31 def Path 32 if scan(/^(https?:\/\/)/).length > 0 33 matches = scan(/^https?:\/\/[^\/]+\/([^#]+)$/) 34 else 35 matches = scan(/^[^\/]+\/([^#]+)$/) 36 end 37 if matches != nil && matches.length == 1 && matches[0].length == 1 38 return matches[0][0].downcase 39 else 40 return "" 41 end 42 end 43 def CanonicalUrl 44 return self.Protocol + self.HostName + "/" + self.Path 45 end 46 end
两个脚本放在同一目录下,url_ops.rb未作变动,在cmd中执行即可。

site_spider.rb
浙公网安备 33010602011771号