提取网页中的javascript脚本和下载链接

  网上有个不错的视频教程,是swf格式的,想下载下来,但是网页太多了,每次打开网页查看源码再定位到那么JavaScript块,复制粘贴,好不繁琐。于是就想通过程序来减少工作量。

程序功能:批量提取网页中的Javascript脚本,提取脚本中的视频下载链接信息。

首先通过网络请求网页,得到响应的流文件,通过正则表达式匹配提取其中的JavaScript脚本块。再匹配提出Url下载链接。

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;
namespace CsWebBrower
{
    public partial class GetUri : Form
    {
        public GetUri()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {
            string[] strs = textBox1.Lines;
            foreach (string str in strs)
            {
                Uri uri = new Uri(str);


                WebRequest req = WebRequest.Create(uri);

                WebResponse result = req.GetResponse();
                Stream ReceiveStream = result.GetResponseStream();
                StreamReader readerOfStream = new StreamReader(ReceiveStream,
                    System.Text.Encoding.GetEncoding("UTF-8"));
                string temp = readerOfStream.ReadToEnd();
                //Regex ex = new Regex("<script.+?type ?= ?(/\"|')text/javascript(/\"|')>.*?</script>",
                //RegexOptions.Singleline);
                MatchCollection mc = Regex.Matches(temp, @"<script[^>]*>[\s\S]*?</script>", RegexOptions.IgnoreCase);
                foreach (Match m in mc)
                {

                    //MatchCollection mc2 = Regex.Matches(m.Value, @"http://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?", RegexOptions.IgnoreCase);[a-zA-z]+://[^\s]*
                    MatchCollection mc2 = Regex.Matches(m.Value, @"http://[\s\S]*?.swf", RegexOptions.IgnoreCase);
                    foreach (Match m2 in mc2)
                    {
                        richTextBox1.Text += m2.Value + "\n\n";
                    }

                }
                readerOfStream.Close();
                ReceiveStream.Close();

            }
        }
    }
}

posted @ 2012-11-04 20:29  太一吾鱼水  阅读(978)  评论(0编辑  收藏  举报