用C#从IE浏览器中获取HTML文档
用C#从IE中获取HTML文档
You need to extract the html from the current web page in IE. This article details how to do that.
这篇文章描述如何获得IE浏览器当前网页的HTML文档。
- Create a console application in any version of Visual Studio using .Net version 1|2|3|3.5.
- Add two Com object references which will allow us to manipulate IE.
- 用 Visual Studio 的任意版本建立一个控制台程序。
-
添加2个COM对象引用用来操作IE
- Note the code sample below does not require the using directive for the objects, so just add the code as is.
- Then find the instances of IE and extract the document:
- 添加下面代码
- 打开IE获取HTML文档
SHDocVw.ShellWindows shellWindows
= new SHDocVw.ShellWindowsClass();
string filename;
foreach (SHDocVw.InternetExplorer ie in shellWindows)
{
filename
= Path.GetFileNameWithoutExtension(ie.FullName).ToLower();
if (filename.Equals("iexplore"))
{
Console.WriteLine("Web Site : {0}", ie.LocationURL);
mshtml.IHTMLDocument2 htmlDoc
= ie.Document as mshtml.IHTMLDocument2;
Console.WriteLine(" Document Snippet: {0}",
( ( htmlDoc != null ) ? htmlDoc.body.outerHTML.Substring(0, 40)
: "***Failed***" ));
Console.WriteLine("{0}{0}", Environment.NewLine);
}
}
Here is a screen-shot of the output:
程序截图:
代码:
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
SHDocVw.ShellWindows shellWindows = new SHDocVw.ShellWindowsClass();
string filename;
foreach (SHDocVw.InternetExplorer ie in shellWindows)
{
filename = Path.GetFileNameWithoutExtension(ie.FullName).ToLower();
if (filename.Equals("iexplore"))
{
Console.WriteLine("Web Site : {0}", ie.LocationURL);
mshtml.IHTMLDocument2 htmlDoc = ie.Document as mshtml.IHTMLDocument2;
Console.WriteLine(" 文件 Snippet: {0}", ((htmlDoc != null) ? htmlDoc.body.outerHTML.Substring(0, 40) : "***Failed***"));
Console.WriteLine("{0}{0}", Environment.NewLine);
}
}
}
}
}
posted on 2008-07-14 09:14 布衣(Dream2008) 阅读(718) 评论(0) 收藏 举报