分析是一门科学,设计是一门艺术

每天进步一点点

导航

用C#从IE浏览器中获取HTML文档

用C#从IE中获取HTML文档

You need to extract the html from the current web page in IE. This article details how to do that.

这篇文章描述如何获得IE浏览器当前网页的HTML文档。

  1. Create a console application in any version of Visual Studio using .Net version 1|2|3|3.5.
  2. Add two Com object references which will allow us to manipulate IE.

 

  1. 用 Visual Studio 的任意版本建立一个控制台程序。
  2. 添加2个COM对象引用用来操作IE

    image

  3. Note the code sample below does not require the using directive for the objects, so just add the code as is.
  4. Then find the instances of IE and extract the document:
  5. 添加下面代码
  6. 打开IE获取HTML文档
SHDocVw.ShellWindows shellWindows
= new SHDocVw.ShellWindowsClass();
 
string filename;
 
foreach (SHDocVw.InternetExplorer ie in shellWindows)
{
filename
= Path.GetFileNameWithoutExtension(ie.FullName).ToLower();
 
if (filename.Equals("iexplore"))
{
Console.WriteLine("Web Site   : {0}", ie.LocationURL);
 
mshtml.IHTMLDocument2 htmlDoc
= ie.Document as mshtml.IHTMLDocument2;
 
Console.WriteLine("   Document Snippet: {0}",
( ( htmlDoc != null ) ? htmlDoc.body.outerHTML.Substring(0, 40)
: "***Failed***" ));
Console.WriteLine("{0}{0}", Environment.NewLine);
}
}

Here is a screen-shot of the output:

程序截图:

image

 

代码:

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
    
class Program
    
{
        
static void Main(string[] args)
         {
            
SHDocVw.ShellWindows shellWindows = new SHDocVw.ShellWindowsClass();
            
string filename;
            
foreach (SHDocVw.InternetExplorer ie in shellWindows)
             {
                
filename = Path.GetFileNameWithoutExtension(ie.FullName).ToLower();
                
if (filename.Equals("iexplore"))
                 {
                    
Console.WriteLine("Web Site    : {0}", ie.LocationURL);
                    
mshtml.IHTMLDocument2 htmlDoc = ie.Document as mshtml.IHTMLDocument2;
                    
Console.WriteLine("    文件 Snippet: {0}", ((htmlDoc != null) ? htmlDoc.body.outerHTML.Substring(0, 40) : "***Failed***"));
                    
Console.WriteLine("{0}{0}", Environment.NewLine);
                 }
             }
         }
     }
}

posted on 2008-07-14 09:14  布衣(Dream2008)  阅读(718)  评论(0)    收藏  举报