第3课-网络编程基础

第3课：网络编程基础

课程目标

通过本课程学习，你将能够：

深入理解HTTP协议的工作原理
掌握DNS协议和域名解析过程
熟练使用requests库进行HTTP请求
理解SSL/TLS加密通信
掌握代理和认证的配置方法
学会处理网络超时和异常
为后续学习OneForAll的网络请求功能打下基础

3.1 HTTP协议详解

3.1.1 HTTP协议概述

什么是HTTP？

HTTP（HyperText Transfer Protocol，超文本传输协议）是应用层协议，用于在客户端和服务器之间传输超文本数据。它是万维网数据通信的基础。

HTTP的特点：

无状态：服务器不保存客户端的状态信息
请求/响应模型：客户端发起请求，服务器返回响应
基于TCP：使用TCP作为传输层协议
灵活：可以传输任意类型的数据

HTTP版本历史：

HTTP/0.9（1991）：最初的版本，只支持GET方法
HTTP/1.0（1996）：增加了POST、HEAD等方法，引入了头部
HTTP/1.1（1997）：引入持久连接、分块传输等
HTTP/2.0（2015）：多路复用、头部压缩、服务器推送
HTTP/3.0（2022）：基于QUIC协议，解决队头阻塞问题

3.1.2 HTTP请求

HTTP请求的结构：

<方法> <请求URL> <HTTP版本>
<头部1>: <值1>
<头部2>: <值2>
...
                      //此处空一行
<请求体>

示例：

GET /api/users HTTP/1.1
Host: api.example.com
User-Agent: Mozilla/5.0
Accept: application/json
Authorization: Bearer token123

3.1.3 HTTP请求方法

常见的HTTP方法：

# 1. GET - 获取资源
"""
GET /api/users HTTP/1.1
Host: api.example.com

用途：请求获取指定资源
特点：参数在URL中，有长度限制，可被缓存
"""

# 2. POST - 创建资源
"""
POST /api/users HTTP/1.1
Host: api.example.com
Content-Type: application/json

{"name": "Alice", "email": "alice@example.com"}

用途：向指定资源提交数据进行处理
特点：参数在请求体中，无长度限制，不可被缓存
"""

# 3. PUT - 更新资源（完整更新）
"""
PUT /api/users/1 HTTP/1.1
Host: api.example.com
Content-Type: application/json

{"id": 1, "name": "Alice", "email": "alice@new.com"}

用途：更新指定资源的全部内容
特点：幂等性，多次执行结果相同
"""

# 4. PATCH - 更新资源（部分更新）
"""
PATCH /api/users/1 HTTP/1.1
Host: api.example.com
Content-Type: application/json

{"email": "alice@new.com"}

用途：更新指定资源的部分内容
特点：只更新提供的字段
"""

# 5. DELETE - 删除资源
"""
DELETE /api/users/1 HTTP/1.1
Host: api.example.com

用途：删除指定资源
特点：幂等性
"""

# 6. HEAD - 获取响应头
"""
HEAD /api/users/1 HTTP/1.1
Host: api.example.com

用途：获取资源的响应头，不返回响应体
特点：类似GET，但不返回实际内容
"""

# 7. OPTIONS - 获取支持的方法
"""
OPTIONS /api/users HTTP/1.1
Host: api.example.com

用途：查询服务器支持的方法
特点：用于CORS预检请求
"""

3.1.4 HTTP响应

HTTP响应的结构：

<HTTP版本> <状态码> <状态描述>
<头部1>: <值1>
<头部2>: <值2>
...

<响应体>

示例：

HTTP/1.1 200 OK
Date: Mon, 17 Apr 2026 12:00:00 GMT
Content-Type: application/json
Content-Length: 123
Server: nginx/1.18.0

{
  "id": 1,
  "name": "Alice",
  "email": "alice@example.com"
}

3.1.5 HTTP状态码

状态码分类：

# 1xx 信息性响应
"""
100 Continue - 继续请求
101 Switching Protocols - 切换协议
"""

# 2xx 成功响应
"""
200 OK - 请求成功
201 Created - 资源创建成功
202 Accepted - 请求已接受，正在处理
204 No Content - 请求成功，无返回内容
206 Partial Content - 部分内容
"""

# 3xx 重定向
"""
301 Moved Permanently - 永久重定向
302 Found - 临时重定向
304 Not Modified - 资源未修改
307 Temporary Redirect - 临时重定向（保持请求方法）
308 Permanent Redirect - 永久重定向（保持请求方法）
"""

# 4xx 客户端错误
"""
400 Bad Request - 请求错误
401 Unauthorized - 未认证
403 Forbidden - 禁止访问
404 Not Found - 资源不存在
405 Method Not Allowed - 方法不允许
409 Conflict - 请求冲突
429 Too Many Requests - 请求过多（限流）
"""

# 5xx 服务器错误
"""
500 Internal Server Error - 服务器内部错误
501 Not Implemented - 功能未实现
502 Bad Gateway - 网关错误
503 Service Unavailable - 服务不可用
504 Gateway Timeout - 网关超时
"""

3.1.6 HTTP请求头

常用的请求头：

# 常见请求头示例
headers = {
    # Host - 目标主机
    "Host": "api.example.com",

    # User-Agent - 客户端标识
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",

    # Accept - 可接受的响应类型
    "Accept": "application/json, text/html",

    # Content-Type - 请求体类型
    "Content-Type": "application/json",

    # Authorization - 认证信息
    "Authorization": "Bearer token123",

    # Cookie - Cookie信息
    "Cookie": "session_id=abc123; user_id=456",

    # Referer - 来源页面
    "Referer": "https://www.example.com/page",

    # User-Agent - 自定义标识
    "User-Agent": "OneForAll/0.4.5",

    # Connection - 连接控制
    "Connection": "keep-alive",

    # Cache-Control - 缓存控制
    "Cache-Control": "no-cache",

    # If-Modified-Since - 条件请求
    "If-Modified-Since": "Mon, 17 Apr 2026 12:00:00 GMT",
}

3.1.7 HTTP响应头

常用的响应头：

# 常见响应头示例
response_headers = {
    # Content-Type - 响应体类型
    "Content-Type": "application/json; charset=utf-8",

    # Content-Length - 响应体长度
    "Content-Length": "1234",

    # Content-Encoding - 内容编码
    "Content-Encoding": "gzip",

    # Server - 服务器信息
    "Server": "nginx/1.18.0",

    # Set-Cookie - 设置Cookie
    "Set-Cookie": "session_id=xyz789; Path=/; HttpOnly",

    # Location - 重定向位置
    "Location": "https://www.example.com/new-page",

    # Cache-Control - 缓存控制
    "Cache-Control": "max-age=3600",

    # Expires - 过期时间
    "Expires": "Mon, 17 Apr 2026 13:00:00 GMT",

    # ETag - 资源标识
    "ETag": "\"33a64df551425fcc55e4d42a148795d9f25f89d4\"",

    # Last-Modified - 最后修改时间
    "Last-Modified": "Mon, 17 Apr 2026 11:00:00 GMT",
}

3.1.8 HTTPS与SSL/TLS

什么是HTTPS？

HTTPS（HTTP Secure）是HTTP的安全版本，通过SSL/TLS协议加密通信内容。

SSL/TLS握手过程：

"""
1. 客户端发送ClientHello
   - 支持的SSL/TLS版本
   - 支持的加密套件
   - 随机数

2. 服务器回复ServerHello
   - 选择的SSL/TLS版本
   - 选择的加密套件
   - 服务器证书
   - 随机数

3. 客户端验证证书
   - 验证证书链
   - 验证证书有效期
   - 验证域名匹配

4. 密钥交换
   - 客户端生成预主密钥
   - 使用服务器证书公钥加密
   - 发送给服务器

5. 生成会话密钥
   - 双方使用随机数和预主密钥
   - 生成对称加密的会话密钥

6. 完成握手
   - 双方发送完成消息
   - 开始加密通信
"""

证书验证：

import ssl
import socket
from urllib.request import urlopen


def verify_certificate(hostname, port=443):
    """验证SSL证书"""
    context = ssl.create_default_context()

    with socket.create_connection((hostname, port)) as sock:
        with context.wrap_socket(sock, server_hostname=hostname) as ssock:
            cert = ssock.getpeercert()
            print(f"Certificate for {hostname}:")
            print(f"Subject: {cert['subject']}")
            print(f"Issuer: {cert['issuer']}")
            print(f"Version: {cert['version']}")
            print(f"Serial Number: {cert['serialNumber']}")
            print(f"Not Before: {cert['notBefore']}")
            print(f"Not After: {cert['notAfter']}")


# 使用示例
verify_certificate("www.google.com")

3.2 DNS协议原理

3.2.1 DNS概述

什么是DNS？

DNS（Domain Name System，域名系统）是将域名转换为IP地址的分布式数据库系统。

DNS的作用：

域名解析：将域名转换为IP地址
反向解析：将IP地址转换为域名
邮件路由：查找邮件服务器
负载均衡：通过DNS实现负载均衡

DNS的层次结构：

                    根域名 (.)
                       |
        +--------------+--------------+
        |              |              |
      com            net            org
        |              |              |
   +----+----+     +---+---+     +----+----+
   |         |     |       |     |         |
example   google  baidu   163  wikipedia  github
   |
 +--+--+
 |     |
www  mail

3.2.2 DNS查询过程

递归查询 vs 迭代查询：

"""
递归查询（Recursive Query）：
客户端 → DNS服务器（由DNS服务器完成所有查询）
- 客户端只发起一次请求
- DNS服务器负责查找并返回最终结果
- 减轻客户端负担
- 增加DNS服务器负担

迭代查询（Iterative Query）：
客户端 → DNS服务器 → DNS服务器 → ...（客户端多次请求）
- 客户端需要多次请求
- 每个DNS服务器返回下一跳地址
- 减轻DNS服务器负担
- 增加客户端负担
"""

# 典型的DNS查询流程
def dns_query_flow():
    """
    1. 客户端查询 www.example.com
    2. 本地DNS服务器查询
    3. 本地DNS服务器查询根域名服务器 (.)
    4. 根域名服务器返回 com 域名服务器地址
    5. 本地DNS服务器查询 com 域名服务器
    6. com 域名服务器返回 example.com 域名服务器地址
    7. 本地DNS服务器查询 example.com 域名服务器
    8. example.com 域名服务器返回 www.example.com 的IP地址
    9. 本地DNS服务器返回结果给客户端
    """
    pass

3.2.3 DNS记录类型

常见的DNS记录类型：

# 1. A记录 - 将域名映射到IPv4地址
"""
主机记录类型: A
记录值: 192.0.2.1
示例: www.example.com → 192.0.2.1
"""

# 2. AAAA记录 - 将域名映射到IPv6地址
"""
主机记录类型: AAAA
记录值: 2001:db8::1
示例: www.example.com → 2001:db8::1
"""

# 3. CNAME记录 - 别名记录
"""
主机记录类型: CNAME
记录值: www.example.com
示例: blog.example.com → www.example.com
"""

# 4. MX记录 - 邮件交换记录
"""
主机记录类型: MX
记录值: 10 mail.example.com
示例: example.com → mail.example.com
优先级: 数字越小优先级越高
"""

# 5. NS记录 - 域名服务器记录
"""
主机记录类型: NS
记录值: ns1.example.com
示例: example.com → ns1.example.com
"""

# 6. TXT记录 - 文本记录
"""
主机记录类型: TXT
记录值: "v=spf1 include:_spf.google.com ~all"
示例: example.com → "验证信息"
用途: SPF记录、DKIM记录、域名验证
"""

# 7. SOA记录 - 起始授权记录
"""
主机记录类型: SOA
记录值: ns1.example.com. admin.example.com. (
    2026041701  ; 序列号
    3600        ; 刷新时间
    1800        ; 重试时间
    604800      ; 过期时间
    86400       ; 最小TTL
)
示例: example.com → SOA记录
用途: 定义域名的授权信息
"""

# 8. PTR记录 - 反向DNS记录
"""
主机记录类型: PTR
记录值: www.example.com
示例: 192.0.2.1 → www.example.com
用途: IP地址到域名的反向解析
"""

# 9. SRV记录 - 服务记录
"""
主机记录类型: SRV
记录值: 10 60 5060 sipserver.example.com
示例: _sip._tcp.example.com → sipserver.example.com:5060
格式: 优先级 权重 端口 目标
用途: 定义特定服务的服务器
"""

3.2.4 DNS缓存

DNS缓存的作用：

# DNS缓存层次结构
def dns_cache_hierarchy():
    """
    1. 浏览器缓存
       - 存储时间：几分钟到几小时
       - 优先级：最高
       - 清除方法：清除浏览器缓存

    2. 操作系统缓存
       - 存储时间：由TTL决定
       - 优先级：次之
       - 清除方法：重启或命令清除

    3. 路由器缓存
       - 存储时间：由TTL决定
       - 优先级：再次
       - 清除方法：重启路由器

    4. ISP DNS缓存
       - 存储时间：由TTL决定
       - 优先级：较低
       - 清除方法：等待TTL过期
    """
    pass

清除DNS缓存：

import subprocess
import platform


def clear_dns_cache():
    """清除DNS缓存"""
    system = platform.system()

    if system == "Windows":
        # Windows系统
        try:
            subprocess.run(["ipconfig", "/flushdns"], check=True)
            print("DNS cache cleared successfully on Windows")
        except subprocess.CalledProcessError as e:
            print(f"Failed to clear DNS cache: {e}")

    elif system == "Darwin":
        # macOS系统
        try:
            subprocess.run(["sudo", "dscacheutil", "-flushcache"], check=True)
            subprocess.run(["sudo", "killall", "-HUP", "mDNSResponder"], check=True)
            print("DNS cache cleared successfully on macOS")
        except subprocess.CalledProcessError as e:
            print(f"Failed to clear DNS cache: {e}")

    elif system == "Linux":
        # Linux系统
        try:
            # 尝试systemd-resolved
            subprocess.run(["sudo", "systemd-resolve", "--flush-caches"], check=True)
            print("DNS cache cleared successfully on Linux (systemd-resolved)")
        except (subprocess.CalledProcessError, FileNotFoundError):
            try:
                # 尝试nscd
                subprocess.run(["sudo", "/etc/init.d/nscd", "restart"], check=True)
                print("DNS cache cleared successfully on Linux (nscd)")
            except (subprocess.CalledProcessError, FileNotFoundError):
                print("Could not clear DNS cache. Please restart your DNS service.")


# 使用示例
# clear_dns_cache()

3.2.5 使用dnspython进行DNS查询

安装dnspython：

pip install dnspython

基本DNS查询：

import dns.resolver
import dns.message


def query_dns_record(domain, record_type="A"):
    """
    查询DNS记录

    Args:
        domain: 域名
        record_type: 记录类型 (A, AAAA, CNAME, MX, NS, TXT, SOA, SRV)

    Returns:
        查询结果列表
    """
    try:
        # 创建DNS解析器
        resolver = dns.resolver.Resolver()

        # 设置DNS服务器（可选）
        # resolver.nameservers = ['8.8.8.8', '8.8.4.4']

        # 设置超时时间
        resolver.timeout = 5
        resolver.lifetime = 10

        # 查询DNS记录
        answers = resolver.resolve(domain, record_type)

        results = []
        for rdata in answers:
            results.append(str(rdata))

        return results

    except dns.resolver.NoAnswer:
        print(f"No {record_type} record found for {domain}")
        return []

    except dns.resolver.NXDOMAIN:
        print(f"Domain {domain} does not exist")
        return []

    except Exception as e:
        print(f"Error querying DNS: {e}")
        return []


# 使用示例
if __name__ == "__main__":
    # 查询A记录
    a_records = query_dns_record("www.google.com", "A")
    print(f"A Records: {a_records}")

    # 查询MX记录
    mx_records = query_dns_record("gmail.com", "MX")
    print(f"MX Records: {mx_records}")

    # 查询TXT记录
    txt_records = query_dns_record("google.com", "TXT")
    print(f"TXT Records: {txt_records}")

    # 查询NS记录
    ns_records = query_dns_record("google.com", "NS")
    print(f"NS Records: {ns_records}")

查询所有类型的DNS记录：

import dns.resolver


def query_all_records(domain):
    """查询域名的所有DNS记录"""
    record_types = ["A", "AAAA", "CNAME", "MX", "NS", "TXT", "SOA", "SRV"]

    all_records = {}

    for record_type in record_types:
        try:
            resolver = dns.resolver.Resolver()
            answers = resolver.resolve(domain, record_type)
            all_records[record_type] = [str(rdata) for rdata in answers]
        except:
            pass

    return all_records


# 使用示例
records = query_all_records("google.com")
for record_type, values in records.items():
    print(f"{record_type}: {values}")

DNS解析详细信息：

import dns.resolver


def detailed_dns_query(domain):
    """详细的DNS查询"""
    print(f"\n=== DNS Query for {domain} ===\n")

    resolver = dns.resolver.Resolver()

    # 查询A记录
    try:
        answers = resolver.resolve(domain, "A")
        print("A Records (IPv4):")
        for rdata in answers:
            print(f"  {rdata}")
    except:
        print("  No A records found")

    # 查询AAAA记录
    try:
        answers = resolver.resolve(domain, "AAAA")
        print("\nAAAA Records (IPv6):")
        for rdata in answers:
            print(f"  {rdata}")
    except:
        print("  No AAAA records found")

    # 查询MX记录
    try:
        answers = resolver.resolve(domain, "MX")
        print("\nMX Records (Mail Exchange):")
        for rdata in answers:
            print(f"  Priority {rdata.preference}: {rdata.exchange}")
    except:
        print("  No MX records found")

    # 查询NS记录
    try:
        answers = resolver.resolve(domain, "NS")
        print("\nNS Records (Name Servers):")
        for rdata in answers:
            print(f"  {rdata}")
    except:
        print("  No NS records found")

    # 查询TXT记录
    try:
        answers = resolver.resolve(domain, "TXT")
        print("\nTXT Records:")
        for rdata in answers:
            print(f"  {rdata}")
    except:
        print("  No TXT records found")


# 使用示例
detailed_dns_query("google.com")

3.3 requests库使用

3.3.1 requests库简介

为什么选择requests？

"""
requests的优势：
1. 简洁易用的API
2. 自动处理连接池和会话
3. 自动处理编码
4. 支持多种认证方式
5. 支持文件上传
6. 支持Cookie和Session
7. 支持代理
8. 完善的文档和社区支持

对比urllib：
- requests API更简洁
- requests自动处理更多细节
- requests支持连接保持
- requests更好的异常处理
"""

# 安装requests
# pip install requests

3.3.2 基本HTTP请求

GET请求：

import requests


# 简单的GET请求
def simple_get_request():
    """发送简单的GET请求"""
    url = "https://httpbin.org/get"

    try:
        response = requests.get(url)

        # 检查响应状态
        print(f"Status Code: {response.status_code}")
        print(f"Status: {response.status_code == requests.codes.ok}")

        # 获取响应内容
        print(f"Content: {response.text}")

        # 获取JSON数据
        data = response.json()
        print(f"JSON Data: {data}")

        # 获取响应头
        print(f"Headers: {response.headers}")

        return response

    except requests.exceptions.RequestException as e:
        print(f"Request failed: {e}")
        return None


# 带参数的GET请求
def get_with_params():
    """发送带参数的GET请求"""
    url = "https://httpbin.org/get"

    # 方法1：使用params参数
    params = {
        "name": "Alice",
        "age": 25,
        "city": "Beijing"
    }

    response = requests.get(url, params=params)
    print(f"URL: {response.url}")
    print(f"Response: {response.json()}")

    # 方法2：直接在URL中拼接
    url = "https://httpbin.org/get?name=Bob&age=30"
    response = requests.get(url)
    print(f"Response: {response.json()}")


# 带请求头的GET请求
def get_with_headers():
    """发送带请求头的GET请求"""
    url = "https://httpbin.org/headers"

    headers = {
        "User-Agent": "MyApp/1.0",
        "Accept": "application/json",
        "Accept-Language": "zh-CN,zh;q=0.9",
        "Authorization": "Bearer token123"
    }

    response = requests.get(url, headers=headers)
    print(f"Response: {response.json()}")


# 使用示例
if __name__ == "__main__":
    simple_get_request()
    get_with_params()
    get_with_headers()

POST请求：

import requests
import json


# 简单的POST请求
def simple_post_request():
    """发送简单的POST请求"""
    url = "https://httpbin.org/post"

    # 发送表单数据
    data = {
        "name": "Alice",
        "email": "alice@example.com"
    }

    response = requests.post(url, data=data)
    print(f"Response: {response.json()}")


# 发送JSON数据
def post_json_data():
    """发送JSON格式的POST请求"""
    url = "https://httpbin.org/post"

    # 方法1：使用json参数（自动设置Content-Type）
    json_data = {
        "name": "Bob",
        "email": "bob@example.com",
        "age": 30
    }

    response = requests.post(url, json=json_data)
    print(f"Response: {response.json()}")

    # 方法2：手动设置Content-Type
    headers = {
        "Content-Type": "application/json"
    }

    json_str = json.dumps(json_data)
    response = requests.post(url, data=json_str, headers=headers)
    print(f"Response: {response.json()}")


# 发送文件
def post_file():
    """发送文件"""
    url = "https://httpbin.org/post"

    # 打开文件
    files = {
        'file': ('test.txt', open('test.txt', 'rb'), 'text/plain')
    }

    try:
        response = requests.post(url, files=files)
        print(f"Response: {response.json()}")
    finally:
        files['file'][1].close()  # 关闭文件


# 使用示例
if __name__ == "__main__":
    simple_post_request()
    post_json_data()
    # post_file()  # 需要test.txt文件

其他HTTP方法：

import requests


def other_http_methods():
    """演示其他HTTP方法"""

    # PUT请求
    url = "https://httpbin.org/put"
    data = {"name": "Alice"}
    response = requests.put(url, json=data)
    print(f"PUT Response: {response.json()}")

    # DELETE请求
    url = "https://httpbin.org/delete"
    response = requests.delete(url)
    print(f"DELETE Response: {response.json()}")

    # HEAD请求
    url = "https://httpbin.org/get"
    response = requests.head(url)
    print(f"HEAD Headers: {response.headers}")

    # OPTIONS请求
    url = "https://httpbin.org/get"
    response = requests.options(url)
    print(f"OPTIONS Headers: {response.headers}")


# 使用示例
other_http_methods()

3.3.3 响应处理

获取响应内容：

import requests


def handle_response():
    """处理响应内容"""
    url = "https://httpbin.org/get"

    response = requests.get(url)

    # 1. 获取文本内容
    text_content = response.text
    print(f"Text Content:\n{text_content}\n")

    # 2. 获取二进制内容
    binary_content = response.content
    print(f"Binary Content Length: {len(binary_content)} bytes\n")

    # 3. 获取JSON内容
    try:
        json_content = response.json()
        print(f"JSON Content:\n{json_content}\n")
    except ValueError:
        print("Response is not JSON\n")

    # 4. 获取原始响应
    raw_content = response.raw
    print(f"Raw Response: {raw_content}\n")

    # 5. 获取响应状态码
    status_code = response.status_code
    print(f"Status Code: {status_code}\n")

    # 6. 判断请求是否成功
    is_success = response.ok
    print(f"Is Success: {is_success}\n")

    # 7. 获取响应头
    headers = response.headers
    print(f"Headers:\n{headers}\n")

    # 8. 获取特定的响应头
    content_type = response.headers.get('Content-Type')
    print(f"Content-Type: {content_type}\n")

    # 9. 获取响应的编码
    encoding = response.encoding
    print(f"Encoding: {encoding}\n")

    # 10. 获取响应的URL
    url = response.url
    print(f"URL: {url}\n")

    # 11. 获取响应的cookies
    cookies = response.cookies
    print(f"Cookies: {cookies}\n")

    # 12. 获取响应的耗时
    elapsed = response.elapsed
    print(f"Elapsed Time: {elapsed}\n")


# 使用示例
handle_response()

状态码检查：

import requests


def check_status_codes():
    """检查不同的状态码"""

    # 成功响应
    response = requests.get("https://httpbin.org/status/200")
    if response.status_code == 200:
        print("Request successful!")

    # 使用raise_for_status()抛出异常
    try:
        response = requests.get("https://httpbin.org/status/404")
        response.raise_for_status()  # 如果状态码不是2xx，抛出HTTPError
    except requests.exceptions.HTTPError as e:
        print(f"HTTP Error: {e}")

    # 使用response.ok检查
    response = requests.get("https://httpbin.org/status/200")
    if response.ok:
        print("Request OK!")

    # 检查特定状态码
    response = requests.get("https://httpbin.org/status/404")
    if response.status_code == 404:
        print("Resource not found!")

    response = requests.get("https://httpbin.org/status/500")
    if response.status_code == 500:
        print("Server error!")


# 使用示例
check_status_codes()

3.3.4 会话（Session）

为什么使用Session？

"""
Session的优势：
1. 保持连接（Connection: keep-alive）
2. 自动处理Cookies
3. 保持某些参数（headers、auth等）
4. 提高性能（避免重复建立连接）
5. 实现登录状态保持
"""

import requests


def use_session():
    """使用Session进行多次请求"""

    # 创建Session对象
    session = requests.Session()

    # 设置会话级别的请求头
    session.headers.update({
        "User-Agent": "MyApp/1.0",
        "Accept": "application/json"
    })

    # 第一次请求
    response1 = session.get("https://httpbin.org/get")
    print(f"First Request: {response1.json()['headers']['User-Agent']}")

    # 第二次请求（会保持相同的headers和cookies）
    response2 = session.get("https://httpbin.org/get")
    print(f"Second Request: {response2.json()['headers']['User-Agent']}")

    # 查看Session的cookies
    print(f"Session Cookies: {session.cookies}")

    # 关闭Session
    session.close()


def login_with_session():
    """使用Session模拟登录"""

    session = requests.Session()

    # 1. 访问登录页面，获取CSRF token
    login_page = session.get("https://httpbin.org/cookies/set/session/abc123")
    print(f"Login Page Cookies: {session.cookies}")

    # 2. 提交登录表单
    login_data = {
        "username": "alice",
        "password": "password123"
    }

    login_response = session.post(
        "https://httpbin.org/post",
        data=login_data
    )

    print(f"Login Response: {login_response.json()}")

    # 3. 访问需要登录的页面
    protected_response = session.get("https://httpbin.org/cookies")
    print(f"Protected Page Cookies: {protected_response.json()}")

    session.close()


# 使用示例
use_session()
# login_with_session()

3.3.5 Cookie处理

Cookie的基本使用：

import requests


def handle_cookies():
    """处理Cookie"""

    # 1. 获取响应中的Cookie
    response = requests.get("https://httpbin.org/cookies/set/session/abc123")
    print(f"Response Cookies: {response.cookies}")
    print(f"Session Cookie: {response.cookies.get('session')}")
")

    # 2. 发送请求时携带Cookie

    # 方法1：使用cookies参数
    cookies = {
        "session": "abc123",
        "user_id": "456"
    }
    response = requests.get("https://httpbin.org/cookies", cookies=cookies)
    print(f"Request Cookies: {response.json()}")

    # 方法2：使用CookieJar
    from http.cookiejar import CookieJar

    jar = requests.cookies.RequestsCookieJar()
    jar.set('session', 'abc123', domain='httpbin.org', path='/')
    jar.set('user_id', '456', domain='httpbin.org', path='/')

    response = requests.get("https://httpbin.org/cookies", cookies=jar)
    print(f"CookieJar Cookies: {response.json()}")

    # 3. 使用Session自动管理Cookie
    session = requests.Session()

    # 第一次请求，设置Cookie
    session.get("https://httpbin.org/cookies/set/session/xyz789")

    # 第二次请求，自动携带Cookie
    response = session.get("https://httpbin.org/cookies")
    print(f"Session Cookies: {response.json()}")

    # 4. 保存和加载Cookie

    # 保存Cookie到文件
    import pickle

    with open('cookies.pkl', 'wb') as f:
        pickle.dump(session.cookies, f)

    # 从文件加载Cookie
    with open('cookies.pkl', 'rb') as f:
        loaded_cookies = pickle.load(f)

    session = requests.Session()
    session.cookies.update(loaded_cookies)
    response = session.get("https://httpbin.org/cookies")
    print(f"Loaded Cookies: {response.json()}")

    # 清理文件
    import os
    os.remove('cookies.pkl')


# 使用示例
handle_cookies()

3.3.6 超时和重试

设置超时：

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry


def set_timeout():
    """设置请求超时"""

    url = "https://httpbin.org/delay/5"

    # 1. 连接超时和读取超时
    try:
        # connect_timeout=3, read_timeout=5
        response = requests.get(url, timeout=(3, 5))
        print(f"Request successful: {response.status_code}")
    except requests.exceptions.Timeout:
        print("Request timed out!")

    # 2. 统一超时时间
    try:
        response = requests.get(url, timeout=3)  # 连接和读取都是3秒
        print(f"Request successful: {response.status_code}")
    except requests.exceptions.Timeout:
        print("Request timed out!")

    # 3. 不设置超时（不推荐，可能永久等待）
    # response = requests.get(url)


def retry_request():
    """实现请求重试"""

    url = "https://httpbin.org/status/500"

    # 创建Session
    session = requests.Session()

    # 配置重试策略
    retry_strategy = Retry(
        total=3,  # 总重试次数
        backoff_factor=1,  # 重试间隔因子
        status_forcelist=[500, 502, 503, 504],  # 需要重试的状态码
        allowed_methods=["HEAD", "GET", "POST", "PUT", "DELETE", "OPTIONS", "TRACE"]
    )

    # 创建适配器
    adapter = HTTPAdapter(
        max_retries=retry_strategy,
        pool_connections=10,
        pool_maxsize=10
    )

    # 挂载适配器
    session.mount("http://", adapter)
    session.mount("https://", adapter)

    # 发送请求
    try:
        response = session.get(url, timeout=10)
        print(f"Request successful: {response.status_code}")
    except requests.exceptions.RequestException as e:
        print(f"Request failed after retries: {e}")
    finally:
        session.close()


def custom_retry():
    """自定义重试逻辑"""

    import time

    url = "https://httpbin.org/get"
    max_retries = 3
    retry_delay = 1

    for attempt in range(max_retries):
        try:
            response = requests.get(url, timeout=5)
            response.raise_for_status()
            print(f"Request successful on attempt {attempt + 1}")
            return response
        except requests.exceptions.RequestException as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt < max_retries - 1:
                time.sleep(retry_delay)
                retry_delay *= 2  # 指数退避

    print("All attempts failed")
    return None


# 使用示例
set_timeout()
retry_request()
custom_retry()

3.4 代理和认证

3.4.1 使用代理

基本代理配置：

import requests


def use_proxy():
    """使用代理发送请求"""

    # 1. HTTP代理
    http_proxy = {
        "http": "http://proxy.example.com:8080",
        "https": "https://proxy.example.com:8080"
    }

    # 2. 带认证的代理
    proxy_with_auth = {
        "http": "http://username:password@proxy.example.com:8080",
        "https": "https://username:password@proxy.example.com:8080"
    }

    # 3. SOCKS代理（需要安装requests[socks]）
    # pip install requests[socks]
    socks_proxy = {
        "http": "socks5://proxy.example.com:1080",
        "https": "socks5://proxy.example.com:1080"
    }

    # 使用代理发送请求
    url = "https://httpbin.org/ip"

    try:
        # response = requests.get(url, proxies=http_proxy)
        # print(f"Response: {response.json()}")
        pass
    except requests.exceptions.ProxyError as e:
        print(f"Proxy error: {e}")
    except requests.exceptions.RequestException as e:
        print(f"Request error: {e}")


def proxy_with_session():
    """在Session中使用代理"""

    session = requests.Session()

    # 设置代理
    session.proxies = {
        "http": "http://proxy.example.com:8080",
        "https": "https://proxy.example.com:8080"
    }

    # 所有请求都会使用代理
    try:
        # response = session.get("https://httpbin.org/ip")
        # print(f"Response: {response.json()}")
        pass
    except requests.exceptions.RequestException as e:
        print(f"Request error: {e}")
    finally:
        session.close()


def proxy_pool():
    """代理池示例"""

    proxy_list = [
        {"http": "http://proxy1.example.com:8080"},
        {"http": "http://proxy2.example.com:8080"},
        {"http": "http://proxy3.example.com:8080"}
    ]

    url = "https://httpbin.org/ip"

    for proxy in proxy_list:
        try:
            response = requests.get(url, proxies=proxy, timeout=5)
            print(f"Proxy {proxy} works: {response.json()}")
            break
        except requests.exceptions.RequestException as e:
            print(f"Proxy {proxy} failed: {e}")
            continue


# 使用示例
# use_proxy()
# proxy_with_session()
# proxy_pool()

3.4.2 认证方式

基本认证：

import requests
from requests.auth import HTTPBasicAuth


def basic_auth():
    """HTTP基本认证"""

    url = "https://httpbin.org/basic-auth/user/passwd"

    # 方法1：使用auth参数
    response = requests.get(url, auth=('user', 'passwd'))
    print(f"Basic Auth Response: {response.json()}")

    # 方法2：使用HTTPBasicAuth对象
    auth = HTTPBasicAuth('user', 'passwd')
    response = requests.get(url, auth=auth)
    print(f"Basic Auth Response: {response.json()}")

    # 方法3：手动设置Authorization头
    import base64

    credentials = base64.b64encode(b'user:passwd').decode('utf-8')
    headers = {
        'Authorization': f'Basic {credentials}'
    }
    response = requests.get(url, headers=headers)
    print(f"Basic Auth Response: {response.json()}")


def digest_auth():
    """摘要认证"""

    from requests.auth import HTTPDigestAuth

    url = "https://httpbin.org/digest-auth/auth/user/passwd"

    auth = HTTPDigestAuth('user', 'passwd')
    response = requests.get(url, auth=auth)
    print(f"Digest Auth Response: {response.json()}")


def bearer_token_auth():
    """Bearer Token认证"""

    url = "https://httpbin.org/bearer"

    # 设置Authorization头
    headers = {
        'Authorization': 'Bearer your_token_here'
    }

    response = requests.get(url, headers=headers)
    print(f"Bearer Token Response: {response.json()}")


def api_key_auth():
    """API Key认证"""

    url = "https://httpbin.org/headers"

    # 方法1：在请求头中
    headers = {
        'X-API-Key': 'your_api_key_here'
    }
    response = requests.get(url, headers=headers)
    print(f"API Key in Headers: {response.json()}")

    # 方法2：在查询参数中
    params = {
        'api_key': 'your_api_key_here'
    }
    response = requests.get(url, params=params)
    print(f"API Key in Params: {response.json()}")


def oauth2_auth():
    """OAuth2认证（需要安装requests-oauthlib）"""

    # pip install requests-oauthlib

    try:
        from requests_oauthlib import OAuth2Session

        # OAuth2授权流程
        client_id = "your_client_id"
        client_secret = "your_client_secret"
        redirect_uri = "http://localhost:8000/callback"

        # 创建OAuth2会话
        oauth = OAuth2Session(client_id, redirect_uri=redirect_uri)

        # 获取授权URL
        authorization_url, state = oauth.authorization_url(
            "https://provider.com/oauth/authorize"
        )

        print(f"Visit this URL to authorize: {authorization_url}")

        # 用户授权后，获取回调码
        # redirect_response = input('Paste the full redirect URL here: ')
        # token = oauth.fetch_token(
        #     "https://provider.com/oauth/token",
        #     client_secret=client_secret,
        #     authorization_response=redirect_response
        # )

        # 使用access token访问API
        # response = oauth.get("https://api.provider.com/user")
        # print(response.json())

    except ImportError:
        print("requests-oauthlib not installed")


# 使用示例
basic_auth()
digest_auth()
bearer_token_auth()
api_key_auth()
# oauth2_auth()

3.4.3 自定义认证

import requests
from requests.auth import AuthBase


class CustomAuth(AuthBase):
    """自定义认证类"""

    def __init__(self, token):
        self.token = token

    def __call__(self, r):
        """在发送请求前调用"""
        # 添加自定义认证头
        r.headers['X-Custom-Auth'] = self.token
        return r


def use_custom_auth():
    """使用自定义认证"""

    url = "https://httpbin.org/headers"

    # 创建自定义认证对象
    auth = CustomAuth("my_custom_token")

    # 发送请求
    response = requests.get(url, auth=auth)
    print(f"Custom Auth Response: {response.json()}")


class TimestampAuth(AuthBase):
    """带时间戳的自定义认证"""

    def __init__(self, api_key):
        self.api_key = api_key

    def __call__(self, r):
        import time
        import hashlib

        # 生成时间戳
        timestamp = str(int(time.time()))

        # 生成签名
        message = f"{timestamp}{r.method}{r.path_url}"
        signature = hashlib.sha256(
            f"{message}{self.api_key}".encode()
        ).hexdigest()

        # 添加认证头
        r.headers['X-Timestamp'] = timestamp
        r.headers['X-Signature'] = signature

        return r


def use_timestamp_auth():
    """使用带时间戳的认证"""

    url = "https://httpbin.org/headers"

    auth = TimestampAuth("my_api_key")
    response = requests.get(url, auth=auth)
    print(f"Timestamp Auth Response: {response.json()}")


# 使用示例
use_custom_auth()
use_timestamp_auth()

3.5 网络超时处理

3.5.1 超时类型

import requests


def timeout_types():
    """演示不同类型的超时"""

    url = "https://httpbin.org/delay/10"

    # 1. 连接超时
    """
    连接超时：建立TCP连接的最大等待时间
    - 如果在指定时间内无法建立连接，抛出ConnectTimeout异常
    - 适用于：网络慢、服务器响应慢的情况
    """
    try:
        response = requests.get(url, timeout=3)
    except requests.exceptions.ConnectTimeout:
        print("Connection timeout!")

    # 2. 读取超时
    """
    读取超时：等待服务器发送数据的最大时间
    - 如果在指定时间内服务器没有发送数据，抛出ReadTimeout异常
    - 适用于：服务器处理慢、数据传输慢的情况
    """
    try:
        response = requests.get(url, timeout=(10, 3))  # 连接10秒，读取3秒
    except requests.exceptions.ReadTimeout:
        print("Read timeout!")

    # 3. 总超时
    """
    总超时：连接和读取的总时间
    - 连接和读取共享这个时间
    - 适用于：简单的超时控制
    """
    try:
        response = requests.get(url, timeout=5)  # 总共5秒
    except requests.exceptions.Timeout:
        print("Total timeout!")


# 使用示例
timeout_types()

3.5.2 超时处理策略

import requests
import time


def timeout_handling_strategy():
    """超时处理策略"""

    url = "https://httpbin.org/delay/10"

    # 策略1：快速失败
    """适用于：需要快速响应的场景"""
    try:
        response = requests.get(url, timeout=3)
        return response
    except requests.exceptions.Timeout:
        print("Request timed out, giving up")
        return None

    # 策略2：指数退避重试
    """适用于：网络不稳定，需要重试的场景"""
    max_retries = 3
    base_delay = 1

    for attempt in range(max_retries):
        try:
            response = requests.get(url, timeout=5)
            return response
        except requests.exceptions.Timeout:
            if attempt < max_retries - 1:
                delay = base_delay * (2 ** attempt)
                print(f"Timeout, retrying in {delay} seconds...")
                time.sleep(delay)
            else:
                print("All retries failed")
                return None

    # 策略3：渐进式超时
    """适用于：不确定网络状况的场景"""
    timeouts = [3, 5, 10]

    for timeout in timeouts:
        try:
            response = requests.get(url, timeout=timeout)
            return response
        except requests.exceptions.Timeout:
            print(f"Timeout with {timeout}s, trying longer timeout...")

    print("All timeouts failed")
    return None


def timeout_with_fallback():
    """超时后的降级处理"""

    url = "https://httpbin.org/delay/10"

    try:
        response = requests.get(url, timeout=5)
        return response.json()
    except requests.exceptions.Timeout:
        print("Primary source timed out, using fallback")
        # 使用备用数据源或缓存
        return {"status": "fallback", "data": "cached_data"}


# 使用示例
# timeout_handling_strategy()
# timeout_with_fallback()

3.5.3 超时监控和日志

import requests
import time
import logging

logging.basicConfig(level=logging.INFO)


def timeout_monitor():
    """监控请求超时"""

    url = "https://httpbin.org/delay/5"

    start_time = time.time()

    try:
        response = requests.get(url, timeout=3)
        elapsed = time.time() - start_time
        logging.info(f"Request completed in {elapsed:.2f}s")
        return response
    except requests.exceptions.Timeout:
        elapsed = time.time() - start_time
        logging.warning(f"Request timed out after {elapsed:.2f}s")
        return None


def timeout_statistics():
    """统计超时情况"""

    urls = [
        "https://httpbin.org/delay/1",
        "https://httpbin.org/delay/2",
        "https://httpbin.org/delay/10",
        "https://httpbin.org/delay/1",
        "https://httpbin.org/delay/10"
    ]

    success_count = 0
    timeout_count = 0
    total_time = 0

    for url in urls:
        start_time = time.time()
        try:
            response = requests.get(url, timeout=5)
            elapsed = time.time() - start_time
            success_count += 1
            total_time += elapsed
            print(f"✓ Success: {url} ({elapsed:.2f}s)")
        except requests.exceptions.Timeout:
            elapsed = time.time() - start_time
            timeout_count += 1
            total_time += elapsed
            print(f"✗ Timeout: {url} ({elapsed:.2f}s)")

    print(f"\nStatistics:")
    print(f"  Total requests: {len(urls)}")
    print(f"  Success: {success_count}")
    print(f"  Timeout: {timeout_count}")
    print(f"  Success rate: {success_count/len(urls)*100:.1f}%")
    print(f"  Average time: {total_time/len(urls):.2f}s")


# 使用示例
# timeout_monitor()
# timeout_statistics()

3.6 综合示例

3.6.1 实现一个简单的HTTP客户端

import requests
import time
import logging
from typing import Optional, Dict, Any
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

logging.basicConfig(level=logging.INFO)


class HTTPClient:
    """简单的HTTP客户端"""

    def __init__(
        self,
        base_url: str = None,
        timeout: int = 10,
        max_retries: int = 3,
        headers: Dict[str, str] = None
    ):
        """
        初始化HTTP客户端

        Args:
            base_url: 基础URL
            timeout: 超时时间（秒）
            max_retries: 最大重试次数
            headers: 默认请求头
        """
        self.base_url = base_url.rstrip('/') if base_url else None
        self.timeout = timeout
        self.max_retries = max_retries
        self.default_headers = headers or {}

        # 创建Session
        self.session = self._create_session()

    def _create_session(self) -> requests.Session:
        """创建带有重试策略的Session"""
        session = requests.Session()

        # 配置重试策略
        retry_strategy = Retry(
            total=self.max_retries,
            backoff_factor=1,
            status_forcelist=[429, 500, 502, 503, 504],
            allowed_methods=["HEAD", "GET", "POST", "PUT", "DELETE", "OPTIONS"]
        )

        # 创建适配器
        adapter = HTTPAdapter(
            max_retries=retry_strategy,
            pool_connections=10,
            pool_maxsize=10
        )

        # 挂载适配器
        session.mount("http://", adapter)
        session.mount("https://", adapter)

        # 设置默认请求头
        session.headers.update(self.default_headers)

        return session

    def _build_url(self, path: str) -> str:
        """构建完整的URL"""
        if self.base_url:
            return f"{self.base_url}/{path.lstrip('/')}"
        return path

    def get(
        self,
        path: str,
        params: Dict[str, Any] = None,
        headers: Dict[str, str] = None
    ) -> Optional[requests.Response]:
        """发送GET请求"""
        url = self._build_url(path)
        merged_headers = {**self.default_headers, **(headers or {})}

        try:
            response = self.session.get(
                url,
                params=params,
                headers=merged_headers,
                timeout=self.timeout
            )
            response.raise_for_status()
            return response
        except requests.exceptions.RequestException as e:
            logging.error(f"GET {url} failed: {e}")
            return None

    def post(
        self,
        path: str,
        data: Dict[str, Any] = None,
        json: Dict[str, Any] = None,
        headers: Dict[str, str] = None
    ) -> Optional[requests.Response]:
        """发送POST请求"""
        url = self._build_url(path)
        merged_headers = {**self.default_headers, **(headers or {})}

        try:
            response = self.session.post(
                url,
                data=data,
                json=json,
                headers=merged_headers,
                timeout=self.timeout
            )
            response.raise_for_status()
            return response
        except requests.exceptions.RequestException as e:
            logging.error(f"POST {url} failed: {e}")
            return None

    def put(
        self,
        path: str,
        data: Dict[str, Any] = None,
        json: Dict[str, Any] = None,
        headers: Dict[str, str] = None
    ) -> Optional[requests.Response]:
        """发送PUT请求"""
        url = self._build_url(path)
        merged_headers = {**self.default_headers, **(headers or {})}

        try:
            response = self.session.put(
                url,
                data=data,
                json=json,
                headers=merged_headers,
                timeout=self.timeout
            )
            response.raise_for_status()
            return response
        except requests.exceptions.RequestException as e:
            logging.error(f"PUT {url} failed: {e}")
            return None

    def delete(
        self,
        path: str,
        headers: Dict[str, str] = None
    ) -> Optional[requests.Response]:
        """发送DELETE请求"""
        url = self._build_url(path)
        merged_headers = {**self.default_headers, **(headers or {})}

        try:
            response = self.session.delete(
                url,
                headers=merged_headers,
                timeout=self.timeout
            )
            response.raise_for_status()
            return response
        except requests.exceptions.RequestException as e:
            logging.error(f"DELETE {url} failed: {e}")
            return None

    def close(self):
        """关闭Session"""
        self.session.close()


# 使用示例
def http_client_example():
    """HTTP客户端使用示例"""

    # 创建客户端
    client = HTTPClient(
        base_url="https://httpbin.org",
        timeout=5,
        max_retries=3,
        headers={
            "User-Agent": "MyHTTPClient/1.0",
            "Accept": "application/json"
        }
    )

    try:
        # GET请求
        print("=== GET Request ===")
        response = client.get("/get", params={"name": "Alice"})
        if response:
            print(f"Status: {response.status_code}")
            print(f"Data: {response.json()}")

        # POST请求
        print("\n=== POST Request ===")
        response = client.post("/post", json={"name": "Bob", "age": 30})
        if response:
            print(f"Status: {response.status_code}")
            print(f"Data: {response.json()}")

        # PUT请求
        print("\n=== PUT Request ===")
        response = client.put("/put", data={"name": "Charlie"})
        if response:
            print(f"Status: {response.status_code}")
            print(f"Data: {response.json()}")

        # DELETE请求
        print("\n=== DELETE Request ===")
        response = client.delete("/delete")
        if response:
            print(f"Status: {response.status_code}")
            print(f"Data: {response.json()}")

    finally:
        client.close()


# 运行示例
if __name__ == "__main__":
    http_client_example()

3.6.2 子域名查询工具

import requests
import dns.resolver
from typing import List, Set
import logging

logging.basicConfig(level=logging.INFO)


class SubdomainScanner:
    """子域名扫描器"""

    def __init__(self, domain: str):
        """
        初始化子域名扫描器

        Args:
            domain: 目标域名
        """
        self.domain = domain
        self.subdomains = set()
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'SubdomainScanner/1.0'
        })

    def check_subdomain(self, subdomain: str) -> bool:
        """
        检查子域名是否存在

        Args:
            subdomain: 子域名

        Returns:
            bool: 子域名是否存在
        """
        full_domain = f"{subdomain}.{self.domain}"

        try:
            # 方法1：DNS查询
            answers = dns.resolver.resolve(full_domain, 'A')
            if answers:
                logging.info(f"✓ {full_domain} - {answers[0]}")
                return True
        except:
            pass

        try:
            # 方法2：HTTP请求
            url = f"http://{full_domain}"
            response = self.session.head(url, timeout=5)
            if response.status_code:
                logging.info(f"✓ {full_domain} - HTTP {response.status_code}")
                return True
        except:
            pass

        return False

    def scan(self, subdomains: List[str]) -> Set[str]:
        """
        扫描子域名

        Args:
            subdomains: 子域名列表

        Returns:
            Set[str]: 存在的子域名集合
        """
        logging.info(f"Starting scan for {self.domain}")
        logging.info(f"Total subdomains to check: {len(subdomains)}")

        for subdomain in subdomains:
            if self.check_subdomain(subdomain):
                self.subdomains.add(subdomain)

        logging.info(f"Found {len(self.subdomains)} valid subdomains")
        return self.subdomains

    def close(self):
        """关闭Session"""
        self.session.close()


# 使用示例
def subdomain_scanner_example():
    """子域名扫描器使用示例"""

    # 常见子域名列表
    common_subdomains = [
        "www", "mail", "ftp", "admin", "blog",
        "api", "dev", "test", "staging", "prod",
        "m", "mobile", "app", "portal", "secure"
    ]

    # 创建扫描器
    scanner = SubdomainScanner("example.com")

    try:
        # 扫描子域名
        found_subdomains = scanner.scan(common_subdomains)

        print("\n=== Results ===")
        if found_subdomains:
            for subdomain in found_subdomains:
                print(f"  {subdomain}.{scanner.domain}")
        else:
            print("  No subdomains found")

    finally:
        scanner.close()


# 运行示例（会使用真实域名，需要网络连接）
# if __name__ == "__main__":
#     subdomain_scanner_example()

3.7 实践任务

任务1：使用requests发送各种HTTP请求

目标： 熟练使用requests库发送不同类型的HTTP请求。

要求：

发送GET请求，带参数和请求头
发送POST请求，发送表单数据和JSON数据
发送PUT、DELETE、HEAD、OPTIONS请求
处理响应状态码和响应内容
实现请求重试机制

代码框架：

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry


def practice_http_requests():
    # 在这里实现你的代码
    pass


if __name__ == "__main__":
    practice_http_requests()

任务2：实现DNS查询功能

目标： 使用dnspython实现完整的DNS查询功能。

要求：

查询A、AAAA、MX、NS、TXT记录
实现DNS缓存
处理查询异常
显示查询详细信息
批量查询多个域名

代码框架：

import dns.resolver
from typing import Dict, List
import time


def practice_dns_query():
    # 在这里实现你的代码
    pass


if __name__ == "__main__":
    practice_dns_query()

任务3：配置代理和认证

目标： 实现代理配置和各种认证方式。

要求：

配置HTTP/HTTPS/SOCKS代理
实现基本认证和摘要认证
实现Bearer Token认证
实现自定义认证
实现代理池

代码框架：

import requests
from requests.auth import HTTPBasicAuth, HTTPDigestAuth


def practice_proxy_auth():
    # 在这里实现你的代码
    pass


if __name__ == "__main__":
    practice_proxy_auth()

任务4：实现超时处理和重试

目标： 实现完善的超时处理和重试机制。

要求：

实现不同类型的超时设置
实现指数退避重试
实现超时监控和日志
实现超时统计
实现降级处理

代码框架：

import requests
import time
import logging


def practice_timeout_retry():
    # 在这里实现你的代码
    pass


if __name__ == "__main__":
    practice_timeout_retry()

任务5：综合实践

目标： 综合运用所学知识，实现一个完整的网络工具。

要求：

实现一个HTTP客户端类
支持多种认证方式
支持代理配置
支持超时和重试
实现日志记录
实现简单的子域名查询功能

代码框架：

import requests
import dns.resolver
from typing import Optional, Dict, Any
import logging


class NetworkTool:
    """综合网络工具"""

    def __init__(self):
        # 在这里实现你的代码
        pass

    # 在这里实现你的方法
    pass


if __name__ == "__main__":
    tool = NetworkTool()
    # 测试你的工具

3.8 本课总结

本课重点内容回顾

1. HTTP协议详解

HTTP请求和响应结构
HTTP请求方法（GET、POST、PUT、DELETE等）
HTTP状态码（2xx、3xx、4xx、5xx）
HTTP请求头和响应头
HTTPS和SSL/TLS加密

2. DNS协议原理

DNS的作用和层次结构
DNS查询过程（递归vs迭代）
DNS记录类型（A、AAAA、CNAME、MX、NS、TXT等）
DNS缓存机制
使用dnspython进行DNS查询

3. requests库使用

基本HTTP请求（GET、POST等）
响应处理（文本、JSON、二进制）
Session和Cookie管理
超时和重试机制

4. 代理和认证

代理配置（HTTP、HTTPS、SOCKS）
认证方式（基本认证、摘要认证、Bearer Token）
自定义认证

5. 网络超时处理

超时类型（连接超时、读取超时）
超时处理策略（快速失败、重试、降级）
超时监控和统计

下节课预告

第4课：并发编程基础

进程与线程
GIL锁机制
threading模块使用
线程同步与锁
asyncio基础
协程概念

课后思考

HTTP和HTTPS有什么区别？为什么HTTPS更安全？
DNS解析的递归查询和迭代查询有什么区别？
为什么要使用Session而不是直接发送请求？
如何设计一个高效的代理池？
在网络请求中，如何平衡性能和可靠性？

3.9 附录

附录A：HTTP状态码速查表

# 信息响应 (100-199)
100 Continue - 继续
101 Switching Protocols - 切换协议
102 Processing - 处理中

# 成功响应 (200-299)
200 OK - 成功
201 Created - 已创建
202 Accepted - 已接受
203 Non-Authoritative Information - 非授权信息
204 No Content - 无内容
205 Reset Content - 重置内容
206 Partial Content - 部分内容

# 重定向 (300-399)
300 Multiple Choices - 多种选择
301 Moved Permanently - 永久移动
302 Found - 临时移动
303 See Other - 查看其他
304 Not Modified - 未修改
305 Use Proxy - 使用代理
307 Temporary Redirect - 临时重定向
308 Permanent Redirect - 永久重定向

# 客户端错误 (400-499)
400 Bad Request - 错误请求
401 Unauthorized - 未授权
402 Payment Required - 需要付款
403 Forbidden - 禁止访问
404 Not Found - 未找到
405 Method Not Allowed - 方法不允许
406 Not Acceptable - 不可接受
407 Proxy Authentication Required - 需要代理认证
408 Request Timeout - 请求超时
409 Conflict - 冲突
410 Gone - 已删除
411 Length Required - 需要长度
412 Precondition Failed - 前置条件失败
413 Payload Too Large - 请求实体过大
414 URI Too Long - URI太长
415 Unsupported Media Type - 不支持的媒体类型
416 Range Not Satisfiable - 范围不满足
417 Expectation Failed - 期望失败
418 I'm a teapot - 我是个茶壶
426 Upgrade Required - 需要升级
429 Too Many Requests - 请求过多

# 服务器错误 (500-599)
500 Internal Server Error - 内部服务器错误
501 Not Implemented - 未实现
502 Bad Gateway - 网关错误
503 Service Unavailable - 服务不可用
504 Gateway Timeout - 网关超时
505 HTTP Version Not Supported - HTTP版本不支持
511 Network Authentication Required - 需要网络认证

附录B：DNS记录类型速查表

# 常见DNS记录类型

A - IPv4地址记录
  格式: domain → IPv4地址
  示例: www.example.com → 192.0.2.1

AAAA - IPv6地址记录
  格式: domain → IPv6地址
  示例: www.example.com → 2001:db8::1

CNAME - 别名记录
  格式: alias → target
  示例: blog.example.com → www.example.com

MX - 邮件交换记录
  格式: domain → priority:mail_server
  示例: example.com → 10:mail.example.com

NS - 域名服务器记录
  格式: domain → name_server
  示例: example.com → ns1.example.com

TXT - 文本记录
  格式: domain → "text"
  示例: example.com → "v=spf1 include:_spf.google.com ~all"

SOA - 起始授权记录
  格式: domain → mname, rname, serial, refresh, retry, expire, minimum
  示例: example.com → (ns1.example.com. admin.example.com. 2026041701 3600 1800 604800 86400)

PTR - 反向DNS记录
  格式: IP → domain
  示例: 192.0.2.1 → www.example.com

SRV - 服务记录
  格式: _service._proto.domain → priority, weight, port, target
  示例: _sip._tcp.example.com → 10 60 5060 sipserver.example.com

附录C：requests常用配置

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry


# 创建带有完整配置的Session
def create_configured_session():
    """创建配置完善的Session"""

    session = requests.Session()

    # 1. 配置重试策略
    retry_strategy = Retry(
        total=3,  # 总重试次数
        backoff_factor=1,  # 重试间隔因子
        status_forcelist=[429, 500, 502, 503, 504],  # 需要重试的状态码
        allowed_methods=["HEAD", "GET", "POST", "PUT", "DELETE", "OPTIONS", "TRACE"]
    )

    # 2. 配置适配器
    adapter = HTTPAdapter(
        max_retries=retry_strategy,
        pool_connections=10,  # 连接池大小
        pool_maxsize=10,  # 每个主机的最大连接数
        pool_block=False  # 连接池是否阻塞
    )

    # 3. 挂载适配器
    session.mount("http://", adapter)
    session.mount("https://", adapter)

    # 4. 配置默认请求头
    session.headers.update({
        'User-Agent': 'MyApp/1.0',
        'Accept': 'application/json',
        'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
        'Connection': 'keep-alive'
    })

    # 5. 配置代理（如果需要）
    # session.proxies = {
    #     'http': 'http://proxy.example.com:8080',
    #     'https': 'https://proxy.example.com:8080'
    # }

    # 6. 配置认证（如果需要）
    # session.auth = ('username', 'password')

    # 7. 配置SSL验证（生产环境不建议禁用）
    # session.verify = False  # 禁用SSL验证

    # 8. 配置超时（默认值）
    session.timeout = 30

    return session


# 使用示例
session = create_configured_session()
try:
    response = session.get("https://httpbin.org/get", timeout=10)
    print(response.json())
finally:
    session.close()

附录D：学习检查清单

恭喜你完成了第3课的学习！🎉

现在你已经掌握了网络编程的基础知识，包括HTTP协议、DNS协议、requests库的使用、代理和认证、超时处理等。这些都是开发OneForAll网络请求功能所必需的技能。

在下一课中，我们将学习并发编程基础，了解多线程、异步编程等，为后续学习OneForAll的高性能并发处理打下基础。

记住： 网络编程是实践性很强的技能，一定要动手完成所有的实践任务！

继续加油，我们下节课见！💪

posted @ 2026-04-21 19:54 羽弥YUMI 阅读(4) 评论(0) 收藏举报

刷新页面返回顶部

Yumiの博客

第3课-网络编程基础

第3课：网络编程基础

课程目标

3.1 HTTP协议详解

3.1.1 HTTP协议概述

3.1.2 HTTP请求

3.1.3 HTTP请求方法

3.1.4 HTTP响应

3.1.5 HTTP状态码

3.1.6 HTTP请求头

3.1.7 HTTP响应头

3.1.8 HTTPS与SSL/TLS

3.2 DNS协议原理

3.2.1 DNS概述

3.2.2 DNS查询过程

3.2.3 DNS记录类型

3.2.4 DNS缓存

3.2.5 使用dnspython进行DNS查询

3.3 requests库使用

3.3.1 requests库简介

3.3.2 基本HTTP请求

3.3.3 响应处理

3.3.4 会话（Session）

3.3.5 Cookie处理

3.3.6 超时和重试

3.4 代理和认证

3.4.1 使用代理

3.4.2 认证方式

3.4.3 自定义认证

3.5 网络超时处理

3.5.1 超时类型

3.5.2 超时处理策略

3.5.3 超时监控和日志

3.6 综合示例

3.6.1 实现一个简单的HTTP客户端

3.6.2 子域名查询工具

3.7 实践任务

任务1：使用requests发送各种HTTP请求

任务2：实现DNS查询功能

任务3：配置代理和认证

任务4：实现超时处理和重试

任务5：综合实践

3.8 本课总结

本课重点内容回顾

下节课预告

课后思考

推荐阅读

3.9 附录

附录A：HTTP状态码速查表

附录B：DNS记录类型速查表

附录C：requests常用配置

附录D：学习检查清单

公告