02requests库

requests 是 Python 生态中最受欢迎的第三方 HTTP 客户端库，由 Kenneth Reitz 开发，旨在替代原生的 urllib 库。

它以**简洁的 API、人性化的设计和强大的功能**著称，几乎成为 Python 网络请求（包括爬虫开发、API 调用等）的首选工具。

## 概述

requests 构建在 urllib3 之上，封装了复杂的 HTTP 处理逻辑，提供了更直观、更易用的接口。与 urllib 相比，它消除了大量冗余代码（如手动编码参数、处理 Cookie 等），让开发者能更专注于业务逻辑。

### 核心优势

- **API 极简**：用 requests.get()、requests.post() 等方法即可完成请求，无需繁琐的对象创建；
- **自动处理细节**：自动处理 URL 编码、Cookie 持久化、HTTP 连接池、响应编码等；
- **功能完备**：支持所有 HTTP 方法、会话管理、文件上传、代理、身份验证等；
- **异常处理友好**：提供清晰的异常类型，便于捕获和处理网络错误；
- **社区活跃**：文档完善（[官方文档](https://requests.readthedocs.io/)），问题解决资源丰富。

### 适用场景

- 网络爬虫（爬取网页内容、处理登录状态）；
- API 调用（与 RESTful API、GraphQL 等交互）；
- 自动化测试（模拟 HTTP 请求验证接口）；
- 任何需要发送 HTTP/HTTPS 请求的场景。

## 基础流程

安装方法：requests 是第三方库，需通过 pip 安装。

```cmd
pip install requests
```

输出结果：
```python
ERROR: Could not find a version that satisfies the requirement requestss (from versions: none)
ERROR: No matching distribution found for requestss
```

如果下载失败，可以适用豆瓣源加速（通过 *-i* 参数临时指定镜像源）。

```cmd
pip install requests -i https://pypi.doubanio.com/simple/
```

输出结果：
```txt
Installing collected packages: urllib3, idna, charset_normalizer, certifi, requests
Successfully installed certifi-2025.8.3 charset_normalizer-3.4.3 idna-3.10 requests-2.32.5 urllib3-2.5.0
```

使用 requests 发送请求的核心流程仅需 3 步：

1. 调用对应 HTTP 方法（如 get()、post()）发送请求；
2. 获取响应对象（Response）；
3. 从响应对象中提取数据（状态码、响应体、 headers 等）。

## 核心功能

### 发送 HTTP 请求

requests 为所有 HTTP 方法（GET、POST、PUT、DELETE 等）提供了对应函数，语法统一且简洁。

**1.GET 请求（获取资源）**

GET 是最常用的 HTTP 方法，用于从服务器获取资源（如网页、API 数据）。

```python
import requests

url = "https://maolin101.com"

# 发送请求
response = requests.get(url)

# 获取响应
print(response.status_code)
print(response.url)
print(response.headers['Server'])
```

输出结果：

```python
200
https://maolin101.com/
nginx
```

**2.带查询参数的 GET 请求**

GET 参数（URL 中 ? 后的键值对）可通过 params 参数传入（无需手动拼接 URL）：

```python
import requests

url = "https://maolin101.com"

params = {
    's': "api"
}

response = requests.get(url, params=params)

print(response.url)
print(response.status_code)
```

```python
https://maolin101.com/?s=api
200
```

**3.POST 请求**

POST 用于向服务器提交数据（如表单提交、API 数据创建），数据通过请求体发送。

例如：

```python
import requests

# 目标 URL（模拟登录接口）
url = "https://httpbin.org/post"  # httpbin 用于测试 HTTP 请求

# 表单数据（字典形式，requests 会自动编码为表单格式）
data = {
    "username": "test_user",
    "password": "123456",
    "remember_me": "true"
}

# 发送 POST 请求
response = requests.post(url, data=data)

# 查看响应（httpbin 会返回请求详情）
print(response.json())
```

输出结果：
```python
{'args': {}, 'data': '', 'files': {}, 'form': {'password': '123456', 'remember_me': 'true', 'username': 'test_user'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Content-Length': '51', 'Content-Type': 'application/x-www-form-urlencoded', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.32.5', 'X-Amzn-Trace-Id': 'Root=1-68b7f580-4867f7e6407b08181228ba16'}, 'json': None, 'origin': '113.201.191.61', 'url': 'https://httpbin.org/post'}
```

提交 JSON 数据（application/json）
API 交互中常用 JSON 格式提交数据，可通过 json 参数直接传入字典（无需手动序列化）：

```python
import requests

url = "https://httpbin.org/post"

# JSON 数据（字典形式）
json_data = {
    "name": "张三",
    "age": 25,
    "hobbies": ["coding", "reading"]
}

# 发送 JSON 格式的 POST 请求
response = requests.post(url, json=json_data)  # 自动设置 Content-Type: application/json

print("服务器收到的 JSON 数据：", response.json()["json"])
```

输出结果：
```python
服务器收到的 JSON 数据： {'age': 25, 'hobbies': ['coding', 'reading'], 'name': '张三'}
```

**4.其他HTTP方法**

requests 支持所有标准 HTTP 方法，用法与 GET/POST 一致：

```python
import requests

url = "https://httpbin.org"

# PUT：更新资源
response_put = requests.put(f"{url}/put", data={"key": "new_value"})

# DELETE：删除资源
response_delete = requests.delete(f"{url}/delete")

# HEAD：仅获取响应头（无响应体）
response_head = requests.head(f"{url}/get")

# OPTIONS：获取服务器支持的 HTTP 方法
response_options = requests.options(f"{url}/get")
```

**5.request() **

requests.request() 允许你直接指定 HTTP 方法（如 GET、POST、PUT、DELETE 等），适用于需要**动态选择请求方法**的场景，或需要使用不常用 HTTP 方法（如 PATCH、OPTIONS）的情况。

基本语法：
```txt
requests.request(method, url, **kwargs)
```

| 参数      | 作用                                                         |
| --------- | ------------------------------------------------------------ |
| method    | 字符串类型，指定 HTTP 方法（如 'GET'、'POST'、'PUT'、'DELETE' 等，大小写不敏感） |
| url       | 字符串类型，请求的目标 URL（必传参数）                       |
| ** kwargs | 可选参数，包含所有请求细节（如参数、请求头、Cookie、超时等，与 get()/post() 方法的参数一致） |

可选参数常用如下（与 requests.get()、requests.post() 等方法的参数完全一致）：

- params：字典 / 列表，GET 请求的查询参数（自动拼接到 URL）；
- data：字典 / 字符串 / 字节流，POST 请求的表单数据（application/x-www-form-urlencoded）；
- json：字典，POST 请求的 JSON 数据（自动设置 Content-Type: application/json）；
- headers：字典，自定义请求头（如 User-Agent）；
- cookies：字典 /RequestsCookieJar，携带的 Cookie；
- auth：元组 / 认证对象，身份验证信息（如 (username, password)）；
- proxies：字典，代理配置；
- timeout：数值 / 元组，超时时间（秒）；
- verify：布尔值，是否验证 SSL 证书（默认 True）；
- stream：布尔值，是否流式下载（默认 False）。

```python
import requests

# 用 request() 发送 POST 请求（表单数据）
response = requests.request(
    method='POST',
    url='https://httpbin.org/post',
    data={'username': 'admin', 'password': '123'},  # 表单数据
    timeout=5
)

print(response.json())
```

### 响应对象

发送请求后返回的 Response 对象包含服务器响应的所有信息，是提取数据的核心载体。

| 属性 / 方法 | 描述                                                         |
| ----------- | ------------------------------------------------------------ |
| status_code | HTTP 状态码（如 200 成功、404 未找到、500 服务器错误）       |
| headers     | 响应头（字典形式，键不区分大小写，如 response.headers["Content-Type"]） |
| text        | 响应体（自动解码为字符串，编码由 encoding 决定）             |
| content     | 响应体（原始 bytes 类型，用于非文本数据如图片、二进制文件）  |
| json()      | 解析 JSON 格式的响应体（返回字典 / 列表，若格式错误抛 JSONDecodeError） |
| url         | 最终请求的 URL（可能因重定向变化）                           |
| encoding    | 响应体编码（requests 会自动推测，可手动修改如 response.encoding = "gbk"） |
| cookies     | 响应中的 Cookie（RequestsCookieJar 对象，可直接访问）        |
| history     | 重定向历史（包含所有重定向的响应对象列表）                   |

```python
import requests

response = requests.get("https://www.baidu.com")

# 状态码与响应头
print("状态码：", response.status_code)
print("服务器：", response.headers.get("Server"))  # 不区分大小写
print("编码方式（自动推测）：", response.encoding)

# 响应体（文本与二进制）
print("文本内容前 100 字符：", response.text[:100])
print("二进制内容前 10 字节：", response.content[:10])  # b'\x89PNG\r\n\x1a\n\x00\x00' 类似

# 处理编码问题（若自动推测错误）
# 例如：若网页实际是 gbk 编码但被推测为 iso-8859-1，可手动修正
# response.encoding = "gbk"
# correct_text = response.text
```

输出结果：
```python
状态码： 200
服务器： bfe/1.0.8.18
编码方式（自动推测）： ISO-8859-1
文本内容前 100 字符： <!DOCTYPE html>
<html> <head><meta http-equiv=content-type content=text/html;charse
二进制内容前 10 字节： b'<!DOCTYPE '
```

## 自定义请求

**1.设置请求头（headers）**

网站常通过 User-Agent 识别爬虫，需设置浏览器的 User-Agent 绕过基础反爬：

```python
import requests

url = "https://www.baidu.com"

# 自定义请求头（模拟 Chrome 浏览器）
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
    "Referer": "https://www.google.com",  # 模拟从谷歌跳转
    "Accept-Language": "zh-CN,zh;q=0.9"  # 语言偏好
}

# 发送带自定义 headers 的请求
response = requests.get(url, headers=headers)
print("响应状态码：", response.status_code)  # 200 表示成功绕过基础反爬
```

输出结果：
```txt
响应状态码： 200
```

**2.设置cookie**

有两种方式携带 Cookie：

- 通过 headers 手动添加 Cookie 字段；
- 通过 cookies 参数传入字典或 RequestsCookieJar 对象。

```python
import requests

url = "https://httpbin.org/cookies"

# 方式 1：通过 headers 传 Cookie
headers = {
    "Cookie": "user_id=123; username=test"
}
response1 = requests.get(url, headers=headers)
print("方式 1 结果：", response1.json())

# 方式 2：通过 cookies 参数传字典（更推荐）
cookies = {
    "user_id": "456",
    "username": "demo"
}
response2 = requests.get(url, cookies=cookies)
print("方式 2 结果：", response2.json())
```

输出结果：
```python
方式 1 结果： {'cookies': {'user_id': '123', 'username': 'test'}}
方式 2 结果： {'cookies': {'user_id': '456', 'username': 'demo'}}
```

**3.设置超时**

避免请求因网络问题无限阻塞，通过 timeout 参数设置超时时间（秒）：

```python
import requests

url = "https://www.baidu.com"

try:
    # 设置连接超时（1 秒）和读取超时（3 秒），总超时 4 秒
    response = requests.get(url, timeout=(1, 3))  # (连接超时, 读取超时)
    print("请求成功，状态码：", response.status_code)
except requests.exceptions.Timeout:
    print("请求超时！")
except requests.exceptions.RequestException as e:
    print("请求失败：", e)
```

输出结果：
```txt
请求成功，状态码： 200
```

**4.会话管理**

爬取需要登录的网站时，需保持会话状态（即 Cookie 持久化）。requests.Session 可自动维护 Cookie，无需手动传递，比 urllib 的 HTTPCookieProcessor 更简洁。

```python
import requests

# 创建 Session 对象（自动维护 Cookie）
session = requests.Session()

# 第一步：登录（获取登录 Cookie）
login_url = "https://httpbin.org/post"
login_data = {
    "username": "my_user",
    "password": "my_pass"
}
# 用 session 发送登录请求，Cookie 会自动保存
session.post(login_url, data=login_data)

# 第二步：访问需要登录的页面（自动携带 Cookie）
profile_url = "https://httpbin.org/cookies"
response = session.get(profile_url)

print("当前会话的 Cookie：", response.json())  # 包含登录时的 Cookie
```

输出结果：

```txt
当前会话的 Cookie： {'cookies': {}}
```

Session作为全局设置参数：

```python
session = requests.Session()
# 全局 headers（所有通过 session 发送的请求都会携带）
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/125.0.0.0 Safari/537.36"
})
# 后续请求无需重复设置 headers
session.get("https://www.baidu.com")
```

**5. 代理设置**

爬虫中常用代理 IP 避免单一 IP 被封禁，requests 通过 proxies 参数设置代理。

示例：使用HTTP/HTTPS 代理。

```python
import requests

url = "https://httpbin.org/ip"  # 该接口返回当前请求的 IP

# 代理配置（键为协议，值为代理 URL）
proxies = {
    "http": "http://123.45.67.89:8080",    # HTTP 代理
    "https": "https://123.45.67.89:8443"  # HTTPS 代理
}

try:
    # 发送带代理的请求
    response = requests.get(url, proxies=proxies, timeout=10)
    print("代理 IP 信息：", response.json())
except requests.exceptions.ProxyError:
    print("代理错误！请检查代理是否有效")
except requests.exceptions.RequestException as e:
    print("请求失败：", e)
```

输出结果：

```txt
代理错误！请检查代理是否有效
```

若代理需要用户名和密码，格式为：若代理需要用户名和密码，格式为 http://user:pass@ip:port。

```python
proxies = {
    "http": "http://user:password@123.45.67.89:8080"
}
```

**6.SSL 证书验证**

requests 默认验证 HTTPS 证书的有效性，若目标网站使用自签证书（如内部系统），会抛出 SSLError，可通过 verify 参数关闭验证（仅用于测试，生产环境不推荐）。

```python
import requests

url = "https://自签证书网站.com"

# 关闭 SSL 验证（不推荐在生产环境使用）
response = requests.get(url, verify=False, timeout=5)

# 关闭验证后会有警告，可通过以下方式屏蔽
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
```

## 异常处理

requests 定义了丰富的异常类（位于 requests.exceptions），便于精准捕获不同类型的错误：

| 常类             | 描述                                         |
| ---------------- | -------------------------------------------- |
| RequestException | 所有异常的基类，可捕获所有 requests 相关错误 |
| ConnectionError  | 网络连接错误（如 DNS 失败、拒绝连接）        |
| Timeout          | 请求超时                                     |
| HTTPError        | HTTP 状态码错误（如 4xx、5xx，需手动触发）   |
| TooManyRedirects | 重定向次数过多                               |
| SSLError         | SSL 证书验证错误                             |

```python
import requests
from requests.exceptions import (
    RequestException, ConnectionError, Timeout, HTTPError
)

url = "https://www.baidu.com"

try:
    response = requests.get(url, timeout=5)
    # 手动触发 HTTPError（当状态码不是 2xx 时）
    response.raise_for_status()  # 状态码 4xx/5xx 时抛 HTTPError
    print("请求成功")
except HTTPError as e:
    print(f"HTTP 错误：{e}（状态码：{response.status_code}）")
except ConnectionError:
    print("连接错误：无法连接到服务器")
except Timeout:
    print("超时错误：请求超过指定时间")
except RequestException as e:
    print(f"其他请求错误：{e}")
```

输出结果：
```txt
请求成功
```

## 其他功能

**1.通过 files 参数实现文件上传（模拟表单的 multipart/form-data 类型）**

```python
import requests

url = "https://httpbin.org/post"

# 上传文件（键为表单字段名，值为 (文件名, 文件对象, MIME类型)）
files = {
    "file1": ("test.txt", open("test.txt", "rb"), "text/plain"),
    "file2": ("image.jpg", open("image.jpg", "rb"), "image/jpeg")
}

response = requests.post(url, files=files)
print("上传结果：", response.json())

# 注意：上传后需关闭文件
for f in files.values():
    f[1].close()
```

**2.对于大文件（如视频、大型 CSV），用 stream=True 开启流式下载，避免一次性加载到内存**

```python
import requests

url = "https://example.com/large_file.zip"

# 流式请求（response.content 不会立即下载全部内容）
with requests.get(url, stream=True, timeout=30) as response:
    response.raise_for_status()  # 检查状态码
    # 分块写入文件
    with open("large_file.zip", "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):  # 每次读取 8KB
            if chunk:  # 过滤空块
                f.write(chunk)
print("大文件下载完成")
```

**3.requests 内置支持多种身份验证方式，如 Basic Auth、Digest Auth 等**

```python
import requests
from requests.auth import HTTPBasicAuth, HTTPDigestAuth

# Basic 身份验证
url_basic = "https://httpbin.org/basic-auth/user/pass"
# 方式 1：通过 auth 参数传入 HTTPBasicAuth 对象
response_basic = requests.get(
    url_basic,
    auth=HTTPBasicAuth("user", "pass"),
    timeout=5
)
# 方式 2：简化写法（直接传 (用户名, 密码) 元组）
response_basic = requests.get(url_basic, auth=("user", "pass"), timeout=5)
print("Basic Auth 结果：", response_basic.json())

# Digest 身份验证
url_digest = "https://httpbin.org/digest-auth/auth/user/pass"
response_digest = requests.get(
    url_digest,
    auth=HTTPDigestAuth("user", "pass"),
    timeout=5
)
print("Digest Auth 结果：", response_digest.json())
```

## 对比

| 特性               | urllib                               | requests                                     |
| ------------------ | ------------------------------------ | -------------------------------------------- |
| 安装方式           | 标准库，无需安装                     | 第三方库，需 pip install requests            |
| GET 参数处理       | 需 urllib.parse.urlencode() 手动编码 | 直接通过 params 参数传字典，自动编码         |
| POST 数据处理      | 需手动编码为 bytes 类型              | 直接传 data（表单）或 json（JSON），自动处理 |
| 响应编码           | 需手动指定或检测                     | 自动推测编码，可通过 encoding 修改           |
| JSON 解析          | 需 json.loads(response.read())       | 直接调用 response.json() 方法                |
| 会话管理（Cookie） | 需 HTTPCookieProcessor 手动处理      | 内置 Session 对象自动维护                    |
| 异常处理           | 需捕获 URLError/HTTPError            | 提供更丰富的异常类（ConnectionError 等）     |
| 代码简洁度         | 较高冗余度                           | 极简，一行代码完成基础请求                   |