我们以 python 为例, 示范如何在 python 代码中使用代理 IP
首先需要准备一个代理. 如果没有, 可以到代理站点如 webshare 注册获取免费的代理 IP
其次, 我们需要一个测试网站, 验证我们正常通过代理进行了访问, 这里我们可以使用之前提到的 ipinfo 进行测试, 访问后将返回访问来源地址, 以验证我们是否成功使用了代理
urllib
以标准库的 urllib 为例
http/https
from urllib.error import URLError
from urllib.request import ProxyHandler, build_opener
proxy = 'dnpodndn:yb0xd39hqgc0@198.23.239.134:6540'
proxy_handler = ProxyHandler({
'http': 'http://' + proxy,
'https': 'http://' + proxy
})
opener = build_opener(proxy_handler)
response = opener.open('http://ipinfo.io')
print(response.read().decode('utf-8'))
socks5
import socks
from urllib import request
from sockshandler import SocksiPyHandler
username = "dnpodndn"
password = "yb0xd39hqgc0"
proxy_host = "198.23.239.134"
proxy_port = 6540
url = 'http://ipinfo.io'
req = request.Request(url=url, headers={})
opener = request.build_opener(SocksiPyHandler(socks.SOCKS5, proxy_host, proxy_port, username=username, password=password))
response = opener.open(req)
print(response.read().decode('utf-8'))
返回结果
{
"ip": "198.23.239.134",
"hostname": "198-23-239-134-host.colocrossing.com",
"city": "Buffalo",
"region": "New York",
"country": "US",
"loc": "42.8865,-78.8784",
"org": "AS36352 HostPapa",
"postal": "14202",
"timezone": "America/New_York",
"readme": "https://ipinfo.io/missingauth"
}
可以看到, 返回的结果 ip 已经是代理 ip 了
requests
Requests 库在 urllib 的基础上开发而来, 是采用了 Apache2 Licensed 协议的 HTTP 库. 与 urllib 相比, Requests 更加方便、快捷, 因此在编写爬虫程序时 Requests 库使用较多.
requests 使用代理较为简单, 只需要 proxies 配置参数传入即可
http/https
import requests
proxy = "dnpodndn:yb0xd39hqgc0@198.23.239.134:6540"
proxies = {
'http': 'http://' + proxy,
'https': 'https://' + proxy,
}
session = requests.Session()
session.proxies.update(proxies)
print(session.get('http://ipinfo.io').json())
socks5
import requests
proxy = "dnpodndn:yb0xd39hqgc0@198.23.239.134:6540"
proxies = {
"http": "socks5://" + proxy,
"https": "socks5://" + proxy,
}
session = requests.Session()
session.proxies.update(proxies)
print(session.get('https://httpbin.org/get').json())
返回结果
{
'ip': '198.23.239.134',
'hostname': '198-23-239-134-host.colocrossing.com',
'city': 'Buffalo',
'region': 'New York',
'country': 'US',
'loc': '42.8865,-78.8784',
'org': 'AS36352 HostPapa',
'postal': '14202',
'timezone': 'America/New_York',
'readme': 'https://ipinfo.io/missingauth'
}
httpx
httpx 使用与 requests 相类似
http/https
import httpx
proxy = "dnpodndn:yb0xd39hqgc0@198.23.239.134:6540"
proxies = {
'http://': 'http://' + proxy,
'https://': 'http://' + proxy,
}
response = httpx.Client(proxies=proxies).get('http://ipinfo.io')
print(response.text)
socks5
import httpx
proxy = "dnpodndn:yb0xd39hqgc0@198.23.239.134:6540"
response = httpx.Client(proxies='socks5://' + proxy).get("http://ipinfo.io")
print(response.text)
异步方式
import httpx
import asyncio
from httpx_socks import AsyncProxyTransport
proxy = "dnpodndn:yb0xd39hqgc0@198.23.239.134:6540"
async def fetch(url):
transport = AsyncProxyTransport.from_url('socks5://' + proxy)
async with httpx.AsyncClient(transport=transport) as client:
res = await client.get(url)
print(res.text)
if __name__ == '__main__':
asyncio.get_event_loop().run_until_complete(fetch("http://ipinfo.io"))
返回结果
{
"ip": "198.23.239.134",
"hostname": "198-23-239-134-host.colocrossing.com",
"city": "Buffalo",
"region": "New York",
"country": "US",
"loc": "42.8865,-78.8784",
"org": "AS36352 HostPapa",
"postal": "14202",
"timezone": "America/New_York",
"readme": "https://ipinfo.io/missingauth"
}