Python爬蟲可以通過以下幾種方式來偽裝自己,以避免被網站封禁或限制訪問:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
response = requests.get(url, headers=headers)
import requests
headers = {
'Referer': 'https://www.example.com'
}
response = requests.get(url, headers=headers)
import requests
headers = {
'Cookie': 'sessionid=xxxxxx'
}
response = requests.get(url, headers=headers)
import requests
proxies = {
'http': 'http://127.0.0.1:8888',
'https': 'https://127.0.0.1:8888'
}
response = requests.get(url, proxies=proxies)
需要注意的是,偽裝爬蟲的方式并不是絕對可靠的,有些網站可能會采取更復雜的反爬蟲措施。在進行爬蟲時,應該尊重網站的爬取規則,遵守robots.txt協議,并適度控制爬取頻率,以避免給對方服務器帶來過大的負擔。