您好,登錄后才能下訂單哦!
本程序以爬取 'http://httpbin.org/post' 為例
格式:
導入urllib.request
導入urllib.parse
數據編碼處理,再設為utf-8編碼: bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')
打開爬取的網頁: response = urllib.request.urlopen('網址', data = data)
讀取網頁代碼: html = response.read()
打印:
1.不decode
print(html) #爬取的網頁代碼會不分行,沒有空格顯示,很難看
2.decode
print(html.decode()) #爬取的網頁代碼會分行,像寫規范的代碼一樣,看起來很舒服
查詢請求結果:
a. response.status # 返回 200:請求成功 404:網頁找不到,請求失敗
b. response.getcode() # 返回 200:請求成功 404:網頁找不到,請求失敗
1.不decode的程序如下:
import urllib.request import urllib.parsse data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8') response = urllib.request.urlopen(' data = data ) html = response.read() print(html) print("------------------------------------------------------------------") print("------------------------------------------------------------------") print(response.status) print(response.getcode())
運行結果:
2.帶decode的程序如下:
import urllib.request import urllib.parsse data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8') response = urllib.request.urlopen(' data = data ) html = response.read() print(html.decode()) print("------------------------------------------------------------------") print("------------------------------------------------------------------") print(response.status) print(response.getcode())
運行結果:
{ "args": {}, "data": "", "files": {}, "form": { "word": "hello" }, "headers": { "Accept-Encoding": "identity", "Connection": "close", "Content-Length": "10", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "Python-urllib/3.4" }, "json": null, "origin": "106.14.17.222", "url": "http://httpbin.org/post" } ------------------------------------------------------------------ ------------------------------------------------------------------ 200 200
為什么要用bytes轉換?
因為
data = urllib.parse.urlencode({'word': 'hello'}) ##沒有用bytes response = urllib.request.urlopen('http://httpbin.org/post', data = data ) html = response.read()
錯誤提示:
Traceback (most recent call last): File "/usercode/file.py", line 15, in <module> response = urllib.request.urlopen('http://httpbin.org/post', data = data ) File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.4/urllib/request.py", line 453, in open req = meth(req) File "/usr/lib/python3.4/urllib/request.py", line 1104, in do_request_ raise TypeError(msg) TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.
由此可見,post方式需要將請求內容用二進制編碼。
class bytes
([source[, encoding[, errors]]])
Return a new “bytes” object, which is an immutable sequence of integers in the range 0 <= x < 256
. bytes
is an immutable version of bytearray
– it has the same non-mutating methods and the same indexing and slicing behavior.
Accordingly, constructor arguments are interpreted as for bytearray()
.
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。