91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

Python如何爬取京東的評價信息

發布時間:2020-06-09 12:32:05 來源:網絡 閱讀:822 作者:mb5c9f0bd46ed07 欄目:編程語言

Python如何爬取京東的評價信息

模塊:requests,BeautifulSoup

import re
import time
import csv
import requests
from bs4 import BeautifulSoup

def write_a_row_in_csv(data, csv_doc):
    "save good information into a row in csv document"
    with open(csv_doc, 'a', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(data)

# add headers, download page, check status code, return page
url = 'https://search.jd.com/Search?keyword=%E5%8D%8E%E4%B8%BAp20&enc=utf-8&suggest=1.def.0.V13&wq=%E5%8D%8E%E4%B8%BA&pvid=f47b5d05bba84d9dbfabf983575a6875'
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0"
    }
response = requests.get(url, headers=headers)
print(response.status_code)

# save as html document
with open('html.html', 'w', encoding='utf8') as f:
    f.write(response.text)

# save as csv document
with open('phone.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    fields = ('id', '名稱', '價格', '評價人數', '好評率')
    writer.writerow(fields)

# find elements, such as name, item, price, comment, goodrate, comment count
soup_all = BeautifulSoup(response.content, 'lxml')
sp_all_items = soup_all.find_all('li', attrs={'class': 'gl-item'})
for soup in sp_all_items[:3]:
    print('-' * 50)
    name = soup.find('div', attrs={'class': 'p-name p-name-type-2'}).find('em').text
    print('name: ', name)
    item = soup.find('div', attrs={'class': 'p-name p-name-type-2'}).find('a')
    print('item: ', item['href'], re.search(r'(\d+)', item['href']).group())
    price = soup.find_all('div', attrs={'class': 'p-price'})
    print('price:', price[0].i.string)
    comment = soup.find_all('div', attrs={'class': 'p-commit'})
    print('comment url:', comment[0].find('a').attrs['href'])
    time.sleep(0.2)

    # need add referer into headers
    item_id = re.search(r'(\d+)', item['href']).group()
    url = f'https://sclub.jd.com/comment/productPageComments.action?productId={item_id}&score=0&sortType=5&page=0&pageSize=10&isShadowSku=0&fold=1'
    headers = {
        "referer": f"https://item.jd.com/{item_id}.html",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0"
    }
    response = requests.get(url, headers=headers)
    with open('html.json', 'w', encoding='utf8') as f:
        f.write(response.text)
    data = response.json()
    comment_count = data['productCommentSummary']['commentCount']
    print('評價人數:', comment_count)
    good_rate = data['productCommentSummary']['goodRate']
    print('好評率:', good_rate)

    # record data into CSV sheet
    write_a_row_in_csv(('id'+item_id, name, price[0].i.string, comment_count, good_rate), 'phone.csv')
向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

兴文县| 金平| 石泉县| 松桃| 镇康县| 巩留县| 洛川县| 奉节县| 天长市| 砚山县| 安吉县| 英吉沙县| 林州市| 长寿区| 万州区| 察隅县| 大化| 兴隆县| 屯留县| 白山市| 枣庄市| 湄潭县| 浦北县| 鹤峰县| 邵阳县| 淳安县| 合水县| 治县。| 大连市| 融水| 兰考县| 宝山区| 乌审旗| 石阡县| 永和县| 小金县| 龙胜| 双鸭山市| 台前县| 延川县| 德令哈市|