Python如何使用Chrome插件實現爬蟲

發布時間：2020-07-17 14:21:39 來源：億速云閱讀：501 作者：小豬欄目：開發技術

小編這次要給大家分享的是Python如何使用Chrome插件實現爬蟲，文章內容豐富，感興趣的小伙伴可以來了解一下，希望大家閱讀完這篇文章之后能夠有所收獲。

做電商時，消費者對商品的評論是很重要的，但是不會寫代碼怎么辦？這里有個Chrome插件可以做到簡單的數據爬取，一句代碼都不用寫。下面給大家展示部分抓取后的數據：

可以看到，抓取的地址，評論人，評論內容，時間，產品顏色都已經抓取下來了。那么，爬取這些數據需要哪些工具呢？就兩個：

1. Chrome瀏覽器；

2. 插件：Web Scraper

插件下載地址：https://chromecj.com/productivity/2018-05/942.html

最后，如果你想自己動手抓取一下，這里是這次抓取的詳細過程：

1. 首先，復制如下的代碼，對，你不需要寫代碼，但是為了便于上手，復制代碼還是需要的，后續可以自己定制和選擇，不需要寫代碼。

{
  "_id": "jdreview",
  "startUrl": [
    "https://item.jd.com/100000680365.html#comment"
  ],
  "selectors": [
    {
      "id": "user",
      "type": "SelectorText",
      "selector": "div.user-info",
      "parentSelectors": [
        "main"
      ],
      "multiple": false,
      "regex": "",
      "delay": 0
    },
    {
      "id": "comments",
      "type": "SelectorText",
      "selector": "div.comment-column > p.comment-con",
      "parentSelectors": [
        "main"
      ],
      "multiple": false,
      "regex": "",
      "delay": 0
    },
    {
      "id": "time",
      "type": "SelectorText",
      "selector": "div.comment-message:nth-of-type(5) span:nth-of-type(4), div.order-info span:nth-of-type(4)",
      "parentSelectors": [
        "main"
      ],
      "multiple": false,
      "regex": "",
      "delay": "0"
    },
    {
      "id": "color",
      "type": "SelectorText",
      "selector": "div.order-info span:nth-of-type(1)",
      "parentSelectors": [
        "main"
      ],
      "multiple": false,
      "regex": "",
      "delay": 0
    },
    {
      "id": "main",
      "type": "SelectorElementClick",
      "selector": "div.comment-item",
      "parentSelectors": [
        "_root"
      ],
      "multiple": true,
      "delay": "10000",
      "clickElementSelector": "div.com-table-footer a.ui-pager-next",
      "clickType": "clickMore",
      "discardInitialElements": false,
      "clickElementUniquenessType": "uniqueHTMLText"
    }
  ]
}

2. 然后打開chrome瀏覽器，在任意頁面同時按下Ctrl+Shift+i，在彈出的窗口中找到Web Scraper，如下：

Python如何使用Chrome插件實現爬蟲