python爬蟲中scrapy怎么處理項目數據

發布時間：2020-11-23 09:25:30 來源：億速云閱讀：199 作者：小新欄目：編程語言

這篇文章將為大家詳細講解有關python爬蟲中scrapy怎么處理項目數據，小編覺得挺實用的，因此分享給大家做個參考，希望大家閱讀完這篇文章后可以有所收獲。

1、拉取項目

$ git clone https://github.com/jonbakerfish/TweetScraper.git
$ cd TweetScraper/
$ pip install -r requirements.txt  #add '--user' if you are not root
$ scrapy list
$ #If the output is 'TweetScraper', then you are ready to go.

2、數據持久化

通過閱讀文檔，我們發現該項目有三種持久化數據的方式，第一種是保存在文件中，第二種是保存在Mongo中，第三種是保存在MySQL數據庫中。因為我們抓取的數據需要做后期的分析，所以，需要將數據保存在MySQL中。

抓取到的數據默認是以Json格式保存在磁盤 ./Data/tweet/ 中的，所以，需要修改配置文件 TweetScraper/settings.py 。

ITEM_PIPELINES = {
    # 'TweetScraper.pipelines.SaveToFilePipeline':100,
    #'TweetScraper.pipelines.SaveToMongoPipeline':100, # replace `SaveToFilePipeline` with this to use MongoDB
    'TweetScraper.pipelines.SavetoMySQLPipeline':100, # replace `SaveToFilePipeline` with this to use MySQL
}
 
#settings for mysql
MYSQL_SERVER = "18.126.219.16"
MYSQL_DB     = "scraper"
MYSQL_TABLE  = "tweets" # the table will be created automatically
MYSQL_USER   = "root"        # MySQL user to use (should have INSERT access granted to the Database/Table
MYSQL_PWD    = "admin123456"        # MySQL user's password

關于python爬蟲中scrapy怎么處理項目數據就分享到這里了，希望以上內容可以對大家有一定的幫助，可以學到更多知識。如果覺得文章不錯，可以把它分享出去讓更多的人看到。

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

python爬蟲中scrapy怎么處理項目數據

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

python爬蟲中scrapy怎么處理項目數據

猜你喜歡

最新資訊

相關推薦

相關標簽