在Python中,我們可以使用網絡爬蟲庫(如BeautifulSoup、Scrapy和Requests)來提取數據。以下是使用這些庫提取數據的基本步驟:
pip install beautifulsoup4
pip install scrapy
pip install requests
import requests
from bs4 import BeautifulSoup
url = 'https://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
<h1>
標簽的標題:titles = soup.find_all('h1')
for title in titles:
print(title.text)
import csv
with open('titles.csv', 'w', newline='', encoding='utf-8') as csvfile:
csv_writer = csv.writer(csvfile)
csv_writer.writerow(['Title'])
for title in titles:
csv_writer.writerow([title.text])
這只是一個簡單的示例,實際的網絡爬蟲可能需要處理更復雜的網頁結構和分頁。Scrapy框架提供了更多的功能和靈活性,可以用于構建更復雜的網絡爬蟲。