使用python怎么掃描web郵箱

發布時間：2021-03-31 16:37:53 來源：億速云閱讀：196 作者：Leah 欄目：開發技術

這篇文章給大家介紹使用python怎么掃描web郵箱，內容非常詳細，感興趣的小伙伴們可以參考借鑒，希望對大家能有所幫助。

基本思路

我們向工具傳入目標站點之后，首先要對輸入進行一個基本的檢查和分析，因為我們會可能會傳入各種樣式的地址，比如http://www.xxxx.com/、http://www.xxxx.com/123/456/789.html等等，我們需要對其進行簡單的拆分，以便于后面鏈接的爬取
通過requests庫爬取目標地址的內容，并且在內容通過正則表達式中尋找郵箱地址
查找爬取的網站中的超鏈接，通過這些超鏈接我們就能進入到該站點的另外一個頁面繼續尋找我們想要的郵箱地址。
開工：

該腳本所需要的一些庫

from bs4 import BeautifulSoup #BeautifulSoup最主要的功能是從網頁抓取數據，Beautiful Soup自動將輸入文檔轉換為Unicode編碼
import requests #requests是python實現的最簡單易用的HTTP庫
import requests.exceptions
import urllib.parse
from collections import deque #deque 是一個雙端隊列, 如果要經常從兩端append 的數據, 選擇這個數據結構就比較好了, 如果要實現隨機訪問,不建議用這個,請用列表. 
import re #是一個正則表達式的庫

獲取掃描目標

user_url=str(input('[+] Enter Target URL to Scan:'))
urls =deque([user_url]) #把目標地址放入deque對象列表

scraped_urls= set()#set() 函數創建一個無序不重復元素集，可進行關系測試，刪除重復數據，還可以計算交集、差集、并集等。
emails = set()

對網頁進行郵箱地址爬取（100條）

首先要對目標地址進行分析，拆分目標地址的協議，域名以及路徑。然后利用requests的get方法訪問網頁，通過正則表達式過濾出是郵箱地址的內容。'[a-z0-0.-+]+@[a-z0-9.-+]+.[a-z]+'，符合郵箱格式的內容就進行收錄。

count=0
try:
  while len(urls):  #如果urls有長度的話進行循環
    count += 1		#添加計數器來記錄爬取鏈接的條數 
    if count ==101:
      break
    url = urls.popleft() #popleft（）會刪除urls里左邊第一條數據并傳給url
    scraped_urls.add(url)

    parts = urllib.parse.urlsplit(url) # 打印 parts會顯示：SplitResult(scheme='http', netloc='www.baidu.com', path='', query='', fragment='')
    base_url = '{0.scheme}://{0.netloc}'.format(parts)#scheme：協議；netloc：域名 

    path = url[:url.rfind('/')+1] if '/' in parts.path else url#提取路徑
    print('[%d] Processing %s' % (count,url))
   
    try:
      head = {'User-Agent':"Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; en) Presto/2.8.131 Version/11.11"}
      response = requests.get(url,headers = head)
    except(requests.exceptions.MissingSchema,requests.exceptions.ConnectionError):
      continue
    new_emails = set(re.findall(r'[a-z0-0\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+', response.text ,re.I))#通過正則表達式從獲取的網頁中提取郵箱，re.I表示忽略大小寫
    emails.update(new_emails)#將獲取的郵箱地址存在emalis中。

通過錨點進入下一網頁繼續搜索

soup = BeautifulSoup(response.text, features='lxml')

    for anchor in soup.find_all('a'):  #尋找錨點。在html中，<a>標簽代表一個超鏈接，herf屬性就是鏈接地址
      link = anchor.attrs['href'] if 'href' in anchor.attrs else '' #如果，我們找到一個超鏈接標簽，并且該標簽有herf屬性，那么herf后面的地址就是我們需要錨點鏈接。
      if link.startswith('/'):#如果該鏈接以/開頭，那它只是一個路徑，我們就需要加上協議和域名，base_url就是剛才分離出來的協議+域名
        link = base_url + link
      elif not link.startswith('http'):#如果不是以/和http開頭的話，就要加上路徑。
        link =path + link
      if not link in urls and not link in scraped_urls:#如果該鏈接在之前沒還有被收錄的話，就把該鏈接進行收錄。
        urls.append(link)
except KeyboardInterrupt:
  print('[+] Closing')

for mail in emails:
  print(mail)

完整代碼

from bs4 import BeautifulSoup
import requests
import requests.exceptions
import urllib.parse
from collections import deque
import re

user_url=str(input('[+] Enter Target URL to Scan:'))
urls =deque([user_url])

scraped_urls= set()
emails = set()


count=0
try:
  while len(urls):
    count += 1
    if count ==100:
      break
    url = urls.popleft()
    scraped_urls.add(url)

    parts = urllib.parse.urlsplit(url)
    base_url = '{0.scheme}://{0.netloc}'.format(parts)

    path = url[:url.rfind('/')+1] if '/' in parts.path else url

    print('[%d] Processing %s' % (count,url))
    try:
      head = {'User-Agent':"Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; en) Presto/2.8.131 Version/11.11"}
      response = requests.get(url,headers = head)
    except(requests.exceptions.MissingSchema,requests.exceptions.ConnectionError):
      continue
    new_emails = set(re.findall(r'[a-z0-0\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+', response.text ,re.I))
    emails.update(new_emails)

    soup = BeautifulSoup(response.text, features='lxml')

    for anchor in soup.find_all('a'):
      link = anchor.attrs['href'] if 'href' in anchor.attrs else ''
      if link.startswith('/'):
        link = base_url + link
      elif not link.startswith('http'):
        link =path + link
      if not link in urls and not link in scraped_urls:
        urls.append(link)
except KeyboardInterrupt:
  print('[+] Closing')

for mail in emails:
  print(mail)

實驗………………

使用python怎么掃描web郵箱

關于使用python怎么掃描web郵箱就分享到這里了，希望以上內容可以對大家有一定的幫助，可以學到更多知識。如果覺得文章不錯，可以把它分享出去讓更多的人看到。

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

使用python怎么掃描web郵箱

該腳本所需要的一些庫

獲取掃描目標

對網頁進行郵箱地址爬取（100條）

通過錨點進入下一網頁繼續搜索

完整代碼

實驗………………

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

使用python怎么掃描web郵箱

該腳本所需要的一些庫

獲取掃描目標

對網頁進行郵箱地址爬取（100條）

通過錨點進入下一網頁繼續搜索

完整代碼

實驗………………

猜你喜歡

最新資訊

相關推薦

相關標簽