您好,登錄后才能下訂單哦!
如何使用Spark分析網站日志,相信很多沒有經驗的人對此束手無策,為此本文總結了問題出現的原因和解決方法,通過這篇文章希望你能解決這個問題。
郁悶從昨天開始個人網站不斷的發出告警504錯誤,登錄機器看了一下是php-fpm報錯,這個錯誤重啟php-fpm后,幾個小時就告警,快一年了都沒什么問題,奇怪
[28-Sep-2016 11:53:19] NOTICE: ready to handle connections
[28-Sep-2016 11:53:19] NOTICE: systemd monitor interval set to 10000ms
[28-Sep-2016 11:53:26] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[28-Sep-2016 13:46:35] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[28-Sep-2016 13:49:32] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
以為是這個值設置的太小了,所以修改了配置修改大了值
[28-Sep-2016 15:51:43] NOTICE: fpm is running, pid 28179
[28-Sep-2016 15:51:43] NOTICE: ready to handle connections
[28-Sep-2016 15:51:43] NOTICE: systemd monitor interval set to 10000ms
[28-Sep-2016 15:52:12] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 7 total children
[28-Sep-2016 16:15:58] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[28-Sep-2016 16:52:32] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[28-Sep-2016 16:53:05] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[28-Sep-2016 16:55:17] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
結果后來還是一樣,幾個小時之后再次504告警,再看nginx的日志,發現一些奇怪的ip訪問量非常大。。。有懷疑是有惡意ip的訪問,看來有必要查查訪問日志中的ip訪問量
root@iZ28bhfjhgkZ:/var/log/nginx# vim access.log
121.42.53.180 - - [25/Sep/2016:06:26:29 +0800] "POST /wp-cron.php?doing_wp_cron=1474755989.0131719112396240234375 HTTP/1.0" 499 0 "-" "WordPress/4.3.1; http://zhwen.org"
182.92.148.207 - - [25/Sep/2016:06:26:29 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)"
203.208.60.226 - - [25/Sep/2016:06:28:55 +0800] "GET /?p=675 HTTP/1.1" 200 8204 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
203.208.60.226 - - [25/Sep/2016:06:28:57 +0800] "GET /wp-content/themes/sparkling/inc/css/font-awesome.min.css?ver=4.3.1 HTTP/1.1" 200 26711 "http://zhwen.org/?p=675" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
203.208.60.226 - - [25/Sep/2016:06:28:57 +0800] "GET /wp-content/plugins/wp-pagenavi/pagenavi-css.css?ver=2.70 HTTP/1.1" 200 374 "http://zhwen.org/?p=675" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
203.208.60.226 - - [25/Sep/2016:06:28:58 +0800] "GET /wp-content/plugins/yet-another-related-posts-plugin/style/widget.css?ver=4.3.1 HTTP/1.1" 200 771 "http://zhwen.org/?p=675" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
121.43.107.174 - - [25/Sep/2016:06:29:18 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)"
115.28.189.208 - - [25/Sep/2016:06:29:33 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)"
42.156.139.59 - - [25/Sep/2016:06:30:58 +0800] "GET /?paged=14 HTTP/1.1" 200 11164 "-" "YisouSpider"
182.92.148.207 - - [25/Sep/2016:06:31:29 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)"
61.135.169.81 - - [25/Sep/2016:06:34:14 +0800] "GET /?p=articles/cscope-tags HTTP/1.1" 200 10681 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12) AppleWebKit/602.1.50 (KHTML, like Gecko)"
61.135.169.81 - - [25/Sep/2016:06:34:14 +0800] "GET /apple-touch-icon-precomposed.png HTTP/1.1" 404 151 "-" "Safari/12602.1.50.0.10 CFNetwork/807.0.4 Darwin/16.0.0 (x86_64)"
所以對訪問日志的ip做了一個簡單統計:
1)先把ip取出來(為了減少數據量,其實也可以直接壓縮后下載到本地),再下載到本地
root@iZ28bhfjhgkZ:/var/log/nginx# cat access.log|awk ‘{print $1}’ > tt
在sparkshell中執行下面的代碼:
val line = sc.textFile("/data1/data/t1")
line.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
.map(e => (e._2, e._1)).reduceByKey(_+","+_)
.sortByKey(true,1).saveAsTextFile("/data1/data/t3")
2)最后的結果t3的內容如下,發現這幾個ip的訪問量非常大,尤其
191.96.249.53
。。。。。
(855,182.92.148.207)
(3100,121.8.136.75)
(3889,61.135.169.81)
(53513,191.96.249.53)
3)再搞一個iptables限制,搞定。spark做這種統計分析還是非常簡單的,就是一行代碼搞定分析。
root@iZ28bhfjhgkZ:/var/log# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
root@iZ28bhfjhgkZ:/var/log# iptables -A INPUT -s 191.96.249.53 -j DROP
root@iZ28bhfjhgkZ:/var/log# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
DROP all -- DEDICATED.SERVER anywhere
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
root@iZ28bhfjhgkZ:/var/log#
看完上述內容,你們掌握如何使用Spark分析網站日志的方法了嗎?如果還想學到更多技能或想了解更多相關內容,歡迎關注億速云行業資訊頻道,感謝各位的閱讀!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。