Linux服務器上經常遇到一些系統和應用上的問題,如何分析排查,需要利器,下面總結列表了一些常用工具、trace tool;最后也列舉了最近hadoop社區在開發發展的分布式系統的trace tool。
概覽:
http://www.brendangregg.com/index.html
http://www.slideshare.net/brendangregg/linux-performance-analysis-and-tools
https://github.com/brendangregg/perf-tools/
http://www.brendangregg.com/linuxperf.html
引用linux-performance-analysis-and-tools中圖片,說明這些tool試用層次位置
其中提到了的工具,大部分在我日常工具箱里或者在實踐的案例里面使用過, 都有很高的價值,這里方便大家索引下:
OS系統命令
系統信息(RHEL/Fedora)
-
uname -a 或 cat /proc/version #print system information
-
Linux hadoopst2.cm6 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
-
15:42:46 up 674 days, 6 min, 35 users, load average: 1.30, 5.97, 11.53
-
Red Hat Enterprise Linux Server release 5.4 (Tikanga)
-
LSB Version: :core-3.1-amd64:core-3.1-ia32:core-3.1-noarch:graphics-3.1-amd64:graphics-3.1-ia32:graphics-3.1-noarch
-
lspci - list all PCI devices
-
last, lastb - show listing of last logged in users
-
lsmod — show the status of modules in the Linux Kernel
-
modprobe - add and remove modules from the Linux Kernel
常用命令/工具
-
To print a process tree: ps -ejH / ps axjf
-
To get info about threads: ps -eLf / ps axms
-
lsof - list open files, UNIX一切皆文件
-
/var/log/yum.log #yum 更新包日志
-
/var/log/cron #crontab日志,可以查看調度執行情況
-
ntpd - Network Time Protocol (NTP) daemon,同步集群中機器時間
-
squid - proxy caching server,集群WebUI的代理
系統監控
-
mpstat - Report processors related statistics. 注意%sys %iowait值
-
vmstat - Report virtual memory statistics
-
iostat - Report Central Processing Unit (CPU) statistics and input/output statistics for devices and partitions.
-
netstat - Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships
-
ganglia - a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.
-
sar/tsar - Collect, report, or save system activity information; tsar是淘寶自己改進的版本
-
定時采樣(每分鐘),可查歷史記錄(默認5分鐘),可彌補ganglia顯示更詳細信息
-
iftop - the "top" bandwidth consumers shown. iftop wiki
-
vmtouch, Portable file system cache diagnostics and control
網絡相關
-
telnet/nc IP PORT - 確認目標端口是否可訪問,只ping通不一定端口可訪問,可能防火墻等禁止
-
ifconfig/ifup/ifdown - configure a network interface
-
traceroute - print the route packets trace to network host
-
nslookup - query Internet name servers interactively
-
tcpdump - dump traffic on a network, 類似開源工具 wireshark, netsniff-ng, 更多工具比較
-
lynx - a general purpose distributed information browser for the World Wide Web
-
tcpcp - allows cooperating applications to pass ownership of TCP connection endpoints from one Linux host to another one.
程序/進程相關
靜態信息
-
ldconfig - configure dynamic linker run time bindings
-
ldconfig -p | grep SO 查看so是否在link cache中
-
ldd - print shared library dependencies, 查看exe或so依賴的so
-
nm - list symbols from object files,可grep查找是否存在相關的symbol,是否Undefined.
-
readelf - Displays information about ELF files. 可現實elf相關信息,如32/64位,適用的OS,處理器
動態信息
-
cat /proc/$PID/[cmdline|environ|limits|status|...] - 進程相關信息
-
pstack - print a stack trace of a running process
-
pmap - report memory map of a process
java相關
-
Java Troubleshooting Tools
-
jinfo - print java process information, 如classpath,java.libary.path(jni so目錄)
-
jstack - print a stack trace of a running java process,可查看死鎖情況
-
jmap - report memory map of a java process
-
jmap -histo:live 可觸發full gc
-
jmap -dump:live,file=$FILE 可dump heap內存,用于jhat等工具debug分析object在heap的占用情況
-
jhat - Heap Dump Browser - Starts a web server on a heap dump file (eg, produced by jmap -dump), allowing the heap to be browsed.
-
-J-mxXXXm ,分析大文件時需要加大heap大小
-
若有對象數據超大或內存占用過多,極有可能memory leak
-
Memory Analyzer (MAT) - eclipse plugin,Java heap analyzer
-
可視化工具,但受到機器內存的限制,無法分析太大的heap dump file
-
jdb - 可起服務做server,eclipse等工具遠程連接調試
-
jstat - Java Virtual Machine Statistics Monitoring Tool
-
jstatd - Virtual Machine jstat Daemon,可配合jvisualvm
-
jvisualvm - Java Virtual Machine Monitoring, Troubleshooting, and Profiling Tool;可遠程連接jstatd/jmx, 可視化展示工具:演示
-
jvmtop - In a top-like manner, displays JVM internal metrics (e.g. memory information) of running java processes.
-
JVM performance optimization JVM開發者寫的優化文章
-
Overview
-
Compilers
-
Garbage collection
-
Concurrently compacting GC
-
Scalability
-
HPROF - Heap Profiler: java -agentlib:hprof
Trace/Debug/Profiling工具
通用工具
-
strace - trace system calls and signals
-
示例:可跟蹤系統調用時間,如機器cpu:%sys高的問題
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
67.90 3966.320849 496 7992161 3050250 futex
25.80 1507.326693 127093 11860 epoll_wait
....................
-
blktrace, generate traces of the i/o traffic on block devices
-
ltrace - A library call tracer
-
gprof - a performance analysis tool, sampling and call-graph profiling
-
valgrind - an instrumentation framework for building dynamic analysis tools. automatically detect many memory management and threading bugs, and profile your programs in detail
-
systemtap - a simple command line interface and scripting language for writing instrumentation for a live running kernel plus user-space applications for complex tasks that may require live analysis, programmable on-line response, and whole-system symbolic access.
-
Linux版DTrace(SUN在Solaris上開發的)
-
功能強大,kernel, user-space app,cross language(java perl python ruby),build-in markers(pg mysql)
-
can write and reuse simple scripts to deeply examine the activities of a live system
-
Data can be extracted, filtered, and summarized quickly and safely, to enable diagnoses of complex performance or functional problems
-
豐富的 "tapset" script library
java trace工具
-
btrace - dynamic tracing tool for the Java platform. UserGuide
-
基于動態字節碼修改技術(Hotswap)來實現運行時java程序的跟蹤和替換, 實現原理
-
byteman - simplifies tracing and testing of Java programs. Can modify a running application without needing to stop and restart it.
-
define rules specifying the side effects you want to inject 而 BTrace類java語法
Distributed Tracing Tools
-
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
-
x-trace, a network diagnostic tool designed to provide users and network operators with better visibility into increasingly complex Internet applications.
-
HTrace, a tracing framework intended for use with distributed systems written in java
Linux observability tools | Linux 性能觀測工具
Linux benchmarking tools | Linux 性能測評工具
Linux tuning tools | Linux 性能調優工具
Linux observability sar