Linux常用監控指標有哪些

發布時間：2021-11-30 09:26:56 來源：億速云閱讀：180 作者：iii 欄目：大數據

本篇內容介紹了“Linux常用監控指標有哪些”的有關知識，在實際案例的操作過程中，不少人都會遇到這樣的困境，接下來就讓小編帶領大家學習一下如何處理這些情況吧！希望大家仔細閱讀，能夠學有所成！

1. Linux運維基礎采集項

做運維，不怕出問題，怕的是出了問題，抓不到現場，兩眼摸黑。所以，依靠強大的監控系統，收集盡可能多的指標，意義重大。但哪些指標才是有意義的呢，本著從實踐中來的思想，各位工程師在長期摸爬滾打中總結出來的經驗最有價值。

在各位運維工程師長期的工作實踐中，我們總結了在系統運維過程中，經常會參考的一些指標，主要包括以下幾個類別：

CPU
Load
內存
磁盤
IO
網絡相關
內核參數
ss 統計輸出
端口采集
核心服務的進程存活信息采集
關鍵業務進程資源消耗
NTP offset采集
DNS解析采集

每個類別，具體的詳細指標如下，這些指標，都是open-falcon的agent組件直接支持的。falcon-agent每隔一定時間間隔（目前是60秒）會采集一次相關的指標，并匯報給server端。

2. CPU相關采集項

計算方法：通過采集/proc/stat來得到，大家可以參考sar命令的統計輸出來理解。

cpu.idle：Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.
cpu.busy：與cpu.idle相對，他的值等于100減去cpu.idle。
cpu.guest：Percentage of time spent by the CPU or CPUs to run a virtual processor.
cpu.iowait：Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
cpu.irq：Percentage of time spent by the CPU or CPUs to service hardware interrupts.
cpu.softirq：Percentage of time spent by the CPU or CPUs to service software interrupts.
cpu.nice：Percentage of CPU utilization that occurred while executing at the user level with nice priority.
cpu.steal：Percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
cpu.system：Percentage of CPU utilization that occurred while executing at the system level (kernel).
cpu.user：Percentage of CPU utilization that occurred while executing at the user level (application).
cpu.cnt：cpu核數。
cpu.switches：cpu上下文切換次數，計數器類型。

3. 磁盤相關采集項

計算方法：先讀取/proc/mounts拿到所有掛載點，然后通過syscall.Statfs_t拿到blocks和inode的使用情況。每個metric都會附加一組tag描述，類似mount=$mount,fstype=$fstype，其中$mount是掛載點，比如/home，$fstype是文件系統，比如ext4。

df.bytes.free：磁盤可用量，int64
df.bytes.free.percent：磁盤可用量占總量的百分比，float64，比如32.1
df.bytes.total：磁盤總大小，int64
df.bytes.used：磁盤已用大小，int64
df.bytes.used.percent：磁盤已用大小占總量的百分比，float64
df.inodes.total：inode總數，int64
df.inodes.free：可用inode數目，int64
df.inodes.free.percent：可用inode占比，float64
df.inodes.used：已用的inode數據，int64
df.inodes.used.percent：已用inode占比，float64

4. megacli工具輸出

使用 megacli 工具讀取 RAID 相關信息，每個metric都會附件一組tag描述，用來標明所屬PD或者 VD，PD格式為PD=Enclosure_ID:SLOT_ID，比如PD=32:0表明第一塊磁盤，VD=0 表明第一個邏輯磁盤。

sys.disk.lsiraid.pd.Media_Error_Count：這個及以下三個指標目前僅作為數據收集，不一定意味磁盤損壞（只是表示損壞概率變大）
sys.disk.lsiraid.pd.Other_Error_Count
sys.disk.lsiraid.pd.Predictive_Failure_Count
sys.disk.lsiraid.pd.Drive_Temperature
sys.disk.lsiraid.pd.Firmware_state：如果值不為0，則此物理磁盤出現問題
sys.disk.lsiraid.vd.cache_policy：如果值不為0，表示此邏輯磁盤緩存策略和設置不符
sys.disk.lsiraid.vd.state：如果值不為0，表示此邏輯磁盤出現問題

5. SMART工具輸出

使用 smartctl 工具讀取磁盤 SMART 信息，目前所有指標僅作為數據收集，不一定意味磁盤損壞（只是表示概率變大），每個metric都會有一組tag描述，表明盤符，例如device=/dev/sda。

sys.disk.smart.Reallocated_Sector_Ct
sys.disk.smart.Spin_Retry_Count
sys.disk.smart.Reallocated_Event_Count
sys.disk.smart.Current_Pending_Sector
sys.disk.smart.Offline_Uncorrectable
sys.disk.smart.Temperature_Celsius

6. 分區讀寫監控

測試所有已掛載分區是否可讀寫，每個metric都會有一組tag描述，表示掛載點，比如mount=/home

sys.disk.rw：如果值不為0，表明此分區讀寫出現問題

7. IO相關采集項

計算方法：每秒采集一次/proc/diskstats，計算差值，都是計數器類型的。每個metric都會有一組tag描述，形如device=$device，用來表示具體的設備，比如sda1、sdb。用戶可以參考iostat的幫助文檔來理解具體的metric含義。

disk.io.ios_in_progress：Number of actual I/O requests currently in flight.
disk.io.msec_read：Total number of ms spent by all reads.
disk.io.msec_total：Amount of time during which ios_in_progress >= 1.
disk.io.msec_weighted_total：Measure of recent I/O completion time and backlog.
disk.io.msec_write：Total number of ms spent by all writes.
disk.io.read_merged：Adjacent read requests merged in a single req.
disk.io.read_requests：Total number of reads completed successfully.
disk.io.read_sectors：Total number of sectors read successfully.
disk.io.write_merged：Adjacent write requests merged in a single req.
disk.io.write_requests：total number of writes completed successfully.
disk.io.write_sectors：total number of sectors written successfully.
disk.io.read_bytes：單位是byte的數字
disk.io.write_bytes：單位是byte的數字
disk.io.avgrq_sz：下面幾個值就是iostat -x 1看到的值
disk.io.avgqu-sz
disk.io.await
disk.io.svctm
disk.io.util：是個百分數，比如56.43，表示56.43%

8. 機器負載相關采集項

計算方法：讀取/proc/loadavg，都是原始值類型的：

load.1min
load.5min
load.15min

9. 內存相關采集項

計算方法：讀取/proc/meminfo 中的內容，其中的mem.memfree是free+buffers+cached，mem.memused=mem.memtotal-mem.memfree。用戶具體可以參考free命令的輸出和幫助文檔來理解每個metric的含義。

mem.memtotal：內存總大小
mem.memused：使用了多少內存
mem.memused.percent：使用的內存占比
mem.memfree
mem.memfree.percent
mem.swaptotal：swap總大小
mem.swapused：使用了多少swap
mem.swapused.percent：使用的swap的占比
mem.swapfree
mem.swapfree.percent

10. 網絡相關采集項

計算方法：讀取/proc/net/dev的內容，每個metric都附加有一組tag，形如iface=$iface，標明具體那個interface，比如eth0。metric中帶有in的表示流入情況，out表示流出情況，total是總量in+out，支持的metric如下：

net.if.in.bytes
net.if.in.compressed
net.if.in.dropped
net.if.in.errors
net.if.in.fifo.errs
net.if.in.frame.errs
net.if.in.multicast
net.if.in.packets
net.if.out.bytes
net.if.out.carrier.errs
net.if.out.collisions
net.if.out.compressed
net.if.out.dropped
net.if.out.errors
net.if.out.fifo.errs
net.if.out.packets
net.if.total.bytes
net.if.total.dropped
net.if.total.errors
net.if.total.packets

11. 端口采集項

計算方法，通過ss -ln，來判斷指定的端口是否處于listen狀態。原始值類型，值要么是1：代表在監聽，要么是0，代表沒有在監聽。每個metric都附件一組tag，形如port=port，port就是具體的端口。

net.port.listen

12. 機器內核配置

kernel.maxfiles：讀取的/proc/sys/fs/file-max
kernel.files.allocated：讀取的/proc/sys/fs/file-nr第一個Field
kernel.files.left：值=kernel.maxfiles-kernel.files.allocated
kernel.maxproc：讀取的/proc/sys/kernel/pid_max

13. ntp采集項

使用 ntpq -pn 獲取本機時間相對于 ntp 服務器的 offset。

sys.ntp.offset：本機偏移時間，單位為ms，值過大或者為0則表明有異常，需要報警

14. 進程監控

proc.num：判斷某個進程的數目，這里需要分兩個場景，一種是根據進程的名字來判定，比如name=sshd；另外一種是根據cmdline來判定，比如Java的應用進程名可能都是java，根據第一種情況沒法做區分，此時可以配置cmdline，如cmdline=./falcon_agent-c./cfg.ini

15. 進程資源監控

process.cpu.all：進程和它的子進程使用的sys+user的cpu，單位是jiffies
process.cpu.sys：進程和它的子進程使用的sys cpu，單位是jiffies
process.cpu.user：進程和它的子進程使用的user cpu，單位是jiffies
process.swap：進程和它的子進程使用的swap，單位是page
process.fd：進程使用的文件描述符個數
process.mem：進程占用內存，單位byte

16. ss命令輸出

ss.orphaned
ss.closed
ss.timewait
ss.slabinfo.timewait
ss.synrecv
ss.estab

“Linux常用監控指標有哪些”的內容就介紹到這里了，感謝大家的閱讀。如果想了解更多行業相關的知識可以關注億速云網站，小編將為大家輸出更多高質量的實用文章！

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

Linux常用監控指標有哪些

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

Linux常用監控指標有哪些

猜你喜歡

最新資訊

相關推薦

相關標簽