您好,登錄后才能下訂單哦!
小編給大家分享一下RabbitMQ如何實現服務檢查,希望大家閱讀完這篇文章之后都有所收獲,下面讓我們一起去探討吧!
登錄到各個RabbitMQ節點上,執行
rabbitmqctl status 正常狀態如下: # Status of node rabbit@devxyz ... # [{pid,13505}, # {running_applications, # [{rabbitmq_management,"RabbitMQ Management Console","3.6.5"}, # {rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.5"}, # {rabbit,"RabbitMQ","3.6.5"}, # {os_mon,"CPO CXC 138 46","2.4"}, # {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.5"}, # {webmachine,"webmachine","1.10.3"}, # {mochiweb,"MochiMedia Web Server","2.13.1"}, # {amqp_client,"RabbitMQ AMQP Client","3.6.5"}, # {rabbit_common,[],"3.6.5"}, # {mnesia,"MNESIA CXC 138 12","4.13.4"}, # {compiler,"ERTS CXC 138 10","6.0.3"}, # {ssl,"Erlang/OTP SSL application","7.3.3.1"}, # {ranch,"Socket acceptor pool for TCP protocols.","1.2.1"}, # {public_key,"Public key infrastructure","1.1.1"}, # {xmerl,"XML parser","1.3.10"}, # {inets,"INETS CXC 138 49","6.2.4"}, # {asn1,"The Erlang ASN1 compiler version 4.0.2","4.0.2"}, # {crypto,"CRYPTO","3.6.3"}, # {syntax_tools,"Syntax tools","1.7"}, # {sasl,"SASL CXC 138 11","2.7"}, # {stdlib,"ERTS CXC 138 10","2.8"}, # {kernel,"ERTS CXC 138 10","4.2"}]}, # {os,{unix,linux}}, # {erlang_version, # "Erlang/OTP 18 [erts-7.3.1.2] [source] [64-bit] [smp:8:8] [async-threads:128] [hipe] [kernel-poll:true]\n"}, # {memory, # [{total,119288000}, # {connection_readers,491304}, # {connection_writers,33944}, # {connection_channels,115312}, # {connection_other,563312}, # {queue_procs,510368}, # {queue_slave_procs,0}, # {plugins,1254560}, # {other_proc,18328184}, # {mnesia,160320}, # {mgmt_db,2527968}, # {msg_index,66840}, # {other_ets,1641160}, # {binary,55247472}, # {code,27655723}, # {atom,992409}, # {other_system,9699124}]}, # {alarms,[]}, # {listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]}, # {vm_memory_high_watermark,0.4}, # {vm_memory_limit,6663295795}, # {disk_free_limit,50000000}, # {disk_free,53003800576}, # {file_descriptors, # [{total_limit,1948}, # {total_used,23}, # {sockets_limit,1751}, # {sockets_used,21}]}, # {processes,[{limit,1048576},{used,498}]}, # {run_queue,0}, # {uptime,47953}, # {kernel,{net_ticktime,60}}] 得到rabbitmq服務的狀態,得到的結果中顯示服務正在running,且不存在nodedown、error等字樣,并且 running_applications中包含了rabbitmq_management等應用名(如果開啟了rabbitmq_management等插件) 1.1、如果running狀態,但是沒有rabbitmq_management字樣,類似如下結果: # Status of node rabbit@devxyz ... # [{pid,13505}, # {running_applications,[{compiler,"ERTS CXC 138 10","6.0.3"}, # {ssl,"Erlang/OTP SSL application","7.3.3.1"}, # {ranch,"Socket acceptor pool for TCP protocols.", # "1.2.1"}, # {public_key,"Public key infrastructure","1.1.1"}, # {xmerl,"XML parser","1.3.10"}, # {inets,"INETS CXC 138 49","6.2.4"}, # {asn1,"The Erlang ASN1 compiler version 4.0.2", # "4.0.2"}, # {crypto,"CRYPTO","3.6.3"}, # {syntax_tools,"Syntax tools","1.7"}, # {sasl,"SASL CXC 138 11","2.7"}, # {stdlib,"ERTS CXC 138 10","2.8"}, # {kernel,"ERTS CXC 138 10","4.2"}]}, # {os,{unix,linux}}, # {erlang_version,"Erlang/OTP 18 [erts-7.3.1.2] [source] [64-bit] [smp:8:8] [async-threads:128] [hipe] [kernel-poll:true]\n"}, # {memory,[{total,58267544}, # {connection_readers,0}, # {connection_writers,0}, # {connection_channels,0}, # {connection_other,0}, # {queue_procs,0}, # {queue_slave_procs,0}, # {plugins,0}, # {other_proc,18771312}, # {mnesia,0}, # {mgmt_db,0}, # {msg_index,0}, # {other_ets,1218464}, # {binary,29984}, # {code,27655723}, # {atom,992409}, # {other_system,9599652}]}, # {alarms,[]}, # {listeners,[]}, # {processes,[{limit,1048576},{used,73}]}, # {run_queue,0}, # {uptime,48363}, # {kernel,{net_ticktime,60}}] 則說明rabbitmq應用沒有啟動,只啟動了基礎服務,則執行 rabbitmqctl start_app 得到: # Starting node rabbit@devxyz ... 然后 rabbitmqctl status 再次驗證服務狀態 1.2、如果存在error,例如: # Status of node rabbit@devxyz ... # Error: unable to connect to node rabbit@devxyz: nodedown # # DIAGNOSTICS # =========== # # attempted to contact: [rabbit@devxyz] # # rabbit@devxyz: # * connected to epmd (port 4369) on devxyz # * epmd reports: node 'rabbit' not running at all # no other nodes on devxyz # * suggestion: start the node # # current node details: # - node name: 'rabbitmq-cli-07@devxyz' # - home dir: /var/lib/rabbitmq # - cookie hash: duuNopvOx1ChRdjrRHPo+A== 說明rabbitmq的基礎服務都沒有啟動起來,首先嘗試如下命令看是否可以啟動: rabbitmq-server -detached 得到: # Warning: PID file not written; -detached was passed. rabbitmqctl start_app 得到 # Starting node rabbit@devxyz ... rabbitmqctl status驗證 如果無法得到正常狀態,則需要根據報錯信息進行判斷再進行相應操作
登錄到任意一個存活的RabbitMQ節點上,執行
rabbitmqctl cluster_status 得到: # Cluster status of node rabbit@HYRBT001 ... # [{nodes,[{disc,[rabbit@HYRBT001,rabbit@HYRBT002,rabbit@HYRBT003]}]}, # {running_nodes,[rabbit@HYRBT003,rabbit@HYRBT002,rabbit@HYRBT001]}, # {cluster_name,<<"HYRBT001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYRBT003,[]},{rabbit@HYRBT002,[]},{rabbit@HYRBT001,[]}]}] 得到集群的狀態信息 nodes: 后面會顯示所有的rabbitmq節點 running_nodes: 后面會顯示所有的rabbitmq節點 cluster_name:后面會顯示集群名稱 partitions之后為空 alarms之后跟的節點之后的[]中為空 2.1、如果nodes后面的rabbitmq節點不全,說明存在節點沒有加入到集群中 例如: # Cluster status of node rabbit@HYCTL001 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002]}]}, # {running_nodes,[rabbit@HYCTL002,rabbit@HYCTL001]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYCTL002,[]},{rabbit@HYCTL001,[]}]}] 但實際上應該有三個節點上,則登錄到未加入到集群中的節點3上 首先驗證此節點與已經加入集群的節點的連通性,通過ping測試 然后驗證.erlang.cookie是否相同 .erlang.cookie位于/var/lib/rabbitmq/下 如果不同,則將集群中節點的內容復制到此節點上 驗證都通過后,查看rabbitmq服務是否已經開啟,具體見步驟1 服務正常之后,執行如下命令加入集群: rabbitmqctl stop_app 得到: # Stopping node rabbit@HYCTL003 ... rabbitmqctl reset 得到: # Resetting node rabbit@HYCTL003 ... rabbitmqctl join_cluster rabbit@集群節點名 得到: # Clustering node rabbit@HYCTL003 with rabbit@HYCTL001 ... rabbitmqctl start_app 得到: # Starting node rabbit@HYCTL003 ... rabbitmqctl cluster_status驗證節點已經加入到nodes、running_nodes及alarms之后 得到: # Cluster status of node rabbit@HYCTL003 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}]}, # {running_nodes,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYCTL001,[]},{rabbit@HYCTL002,[]},{rabbit@HYCTL003,[]}]}] 2.2、如果running_nodes之后未顯示所有的節點,說明部分節點上的rabbitmq服務未正常,例如: # Cluster status of node rabbit@HYCTL001 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}]}, # {running_nodes,[rabbit@HYCTL002,rabbit@HYCTL001]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYCTL002,[]},{rabbit@HYCTL001,[]}]}] 發現節點3沒有running,則登錄到節點3 參考步驟1進行處理,處理完成后,執行 rabbitmqctl cluster_status驗證 # Cluster status of node rabbit@HYCTL003 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}]}, # {running_nodes,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYCTL001,[]},{rabbit@HYCTL002,[]},{rabbit@HYCTL003,[]}]}] 2.3、如果partitions中存在節點,則說明發生了腦裂(一般為網絡問題,導致節點之間通信異常),集群服務處于異常狀態。 例如: # Cluster status of node rabbit@HYCTL001 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL00]}]}, # {running_nodes,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[{rabbit@HYCTL001,rabbit@HYCTL002,[rabbit@HYCTL001]}, # {rabbit@HYCTL003,[rabbit@HYCTL001,rabbit@HYCTL002]}]}, # {alarms,[{rabbit@HYCTL001,[]},{rabbit@HYCTL002,[]},{rabbit@HYCTL003,[]}]}] 需要確定一個主節點進行保留,然后把另外partition中節點進行服務重啟。 主節點的確定主要分兩種情況: 2.3.1、如果使用了haproxy來對rabbitmq集群進行負載均衡,并且設置了主備模式,則可以通過查看haproxy的配置 文件來確定: 登錄到某一臺控制節點,查看haproxy配置文件: cat /etc/haproxy/conf.d/100-rabbitmq.cfg 得到: # listen rabbitmq # bind 192.168.0.10:5672 # balance roundrobin # mode tcp # option tcpka # timeout client 48h # timeout server 48h # server HYCTL001 192.168.0.11:5673 check inter 5000 rise 2 fall 3 # server HYCTL002 192.168.0.12:5673 backup check inter 5000 rise 2 fall 3 # server HYCTL003 192.168.0.13:5673 backup check inter 5000 rise 2 fall 3 配置文件中存在backup的是備節點,無backup的是主節點,由此可見,對于本環境,HYCTL001為主節點,處理業務 確定好主節點之后,登錄到其他非主節點的rabbitmq節點進行rabbitmq服務的重啟 執行如下命令: rabbitmqctl stop 得到: # Stopping and halting node rabbit@HYCTL003 ... rabbitmq-server -detached 得到: # Warning: PID file not written; -detached was passed. rabbitmqctl start_app 得到: # Starting node rabbit@HYCTL003 ... 使用rabbitmqctl status檢查狀態 使用rabbitmqctl cluster_status檢查集群狀態,如果依然存在其他腦裂的節點,則partitions中主節點所在元組 會增加剛剛重啟的節點,其他元組中該節點被移除。如果所有的腦裂節點都已經處理完畢,則partitions后無節點存在, 得到: # Cluster status of node rabbit@HYCTL003 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}]}, # {running_nodes,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYCTL001,[]},{rabbit@HYCTL002,[]},{rabbit@HYCTL003,[]}]}] 2.3.2 如果沒有設置主備模式,則需要確定下當前連接數最多的節點,以此節點為主 通過查看連接數來進行判斷,在任意一個RabbitMQ節點上執行: rabbitmqctl list_connections pid | grep HYCTL001(節點名) | wc -l 對所有的節點名進行連接數個數的選取,最終選擇連接數目最多的那個partition元組作為主元組,對其他元組中的節點 進行RabbitMQ服務的重啟,重啟步驟與2.3.1相同 2.4、如果alarms中存在節點,說明內存或者磁盤占用過多,例如: # Cluster status of node rabbit@HYCTL003 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}]}, # {running_nodes,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYCTL001,[]}, # {rabbit@HYCTL002,[]}, # {rabbit@HYCTL003,[disk,memory]}]}] 說明節點3上內存和磁盤都出現了報警,說明有大量的消息堆積在了節點3上,原因可能是后端消費消息的服務異常或者存在無效的隊列 一直在接收消息,但是并沒有消費者進行消費。 RabbitMQ報警的參數是可以設置的,具體的值通過rabbitmqctl status可以看到,如下: # {vm_memory_high_watermark,0.4},內存使用閾值 # {vm_memory_limit,81016840192},內存使用限值 # {disk_free_limit,50000000},空閑磁盤限值 # {disk_free,553529729024},磁盤空余量 # {file_descriptors, # [{total_limit,102300}, # {total_used,2040}, # {sockets_limit,92068}, # {sockets_used,2038}]},文件描述符和socket的使用及閾值 # {processes,[{limit,1048576},{used,31681}]},進程數使用及閾值 消息堆積數目的確定可以通過如下命令: rabbitmqctl list_queues messages_ready | awk 'NR>=2{print }'| awk '{sum+=$1}END{print sum}' 得到最終消息堆積數目 rabbitmqctl list_queues message_bytes_ram | awk 'NR>=2{print }'| awk '{sum+=$1}END{print sum}' 可以得到消息堆積占用的內存 rabbitmqctl list_queues message_bytes_persistent | awk 'NR>=2{print }'| awk '{sum+=$1}END{print sum}' 可以得到消息堆積占用的磁盤 消息堆積時需要先檢查是那些隊列堆積消息過多 rabbitmqctl list_queues message_bytes_ram name | awk 'NR>=2{print }'|sort -rn|less 得到消息堆積數目的從大到小的排序,并顯示隊列名稱,然后根據隊列名進行不同節點服務的排查,如果是服務狀態異常,則 對服務進行處理,如果是無效隊列(前期使用當前已經不再使用的服務產生的隊列),則進行刪除,隊列的刪除需要登錄到 RabbitMQ的管理頁面上進行處理,后面會寫管理頁面的操作。 2.5、檢查RabbitMQ的隊列或者連接是否處于流控狀態 當RabbitMQ的消費者端的處理能力遠低于消息的生產速度時,RabbitMQ會自動進行流控,避免消息過度堆積且導致消息從 產生到被消費時間間隔過長。 是否發生了流控可以通過命令行查看,登錄到任意一個RabbitMQ節點,執行 rabbitmqctl list_queus name state | grep flow 如果得到結果,說明對應的隊列產生了流控,需要對隊列的生產進程和消費進程進行檢查,參考2.4 rabbitmqctl list_connections name state|grep flow 如果得到結果,說明對應的連接產生了流控,此時隊列中也一定會有流控,對隊列的生產進程和消費進程進行檢查,參考2.4
RabbitMQ管理頁面的開啟需要先啟用rabbitmq_management插件 登錄到任意一臺RabbitMQ節點,首先查看是否啟用了rabbitmq_management插件: rabbitmq-plugins list -v -E |grep -A5 rabbitmq_management 得到: # [E*] rabbitmq_management # Version: 3.6.5 # Dependencies: [rabbitmq_web_dispatch,amqp_client, # rabbitmq_management_agent] # Description: RabbitMQ Management Console 說明rabbitmq_management插件已經啟用 如果未啟用,則通過如下命令開啟: rabbitmq-plugins enable rabbitmq_management 得到: # The following plugins have been enabled: # mochiweb # webmachine # rabbitmq_web_dispatch # amqp_client # rabbitmq_management_agent # rabbitmq_management # # Applying plugin configuration to rabbit@devxyz... started 6 plugins. rabbitmq_management插件啟用以后,需要開通15672端口的防火墻規則,rabbitmq_management插件默認使用15672端口 進行訪問 iptables -I INPUT -p tcp --dport 15672 -j ACCEPT service iptables save 開啟iptables規則并保存 然后使用此節點的ip:15672登錄到管理頁面 輸入用戶名密碼 用戶名可以通過 rabbitmqctl list_users來獲取,對應的密碼為之前用戶設置的密碼,使用非guest用戶登錄 登錄過后可以看到RabbitMQ整個集群的狀態,各節點的狀態,是否存在腦裂,是否存在報警,當前的消息堆積數目等, 如果需要對隊列進行刪除,需要點擊Queues標簽,然后再Filter后輸入隊列名,點擊進入隊列,拉到頁面下方,點擊 Delete/purge欄,點擊Delete可刪除隊列,purge可清空隊列
看完了這篇文章,相信你對“RabbitMQ如何實現服務檢查”有了一定的了解,如果想了解更多相關知識,歡迎關注億速云行業資訊頻道,感謝各位的閱讀!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。