您好,登錄后才能下訂單哦!
本篇文章為大家展示了 怎么在postgres中主備切換文件,內容簡明扼要并且容易理解,絕對能使你眼前一亮,通過這篇文章的詳細介紹希望你能有所收獲。
環境:
主庫IP:192.168.40.130 主機名:postgres 端口:5442
備庫IP: 192.168.40.131 主機名:postgreshot 端口:5442
PostgreSQL9.0版本流復制主備切換只能通過創建觸發文件方式進行,這一小節將介紹這種主備切換方式,測試環境為一主一備異步流復制環境,postgres上的數據庫為主庫,postgreshot上的數據庫為備庫,文件觸發方式的手工主備切換主要步驟如下:
1)配置備庫recovery.conf文件trigger_file參數,設置激活備庫的觸發文件路徑和名稱。
2)關閉主庫,建議使用-m fast模式關閉。
3)在備庫上創建觸發文件激活備庫,如果recovery.conf變成recovery.done表示備庫已經切換成主庫。
4)這時需要將老的主庫切換成備庫,在老的主庫的$PGDATA目錄下創建recovery.conf文件(如果此目錄下不存在recovery.conf文件,可以根據$PGHOME/share/recovery.conf.sample模板文件復制一個,如果此目錄下存在recovery.done文件,需將recovery.done文件重命名為recovery.conf),配置和老的從庫一樣,只是primary_conninfo參數中的IP換成對端IP。
5)啟動老的主庫,這時觀察主、備進程是否正常,如果正常表示主備切換成功。
1、首先在備庫上配置recovery.conf,如下所示:
[postgres@postgreshot pg11]$ cat recovery.conf | grep -v '^#' recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=192.168.40.130 port=5442 user=replica application_name=pg1' # e.g. 'host=localhost port=5432' trigger_file = '/home/postgres/pg11/trigger' [postgres@postgreshot pg11]$
trigger_file可以配置成普通文件或隱藏文件,調整以上參數后需重啟備庫使配置參數生效。
2、關閉主庫,如下所示:
[postgres@postgres pg11]$ pg_ctl stop -m fast waiting for server to shut down.... done server stopped [postgres@postgres pg11]$
3、在備庫上創建觸發文件激活備庫,如下所示:
[postgres@postgreshot pg11]$ ll recovery.conf -rwx------ 1 postgres postgres 5.9K Mar 26 18:47 recovery.conf [postgres@postgreshot pg11]$ [postgres@postgreshot pg11]$ touch /home/postgres/pg11/trigger [postgres@postgreshot pg11]$ ll recovery* -rwx------ 1 postgres postgres 5.9K Mar 26 18:47 recovery.done [postgres@postgreshot pg11]$
觸發器文件名稱和路徑需和recovery.conf配置文件trigger_file保持一致,再次查看recovery文件時,發現后輟由原來的.conf變成了.done
查看備庫數據庫日志,如下所示:
2019-03-26 23:30:19.399 EDT [93162] LOG: replication terminated by primary server 2019-03-26 23:30:19.399 EDT [93162] DETAIL: End of WAL reached on timeline 3 at 0/50003D0. 2019-03-26 23:30:19.399 EDT [93162] FATAL: could not send end-of-streaming message to primary: no COPY in progress 2019-03-26 23:30:19.399 EDT [93158] LOG: invalid record length at 0/50003D0: wanted 24, got 0 2019-03-26 23:30:19.405 EDT [125172] FATAL: could not connect to the primary server: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2019-03-26 23:30:24.410 EDT [125179] FATAL: could not connect to the primary server: could not connect to server: Connection refused Is the server running on host "192.168.40.130" and accepting TCP/IP connections on port 5442? 2019-03-26 23:31:49.505 EDT [93158] LOG: trigger file found: /home/postgres/pg11/trigger 2019-03-26 23:31:49.506 EDT [93158] LOG: redo done at 0/5000360 2019-03-26 23:31:49.506 EDT [93158] LOG: last completed transaction was at log time 2019-03-26 19:03:11.202845-04 2019-03-26 23:31:49.516 EDT [93158] LOG: selected new timeline ID: 4 2019-03-26 23:31:50.063 EDT [93158] LOG: archive recovery complete 2019-03-26 23:31:50.083 EDT [93157] LOG: database system is ready to accept connections
根據備庫以上信息,由于關閉了主庫,首先日志顯示連接不上主庫,接著顯示發現了觸發文件,之后顯示恢復成功,數據庫切換成讀寫模式。
這時根據pg_controldata輸出進行驗證,如下所示:
[postgres@postgreshot ~]$ pg_controldata | grep cluster Database cluster state: in production [postgres@postgreshot ~]$
以上顯示數據庫角色已經是主庫角色,在postgreshot上創建一張名為test_alived的表并插入數據,如下所示:
postgres=# CREATE TABLE test_alived2(id int4); CREATE TABLE postgres=# INSERT INTO test_alived2 VALUES(1); INSERT 0 1 postgres=#
4、準備將老的主庫切換成備庫角色,在老的主庫上配置recovery.conf,如下所示:
[postgres@postgres pg11]$ cat recovery.conf | grep -v '^#' recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=192.168.40.131 port=5442 user=replica application_name=pg2' # e.g. 'host=localhost port=5432' trigger_file = '/home/postgres/pg11/trigger' [postgres@postgres pg11]$
以上配置和postgreshot上的recovery.done配置文件基本一致,只是primary_conninfo參數的host選項配置成對端主機IP。
之后在postgres主機用戶家目錄創建~/.pgpass文件,如下所示:
[postgres@pghost1 ~]$ touch ~/.pgpass
[postgres@pghost1 ~]$ chmod 600 ~/.pgpass
并在~/.pgpass文件中插入以下內容:
[postgres@postgres ~]$ cat .pgpass 192.168.40.130:5442:replication:replica:replica 192.168.40.131:5442:replication:replica:replica [postgres@postgres ~]
之后啟動postgres上的數據庫,如下所示:
[postgres@postgres ~]$ pg_ctl start waiting for server to start....2019-03-26 23:38:50.424 EDT [55380] LOG: listening on IPv4 address "0.0.0.0", port 5442 2019-03-26 23:38:50.424 EDT [55380] LOG: listening on IPv6 address "::", port 5442 2019-03-26 23:38:50.443 EDT [55380] LOG: listening on Unix socket "/tmp/.s.PGSQL.5442" 2019-03-26 23:38:50.465 EDT [55381] LOG: database system was shut down in recovery at 2019-03-26 23:38:20 EDT 2019-03-26 23:38:50.465 EDT [55381] LOG: entering standby mode 2019-03-26 23:38:50.483 EDT [55381] LOG: consistent recovery state reached at 0/50003D0 2019-03-26 23:38:50.483 EDT [55381] LOG: invalid record length at 0/50003D0: wanted 24, got 0 2019-03-26 23:38:50.483 EDT [55380] LOG: database system is ready to accept read only connections done server started [postgres@postgres ~]$ 2019-03-26 23:38:50.565 EDT [55385] LOG: fetching timeline history file for timeline 4 from primary server 2019-03-26 23:38:50.588 EDT [55385] LOG: started streaming WAL from primary at 0/5000000 on timeline 3 2019-03-26 23:38:50.589 EDT [55385] LOG: replication terminated by primary server 2019-03-26 23:38:50.589 EDT [55385] DETAIL: End of WAL reached on timeline 3 at 0/50003D0. 2019-03-26 23:38:50.592 EDT [55381] LOG: new target timeline is 4 2019-03-26 23:38:50.594 EDT [55385] LOG: restarted WAL streaming at 0/5000000 on timeline 4 2019-03-26 23:38:50.717 EDT [55381] LOG: redo starts at 0/50003D0 [postgres@postgres ~]$ pg_controldata | grep cluster Database cluster state: in archive recovery [postgres@postgres ~]$ postgres=# select * from test_alived2; id ---- 1 (1 row) postgres=#
同時,postgres上已經有了WAL接收進程,postgreshot上有了WAL發送進程,說明老的主庫已經成功切換成備庫,以上是主備切換的所有步驟。
為什么在步驟2中需要干凈地關閉主庫?數據庫關閉時首先做一次checkpoint,完成之后通知WAL發送進程要關閉了,WAL發送進程會將截止此次checkpoint的WAL日志流發送給備庫的WAL接收進程,備節點接收到主庫最后發送來的WAL日志流后應用WAL,從而達到了和主庫一致的狀態。
另一個需要注意的問題是假如主庫主機異常宕機了,如果激活備庫,備庫的數據完全和主庫一致嗎?此環境為一主一備異步流復制環境,備庫和主庫是異步同步方式,存在延時,這時主庫上已提交事務的WAL有可能還沒來得及發送給備庫,主庫主機就已經宕機了,因此異步流復制備庫可能存在事務丟失的風險。
上面介紹了以文件觸發方式進行主備切換,PostgreSQL9.1版本開始支持pg_ctl promote觸發方式,相比文件觸發方式操作更方便,promote命令語法如下:
pg_ctl promote [-D datadir]
-D是指數據目錄,如果不指定會使用環境變量$PGDATA設置的值。promote命令發出后,運行中的備庫將停止恢復模式并切換成讀寫模式的主庫。
pg_ctl promote主備切換步驟和文件觸發方式大體相同,只是步驟1中不需要配置recovery.conf配置文件中的trigger_file參數,并且步驟3中換成以pg_ctl promote方式進行主備切換,如下:
1)關閉主庫,建議使用-m fast模式關閉。
2)在備庫上執行pg_ctl promote命令激活備庫,如果recovery.conf變成recovery.done表示備庫已切換成為主庫。
3)這時需要將老的主庫切換成備庫,在老的主庫的$PGDATA目錄下創建recovery.conf文件(如果此目錄下不存在recovery.conf文件,可以根據$PGHOME/share/recovery.conf.sample模板文件復制一個,如果此目錄下存在recovery.done文件,需將recovery.done文件重命名為recovery.conf),配置和老的從庫一樣,只是primary_conninfo參數中的IP換成對端IP。
4)啟動老的主庫,這時觀察主、備進程是否正常,如果正常表示主備切換成功。以上是pg_ctl promote主備切換的主要步驟,這一小節不進行演示了,下一小節介紹pg_rewind工具時會給出使用pg_ctl promote進行主備切換的示例
pg_rewind
pg_rewind是流復制維護時一個非常好的數據同步工具,在上一節介紹流復制主備切換內容中講到了主要有五個步驟進行主備切換,其中步驟2是在激活備庫前先關閉主庫,如果不做步驟2會出現什么樣的情況?下面我們舉例進行演示,測試環境為一主一備異步流復制環境,postgres上的數據庫為主庫,postgreshot上的數據庫為備庫。
--備節點 recovery.conf 配置: postgreshot 上操作
備庫recovery.conf配置如下所示:
[postgres@postgreshot pg11]$ cat recovery.conf | grep -v '^#' recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=192.168.40.130 port=5442 user=replica application_name=pg1' # e.g. 'host=localhost port=5432' trigger_file = '/home/postgres/pg11/trigger' [postgres@postgreshot pg11]$
--激活備節點: postgreshot 上操作
檢查流復制狀態,確保正常后在備庫主機上執行以下命令激活備庫,如下所示
[postgres@postgreshot pg11]$ pg_ctl promote -D $PGDATA waiting for server to promote.... done server promoted [postgres@postgreshot pg11]$ [postgres@postgreshot pg11]$
查看備庫數據庫日志,能夠看到數據庫正常打開接收外部連接的信息,這說明激活成功,檢查postgreshot上的數據庫角色,如下所示:
[postgres@postgreshot pg11]$ pg_controldata | grep cluster Database cluster state: in production [postgres@postgreshot pg11]$
從pg_controldata輸出也可以看到postgreshot上的數據庫已成為主庫,說明postgreshot上的數據庫已經切換成主庫,這時老的主庫(postgres上的數據庫)依然還在運行中,我們計劃將postgres上的角色轉換成備庫,先查看postgres上的數據庫角色,如下所示
[postgres@postgres pg11]$ pg_controldata | grep cluster Database cluster state: in production [postgres@postgres pg11]$
--備節點激活后,創建一張測試表并插入數據
postgres=# create table test_1(id int4); CREATE TABLE postgres=# insert into test_1(id) select n from generate_series(1,10) n; INSERT 0 10 postgres=#
--停原來主節點: postgres 上操作
[postgres@postgres pg11]$ pg_controldata | grep cluster Database cluster state: in production [postgres@postgres pg11]$ [postgres@postgres pg11]$ pg_ctl stop -m fast -D $PGDATA 2019-03-27 01:10:46.714 EDT [64858] LOG: received fast shutdown request waiting for server to shut down....2019-03-27 01:10:46.716 EDT [64858] LOG: aborting any active transactions 2019-03-27 01:10:46.717 EDT [64858] LOG: background worker "logical replication launcher" (PID 64865) exited with exit code 1 2019-03-27 01:10:46.718 EDT [64860] LOG: shutting down 2019-03-27 01:10:46.731 EDT [64858] LOG: database system is shut down done server stopped [postgres@postgres pg11]$
--pg_rewind: postgres 上操作
[postgres@postgreshot pg11]$ pg_rewind --target-pgdata $PGDATA --source-server='host=192.168.40.131 port=5442 user=replica password=replica' target server needs to use either data checksums or " = on" Failure, exiting [postgres@postgreshot pg11]$
備注:數據庫在 initdb 時需要開啟 checksums 或者設置 "wal_log_hints = on", 接著設置主,備節點的 wal_log_hints 參數并重啟數據庫。
[postgres@postgres pg11]$ pg_rewind --target-pgdata $PGDATA --source-server='host=192.168.40.131 port=5442 user=replica password=replica' servers diverged at WAL location 0/70001E8 on timeline 5 rewinding from last common checkpoint at 0/6000098 on timeline 5 Done! [postgres@postgres pg11]$ [postgres@postgres pg11]$
備注:pg_rewind 成功。
--調整 recovery.conf 文件: postgres 操作
[postgres@postgres pg11]$ mv recovery.done recovery.conf [postgres@postgres pg11]$ [postgres@postgres pg11]$ cat recovery.conf | grep -v '^#' recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=192.168.40.131 port=5442 user=replica application_name=pg2' # e.g. 'host=localhost port=5432' trigger_file = '/home/postgres/pg11/trigger' [postgres@postgres pg11]$
--啟動原主庫, postgres 上操作
[postgres@postgres pg11]$ pg_ctl start -D $PGDATA waiting for server to start....2019-03-27 01:14:48.028 EDT [66323] LOG: listening on IPv4 address "0.0.0.0", port 5442 2019-03-27 01:14:48.028 EDT [66323] LOG: listening on IPv6 address "::", port 5442 2019-03-27 01:14:48.031 EDT [66323] LOG: listening on Unix socket "/tmp/.s.PGSQL.5442" 2019-03-27 01:14:48.045 EDT [66324] LOG: database system was interrupted while in recovery at log time 2019-03-27 01:08:08 EDT 2019-03-27 01:14:48.045 EDT [66324] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2019-03-27 01:14:48.084 EDT [66324] LOG: entering standby mode 2019-03-27 01:14:48.089 EDT [66324] LOG: redo starts at 0/6000060 2019-03-27 01:14:48.091 EDT [66324] LOG: invalid record length at 0/7024C98: wanted 24, got 0 2019-03-27 01:14:48.096 EDT [66331] LOG: started streaming WAL from primary at 0/7000000 on timeline 6 2019-03-27 01:14:48.109 EDT [66324] LOG: consistent recovery state reached at 0/7024CD0 2019-03-27 01:14:48.110 EDT [66323] LOG: database system is ready to accept read only connections done server started [postgres@postgres pg11]$ [postgres@postgres pg11]$ pg_controldata | grep cluster Database cluster state: in archive recovery [postgres@postgres pg11]$
--數據驗證, postgres 上操作
[postgres@postgres pg11]$ p psql (11.1) Type "help" for help. postgres=# select count(*) from test_1; count ------- 10 (1 row) postgres=#
備注:pg_rewind 成功,原主庫現在是以備庫角色啟動,而且數據表 test_1 也同步過來了。
The basic idea is to copy everything from the new cluster to the old cluster, except for the blocks that we know to be the same.
1)Scan the WAL log of the old cluster, starting from the last checkpoint before the point where the new cluster's timeline history forked off from the old cluster. For each WAL record, make a note of the data blocks that were touched. This yields a list of all the data blocks that were changed in the old cluster, after the new cluster forked off.
2)Copy all those changed blocks from the new cluster to the old cluster.
3)Copy all other files like clog, conf files etc. from the new cluster to old cluster. Everything except the relation files.
4) Apply the WAL from the new cluster, starting from the checkpoint created at failover. (Strictly speaking, pg_rewind doesn't apply the WAL, it just creates a backup label file indicating that when PostgreSQL is started, it will start replay from that checkpoint and apply all the required WAL.)
補充:postgres主備搭建時踩坑點
1: socket 路徑問題 報錯如下
你好! 這是你第一次使用 **Markdown編輯器** 所展示的歡迎頁。如果你想學習如何使用Markdown編輯器,仔細閱讀這篇文章,了解一下Markdown的基本語法知識。解決方法: 修改postgres.conf中unix_socket_permissions = ‘*' 路徑修改為上述報錯中的路徑 重啟即可
2:搭建主備時 備庫的data目錄一定 一定 一定要使用主庫基礎備份出來的數據。 可采用pg_basebackup 的方式, 也可以采用tar包 打包 解包的方式 進行基礎備份
如果備庫不小心已經初始化過 請刪除data目錄下的* 并使用主庫的基礎備份重新啟動
3:備庫啟動時報錯 FATAL: no pg_hba.conf entry for replication connection from host “172.20.0.16”, user “repl” 之類的問題
例如 master:IP: *.1 standby:IP *.2 主備賬號repl
那么在pg_hba.cnf中 單單指明 host replication repl *.2 md5 是不行的
還需在此條記錄前面 添加 host all all *.2 md5
首先要能訪問主庫 才會資格使用repl賬號進行同步的步驟
上述內容就是 怎么在postgres中主備切換文件,你們學到知識或技能了嗎?如果還想學到更多技能或者豐富自己的知識儲備,歡迎關注億速云行業資訊頻道。
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。