MySQL：如何快速的查看Innodb數據文件

發布時間：2020-08-09 13:22:09 來源：ITPUB博客閱讀：206 作者：gaopengtttt 欄目：MySQL數據庫

歡迎關注我的《深入理解MySQL主從原理 32講》，如下：

MySQL：如何快速的查看Innodb數據文件
如果圖片不能顯示可查看下面鏈接：
https://www.jianshu.com/p/d636215d767f

注意：本文輸出格式全是16進制格式。

經常有朋友問我一些如何查看Innodb數據文件的問題比如：

如果我是UTF8字符集，如果插入字符‘a’到底占用幾個字節？
主鍵和普通索引葉子節點的行數據在存儲上有哪些區別？
如何證明rowid的存在？
數據中的NULL值如何存儲的？
char和varchar在存儲上的區別？
……

如果要得到答案除了學習源碼，可能更加直觀的方式就是查看Innodb的ibd數據文件了，俗話說得好“眼見為實”，但是我們知道數據文件是二進制形式的，Innodb通過既定的訪問方式解析出其中的格式得到正確的結果。如果我們要去訪問這些ibd文件，通常的方式就是可以通過hexdump -Cv這樣的命令進行二進制的訪問，最初我也是這樣訪問的，但是看起來眼睛特別難受。因此我寫了2個工具：

innblock：一個用于解析數據塊的工具，能夠得到每行的偏移量，并且按照邏輯和物理順序排序。詳細使用方式可以參考 https://www.jianshu.com/p/5c1a99614fb8
下載地址： https://github.com/gaopengcarl/innblock 除了代碼我已經編譯好了直接使用即可
bcview：一個小工具，用于將數據文件按照既定的大小（比如16K）分塊，然后訪問每個塊的偏移量后指定的字節數，通常我們并不知道記錄到底多長，可以設置一個較大的查看字節數。
下載地址： https://github.com/gaopengcarl/bcview 除了代碼我已經編譯好了直接使用即可

有了這兩工具可能訪問ibd數據文件就更加方便一些了，下面我就使用這兩個工具來進行數據文件的查看，來解決開頭我們提出的這些問題。

一、行結構簡述

本文無意解釋詳細的Innodb文件結構，這樣的文章和書籍很多，比如：

https://blog.jcole.us/innodb/

整個系列都是講解Innodb文件結構的，我們只需要知道普通數據文件，除去塊級別的開銷后，其第一行記錄從偏移量96 開始，首先出現的是2個偽列 infimum 和 supremum，它們的位置固定在塊的94-120字節，其中94-107為infimum 相關信息，而107到120為supremum相關信息，分別的heap no 為 0和1，它們是邏輯記錄的開始和結尾，所有的實際的記錄都鏈接在這一條鏈表上。

其中普通記錄的大概格式如下：

MySQL：如何快速的查看Innodb數據文件

我暫且將黃色部分稱為‘行頭’，圖中用粉紅色標記的innblock每行數據offset的位置， 我們發現innblock工具指向的是行頭以后實際字段開啟的位置。

下面是一個innblock工具典型的部分輸出：

-----Total used rows:3 used rows list(logic):
(1) INFIMUM record offset:99 heapno:0 n_owned 1,delflag:N minflag:0 rectype:2
(2) normal record offset:128 heapno:2 n_owned 0,delflag:N minflag:0 rectype:0
(3) SUPREMUM record offset:112 heapno:1 n_owned 2,delflag:N minflag:0 rectype:3
-----Total used rows:3 used rows list(phy):
(1) INFIMUM record offset:99 heapno:0 n_owned 1,delflag:N minflag:0 rectype:2
(2) SUPREMUM record offset:112 heapno:1 n_owned 2,delflag:N minflag:0 rectype:3
(3) normal record offset:128 heapno:2 n_owned 0,delflag:N minflag:0 rectype:0

我們可以找到一行除了infimum和 supremum記錄以外的normal記錄，并且標記了這樣記錄字段的起點（offset:128），也就是圖中的粉紅色部分，但是需要注意的是聚集索引（表本身）而言，如果沒有主鍵前面3列分別為：

rowid 6字節
trx id 6字節
roll ptr 7字節

如果存在主鍵則為：

主鍵和定義有關
trx id 6字節
roll ptr 7字節

關于rowid\trx id\roll ptr的源碼中的定義如下：

#define    DATA_ROW_ID    0    /* row id: a 48-bit integer */
#define DATA_ROW_ID_LEN    6    /* stored length for row id */
#define DATA_TRX_ID    1    /* transaction id: 6 bytes */
#define DATA_TRX_ID_LEN    6
#define    DATA_ROLL_PTR    2    /* rollback data pointer: 7 bytes */
#define DATA_ROLL_PTR_LEN 7

而roll ptr的具體含義可以參考函數trx_undo_decode_roll_ptr如下：

/***********************************************************************//**
Decodes a roll pointer. */ 
//從高位到低位依次是  
//第1位是否是insert 
//第2到8位是segmentid
//第9到40位為page no 
//第41位到56位為OFFSET
UNIV_INLINE
void
trx_undo_decode_roll_ptr(
/*=====================*/
    roll_ptr_t  roll_ptr,   /*!< in: roll pointer */
    ibool*      is_insert,  /*!< out: TRUE if insert undo log */
    ulint*      rseg_id,    /*!< out: rollback segment id */
    ulint*      page_no,    /*!< out: page number */
    ulint*      offset)     /*!< out: offset of the undo
                    entry within page */
{
...
    ut_ad(roll_ptr < (1ULL << 56));
    *offset = (ulint) roll_ptr & 0xFFFF; //獲取低16位 為OFFSET
    roll_ptr >>= 16; //右移16位
    *page_no = (ulint) roll_ptr & 0xFFFFFFFF;//獲取32位為 page no
    roll_ptr >>= 32;//右移32位
    *rseg_id = (ulint) roll_ptr & 0x7F;//獲取7位為segment id
    roll_ptr >>= 7;//右移7位
    *is_insert = (ibool) roll_ptr; /* TRUE==1 *///最后一位
}

二、建立測試表

為了解決文中開頭的幾個問題，我們來建立測試表如下：

drop table baguait1;
create table baguait1(id int primary key,c1 varchar(20) ,c2 varchar(20),c3 char(20)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
alter table baguait1 add key(c1);
insert into baguait1 values(1,NULL,'gaopeng','gaopeng');
insert into baguait1 values(2,'gaopeng',NULL,'gaopeng');
insert into baguait1 values(3,'gaopeng',NULL,NULL);
insert into baguait1 values(4,'a',NULL,NULL);
mysql> select * from baguait1;
+----+---------+---------+---------+
| id | c1      | c2      | c3      |
+----+---------+---------+---------+
|  1 | NULL    | gaopeng | gaopeng |
|  2 | gaopeng | NULL    | gaopeng |
|  3 | gaopeng | NULL    | NULL    |
|  4 | a       | NULL    | NULL    |
+----+---------+---------+---------+
4 rows in set (0.01 sec)

我們發現這里實際上除了rowid問題還不能包含，其他都包含了，接下來我們使用innblock進行掃描。如下：

1、掃描數據文件找到主鍵和普通索引數據塊

[root@gp1 test]# ./innblock baguait1.ibd scan 16
···
Datafile Total Size:114688
===INDEX_ID:323
level0 total block is (1)
block_no:         3,level:   0|*|
===INDEX_ID:324
level0 total block is (1)
block_no:         4,level:   0|*|

這里實際上323就是聚集索引，324就是普通索引，它們數據塊對應是3和4。

2、掃描聚集索引記錄

[root@gp1 test]# ./innblock baguait1.ibd 3 16
鏈表部分：
==== Block list info ====
-----Total used rows:6 used rows list(logic):
(1) INFIMUM record offset:99 heapno:0 n_owned 1,delflag:N minflag:0 rectype:2
(2) normal record offset:128 heapno:2 n_owned 0,delflag:N minflag:0 rectype:0
(3) normal record offset:180 heapno:3 n_owned 0,delflag:N minflag:0 rectype:0
(4) normal record offset:231 heapno:4 n_owned 0,delflag:N minflag:0 rectype:0
(5) normal record offset:262 heapno:5 n_owned 0,delflag:N minflag:0 rectype:0
(6) SUPREMUM record offset:112 heapno:1 n_owned 5,delflag:N minflag:0 rectype:3
-----Total used rows:6 used rows list(phy):
(1) INFIMUM record offset:99 heapno:0 n_owned 1,delflag:N minflag:0 rectype:2
(2) SUPREMUM record offset:112 heapno:1 n_owned 5,delflag:N minflag:0 rectype:3
(3) normal record offset:128 heapno:2 n_owned 0,delflag:N minflag:0 rectype:0
(4) normal record offset:180 heapno:3 n_owned 0,delflag:N minflag:0 rectype:0
(5) normal record offset:231 heapno:4 n_owned 0,delflag:N minflag:0 rectype:0
(6) normal record offset:262 heapno:5 n_owned 0,delflag:N minflag:0 rectype:0

3、掃描普通索引記錄
```
[root @gp1 test]# ./innblock baguait1.ibd 4 16

鏈表部分：
==== Block list info ====
——-Total used rows:6 used rows list(logic):
(1) INFIMUM record offset:99 heapno:0 n_owned 1,delflag:N minflag:0 rectype:2
(2) normal record offset:126 heapno:2 n_owned 0,delflag:N minflag:0 rectype:0
(3) normal record offset:173 heapno:5 n_owned 0,delflag:N minflag:0 rectype:0
(4) normal record offset:137 heapno:3 n_owned 0,delflag:N minflag:0 rectype:0
(5) normal record offset:155 heapno:4 n_owned 0,delflag:N minflag:0 rectype:0
(6) SUPREMUM record offset:112 heapno:1 n_owned 5,delflag:N minflag:0 rectype:3
——-Total used rows:6 used rows list(phy):
(1) INFIMUM record offset:99 heapno:0 n_owned 1,delflag:N minflag:0 rectype:2
(2) SUPREMUM record offset:112 heapno:1 n_owned 5,delflag:N minflag:0 rectype:3
(3) normal record offset:126 heapno:2 n_owned 0,delflag:N minflag:0 rectype:0
(4) normal record offset:137 heapno:3 n_owned 0,delflag:N minflag:0 rectype:0
(5) normal record offset:155 heapno:4 n_owned 0,delflag:N minflag:0 rectype:0
(6) normal record offset:173 heapno:5 n_owned 0,delflag:N minflag:0 rectype:0


我們發現不管聚集索引還是普通索引均包含了4條普通記錄，并且采集到了記錄的偏移量，我們需要注意一下這里普通索引的邏輯鏈表順序中我們可以看到第4條記錄（offset:173 ）已經排到了第3位，實際上它是最后插入的，這是因為‘a’字符的順序肯定是在‘gaopeng’這個字符串之前的。并且第一行記錄C1為NULL它在邏輯鏈表順序中依然是在第一位。好了下面我們就來將問題逐一解決。
###三、如果我是UTF8字符集，如果插入字符‘a’到底占用幾個字節 ？
在我們的語句中我們最后一條記錄插入的數據就是'a'，即：

| 4 | a | NULL | NULL |


我們使用bcview來查看一下聚集索引 （offset 262 ）的數據是啥，我們直接從塊3的（offset 262 ）后查看20個字節，如下：

[root @gp1 test]# ./bcview baguait1.ibd 16 262 30|grep 00000003
current block:00000003—Offset:00262—cnt bytes:30—data is:8000000400000005d970e000000043011061000000000000000000000000


我們來解析一樣：
- 80000004：主鍵4，8是符號位
- 400000005d970：trx id 6字節
- e0000000430110：undo ptr 7字節
- 61：字符‘a’，ASCII編碼
我們發現后面都是0了，實際上字符‘a’即便在UTF8字符下也只是占用一個字節而已。
###四、主鍵和普通索引葉子節點的行數據在存儲上有哪些區別？
下面我先總結一下：
- 主鍵會包含全部的字段，普通索引只會包含它定義的字段內容
- 主鍵會包含trx id和roll ptr，普通索引不會包含
- 即便不定義主鍵也會包含一個根據rowid排列的聚集索引，很明顯如果不定義普通索引則不會存在
- 普通索引葉子結點包含了主鍵或者rowid
下面我們驗證一下，我們來觀察第2行數據，即：


在主鍵上這條記錄存在于（offset:180）中，在普通索引這條記錄存在于（offset:137）中，下面我們分別解析：
>主鍵（block 3 offset 180 ）：

[root @gp1 test]# ./bcview baguait1.ibd 16 180 50|grep 00000003
current block:00000003—Offset:00180—cnt bytes:50—data is:8000000200000005d96adc00000042011067616f70656e6767616f70656e6720202020202020202020202020070600002000

解析一下：
- 80000002：主鍵
- 00000005d96a：trx id 6字節
- dc000000420110：undo ptr 7字節
- 67616f70656e67：第二個字段的‘gaopeng’的ASCII編碼
- 67616f70656e6720202020202020202020202020：第四個字段的‘gaopeng’的ASCII編碼，并且因為是char(20)類型因此出現了0X20補足的情況，這實際上也解決了第5個問題，我們可以實實在在的看到這種補足操作，占用了更多的空間。
這里我們發現這條記錄沒有第三個字段，因為其為NULL，其包含在NULL位圖中，后面我們會說明。
>普通索引（block 4 offset 137 ）：

[root @gp1 test]# ./bcview baguait1.ibd 16 137 20|grep 00000004
current block:00000004—Offset:00137—cnt bytes:20—data is:67616f70656e67800000020700000020ffd56761


解析如下：
- 67616f70656e67：‘gaopeng’的ASCII編碼
- 80000002：主鍵值2
后面的內容是下一行的行頭了，這一點如果不確定可以看看最后一行，最后一行的位置是（offset:173）查看如下：

[root @gp1 test]# ./bcview baguait1.ibd 16 173 20|grep 00000004
current block:00000004—Offset:00173—cnt bytes:20—data is:6180000004000000000000000000000000000000

解析為：
- 61：‘a’的ASCII編碼
- 80000004：主鍵值4
后面是0了，我們這里可以看到沒有trx id和roll ptr，除了鍵值以外普通索引還包含了主鍵。
###五、char和varchar在存儲上的區別？
這一點我在上面已經說了，下面我們還是以第二行數據為例：


其中第1個‘gaopeng’是varchar（20）第2個‘gaopeng’是char（20）下面是他們的存儲方式：
- 67616f70656e67：第二個字段的‘gaopeng’的ASCII編碼
- 67616f70656e6720202020202020202020202020：第四個字段的‘gaopeng’的ASCII編碼，并且因為是char(20)類型因此出現了0X20補足20字節的情況，我們可以實實在在的看到這種補足操作，占用了更多的空間。
不再過多熬述
###六、數據中的NULL值如何存儲的？
這一點還記得‘行頭’的NULL位圖嗎？實際上這個位圖會為每一個可以為NULL的字段預留1位的空間，用于標記是否字段的值為NULL，當然至少1字節（8位）。

+——+————-+————-+————-+
| id | c1 | c2 | c3 |
+——+————-+————-+————-+
| 1 | NULL | gaopeng | gaopeng |
| 2 | gaopeng | NULL | gaopeng |
| 3 | gaopeng | NULL | NULL |
| 4 | a | NULL | NULL |
+——+————-+————-+————-+


c1\c2\c3均可以為空，因此我們分別訪問4條記錄聚集索引（block 3）上的NULL位圖信息，計算方式如下：
- 第1行：記錄（offset:128）那么128-5（5字節固定）-1（1字節NULL位圖）= 122
- 第2行：記錄（offset:180）那么180-5（5字節固定）-1（1字節NULL位圖）= 174
- 第3行：記錄（offset:231）那么231-5（5字節固定）-1（1字節NULL位圖）= 225
- 第4行：記錄（offset:262）那么262-5（5字節固定）-1（1字節NULL位圖）= 256
好了有了偏移量我們可以使用bcview訪問這1字節的NULL位圖信息了如下：
>第1行
[root@gp1 test]# ./bcview baguait1.ibd 16 122 1 |grep 00000003
current block:00000003--Offset:00122--cnt bytes:01--data is:01
轉換為二進制為：0000 0001
>第2行
[root@gp1 test]# ./bcview baguait1.ibd 16 174 1 |grep 00000003
current block:00000003--Offset:00174--cnt bytes:01--data is:02
轉換為二進制為：0000 0010
>第3行
[root@gp1 test]# ./bcview baguait1.ibd 16 225 1 |grep 00000003
current block:00000003--Offset:00225--cnt bytes:01--data is:06
轉換為二進制為：0000 0110
>第4行
[root@gp1 test]# ./bcview baguait1.ibd 16 256 1 |grep 00000003
current block:00000003--Offset:00256--cnt bytes:01--data is:06
轉換為二進制為：0000 0110
下面就是這種表示方法，1為NULL，我們發現和我們記錄中的NULL記錄一模一樣。
||c3|c2|c1|
|-|-|-|-|
|第1行|0|0|1|
|第2行|0|1|0|
|第3行|1|1|0|
|第4行|1|1|0|
我們DDL修改字段的NULL屬性的時候并不能通過修改數據字典來快速完成，我覺得修改更改ibd文件的實際內容是其中很大的一部分原因。下面是我修改NULL屬性的記錄，具體參考官方文檔。

設置NULL和NOT NULL屬性

都是inplace方式，因為需要修改NULL位圖因此都需要重組，代價較高
ALTER TABLE tbl_name MODIFY COLUMN column_name data_type NULL, ALGORITHM=INPLACE, LOCK=NONE;
ALTER TABLE tbl_name MODIFY COLUMN column_name data_type NOT NULL, ALGORITHM=INPLACE, LOCK=NONE;


###七、如何證明rowid的存在？
這一點實際上也很好證明，我們先來建立一個不包含主鍵并且插入一條記錄如下：

drop table baguait1;
create table baguait1(id int ,c1 varchar(20) ,c2 varchar(20),c3 char(20)) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert into baguait1 values(1,NULL,’gaopeng’,’gaopeng’);

mysql> select * from baguait1;
+———+———+————-+————-+
| id | c1 | c2 | c3 |
+———+———+————-+————-+
| 1 | NULL | gaopeng | gaopeng |
+———+———+————-+————-+
1 row in set (0.00 sec)


使用innblock掃描發現其只包含了1個塊如下：

[root @gp1 test]# ./innblock baguait1.ibd scan 16
…
Datafile Total Size:98304
===INDEX_ID:325
level0 total block is (1)
block_no: 3,level: 0|*|


然后掃描這個塊如下：

[root @gp1 test]# ./innblock baguait1.ibd 3 16

…
==== Block list info ====
——-Total used rows:3 used rows list(logic):
(1) INFIMUM record offset:99 heapno:0 n_owned 1,delflag:N minflag:0 rectype:2
(2) normal record offset:128 heapno:2 n_owned 0,delflag:N minflag:0 rectype:0
(3) SUPREMUM record offset:112 heapno:1 n_owned 2,delflag:N minflag:0 rectype:3
——-Total used rows:3 used rows list(phy):
(1) INFIMUM record offset:99 heapno:0 n_owned 1,delflag:N minflag:0 rectype:2
(2) SUPREMUM record offset:112 heapno:1 n_owned 2,delflag:N minflag:0 rectype:3
(3) normal record offset:128 heapno:2 n_owned 0,delflag:N minflag:0 rectype:0


然后使用bcview查看（offset:128 ）這條記錄如下：

[root @gp1 test]# ./bcview baguait1.ibd 16 128 60 |grep 00000003
current block:00000003—Offset:00128—cnt bytes:60—data is:000001ac310000000005d97fea0000002c01108000000167616f70656e6767616f70656e672020202020202020202020202000000000000000000000
```

我們來解析一下：

000001ac3100：rowid 6字節，上面的測試中這里是主鍵定義的相關字段值
00000005d97f：trx id 6字節
ea0000002c0110：roll ptr 7字節
80000001：第1個字段值 1
67616f70656e67：第2個字段值 ‘gaopeng’的ASCII編碼
67616f70656e6720202020202020202020202020：第4個字段值‘gaopeng’的ASCII編碼，并且char有0X20補足20字節。

最后：

當然這里只是列舉了一些例子來說明工具的使用方式，可以按照你的需求方便的從ibd文件中提取出你感興趣的信息。

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

MySQL：如何快速的查看Innodb數據文件

一、行結構簡述

二、建立測試表

最后：

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

MySQL：如何快速的查看Innodb數據文件

一、行結構簡述

二、建立測試表

最后：

猜你喜歡

最新資訊

相關推薦

相關標簽