您好,登錄后才能下訂單哦!
一、HDFS
永久性數據結構
1.1 namde的目錄結構
[root@datanode1 name]# cd /data0/hadoop/dfs/name/current/ [root@datanode1 current]# ls edits edits.new fsp_w_picpath fstime VERSION [root@datanode1 current]# ls -l 總用量 56 -rw-rw-r--. 1 hadoop hadoop 789 1月 15 16:59 edits -rw-rw-r--. 1 hadoop hadoop 1049088 1月 15 18:00 edits.new -rw-rw-r--. 1 hadoop hadoop 14557 1月 14 18:47 fsp_w_picpath -rw-rw-r--. 1 hadoop hadoop 8 1月 14 18:47 fstime -rw-rw-r--. 1 hadoop hadoop 100 1月 14 18:47 VERSION
1.1.2 VERSION文件是一個JAVA屬性,包含正在運行的HDFS的版本信息
[root@datanode1 current]# cat VERSION #Thu Jan 14 18:47:15 CST 2016 namespaceID=688384215 cTime=0 storageType=NAME_NODE layoutVersion=-32
layoutVersion是一個負整數,描述HDFS持久數據結構(也稱布局)的版本,但該版本號與hadoop發布包的版本號無關。只要布局變更,版本號便會遞增(如,版本號-18之后是-19),此時,HDFS也需升級。否則,磁盤仍然使用舊版本布局,新版本的namenode或datanode無法正常工作。
namespaceIT:是文件系統唯一標識符,是在文件系統首次格式化時設置的。
cTime:標記了namenode存儲系統的創建時間。對于剛剛格式化的存儲系統,這個屬性值為0
storageType:說明該存儲目錄包含有namenode的數據結構
1.1.3 文件系統映像和編輯日志
文件系統客戶端執行寫操作時(如創建和移動文件),這些操作,首先被記錄到編輯日志中。namenode在內存中維護文件系統的元數據;當編輯日志被修改時,相關元數據信息也需要更新。內存中元數支技客戶端的讀請求。
每次執行寫操作后,且向客戶端發送成功代碼之前,編輯日志需要更新和同步。當namenode向多個目錄 寫數時,只有在所有寫操作均執行完畢之后方可返回代碼,以確保任何操作不都不會因為機器故障而丟失。
fsp_w_picpath:是文件系統元數據一個永久檢查點。如果namenode發生故障,可以先把fsp_w_picpath文件載入到內存重構新近的元數據,再執行編輯日志 記錄各項操作
fsp_w_picpath包含文件系統中所有目錄和文件inode的序列化信息。每個inode是一個文件或目錄的元數據的內存部描述方式。對于文件來說,包含信息有“復本級別”(replication level),修改時間和訪問時間,訪問許可,塊大小,組成一個文件塊等;對于目錄來說,包含有修改時間,訪問許可和配額元數據等信息。
數據塊存儲在datanode中,但fsp_w_picpath文件不描述datanode,取而代之的是,namenode將這種塊映射關系放在內存中。當datanode加入集群時,namenode向datanode索取塊列表以建立映射關系;namnode還將定期征詢datanode以確保它擁有最新的塊映射。
運行輔助namenode,為主namenode內存中的文件系統元數創建檢查點
(1)輔助namenode請求主namenode停止使用edits文件,暫時將新的寫操作記錄到一個新文件中。
(2)輔助namenode從主namenode獲取fsp_w_picpath和edits文件(采用HTTP GET)
(3)輔助namenode將fsp_w_picpath文件載入內存,逐一執行edits文件中操作,創建新fsp_w_picpaths文件。
(4)輔助namenode將新fsp_w_picpath文件發送回主namenode(使用HTTP POST)
(5)主namenode用從輔助namenode接收的fsp_w_picpath文件替換舊的fsp_w_picpath文件;用步聚1所產生的edits文件替換舊edits文件。同時,還更新ftime文件來記錄檢查點執行時間。
創建檢查點的觸發條件接愛兩個配置參數控制。
(1)輔助namenode每隔一小時(由fs.checkpoint.period屬性設置 ,以秒為單位)
(2)當編輯日志大小時到達64MB(由fs.checkpoint.size屬性設置,以字節為單位)時,即使未到一小時也會創建檢查點。系統每隔5分鐘檢查一次編輯日志大小。
1.2 輔助namenode的目錄結構
[root@slave-two current]# pwd /data0/hadoop/dfs/data/current [root@slave-two current]# cat VERSION #Fri Jan 15 15:34:22 CST 2016 namespaceID=688384215 storageID=DS-1030151558-10.1.2.216-50010-1452481280886 cTime=0 storageType=DATA_NODE layoutVersion=-32
在主namenode發生故障時(假設沒有及時備份,甚至在NFS上也沒有),可以從輔助namenode恢復數據。兩種實現方法
(1)將相關存儲目錄復制到新的namenode中
(2)使用-importCheckpoint選項啟動namenode守護進程,從面輔助namenode用作新的主namenode。借助該選項,當dfs.name.dir屬性定義目錄中沒有元數據時,輔助namenode就從fs.checkpoint.dir目錄截入最新的檢查點數據,否則執行失敗
1.3 datanode的目錄結構
datanode的存儲目錄是初始階段自動創建的,不需要額外格式化
[root@slave-one current]# ls blk_-1342046564177101301 blk_3255346014128987307 blk_-4378222930931288631 blk_7478159877522346339 blk_-8475713792677154223 blk_-1342046564177101301_1004.meta blk_3255346014128987307_1010.meta blk_-4378222930931288631_1065.meta blk_7478159877522346339_1002.meta blk_-8475713792677154223_1063.meta blk_-1859875086242295767 blk_3484901243420393976 blk_5202437766650751967 blk_7579826132350507903 blk_-9058686418693604829 blk_-1859875086242295767_1061.meta blk_3484901243420393976_1067.meta blk_5202437766650751967_1072.meta blk_7579826132350507903_1080.meta blk_-9058686418693604829_1062.meta blk_253660519371394588 blk_-350256639016866731 blk_5450455005443823908 blk_774901497839428573 dncp_block_verification.log.curr blk_253660519371394588_1014.meta blk_-350256639016866731_1077.meta blk_5450455005443823908_1076.meta blk_774901497839428573_1068.meta VERSION blk_2653614491429524571 blk_-4332403947618992681 blk_6996247191717220870 blk_7996063171811697628 blk_2653614491429524571_1066.meta blk_-4332403947618992681_1012.meta blk_6996247191717220870_1064.meta blk_7996063171811697628_1013.meta [root@slave-one current]# pwd /data0/hadoop/dfs/data/current [root@slave-one current]# cat VERSION #Fri Jan 15 15:34:16 CST 2016 namespaceID=688384215 storageID=DS-444750413-10.1.2.215-50010-1452481260852 cTime=0 storageType=DATA_NODE layoutVersion=-32
datanode的current目錄中的其他文件都有blk_前綴,包括兩種文件類型:HDFS塊文件(僅有原始數據)和塊的元數據(含.meta后綴)。塊文件包含所存儲文件中一部分的原始數據;元數據文件包括頭部(含版本和類型信息)和該塊各區段的一系列的校驗和
目錄數據的數量增加到一定規模時,datanode會創建一個子目錄來存放新數據塊及元數據信息。如果存儲64個(通過dfs.datanode.numblocks屬性設置)數據塊,就創建一個子目錄
如果dfs.data.dir屬性指定了不同磁盤上多個目錄,那么數據塊以輪轉(round-robin)方式寫到各個目錄中。注意,同一個datanode上的每個磁盤上的塊不會重復,不同的datanode之間塊才可能重復
2. 安全模式
namenode啟動時,先將fsp_w_picpath載入內存,并執行edits中各項操作。一旦內存中成功建立文件系統元數據的映像,則創建一個新的fsp_w_picpath文件(該操作不需借助輔助namenode)和一個空編輯日志。此時,namenode開始監聽RPC和HTTP請求。但此刻,namenode處在安全模式,即namenode的文件系統對于客戶端來說是只讀的。
進入和離開安全模式
[hadoop@slave-one current]$ hadoop dfsadmin -safemode get Safe mode is ON
HDFS的網頁面也能顯示namenode是否處于安全模式
進入安全模式,使namenode永遠處于安全模式方式,將屬性dfs.safemode.thresholdpct的值設為大于1
[hadoop@slave-one current]$ hadoop dfsadmin -safemode enter Safe mode is ON
離開
[hadoop@slave-one current]$ hadoop dfsadmin -safemode leave Safe mode is OFF
1.4 工具
1.4.1 dfadmin工具
可查找HDFS狀態信息,也可在HDFS上執行管理操作
hadoop dfsadmin
[hadoop@slave-one current]$ hadoop dfsadmin -help
hadoop dfsadmin is the command to execute DFS administrative commands.
The full syntax is:
hadoop dfsadmin [-report] [-safemode <enter | leave | get | wait>]
[-saveNamespace]
[-refreshNodes]
[-setQuota <quota> <dirname>...<dirname>]
[-clrQuota <dirname>...<dirname>]
[-setSpaceQuota <quota> <dirname>...<dirname>]
[-clrSpaceQuota <dirname>...<dirname>]
[-refreshServiceAcl]
[-refreshUserToGroupsMappings]
[refreshSuperUserGroupsConfiguration]
[-setBalancerBandwidth <bandwidth>]
[-help [cmd]]
-report: Reports basic filesystem information and statistics.
-safemode <enter|leave|get|wait>: Safe mode maintenance command.
Safe mode is a Namenode state in which it
1. does not accept changes to the name space (read-only)
2. does not replicate or delete blocks.
Safe mode is entered automatically at Namenode startup, and
leaves safe mode automatically when the configured minimum
percentage of blocks satisfies the minimum replication
condition. Safe mode can also be entered manually, but then
it can only be turned off manually as well.
-saveNamespace: Save current namespace into storage directories and reset edits log.
Requires superuser permissions and safe mode.
-refreshNodes: Updates the set of hosts allowed to connect to namenode.
Re-reads the config file to update values defined by
dfs.hosts and dfs.host.exclude and reads the
entires (hostnames) in those files.
Each entry not defined in dfs.hosts but in
dfs.hosts.exclude is decommissioned. Each entry defined
in dfs.hosts and also in dfs.host.exclude is stopped from
decommissioning if it has aleady been marked for decommission.
Entires not present in both the lists are decommissioned.
-finalizeUpgrade: Finalize upgrade of HDFS.
Datanodes delete their previous version working directories,
followed by Namenode doing the same.
This completes the upgrade process.
-upgradeProgress <status|details|force>:
request current distributed upgrade status,
a detailed status or force the upgrade to proceed.
-metasave <filename>: Save Namenode's primary data structures
to <filename> in the directory specified by hadoop.log.dir property.
<filename> will contain one line for each of the following
1. Datanodes heart beating with Namenode
2. Blocks waiting to be replicated
3. Blocks currrently being replicated
4. Blocks waiting to be deleted
-setQuota <quota> <dirname>...<dirname>: Set the quota <quota> for each directory <dirName>.
The directory quota is a long integer that puts a hard limit
on the number of names in the directory tree
Best effort for the directory, with faults reported if
1. N is not a positive integer, or
2. user is not an administrator, or
3. the directory does not exist or is a file, or
-clrQuota <dirname>...<dirname>: Clear the quota for each directory <dirName>.
Best effort for the directory. with fault reported if
1. the directory does not exist or is a file, or
2. user is not an administrator.
It does not fault if the directory has no quota.
-setSpaceQuota <quota> <dirname>...<dirname>: Set the disk space quota <quota> for each directory <dirName>.
The space quota is a long integer that puts a hard limit
on the total size of all the files under the directory tree.
The extra space required for replication is also counted. E.g.
a 1GB file with replication of 3 consumes 3GB of the quota.
Quota can also be speciefied with a binary prefix for terabytes,
petabytes etc (e.g. 50t is 50TB, 5m is 5MB, 3p is 3PB).
Best effort for the directory, with faults reported if
1. N is not a positive integer, or
2. user is not an administrator, or
3. the directory does not exist or is a file, or
-clrSpaceQuota <dirname>...<dirname>: Clear the disk space quota for each directory <dirName>.
Best effort for the directory. with fault reported if
1. the directory does not exist or is a file, or
2. user is not an administrator.
It does not fault if the directory has no quota.
-refreshServiceAcl: Reload the service-level authorization policy file
Namenode will reload the authorization policy file.
-refreshUserToGroupsMappings: Refresh user-to-groups mappings
-refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups mappings
-setBalancerBandwidth <bandwidth>:
Changes the network bandwidth used by each datanode during
HDFS block balancing.
<bandwidth> is the maximum number of bytes per second
that will be used by each datanode. This value overrides
the dfs.balance.bandwidthPerSec parameter.
--- NOTE: The new value is not persistent on the DataNode.---
-help [cmd]: Displays help for the given command or all commands if none
is specified.
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
-help:顯示指定命令幫助,未指明,則顯示所有命令的幫助
[hadoop@slave-one current]$ hadoop dfsadmin -safemode -help Usage: java DFSAdmin [-safemode enter | leave | get | wait]
-repot:顯示文件系統的統計信息(類似在網頁界面上顯示文件的內容)
-metasave:將某些信息存儲到hadoop日志目錄中一個文件,包括正在被復制或刪除的塊信息以及連接的datanode列表
-safamode:改變或查詢安全模式
-saveNamespace:將內存中文件系統映像保存到為一個新的fsp_w_picpath文件,重置edits文件。該操作公在安全模式下執行。
-refreshNodes:更新允許連接到namenode的datanode列表
-upgradeProgress:獲取有關HDFS升級的進度信息或強制升級
-finalizeUpgrade:移除datanode和namenode的存儲目錄上的舊版本數據。這個操作一般在升級完成而且集群在新版本下運行正常情況下執行
-setQuota:設置目錄配額,即設置以該目錄為根的整個目錄樹最多包含多少個文件和目錄。這項配置能有效阻止用戶創建大量小文件,從而保護namenode的內存(文件系統中的所有文件,目錄和塊的各項信息均存儲在內存中)
-clrQuota:清瑼指定的空間配額
-setSpaceQtuota:設置目錄空間配客,以限制存儲在目錄樹中的所有文件的總規模。分別為各用戶指定有限的存儲空間很有必要
-clrSpaceQtuota:清理指定的空間配額
-refreshSserviceAcl:刷新namenode的服務級授權策略文件。
1.4.2 fsck工具
hadoop提供fsck工具來檢查HDFS中文件的健康狀況。該工具會查找哪些在所有datanode中均缺失的塊以及過少或過多復本的塊。注意,fsck工具只是從namenode獲取信息,并不與任何datanode進行交互,因此并不真正獲取塊數據
hadoop fsck /
[root@xenserver hadoop6]# xm console hadoop1 WARNING: xend/xm is deprecated. PCI: Warning: Cannot find a gap in the 32bit address range PCI: Unassigned devices with 32bit resource registers may break! PCI: Fatal: No config space access function found ipmi_si: Could not set up I/O space ipmi_si: Could not set up I/O space ipmi_si: Could not set up I/O space Welcome to CentOS Starting udev: [ OK ] Setting hostname hadoop1: [ OK ] Setting up Logical Volume Management: 3 logical volume(s) in volume group "VolGroup" now active [ OK ] Checking filesystems Checking all file systems. [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/mapper/VolGroup-lv_root /dev/mapper/VolGroup-lv_root: clean, 62607/3276800 files, 1911998/13107200 blocks [/sbin/fsck.ext4 (1) -- /boot] fsck.ext4 -a /dev/xvda1 /dev/xvda1: recovering journal /dev/xvda1: clean, 38/128016 files, 49624/512000 blocks [/sbin/fsck.ext4 (1) -- /home] fsck.ext4 -a /dev/mapper/VolGroup-lv_home /dev/mapper/VolGroup-lv_home: recovering journal /dev/mapper/VolGroup-lv_home: clean, 8755/29204480 files, 1969939/116811776 blocks [ OK ] Remounting root filesystem in read-write mode: [ OK ] Mounting local filesystems: [ OK ] Enabling /etc/fstab swaps: [ OK ] Entering non-interactive startup Starting monitoring for VG VolGroup: 3 logical volume(s) in volume group "VolGroup" monitored [ OK ] ip6tables: Applying firewall rules: [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth0: Determining if ip address 10.1.2.184 is already in use for device eth0... [ OK ] Starting auditd: [ OK ] Starting system logger: [ OK ] Mounting filesystems: [ OK ] Retrigger failed udev events[ OK ] Starting sshd: [ OK ] Starting postfix: [ OK ] Starting crond: [ OK ] CentOS release 6.5 (Final) Kernel 2.6.32-431.el6.x86_64 on an x86_64 hadoop1 login: root Password: Last login: Tue Jan 19 10:59:56 from 10.1.2.192 [root@hadoop1 ~]# hadoop fsck / -bash: hadoop: command not found [root@hadoop1 ~]# cd /home/hadoop/hadoop-1.0.4/bin/ [root@hadoop1 bin]# ./hadoop fsck / FSCK started by root from /10.1.2.184 for path / at Tue Jan 19 20:59:55 CST 2016 ........................ /data/appstore/chDownloadForPlayer/2016/01/14/00/output/_logs/history/job_201601082048_0706_conf.xml: CORRUPT block blk_-1739242649335851318 ............................................................................ ............................................................................................... /data/appstore/chRetainAndFresh/2016/01/14/00/output/_logs/history/job_201601082048_0707_conf.xml: CORRUPT block blk_5175780252882211574 ..... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... ..................................................................................FSCK ended at Tue Jan 19 20:59:56 CST 2016 in 1469 milliseconds Permission denied: user=root, access=READ_EXECUTE, inode=".staging":hadoop:supergroup:rwx------ Fsck on path '/' FAILED
fsck輸出文件內容有以下信息
過多的復制塊
指復本數超出最小塊復本級別的塊,嚴格上講,并非一個大問題,HDFS會自動刪除多余復本
仍需復制的塊
指復本數目低于最小復本級別的塊。HDFS會自動為這些塊創建新的復本,直到達到最小復本級別,用 hadoop dfsadmin -metasave FILE 了解正在復制的(或等待復制的)塊的信息
錯誤復制的塊
違反塊復本放置策略的塊。如,最小復本級為3的多機架集群中,如果一個塊的三個復本都存儲在一個機加中,則可認定該塊的復本放置錯誤,因為一個塊的復本要分散在至少兩個機架中,以提高可靠性。
損壞的塊
指所有復本均已損壞的塊。如果雖然部分復本損壞,但至少還有一個復本完好,則該塊就未損壞;namenode將創建新的復本,直到達到最小復本級別。
缺失的復本
指在集群中沒有任何復本的塊
1.5 均衡器
均衡器(balancer)程序是一個hadoop守護進程,它將塊從忙碌的datanode移到相對空閑的datanode,從而重新分配塊。
啟動均衡器,-threshold參數指定閥值(百分比格式),以判定集群是否均衡,默認10%
start-balancer.sh
三、維護
元數據備份
如果namenode永久性元數據丟失或損壞,則整個文件系統無法使用。備份方法:利用腳本文件定期將輔助namenode的previous.checkpoint子目錄存檔,放到異地站點。注意該子目錄放在fs.checkpoint.dir屬性定義的目錄之中。
數據備份
distcp是一個理想備份工具,其并行的文件復制功能可將備份文件存儲到其他HDFS集群。
3. 添加新節點
3.1 委任新節點
(1)配置hdfs-site.xml文件,指向namenode;
(2)配置mapred.site.xml文件,指向jobtracker
(3) 啟動datanode和jobtracker守護進程
注意:被允許連接到namenode的所有datanode放在一個文件中,文件名稱由dfs.hosts屬性指定。該文件放在namenode的本地文件系統中,每行對應一個datanode的網絡地址。如需要為一個datanode指定多個網絡地址,可將多個網絡地址放在一行,由空格隔開。通常情況下,集群中的節點同時運行datanode和tasktracker守護進程,dfs.hosts和mapred.hosts會同時指向一個文件,即include文件。
3.2 dfs.hosts屬性和mapred.hosts屬性指定(一個或多個)文件不同于slave文件
前者供namenodet和jobtracker使用,決定可以連接哪個工作節點
后者使用slave文件執行面向整個集群范圍的操作。如重啟集群等。
3.3 向集群添加新節點步聚
(1)將新點的網絡地址添加到include文件中
(2)將審核一系列的datanode集合更新至namenode信息
hadoop dfsadmin -refreshNodes
(3)經過審核的一系列的tasktracker信息更新至jobtracker
hadoop mradmin -refreshNodes
(4)以新節點更新slaves文件。這樣的話,hadoop控制腳本會將新節點包括在未來操作之中
(5)啟動新的datanode和tasktracker
(6)檢查新的datanode和tasktracker是否出現在網頁界面中
4. 解除舊節點
4.1 用戶將擬退出若干datanode告知namenode,hadoop系統將這些datanode停機之前將塊復制到其他datanode
4.2 HDFS的include文件和exclude文件
節點是否出現在include文件中 | 節點是否出現在exclude文件中 | 解釋 |
否 | 否 | 節點無法連接 |
否 | 是 | 節點無法連接 |
是 | 否 | 節點可連接 |
是 | 是 | 節點可連接,將被解除 |
4.3 從集群節點移除節點步聚
(1)將待解除節點的網絡地址添加到exclude文件中,不更新include文件
(2)使用一組新的審核過的datanode來更新namenode
hadoop dfsadmin -refreshNodes
(3)使用一組新的審核過的datanode來更新jobtracker設置
hadoop mradmin -refreshNodes
(4) 轉到網頁界面,查看待解除datanode的管理狀態是否已經變為"正在解除"(Decommission In Progress)。這些datanode會把它們的塊復制到其他的datanode中
(5)當所有datanode的狀態變為”解除完畢“(Decommissioned)時,表明所有塊都已經復制完畢。關閉已經解除的節點
(6)從include文件中移除這些節點
hadoop dfsadmin -refreshNodes
hadoop mradmin -refreshNodes
(7) 從slaves文件中移除節點
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。