您好,登錄后才能下訂單哦!
博文目錄
一、正則表達式的定義
二、擴展正則表達式元字符
三、文本處理器
正則表達式又稱正規表達式、常規表達式。在代碼中常簡寫為regex、regexp或RE。正則表達式是使用單個字符串來描述,匹配一系列符合某個句法規則的字符串,簡單來說,是一種匹配字符串的方法,通過一些特殊符號,實現快速查找、刪除、替換某個特定字符串。
正則表達式是由普通字符與元字符組成的文字模式。模式用于描述在搜索文本時要匹配的一個或多個字符串。正則表達式作為一個模板,將某個字符模式與所搜索的字符串進行匹配。其中普通字符包括大小寫字母、數字、標點符號及一些其他符號,元字符則是指那些在正則表達式中具有特殊意義的專用字符,可以用來規定其前導字符(即位于元字符前面的字符)在目標對象中的出現模式。
正則表達式的字符串表達方法根據不同的嚴謹程度與功能分為基本正則表達式與擴展正則表達式。基礎正則表達式是常用的正則表達式的最基礎的部分。在Linux系統中常見的文件處理工具中grep與sed支持基礎正則表達式,而egrep與awk支持擴展正則表達式。
提前準備一個名為test.txt的測試文件,文件具體內容如下:
[root@centos01 ~]# vim test.txt
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.
PI=3.14148223023840-2382924893980--2383892948
a wood cross!
Actions speak louder than words
#wooood #
#woooood #
AxyzxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.
[root@centos01 ~]# grep -n 'the' test.txt <!--查找特定字符,-n顯示行號-->
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.
[root@centos01 ~]# grep -in 'the' test.txt <!--查找特定字符,-in顯示行號不區分大小寫-->
3:The home of Football on BBC Sport online.
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.
[root@centos01 ~]# grep -vn 'the' test.txt <!--查找不包括特定字符的行,-vn選項實現-->
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
7:PI=3.14148223023840-2382924893980--2383892948
8:a wood cross!
9:Actions speak louder than words
10:
11:
12:#wooood #
13:#woooood #
14:AxyzxyzxyzxyzxyzC
15:I bet this place is really spooky late at night!
16:Misfortunes never come alone/single.
17:I shouldn't have lett so tast.
[root@centos01 ~]# grep -n 'sh[io]rt' test.txt <!--中括號來查找集合字符,
“[]”中無論有幾個字符,都僅代表一個字符,
也就是說“[io]”表示匹配“i”或者“o”-->
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants.
[root@centos01 ~]# grep -n 'oo' test.txt <!--查找重復單個字符-->
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
8:a wood cross!
12:#wooood #
13:#woooood #
15:I bet this place is really spooky late at night!
[root@centos01 ~]# grep -n '[^w]oo' test.txt <!--查找“oo”前面不是“w”的字符串,
使用“[^]”選項實現-->
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
12:#wooood #
13:#woooood #
15:I bet this place is really spooky late at night!
[root@centos01 ~]# grep -n '[^a-z]oo' test.txt <!--查找“oo”前面不存在小寫字母-->
3:The home of Football on BBC Sport online.
[root@centos01 ~]# grep -n '[0-9]' test.txt <!--查找包含數字的行-->
4:the tongue is boneless but it breaks bones.12!
7:PI=3.14148223023840-2382924893980--2383892948
[root@centos01 ~]# grep -n '^the' test.txt <!--查找以“the”字符串為行首的行-->
4:the tongue is boneless but it breaks bones.12!
[root@centos01 ~]# grep -n '^[a-z]' test.txt <!--查找以小寫字母為行首的行 -->
1:he was short and fat.
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
8:a wood cross!
[root@centos01 ~]# grep -n '^[A-Z]' test.txt <!--查找以大寫字母為行首的行-->
2:He was wearing a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
6:The year ahead will test our political establishment to the limit.
7:PI=3.14148223023840-2382924893980--2383892948
9:Actions speak louder than words
14:AxyzxyzxyzxyzxyzC
15:I bet this place is really spooky late at night!
16:Misfortunes never come alone/single.
17:I shouldn't have lett so tast.
[root@centos01 ~]# grep -n '^[^a-zA-Z]' test.txt <!--查找不以字母開頭的行-->
12:#wooood #
13:#woooood #
[root@centos01 ~]# grep -n 'w..d' test.txt <!--查找任意一個字符“.”與重復字符“*”-->
5:google is the best tools for search keyword.
8:a wood cross!
9:Actions speak louder than words
[root@centos01 ~]# grep -n 'ooo*' test.txt <!--查看包含至少兩個o以上的字符串-->
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
8:a wood cross!
11:#woood #
13:#woooooood #
19:I bet this place is really spooky late at night!
[root@centos01 ~]# grep -n 'woo*d' test.txt <!--查詢w開頭d結尾,中間至少包含一個o的字符串-->
8:a wood cross!
11:#woood #
13:#woooooood #
[root@centos01 ~]# grep -n '[0-9][0-9]*' test.txt <!--查詢任意數字所在行-->
4:the tongue is boneless but it breaks bones.12!
7:PI=3.141592653589793238462643383249901429
[root@centos01 ~]# grep -n 'o\{2\}' test.txt <!--查找連續兩個o的字符“{}”-->
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
8:a wood cross!
11:#woood #
13:#woooooood #
19:I bet this place is really spooky late at night!
在Linux/UNIX系統中包含很多種文本處理器或文本編輯器,其中包括VIM編輯器與grep等。而grep,sed,awk更是shell編程中經常用到的文本處理工具,被稱為shell編程三劍客。
sed(Stream EDitor)是一個強大而簡單的文本解析轉換工具,可以讀取文本,并根據指定的條件對文本內容進行編輯(刪除、
替換、添加、移動等),最后輸出所有行或者僅輸出處理的某些行。sed也可以在無交互的情況下實現相當復雜的文本處理操作,被廣泛應用于shell腳本中,用以完成各種自動化處理任務。
sed的工作流程主要包括讀取、執行和顯示三個過程:
- 讀取:sed從輸入流(文件、管道、標準輸入)中讀取一行內容并存儲到臨時的緩沖區中(又稱模式空間,patterm space)。
- 執行:默認情況下,所有的sed命令都在模式空間中順序地執行,除非指定了行的地址,否則sed命令將會在所有的行上依次執行。
- 顯示:發送修改后的內容到輸出流。再發送數據后,模式空間將會被清空。在所有的文件內容都被處理完成之前,上述過程將重復執行,直到所有內容被處理完。
sed[選項] '操作' 參數
sed [選項] -f scriptfile 參數
常見的sed命令選項主要包含以下幾種:
- -e或--expression=:表示用指定命令或者腳本來處理輸入的文本文件。
- -f或--file=:表示用指定的腳本文件來處理輸入的文本文件。
- -h或--help:顯示幫助。
- -n、--quiet或silent:表示僅顯示處理后的結果。
- -i:直接編輯文本文件。
“操作”用于指定對文件操作的動作行為,也就是sed的命令。通常情況下是采用的“[n1[,n2]]”操作參數的格式。n1、n2是可選的,不一定會存在,代表選擇進行操作的行數,如操作需要在5~20行之間進行,則表示為“5,20動作行為”。常見的操作包括以下幾種:- a:增加,在當前行下面增加一行指定內容。
- c:替換,將選定行替換為指定內容。
- d:刪除,刪除選定的行。
- i:插入,在選定行上面插入一行指定內容。
- p:打印,如果同時指定行,表示打印指定行;如果不指定行,則表示打印所有內容;如果有非打印字符,則以ASCII碼輸出。其通常與“-n”選項一起使用。
- s:替換,替換指定字符。
- y:字符轉換。
[root@centos01 ~]# sed -n '3p' test.txt <!--輸出第三行-->
The home of Football on BBC Sport online.
[root@centos01 ~]# sed -n '3,5p' test.txt <!--輸出第三行到第五行-->
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
[root@centos01 ~]# sed -n 'p;n' test.txt <!--輸出所有奇數行-->
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.
PI=3.141592653589793238462643383249901429
Actions speak louder than words
#woood #
#woooooood #
I bet this place is really spooky late at night!
I shouldn't have lett so tast.
[root@centos01 ~]# sed -n 'p;n' test.txt <!--輸出所有偶數行-->
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.
PI=3.141592653589793238462643383249901429
Actions speak louder than words
#woood #
#woooooood #
I bet this place is really spooky late at night!
I shouldn't have lett so tast.
[root@centos01 ~]# sed -n '1,5{p;n}' test.txt <!--輸出第一行到第五行之間的奇數行 -->
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.
[root@centos01 ~]# sed -n '10,${n;p}' test.txt <!--輸出第10行至文件尾之間的偶數行-->
#woood #
#woooooood #
I bet this place is really spooky late at night!
I shouldn't have lett so tast.
[root@centos01 ~]# sed -n '/the/p' test.txt <!--輸出包含the的行-->
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.
[root@centos01 ~]# sed -n '4,/the/p' test.txt<!--輸出從第4行至第一個包含the的行-->
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
[root@centos01 ~]# sed -n '/the/=' test.txt <!--輸出包含the的行所在的行號,
等號(=)用來輸出行號-->
4
5
6
[root@centos01 ~]# sed -n '/^PI/p' test.txt <!--輸出以PI開頭的行-->
PI=3.141592653589793238462643383249901429
[root@centos01 ~]# sed -n '/\<wood\>/p' test.txt <!--輸出包含單詞wood的行,
\<、\>代表單詞邊界-->
a wood cross!
[root@centos01 ~]# nl test.txt | sed '3d' <!--刪除第3行-->
1 he was short and fat.
2 He was wearing a blue polo shirt with black pants.
4 the tongue is boneless but it breaks bones.12!
5 google is the best tools for search keyword.
6 The year ahead will test our political establishment to the limit.
7 PI=3.141592653589793238462643383249901429
8 a wood cross!
9 Actions speak louder than words
10
11 #woood #
12
13 #woooooood #
14
15
16 AxyzxyzxyzxyzC
17
18
19 I bet this place is really spooky late at night!
20 Misfortunes never come alone/single.
21 I shouldn't have lett so tast.
[root@centos01 ~]# nl test.txt | sed '3,5d' <!--刪除第3~5行-->
1 he was short and fat.
2 He was wearing a blue polo shirt with black pants.
6 The year ahead will test our political establishment to the limit.
7 PI=3.141592653589793238462643383249901429
8 a wood cross!
9 Actions speak louder than words
10
11 #woood #
12
13 #woooooood #
14
15
16 AxyzxyzxyzxyzC
17
18
19 I bet this place is really spooky late at night!
20 Misfortunes never come alone/single.
21 I shouldn't have lett so tast.
[root@centos01 ~]# sed '/^[a-z]/d' test.txt <!--刪除以小寫字母開頭的行-->
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
The year ahead will test our political establishment to the limit.
PI=3.141592653589793238462643383249901429
Actions speak louder than words
#woood #
#woooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.
[root@centos01 ~]# sed 's/the/THE/' test.txt <!--將每行中的第一個the替換為THE-->
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
THE tongue is boneless but it breaks bones.12!
google is THE best tools for search keyword.
The year ahead will test our political establishment to THE limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words
#woood #
#woooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.
[root@centos01 ~]# sed 's/l/L/2' test.txt <!--將每行中的第三個l替換為L-->
he was short and fat.
He was wearing a blue poLo shirt with black pants.
The home of FootbalL on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tooLs for search keyword.
The year ahead wilL test our political establishment to the limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words
#woood #
#woooooood #
AxyzxyzxyzxyzC
I bet this place is reaLly spooky late at night!
Misfortunes never come alone/singLe.
I shouldn't have Lett so tast.
[root@centos01 ~]# sed 's/^/#/' test.txt <!--在每行行首插入#號-->
#he was short and fat.
#He was wearing a blue polo shirt with black pants.
#The home of Football on BBC Sport online.
#the tongue is boneless but it breaks bones.12!
#google is the best tools for search keyword.
#The year ahead will test our political establishment to the limit.
#PI=3.141592653589793238462643383249901429
#a wood cross!
#Actions speak louder than words
#
##woood #
#
##woooooood #
#
#
#AxyzxyzxyzxyzC
#
#
#I bet this place is really spooky late at night!
#Misfortunes never come alone/single.
#I shouldn't have lett so tast.
[root@centos01 ~]# sed '/the/s/o/0/g' test.txt <!--將包含the的所有行中的o都替換為0-->
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the t0ngue is b0neless but it breaks b0nes.12!
g00gle is the best t00ls f0r search keyw0rd.
The year ahead will test 0ur p0litical establishment t0 the limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words
#woood #
#woooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.
在Linux/UNIX系統中,awk是一個功能強大的編輯工具,逐行讀取輸入文本,并根據指定的匹配模式進行查找,對符合條件的內容進行格式化輸出或者過濾處理,可以在無交互的情況下實現相當復雜的文本操作,被廣泛應用于Shell腳本,完成各種自動化配置任務。
通常情況下awk所使用的命令格式如下所示,其中,單引號加上大括號“{}”用于設置對數據進行的處理動作。awk可以直接處理目標文件也可以通過“-f”讀取腳本對目標文件進行處理。
awk 選項 '模式或條件 {編輯指令}' 文件1 文件2 ......
awk -f 腳本文件 文件1 文件2 ...
awk包含幾個特殊的內建變量(可直接用)如下所示:
- NF:當前處理的行的字段個數。
- FS:指定每行文本的字段分隔符,默認為空格或制表位。
- NR:當前處理的行的字段個數。
- $0:當前處理的行的整行內容。
- FILENAME:被處理的文件名。
- RS:數據記錄分隔,默認為\n,即每行為一條記錄。
[root@centos01 ~]# awk '{print}' test.txt <!--輸出所有內容-->
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words
#woood #
#woooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.
[root@centos01 ~]# awk 'NR==1,NR==3{print}' test.txt <!--輸出1~3行內容-->
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
[root@centos01 ~]# awk '(NR%2)==1{print}' test.txt <!--輸出所有奇數行的內容-->
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.
PI=3.141592653589793238462643383249901429
Actions speak louder than words
#woood #
#woooooood #
I bet this place is really spooky late at night!
I shouldn't have lett so tast.
[root@centos01 ~]# awk '(NR%2)==0{print}' test.txt <!--輸出所有偶數行內容-->
He was wearing a blue polo shirt with black pants.
the tongue is boneless but it breaks bones.12!
The year ahead will test our political establishment to the limit.
a wood cross!
AxyzxyzxyzxyzC
Misfortunes never come alone/single.
[root@centos01 ~]# awk '/^root/{print}' /etc/passwd <!--輸出以root開頭的行-->
root:x:0:0:root:/root:/bin/bash
[root@centos01 ~]# awk '{print $1 $3}' test.txt <!--輸出每行中的第1、3個字段-->
heshort
Hewearing
Theof
theis
googlethe
Theahead
PI=3.141592653589793238462643383249901429
across!
Actionslouder
#woood
#woooooood
AxyzxyzxyzxyzC
Ithis
Misfortunescome
Ihave
—————— 本文至此結束,感謝閱讀 ——————
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。