正規表示式及Linux文字檢查工具

2020-06-16 17:45:28

首先我們要明白什麼是正規表示式？
用最簡單的話來說，正規表示式就是一套為了處理大量的字串來定義的某種規則和方法；或者換一句話來講，正規表示式就是用一些特殊的字元來重新定義表示含義：
例如：我們把"."表示任意的單個字元；這樣的類似的重新定義就是我們講的正規表示式；
正規表示式廣泛的參照在grep工具中，所以我們先通過grep慢慢引出什麼是正規表示式...

一、linux正規表示式之前的三個文字查詢命令
grep：(global search regular RE )全面搜尋正規表示式並把行列印出來）

相關解釋：最早的文字匹配程式，使用POSIX定義的基本正規表示式（BRE）來匹配文字
名稱:print lines matching a pattern是一種強大的文字搜尋工具，它只能使用基本的正規表示式來搜尋文字，並把匹配的行列印出來
[root@linux ~]# grep 'root' /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
[root@linux ~]# 格式：
1）grep [OPTIONS] PATTERN [FILE...]
###########下面我們就根據這個檔案進行講解###########
[root@linux ~]# cat test.txt
This is a beautiful girl
So do you want to know who is her?

oh! I can`t tell you?

can you tell me your phone number?
My telphone number is 15648562351...

Beat wish to you ?
#########################################################################################

2）grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]
描述：grep會根據標準輸入的“PATTERN”或者被命名的檔案搜尋相應的行，預設情況下會列印匹配的行
[root@linux ~]# grep 'telphone' test.txt
My telphone number is 15648562351...
[root@linux ~]#

常用選項：
-E: 相當於egrep,是由POSIX指定，利用此命令可以使用擴充套件的正規表示式對文字進行搜尋，並把符合使用者需求的字串列印出來
注意：當我們使用egrep的時候我們就不需要對特殊的字元進行轉移操作了，這一點與grep有一點差別：
先來看看egrep的使用：
123 [root@linux ~]# egrep 'beautiful' test.txt
This is a beautiful girl
[root@linux ~]#

下面是grep -E類似與egrep的功能
[root@linux ~]# grep -E '^(a|J)' /etc/passwd
adm:x:3:4:adm:/var/adm:/sbin/nologin
avahi-autoipd:x:170:170:Avahi IPv4LL Stack:/var/lib/avahi-autoipd:/sbin/nologin
abrt:x:173:173::/etc/abrt:/sbin/nologin
Jason:x:1000:1000::/home/Jason:/bin/bash-F: 相當於fgrep,是由Posix指定，它利用固定的字串來對文字進行搜尋，但不支援正規表示式的參照，所以此命令的執行速度也最快
[root@linux ~]# grep -F 'root' /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin--color=auto/nerver/always：對匹配到的文字著色後高亮顯示，一般在alias中定義；
[root@linux ~]# alias
alias cp='cp -i'
alias egrep='egrep --color=auto'
alias fgrep='fgrep --color=auto'
alias grep='grep --color=auto'
alias l.='ls -d .* --color=auto'
alias ll='ls -l --color=auto'
alias ls='ls --color=auto'
alias mv='mv -i'
alias rm='rm -i'
alias which='alias | /usr/bin/which --tty-only --read-alias --show-dot --show-tilde'
[root@linux ~]#[root@linux ~]# grep 'home' --color=auto /etc/passwd
Jason:x:1000:1000::/home/Jason:/bin/bash
[root@linux ~]# grep 'home' --color=never /etc/passwd
Jason:x:1000:1000::/home/Jason:/bin/bash
[root@linux ~]# grep 'home' --color=always /etc/passwd
Jason:x:1000:1000::/home/Jason:/bin/bash
[root@linux ~]#-i：忽略字元大小寫；
[root@linux ~]# cat test.txt
Good morning,zhang An!
[root@linux ~]# grep -i 'a' test.txt
Good morning,zhang An!-o：僅顯示匹配到的文字自身；
[root@linux ~]# cat test.txt
Good morning,zhang An!
[root@linux ~]# grep -o 'zhang' test.txt
zhang
[root@linux ~]#-v: --invert-match：反向匹配，匹配引號之外的行
[root@linux ~]# cat test.txt
Good morning,zhang An!
nihao
[root@linux ~]# grep -v 'Good' test.txt
nihao
[root@linux ~]#
#在這裡可以看出反向匹配是列印出來不包含'Good'的行-q: --quiet， --silient：靜默模式，不輸出任何資訊；
[root@linux ~]# grep -v 'Good' test.txt
nihao
[root@linux ~]# grep -qv 'Good' test.txt
[root@linux ~]#-n:顯示匹配到行，並且顯示行號
[root@linux ~]# grep -n 'o' test.txt
1:Good morning,zhang An!
2:nihao
[root@linux ~]# grep 'o' test.txt | cat -n
1 Good morning,zhang An!
2 nihao
[root@linux ~]#
#grep的n選項是有顏色的與cat的n選項有一些差別 -c: 計算找到‘PATTERN’的次數
[root@linux ~]# grep -c 'o' test.txt
2
[root@linux ~]# -A：顯示匹配到字元那行的後面n行
[root@linux ~]# cat test.txt
gegVDFwer34fs43dfwerFG4g
gegVDFweSDFGertgg
23ere67fgSD5436fe
nihao,zhandge
[root@linux ~]# grep -A1 '23' test.txt
23ere67fgSD5436fe
nihao,zhandge
[root@linux ~]# -B：顯示匹配到字元那行的前面n行
[root@linux ~]# cat test.txt
gegVDFwer34fs43dfwerFG4g
gegVDFweSDFGertgg
23ere67fgSD5436fe
nihao,zhandge
[root@linux ~]# grep -B2 '23' test.txt
gegVDFwer34fs43dfwerFG4g
gegVDFweSDFGertgg
23ere67fgSD5436fe
[root@linux ~]#-C：顯示匹配到字元那行的前後n行
[root@linux ~]# grep -C1 '23' test.txt
gegVDFweSDFGertgg
23ere67fgSD5436fe
nihao,zhandge
[root@linux ~]#

-G：--basic-regexp：支援使用基本正規表示式；
-P：--perl-regexp：支援使用pcre正規表示式；

-e： PATTERN, --regexp=PATTERN：多模式機制；
-f： FILE, --file=FILE：FILE為每行包含了一個pattern的文字檔案，即grep script；
下面就不演示這兩個，上面有相關的例子
egrep:擴充套件式grep，其使用擴充套件式正規表示式（ERE）來匹配文字。

egrep命令等同於grep -E，利用此命令可以使用擴充套件的正規表示式對文字進行搜尋，並把符合使用者需求的字串列印出來。
fgrep：快速grep，這個版本匹配固定字串而非正規表示式。並且是唯一可以並行匹配多個字串的版本。

fgrep命令等同於grep -F，它利用固定的字串來對文字進行搜尋，但不支援正規表示式的參照，所以此命令的執行速度也最快。

Linux 基礎入門教學----正規表示式基礎 http://www.linuxidc.com/Linux/2015-08/121441.htm

Linux正規表示式sed 詳述 http://www.linuxidc.com/Linux/2015-04/116309.htm

Linux正規表示式特性及BRE與ERE的區別 http://www.linuxidc.com/Linux/2014-03/99152.htm

grep使用簡明及正規表示式 http://www.linuxidc.com/Linux/2013-08/88534.htm

正規表示式的用法 http://www.linuxidc.com/Linux/2013-03/81897.htm

正規表示式之零寬斷言 http://www.linuxidc.com/Linux/2013-03/81897.htm

Linux中正規表示式與檔案格式化處理命令(awk/grep/sed) http://www.linuxidc.com/Linux/2013-03/81018.htm

基礎正規表示式 http://www.linuxidc.com/Linux/2014-09/106296.htm

常用正規表示式整理 http://www.linuxidc.com/Linux/2014-10/108076.htm

二、基本正規表示式:
基本意義：由一些基本字元以及某些特殊字元搭配，組合成一段具有某種語法規則的能輕鬆搜尋並匹配文字的字串
分類：基本正規表示式與擴充套件正規表示式
1）基本正規表示式的元字元
什麼是元字元？
元字元是一個或一組代替一個或多個字元的字元，其實呢就是下面的這幾類.
1）字元匹配
.:表示匹配任意的單個字元
[root@linux ~]# grep 'r..t' /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
[root@linux ~]#我們注意這樣的一個例子：
1234567 [root@linux ~]# grep '.' test.txt
This is a beautiful girl
So do you want to know who is her?
oh! I can`t tell you?
can you tell me your phone number?
My telphone number is 15648562351...
Beat wish to you ?

這樣就把所有的行都匹配出來了
[]:匹配指定範圍內的單個字元:
[root@linux ~]# grep '[aj]h' test002.txt
ahjhb
[root@linux ~]#[^]:匹配指定範圍內的單個字元
[root@linux ~]# grep '[^a]h' test002.txt
ahjhb
[root@linux ~]#[:alnum:] : 數位與字母大小寫字元-->"A-Za-z0-9"
[root@linux ~]# grep '[[:alnum:]]' test002.txt
b
ab
acb
aaX2Ab
a[Ah?jhb
aba1baba5bab
[root@linux ~]#
####下面的我就不再一一在例子了，很簡單[:digit:] : 數位字元-------------->"0-9"
[root@linux ~]# grep '[[:digit:]]' test.txt
My telphone number is 15648562351...
[root@linux ~]# 把電話號碼匹配出來了

[:punct:] : 標點符號字元---------->"?.,"
1234567 [root@linux ~]# grep '[[:punct:]]' test.txt
So do you want to know who is her?
oh! I can`t tell you?
can you tell me your phone number?
My telphone number is 15648562351...
Beat wish to you ?
[root@linux ~]# 把所有的標點符號匹配出來了

[:alpha:] : 字母字元-------------->"A-Za-z"
12345678 [root@linux ~]# grep '[[:alpha:]]' test.txt
This is a beautiful girl
So do you want to know who is her?
oh! I can`t tell you?
can you tell me your phone number?
My telphone number is 15648562351...
Beat wish to you ?
除了字母是不是都過濾掉了？

[:graph:] : 除空格符(空格鍵與(Tab)按鍵)外的其他所有按鍵
[root@linux ~]# grep '[[:graph:]]' test.txt
This is a beautiful girl
So do you want to know who is her?
oh! I can`t tell you?
can you tell me your phone number?
My telphone number is 15648562351...
Beat wish to you ?
[root@linux ~]# 看看前面的原始檔，對比一下，是不是？

[:space:] : 代表的是空白字元，包括空格鍵[Tab]等
[root@linux ~]# grep '[[:graph:]]' test.txt
This is a beautiful girl
So do you want to know who is her?
oh! I can`t tell you?
can you tell me your phone number?
My telphone number is 15648562351...
Beat wish to you ?
[root@linux ~]# 這個演示的效果不太明顯，你可以試一試"grep '[^[:space:]]' test.txt"

[:blank:] : 代表的是空格鍵與[Tab]按鍵
[:lower:] : 小寫字母字元---------->"a-z"
[root@linux ~]# grep '[[:lower:]]' test.txt
This is a beautiful girl
So do you want to know who is her?
oh! I can`t tell you?
can you tell me your phone number?
My telphone number is 15648562351...
Beat wish to you ?

[:upper:] : 大寫字母字元---------->"A-Z"
[root@linux ~]# grep '[^[:lower:]]' test.txt
This is a beautiful girl
So do you want to know who is her?
oh! I can`t tell you?
can you tell me your phone number?
My telphone number is 15648562351...
Beat wish to you ?這樣寫是不是也對呢？
[root@linux ~]# grep '[[:upper:]]' test.txt
This is a beautiful girl
So do you want to know who is her?
oh! I can`t tell you?
My telphone number is 15648562351...
Beat wish to you ?

[:cntrl:] : 表示鍵盤上面的控制按鍵即包括"CR,LF,Tab,Del"
[:print:] : 代表可以列印出來的字元
[:xdigit:] :代表十六進位制的數位型別->"0-9，A-F,a-f"
上面三個不經常使用，就不演示了

更多詳情見請繼續閱讀下一頁的精彩內容： http://www.linuxidc.com/Linux/2016-03/129298p2.htm

正規表示式及Linux文字檢查工具

熱門文章