Python3的正規表示式詳解

2022-03-16 10:01:00

1.簡介

# 正規表示式：用來匹配字串的武器；
# 設計思想：用一種描述性的語言來給字串定義一個規則，凡是符合規則的字串，認為匹配，否則，該字串是不合法的；
# 範例：判斷一個字串是否是合法的Email方法：
# 1.建立一個匹配Email的正規表示式；
# 2.用該正規表示式去匹配使用者的輸入來判斷是否合法；
# 如：d可以匹配一個數位，w可以匹配一個字母或數位；
# a. "00d"可以匹配"008"，但無法匹配"00A";
# b. "ddd"可以匹配"009";
# c. "wwd"可以匹配"py3";
# 如： .匹配任意字元
# a. "py."可以匹配"pyc"、"pyt"等；
# 匹配變長的字元：
# a.用*表示任意個字元(包括0個)；
# b.用+表示至少一個字元；
# c.用?表示0個或1個字元；
# d.用{n}表示n個字元；
# e.用{n,m}表示n-m個字元；
# 範例：d{2}s+d{3,6}
# a.d{2}表示匹配2個數位，如："52";
# b.s可以匹配一個空格，s+表示至少有一個空格，如：匹配" "等；
# c.d{3,6}表示3-6個數位，如："584520";
# 精準匹配，用[]表示範圍
# a.[0-9a-zA-Z_]表示可以匹配一個數位、字母、下劃線;
# b.[0-9a-zA-Z_]+表示可以匹配至少由一個數位、字母或下劃線組成的字串，如："Py20";
# c.[a-zA-Z_][0-9a-zA-Z_]*表示匹配由字母或下劃線開頭，後接任意個由一個數位、字母或下劃線組成的字串；
# d.[a-zA-Z_][0-9a-zA-Z_]{0,19}限制變數長度為1-20個字元；
# e.A|B表示匹配A或B,如：(W|w)illard匹配"Willard"或"willard";
# f.^表示行的開頭，^d表示必須以數位開頭；
# g.$表示行的結束，d$表示必須以數位結束；

# re模組：
import re
print("匹配成功，返回一個Match物件：")
print(re.match(r"^d{3}-d{3,8}$", "020-6722053"))
print("----------------------------------------------------")
print("匹配失敗，返回一個None：")
print(re.match(r"^d{3}-d{3,8}$", "020 6722053"))
print("----------------------------------------------------")
user_input = input("請輸入測試字串：")
if re.match(r"^W|w{1-10}", user_input):
    print("It's OK.")
else:
    print("Failed.")

# 結果輸出：
匹配成功，返回一個Match物件：
<re.Match object; span=(0, 11), match='020-6722053'>
----------------------------------------------------
匹配失敗，返回一個None：
None
----------------------------------------------------
請輸入測試字串：Willard584520
It's OK.

2.切分字串

import re
str_input = input("Please input test string：")
# 通過空格切分字串
print(re.split(r"s+", str_input))
# 結果輸出：
# Please input test string：Hello Python.
# ['Hello', 'Python.']

import re
str_input = input("Please input test string：")
print(re.split(r"[s,]+", str_input))
# 結果輸出：
# Please input test string：Hello Willard,welcome to FUXI Technology.
# ['Hello', 'Willard', 'welcome', 'to', 'FUXI', 'Technology.']

import re
str_input = input("Please input test string：")
print(re.split(r"[s,.;]+", str_input))
# 結果輸出：
# Please input test string：Hello;I am Willard.Welcome to FUXI Technology.
# ['Hello', 'I', 'am', 'Willard', 'Welcome', 'to', 'FUXI', 'Technology', '']

3.分組

# ()表示要提取的分組(Group)
# ^(d{3})-(d{3,8})$分別定義了兩個組
import re
match_test = re.match(r"^(d{3})-(d{3,8})$","020-6722053")
print("match_test：", match_test)
print("match_group(0)：", match_test.group(0))
print("match_group(1)：", match_test.group(1))
print("match_group(2)：", match_test.group(2))
print("---------------------------------------------------------")
website_match_test = re.match(r"(w{3}).(w{5}).(w{3})", "www.baidu.com")
print("website_match_test：", website_match_test)
print("website_match_test_group(0)：", website_match_test.group(0))
print("website_match_test_group(1)：", website_match_test.group(1))
print("website_match_test_group(2)：", website_match_test.group(2))
print("website_match_test_group(3)：", website_match_test.group(3))

# 結果輸出：
match_test： <re.Match object; span=(0, 11), match='020-6722053'>
match_group(0)： 020-6722053
match_group(1)： 020
match_group(2)： 6722053
---------------------------------------------------------
website_match_test： <re.Match object; span=(0, 13), match='www.baidu.com'>
website_match_test_group(0)： www.baidu.com
website_match_test_group(1)： www
website_match_test_group(2)： baidu
website_match_test_group(3)： com

4.貪婪匹配

# 貪婪匹配：匹配儘可能多的字元；
import re
string_input =  input("Please input string：")
print("採用貪婪匹配：")
print(re.match(r"^(d+)(0*)$", string_input).groups())
print("---------------------")
print("採用非貪婪匹配：")
print(re.match(r"^(d+?)(0*)$", string_input).groups())

Please input string：1008600
採用貪婪匹配：
('1008600', '')
---------------------
採用非貪婪匹配：
('10086', '00')

5.編譯

# 使用正規表示式，re模組內部：
# a.編譯正規表示式，如果正規表示式的字串本身不合法，丟擲錯誤；
# b.用編譯後的正規表示式去匹配字串；
# c.如果一個正規表示式要重複使用幾千次，考慮效率，
# 可以預編譯正規表示式，重複使用時，不需要編譯這個步驟，直接匹配；
import re
# 編譯
re_telephone = re.compile(r"^(d{3})-(d{3,8})$")
# 使用
telephone_input1 = input("Willard，please input your telphone number：")
telephone_input2 = input("Chen，Please input your telphone number：")
print("match：020-6722053，", re_telephone.match(telephone_input1).groups())
print("match：020-6722066，", re_telephone.match(telephone_input2).groups())

# 結果輸出:
Willard，please input your telphone number：020-6722053
Chen，Please input your telphone number：020-6722066
match：020-6722053， ('020', '6722053')
match：020-6722066， ('020', '6722066')

總結

本篇文章就到這裡了，希望能夠給你帶來幫助，也希望您能夠多多關注it145.com的更多內容!

Python3的正規表示式詳解

目錄

1.簡介

2.切分字串

3.分組

4.貪婪匹配

5.編譯

總結

熱門文章