Python字串常用方法以及其應用場景詳解

2022-08-06 14:00:03

前言
一、最大化最小化方法
二、統計次數方法
三、去掉左右側字元方法
四、字串分隔方法
五、字串替換方法
六、字串拼接方法
七、判斷是否為數位的方法
八、判斷是否為空格的方法
九、判斷字首和字尾的方法
補充：更多Python字串常用方法
總結

前言

字串作為一種重要的Python基本資料型別，在資料處理中發揮著不可或缺的作用，如果對它的方法能夠靈活使用，能夠達到事半功倍的效果。下面我們選取一些常用的方法，簡述其應用場景。

一、最大化最小化方法

字串的最大化方法upper()和最小化方法lower()可以將字串全部轉換為大寫和小寫。在資料處理分析過程中，如果涉及到字串的比較和統計，尤其涉及到英文的，一般需要將字串全部轉化小寫再進行比較統計，否則可能會不準。

比如根據使用者的輸入，決定接下來的程式是否執行，如果使用者輸入n則不執行，為了讓程式設計的更加友好，需要考慮使用者可能輸入N的情況，該問題可以通過lower()或者upper()來解決。

>>> choice = input('是否繼續執行程式，輸入n或N則結束：')
是否繼續執行程式，輸入n或N則結束：N
>>> if choice == 'n'or choice == 'N':  # 常規處理方式
	      print('程式結束')
>>> if choice.lower() == 'n':  #  推薦用該方法處理
	      print('程式結束')

比如現在通過分詞工具，已經把一段英文分詞單詞的列表，現在要統計“when”出現的次數，一般需要再統計之前將字串全部最小化下。

>>> words = ['When', 'you', 'fall', 'stand', 'up.', 'And', 'when', 'you', 'break', 'stand', 'tough', 'And', 'when', 'they', 'say', 'you', 'can't,', 'you', 'say', 'I', 'can', 'I', 'can']
>>> count = 0
>>> sta_word = 'when'
>>> for word in words:
	    if word.lower() == sta_word:
		    count += 1
>>> print('{}出現了{}次'.format('when', count))
when出現了3次

二、統計次數方法

統計次數的count()方法可以快速統計字串中某個子串出現的次數，但這個方法在列表資料型別中應用較多，在字串中應用很少，使用不當容易造成不易察覺的錯誤。

比如統計“帽子和服裝如何搭配才好看”這句話中“和服”出現的次數，雖然出現了“和服”，但不是想要統計的結果，對於英文中很多單詞有多種時態，更是如此。對於文字中詞頻的統計，一般需要先進行分詞處理，英文可能還需要進行詞形還原處理，然後再統計詞頻。

>>> "帽子和服裝如何搭配才好看".count("和服")
1
>>> import jieba
>>> words = jieba.lcut("帽子和服裝如何搭配才好看")
>>> words
['帽子','和','服裝','如何','搭配','才','好看']
>>> words.count("和服") # 分詞後再統計
0

三、去掉左右側字元方法

在做文書處理任務時，對於網路上爬取或者其他渠道獲取的資料資訊，經常會存在“噪聲”，即會有一些沒有實際意義的字元，干擾文字的格式和資訊的提取，此時strip()、lstrip()、rstrip()方法就可以幫助刪除掉字串頭部和尾部的指定字元。當字元沒有被指定時，預設去除空格或換行符。lstrip()代表刪除字串左側（即頭部）出現的指定字元，rstrip()代表刪除字串右側（即尾部）出現的指定字元。下面通過幾個例子來說明。

>>> temp_str = "  tomorrow is another day "
>>> temp_str.strip()
'tomorrow is another day'
>>> temp_str = "#  tomorrow is another day @"
>>> temp_str.strip('#')
'  tomorrow is another day @'
>>> temp_str.strip('# @')
'tomorrow is another day'
>>> temp_str = "#@  tomorrow is another day @"
>>> temp_str.lstrip('@# ')
'tomorrow is another day @'

四、字串分隔方法

當字串具有特定的格式，或者需要處理的資料具有結構化特點，比如excel表格的資料、或者json格式的檔案等，當提取其中的某一個或幾個欄位時，需要先對字串進行分隔。split()方法以指定的分隔符為基準，將分隔後得到的字串以陣列型別返回，方便進行之後的操作。當沒有指定分隔符時，預設以空格分隔。

>>> temp_str = "Whatever is worth doing is worth doing well"
>>> temp_str.split()
['Whatever', 'is', 'worth', 'doing', 'is', 'worth', 'doing', 'well']
>>> temp_str = "tomorrow#is#another#day"
>>> temp_str.split('#')
['tomorrow', 'is', 'another', 'day']
>>> temp_str = ‘"name":"Mike","age":18,"sex":"male","hair":"black"'
>>> temp_str.split(',')
['"name":"Mike"', '"age":18', '"sex":"male"', '"hair":"black"']

五、字串替換方法

字串替換也是很常用的方法之一。例如發現有輸入錯誤的時候，正確的要替換掉錯誤的，或者需要將一些沒有意義的字元統一去除或者換成空格的時候，都可以考慮使用replace()方法。第三個引數為可選引數，表示替換的最大次數。

>>> temp_str = "this is really interesting, and that is boring."
>>> temp_str.replace('is','was')
'thwas was really interesting, and that was boring.'
>>> temp_str.replace('is','was')
'this was really interesting, and that was boring.'
>>> temp_str = 'I really really really like you.'
>>> temp_str.replace("really","",2)
'I   really like you.'

上例顯示出，字串中出現的所有is都被進行了替換，包括包含is的單詞，這也是程式設計中需要考慮的問題。如果是英文字串，可以考慮通過加上空格的方式來避免錯誤的替換，如第四行所示。

六、字串拼接方法

字串的拼接方法與其分隔方法可以看作是互逆操作，join()方法將序列中的元素以指定的字元連線，生成一個新的字串。這個序列可以是字串、元組、列表、字典等。

>>> seq = 'hello world'
>>> ":".join(seq)
'h:e:l:l:o: :w:o:r:l:d'
>>> seq = ('Whatever', 'is', 'worth', 'doing', 'is', 'worth', 'doing', 'well')
>>> "*".join(seq)
'Whatever*is*worth*doing*is*worth*doing*well'
>>> seq = ['Whatever', 'is', 'worth', 'doing', 'is', 'worth', 'doing', 'well']
>>> " ".join(seq)
'Whatever is worth doing is worth doing well'
>>> seq = ['"name":"Mike"', '"age":18', '"sex":"male"', '"hair":"black"']
>>> "#".join(seq)
'"name":"Mike"#"age":18#"sex":"male"#"hair":"black"'

七、判斷是否為數位的方法

isdigit()方法用於判斷一個字串是否全部都由數位組成，返回值為布林值。如果字串中存在小數點或者符號，也不能認為全都是數位，如下例所示：

>>> num = "13579"
>>> num.isdigit()
True
>>> num = '1.0'
>>> num.isdigit()
False
>>> num = '-1'
>>> num.isdigit()
False

八、判斷是否為空格的方法

isspace()方法用於判斷一個字串是否全部都由空格組成，返回值為布林值。要注意的是，空字串返回False。如下例所示：

>>> t = ''
>>> t.isspace()
False
>>> t = '  '
>>> t.isspace()
True

九、判斷字首和字尾的方法

startswith()和endswith()分別用於判斷字串的字首和字尾，即它的開始部分和結尾部分，返回值為布林值，後面有兩個可選引數，相當於對字串做一個切片後再判斷字首/字尾。如下例所示：

>>> temp_str = "Whatever is worth doing is worth doing well"
>>> temp_str.startswith("W")
True
>>> temp_str.startswith("What")
True
>>> temp_str.startswith('Whatever',2)
False
>>> temp_str.endswith("well",2)
True
>>> temp_str.endswith("we",2,-2)
True

補充：更多Python字串常用方法

a = "hello world"
# 字串不能通過索引進行修改  name[0] = 'q'
 
# 切片，查詢字串當中的一段值，[起始值:終止值:步長]不寫步長預設是1
print(a[0:5:])
print(a[::-1])  # 步長負數倒過來走，不寫起始值和終止值就走完全部
print(a[::1])
print(len(a))  # len方法獲取字串的長度
 
# in 和 not in :判斷一個字串是否在一個大的字串中
# 返回值為布林型別
print('hello' in 'hello world')
print('nihao' not in 'hello world')
 
# 字串的增
print('nihao', 'Python')
print('nihao' + 'Python')
 
# format    前面的大括號寫上數位代表著取後面括號裡的索引位置
print('==============format================')
print('my name is {}'.format(100))
print('my name is {1},my age is {0}'.format('dayv', 18))
print('my name is {0},my age is {1}'.format('dayv', 18))
 
# join  把列表裡的元素組成字串
str1 = '真正的勇士'
str2 = '敢於直面慘淡的人生'
str3 = '敢於正視淋漓的鮮血'
print(''.join([str1, str2, str3]))
# 前面的逗號表示用什麼來隔開，列表中只能是字串才能使用join方法
print('，'.join([str1, str2, str3]))
 
# 刪   del
name1 = 'nihao'
del name1  # 這就把這個變數刪除了，在輸出這個變數就會出錯
 
# 改
# 字串變大小寫 upper , lower ,
name1 = 'abc'
print('大寫：' + name1.upper())
print(name1.lower())
 
# capitalize  將第一個字母轉換成大寫
print(name1.capitalize())
 
# 將每個單詞的首字母大寫  title
name2 = 'hello world'
print('每個單詞首字母大寫：' + name2.title())
print('原name2的值' + name2)
 
# 將字串切分成列表  預設空格為字元切分  split
name1 = 'a b    cd e'
print(name1.split())
# 括號裡寫什麼就用什麼切分                !!!!!!!!!!!!!!!!!!!!
name1 = 'a1b1cd1e'
print("自己設定用什麼東西切分", name1.split('1'))  # 返回的是列表
# rsplit('指定用什麼切片', 切幾次),反過來切
print('切片倒過來切使用rsplit', name1.rsplit('1', 1))  # 倒過來切一個元素
# 替換replace(被替換的字元,替換的字元，個數)     !!!!!!!!!!!!!!
print(name1.replace('1', '0'))
print(name1.replace('1', '0', 1))  # 個數是從左往右的順序替換
aaaaa = ' sdf   kkf  k k   '
print('使用替換去除字串中的全部空格', aaaaa.replace(" ", ''))
 
# strip  除去字串兩邊的空格，中間的不會管
name1 = '        ni h ao     '
print(name1.strip())
 
# 查
# find  index
# 查詢字串在大字串的那個索引位置（起始索引）
name1 = 'PythonPythonPython'
print("使用find查詢的索引位置", name1.find('on'))
# 找不到會返回-1
print("使用index查詢的索引位置：", name1.index('on'))
# index方法找不到會報錯
 
# count  統計一個字串在大字串裡面出現的次數
print(name1.count('qi'))
 
# 判斷一個字串裡的資料是不是都是數位  isdigit   返回布林值
num = '156465'
print(num.isdigit())
# 判斷一個字串裡的資料是不是都是字母   isalpha
num = 'ksdjflks'
print(num.isalpha())
 
# 比較後面一個元素是否是前面一個元素的開頭，startswith
# 比較後面一個元素是否是前面一個元素的結尾  endswith
mm = 'Python nihao'
print(mm.startswith('Pyth'))
print(mm.endswith('Pytho'))
 
# 判斷字串是否全是大寫isupper   是否全是小寫islower
 
# 跳脫字元 n換行   t
print('hello nworld')
print('ztiyu')
print('Pyth t on')
print('Python123')
# 反跳脫
print(r'zhai t dada')  # 加r
print('zhai \t dada')  # 或者寫兩個斜槓
 
# 控制字串的輸入字數
print('123456'[:5])  # 只會輸入前五個數