在win下,你也許發現了ctrl+h,或者 ctrl+r進行替換操作,在linux下面用sed將會更加的方便
下面我將舉幾個實用的文字處理的案例
1
首先準備一個文字example.txt####################################AcetaminophenDB00316TYLProstaglandin G/H synthase 2P35354AcetazolamideDB00819AZMCarbonic anhydrase 1P00915AcetazolamideDB00819AZMCarbonic anhydrase 2P00918AcetazolamideDB00819AZMCarbonic anhydrase 4P22748AcetazolamideDB00819AZMCarbonic anhydrase 3P07451AcetazolamideDB00819AZMCarbonic anhydrase 7P43166AcetazolamideDB00819AZMCarbonic anhydrase 14Q9ULX7Acetohydroxamic AcidDB00551HAEUrease alpha subunitP18314Acetylsalicylic acidDB00945AINProstaglandin G/H synthase 1P23219Acetylsalicylic acidDB00945AINProstaglandin G/H synthase 2P35354AcyclovirDB00787AC2DNA polymeraseP04293AcyclovirDB00787AC2DNA polymeraseP09252AcyclovirDB00787AC2Thymidine kinaseP03176AdenosineDB00640ADNAdenosine A1 receptorP30542AdenosineDB00640ADNAdenosine A2b receptorP29275AdenosineDB00640ADNAdenosine A3 receptorP33765AdenosineDB00640ADNAdenosine A2a receptorP29274######anhydrase 14Q9ULX7Acetohydroxamic AcidDB00551HAEUrease alpha subunitP18314Acetylsalicylic acidDB00945AINProstaglandin G/H synthase 1P23219Acetylsalicylic acidDB00945AINProstaglandin G/H synthase 2P35354AcyclovirDB00787AC2DNA polymeraseP04293AcyclovirDB00787AC2DNA polymeraseP09252AcyclovirDB00787AC2Thymidine kinaseP03176AdenosineDB00640ADNAdenosine A1 receptorP30542AdenosineDB00640ADNAdenosine A2b receptorP29275AdenosineDB00640ADNAdenosine A3 receptorP33765AdenosineDB00640ADNAdenosine A2a receptorP29274######2
第一個欄位是藥物名稱,中間可能有空格,欄位之間是tab鍵,我們想把第一列取出來
awk -F't' '{print $1}' example.txt
3
我們想看看一共有哪幾種藥物
awk -F't' '{print $1}' example.txt|sort|uniq
4
我們想看看每種藥物出現了幾次
awk -F't' '{print $1}' example.txt|sort|uniq
5
我們想把文字中小寫的acid換成大寫的acid
sed 's/acid/Acid/' example.txt