Linux高階文字處理之gawk關聯陣列

2020-06-16 17:22:14

Awk 的陣列，都是關聯陣列，即一個陣列包含多個”索引/值”的元素。索引沒必要是一系列連續的數位，實際上，它可以使字串或者數位，並且不需要指定陣列長度。

語法：

arrayname[string]=value

arrayname 是陣列名稱
string 是陣列索引
value 是為陣列元素賦的值

存取 awk 陣列的元素

如果要存取陣列中的某個特定元素，使用 arrayname[index] 即可返回該索引中的值。

範例1：

[root@localhost ~]# awk '
>BEGIN{ item[101]="HD Camcorder";
>item["102"]="Refrigerator";
>item[103]="MP3 Player";
>item["na"]="Young"
>print item[101];  
>print item["102"];    #注意帶引號不帶引號awk都視為字串來處理
>print item[103];
>print item["na"];}'   #字串索引需要加雙引號
HD Camcorder
Refrigerator
MP3 Player
Young

注意：

陣列索引沒有順序，甚至沒有從 0 或 1 開始.
陣列索引可以是字串，陣列的最後一個元素就是字串索引，即”na”
Awk 中在使用陣列前，不需要初始化甚至定義陣列，也不需要指定陣列的長度。
Awk 陣列的命名規範和 awk 變數命名規範相同。

以 awk 的角度來說，陣列的索引通常是字串，即是你使用陣列作為索引， awk 也會當做字串來處理。下面的寫法是等價的：

Item[101]="HD Camcorder"
Item["101"]="HD Camcorder"

一、參照陣列元素

如果試圖存取一個不存在的陣列元素， awk 會自動以存取時指定的索引建立該元素，並賦予 null 值。為了避免這種情況，在使用前最後檢測元素是否存在。

使用 if 語句可以檢測元素是否存在，如果返回 true，說明改元素存在於陣列中。

if ( index in array-name )

範例2：一個簡單的參照陣列元素的例子

[root@localhost ~]# cat arr.awk 
BEGIN {
    x = item[55];  #在參照前沒有賦任何值，所以在參照是 awk 自動建立該元素並賦 null 值
    if ( 55 in item )
        print "Array index 55 contains",item[55];
    item[101]="HD Camcorder";
    if ( 101 in item )
        print "Array index 101 contains",item[101];
    if ( 1010 in item )  #不存在，因此檢查索引值時，返回 false，不會被列印
        print "Array index 1010 contains",item[1010];
}
[root@localhost ~]# awk -f arr.awk 
Array index 55 contains 
Array index 101 contains HD Camcorder

二、使用迴圈遍歷 awk 陣列

如果要存取陣列中的所有元素，可以使用 for 的一個特殊用法來遍歷陣列的所有索引：

語法：

for ( var in arrayname )
actions

說明：

var 是變數名稱
in 是關鍵字
arrayname 是陣列名
actions 是一系列要執行的 awk 語句，如果有多條語句，必須包含在{ }中。通過把索引值賦給變數 var，迴圈體可以把所有語句應用到陣列中所有的元素上。

範例1：將陣列中元素全部列印出來

[root@localhost ~]# cat arr-for.awk 
BEGIN {
    item[101]="HD Camcorder";
    item[102]="Refrigerator";
    item[103]="MP3 Player";
    item[104]="Tennis Racket";
    item[105]="Laser Printer";
    item[1001]="Tennis Ball";
    item[55]="Laptop";
    item["no"]="Not Available";

    for(x in item)  #x 是變數名，用來存放陣列索引，無需制定條件，awk自行判斷
        print item[x];
}
[root@localhost ~]# awk -f arr-for.awk 
Not Available
Laptop
HD Camcorder
Refrigerator
MP3 Player
Tennis Racket
Laser Printer
Tennis Ball

三、刪除陣列元素

如果要刪除特定的陣列元素，使用 delete 語句。一旦刪除了某個元素，就再也獲取不到它的值了。

語法：

delete arrayname[index];

刪除陣列內所有元素：

for (var in array)
delete array[var]

在 GAWK 中，可以使用單個 delete 命令來刪除陣列的所有元素:

Delete array

範例1：

[root@localhost ~]# awk '
>BEGIN{item[101]="HD Camcorder";
>item[102]="Refrigerator";
>item[103]="MP3 Player";
>delete item[101];
>print item[101];print item[102];
>for(x in item) delete item[x]; #使用for迴圈刪除全部陣列
>print item[102];print item[103];}'

Refrigerator


[root@localhost ~]#

範例2???

[root@localhost ~]# awk '
>BEGIN{item[1]="a"; 
>item[2]="b";item[3]="c";
>delete item;   #使用delete直接加陣列名稱刪除全部陣列
>for(x in item) print item[x];}'

四、多維陣列

雖然 awk 只支援一維陣列，但可以使用一維陣列來模擬多維陣列。

範例1：

[root@localhost ~]# cat array-multi.awk
BEGIN {
item["1,1"]=10;
item["1,2"]=20;
item["2,1"]=30;
item["2,2"]=40
for (x in item)
print item[x]
}
[root@localhost ~]# awk -f array-multi.awk
30
20
40
10

說明：即使使用了”1,1”作為索引值，它也不是兩個索引，仍然是單個字串索引，值為”1,1”。所以item[“1,1”]=10，實際上是把 10 賦給一維陣列中索引”1,1”代表的值。

範例2：將雙引號去掉

[root@localhost ~]# cat array-multi2.awk
BEGIN {
item[1,1]=10;
item[1,2]=20;
item[2,1]=30;
item[2,2]=40
for (x in item)
print item[x]
}
[root@localhost ~]# awk -f array-multi2.awk
10
30
20
40

說明：上面的例子仍然可以執行，但是結果有所不同。在多維陣列中，如果沒有把下標用引號引住， awk 會使用”34”作為下標分隔符。

當指定元素 item[1,2]時，它會被轉換為 item[“1342”]。 Awk 用把兩個下標用”34”連線起來並轉換為字串。

範例3：

[root@localhost ~]# cat 034.awk 
BEGIN {
    item["1,1"]=10;
    item["1,2"]=20;
    item[2,1]=30;
    item[2,2]=40;
    for(x in item)
        print "Index",x,"contains",item[x];
}
[root@localhost ~]# awk -f 034.awk 
Index 1,2 contains 20
Index 21 contains 30
Index 22 contains 40
Index 1,1 contains 10

說明：

索引”1,1”和”1,2”放在了引號中，所以被當做一維陣列索引， awk 沒有使用下標分隔符，因此，索引值被原封不動地輸出。

所以 2,1 和 2,2 沒有放在引號中，所以被當做多維陣列索引， awk 使用下標分隔符來處理，因此索引變成”2341”和”2342”,於是在兩個下標直接輸出了非列印字元 “34”

五、SUBSEP 下標分隔符

通過變數 SUBSEP 可以把預設的下標分隔符改成任意字元。

範例1：

[root@localhost ~]# cat subsep.awk 
BEGIN {
    SUBSEP=":";
    item["1,1"]=10;
    item["1,2"]=20;
    item[2,1]=30;
    item[2,2]=40;
    for(x in item)
        print "Index",x,"contains",item[x];
}
[root@localhost ~]# awk -f subsep.awk 
Index 1,2 contains 20
Index 2:1 contains 30
Index 2:2 contains 40
Index 1,1 contains 10

說明：索引”1,1”和”1,2”由於放在了引號中而沒有使用 SUBSEP 變數。

注意：使用多維陣列時，最好不要給索引值加引號，直接使用SUBSEP變數制定索引分隔符。

六、用 asort 為陣列排序

asort 函數重新為元素值排序，並且把索引重置為從 1 到 n 的值，此處 n 代表陣列元素個數。

範例1：

[root@localhost ~]# cat asort.awk 
BEGIN {
    item[101]="HD Camcorder";
    item[102]="Refrigerator";item[103]="MP3 Player";
    item[104]="Tennis Racket";
    item[105]="Laser Printer";
    item[1001]="Tennis Ball";
    item[55]="Laptop";
    item["na"]="Not Available";
    print "---------- Before asort -------------"
    for(x in item)
        print "Index",x,"contains",item[x]
    total = asort(item);
    print "---------- After asort -------------"
    for(x in item)
        print "Index",x,"contains",item[x]
    print "Return value from asort:",total;
}
[root@localhost ~]# awk -f asort.awk 
---------- Before asort -------------
Index 55 contains Laptop
Index 101 contains HD Camcorder
Index 102 contains Refrigerator
Index 103 contains MP3 Player
Index 104 contains Tennis Racket
Index 105 contains Laser Printer
Index na contains Not Available
Index 1001 contains Tennis Ball
---------- After asort -------------  #awk陣列索引是從1開始的不是0
Index 4 contains MP3 Player
Index 5 contains Not Available
Index 6 contains Refrigerator
Index 7 contains Tennis Ball
Index 8 contains Tennis Racket
Index 1 contains HD Camcorder
Index 2 contains Laptop
Index 3 contains Laser Printer
Return value from asort: 8

注意：一旦呼叫 asort 函數，陣列原始的索引值就不復存在了，索引並不是按照1-8排序而是隨機排序。

範例2：增加索引排序功能

[root@localhost ~] cat asort1.awk
BEGIN {
item[101]="HD Camcorder";
item[102]="Refrigerator";item[103]="MP3 Player";
item[104]="Tennis Racket";
item[105]="Laser Printer";
item[1001]="Tennis Ball";
item[55]="Laptop";
item["na"]="Not Available";
total = asort(item);
for(i=1;i<=total;i++)  #新增for迴圈控制索引輸出的順序
print "Index",i,"contains",item[i]
}
[root@localhost ~] awk -f asort1.awk
Index 1 contains HD Camcorder
Index 2 contains Laptop
Index 3 contains Laser Printer
Index 4 contains MP3 Player
Index 5 contains Not Available
Index 6 contains Refrigerator
Index 7 contains Tennis Ball
Index 8 contains Tennis Racket

七、用 asorti 為索引排序

和以元素值排序相似，也可以取出所有索引值，排序，然後把他們儲存在新陣列中。

說明：

asorti 函數為索引值(不是元素值)排序，並且把排序後的元素值當做元素值儲存。
如果使用 asorti(state)將會丟失原始元素值，即索引值變成了元素值。因此為了保險起見，通常給 asorti 傳遞兩個引數，即 asorti(state,statebbr).這樣一來，原始陣列state 就不會被覆蓋了。

範例1：

[root@localhost ~]# cat asorti.awk
BEGIN {
state["TX"]="Texas";
state["PA"]="Pennsylvania";
state["NV"]="Nevada";
state["CA"]="California";
state["AL"]="Alabama";
print "-------------- Function: asort -----------------"
total = asort(state,statedesc);
for(i=1;i<=total;i++)
print "Index",i,"contains",statedesc[i];
print "-------------- Function: asorti -----------------"
total = asorti(state,stateabbr);
for(i=1;i<=total;i++)   #索引按順序輸出也需要自行排序
print "Index",i,"contains",stateabbr[i];
}
[root@localhost ~]# awk -f asorti.awk
-------------- Function: asort -----------------
Index 1 contains Alabama
Index 2 contains California
Index 3 contains Nevada
Index 4 contains Pennsylvania
Index 5 contains Texas
-------------- Function: asorti -----------------
Index 1 contains AL
Index 2 contains CA
Index 3 contains NV
Index 4 contains PA
Index 5 contains TX

補充範例：利用陣列刪除重複行

[root@localhost ~]# cat alpha

a
a
a
b
c
b
d
d
e
e
f
f
f
f
g
[root@localhost ~]# awk '!a[$0]++' alpha 
a
b
c
d
e
f
g

註解：

為何上面的命令將重複的行去掉了呢？原因如下：首先，當讀入第一個字元a時，關聯陣列array的以a為索引的值為空，即array[a]=0，將此取反為1，邏輯上為真，則輸出第一行，然後自相加為2。其次，當讀入第二個值b時，同理可知為1，array也為1。當第二次讀入a時，因為array[a]的值已經為2，（邏輯）取反之後為0，邏輯上是假，則不會輸出，自相加最後為1。

注意：第一點，！的運算順序比++要更優先；第二點，++是在print之後才會執行。

本文永久更新連結地址：http://www.linuxidc.com/Linux/2017-02/140276.htm

Linux高階文字處理之gawk關聯陣列

一、參照陣列元素

二、使用迴圈遍歷 awk 陣列

三、刪除陣列元素

四、多維陣列

五、SUBSEP 下標分隔符

六、用 asort 為陣列排序

七、用 asorti 為索引排序

熱門文章