詳解Python如何優雅地解析命令列

2022-06-14 18:01:23

自然而然地，我們自己寫的很多程式（或者乾脆就是指令碼），也希望能夠像原生命令和其他程式一樣，通過執行時輸入的引數就可以設定、改變程式的行為；而不必一層層找到相應的組態檔，然後還要定位到相應內容、修改、儲存、退出……

想想就很麻煩好嗎

1. 手動解析

所以讓我們開始解析命令列引數吧~

在以前關於模組的文章中我們提到過sys.args這個變數，其中儲存的就是呼叫當前指令碼時傳入的命令列引數。

我們先觀察一下這個變數：

# test_sys.py
import sys

print(sys.argv)

通過命令列呼叫：

$ python test_sys.py -d today -t now --author justdopython --country China --auto

得到如下輸出結果：

['test_sys.py', '-d', 'today', '-t', 'now', '--author', 'justdopython', '--country', 'China', '--auto']

可見，sys.argv其實就是將命令列引數按空格切分，得到的一個字串列表。此外，命令列引數的第一個就是當前執行的指令碼名稱。

我們如果想要提取出各個引數及其對應的值，首先得區分出命令列的長引數和短引數，它們分別由“--”和“-”開頭作為標識。所以我們也以此作為判斷長短引數的條件：

import sys


for command_arg in sys.argv[1:]:
    if command_arg.startswith('--'):
        print("%s 為長引數" % command_arg)
    elif command_arg.startswith('-'):
        print("%s 為短引數" % command_arg)

測試結果：

$ python manually_parse_argv.py -d today -t now --author justdopython --country China --auto

-d 為短引數
-t 為短引數
--author 為長引數
--country 為長引數
--auto 為長引數

緊接著，我們需要在解析出長短引數這一步的基礎上，再解析出對應的引數值：

# manually_parse_argv.py
import sys


# 由於sys.argv的第一個變數是當前指令碼名稱，因此略過
for index, command_arg in enumerate(sys.argv[1:]):
    if command_arg.startswith('--'):
        try:
            value = sys.argv[1:][index+1]
            if not value.startswith('-'):
                print("%s 為長引數，引數值為 %s" % (command_arg, value))
                continue
        except IndexError:
            pass
        
        print("%s 為長引數，無引數值" % command_arg)

    elif command_arg.startswith('-'):
        try:
            value = sys.argv[1:][index+1]
            if not value.startswith('-'):
                print("%s 為短引數，引數值為 %s" % (command_arg, value))
                continue
        except IndexError:
            pass
        
        print("%s 為短引數，無引數值" % command_arg)

再測試一下：

$ python manually_parse_argv.py -d today -t now --author justdopython --country China --auto

-d 為短引數，引數值為 today
-t 為短引數，引數值為 now
--author 為長引數，引數值為 justdopython
--country 為長引數，引數值為 China
--auto 為長引數，無引數值

看起來還不錯。

但是再看看我們的程式碼……真正的邏輯還沒開始，反倒是為了解析命令列引數已經寫了幾十行程式碼。這一點都不pythonic——這還不包括一些其他關於異常情況的處理。

更何況是要在每個類似的程式中加入這麼一段程式了。

2. getopt模組

Python的好處就在於，生態過於豐富，幾乎你要用到的每個功能，都已經有人為你寫好了現成的模組以供呼叫。

衣來伸手飯來張口的日子除了能在夢中想想，在用Python寫程式的時候也不是不可以奢望。

比如命令列引數解析，就有一個名為getopt的模組，既能夠準確區分長短命令列引數，也能夠恰當地提取命令列引數的值。

咱們先來看看：

# test_getopt.py
import sys
import getopt


opts, args = getopt.getopt(sys.argv[1:], 'd:t:', ["author=", "country=", "auto"])

print(opts)
print(args)

列印結果：

$ python test_getopt.py -d today -t now --author justdopython --country China --auto
[('-d', 'today'), ('-t', 'now'), ('--author', 'justdopython'), ('--country', 'China'), ('--auto', '')]
[]

下面我們來分別解釋一下相關引數的含義。

getopt模組中的getopt函數用於解析命令列引數。

該函數接受三個引數：args，shortopts和longopts，分別代表“命令列引數”，“要接收的短選項”和“要接收的長選項”。

其中args和longopts均為字串組成的列表，而shortopts則為一個字串。

同樣地，由於sys.argv的第一個值為當前指令碼名稱，所以多數情況下我們會選擇向args引數傳入sys.argv[1:]的值。

而shortopts這個引數接受的字串則表示需要解析哪些短選項，字串中每個字母均表示一個短選項：

import sys
import getopt


opts, args = getopt.getopt(sys.argv[1:], 'dt')

print(opts)
print(args)

輸出結果：

$ python test_getopt.py -d -t
[('-d', ''), ('-t', '')]
[]

當然，如果輸入的引數少於預期，也不會導致解析失敗：

$ python test_getopt.py -t
[('-t', '')]
[]

但要是給出了預期之外的引數，就會導致模組拋錯：

$ python test_getopt.py -d  -t -k
Traceback (most recent call last):
  File "test_getopt.py", line 11, in <module>
    opts, args = getopt.getopt(sys.argv[1:], 'dt')
      ...
    raise GetoptError(_('option -%s not recognized') % opt, opt)
getopt.GetoptError: option -k not recognized

這樣的處理邏輯也符合我們使用命令的體驗，可以簡單地理解為“寧缺毋濫”。

如果短引數相應的字母后帶了一個冒號:，則意味著這個引數需要指定一個引數值。getopt會將該引數對應的下一個命令列引數作為引數值（而不論下一個引數是什麼形式）：

import sys
import getopt


opts, args = getopt.getopt(sys.argv[1:], 'd:t')

print(opts)
print(args)

# $ python test_getopt.py -d  -t
# [('-d', '-t')]
# []

此外，一旦getopt在預期接收到長短選項的位置沒有找到以“--”或“-”開頭的字串，就會終止解析過程，剩下的未解析字串均放在返回元組的第二項中返回。

$ python test_getopt.py -d d_value o --pattern -t
[('-d', 'd_value')]
['o', '--pattern', '-t']

類似地，longopts參數列示需要解析的長引數。

列表中的每一個字串代表一個長引數：

import sys
import getopt


opts, args = getopt.getopt(sys.argv[1:], '', ["author", "country"])

print(opts)
print(args)

# $ python test_getopt.py --author  --country
# [('--author', ''), ('--country', '')]
# []

要解析帶有引數值的長引數，還應在每個長引數後附帶一個等於號（=），以標識該引數需要帶值：

import sys
import getopt


opts, args = getopt.getopt(sys.argv[1:], '', ["author=", "country"])

print(opts)
print(args)

# $ python test_getopt.py --author justdopython --country
# [('--author', 'justdopython'), ('--country', '')]
# []

所以最終就得到了我們一開始的解析結果：

import sys
import getopt


opts, args = getopt.getopt(sys.argv[1:], 'd:t:', ["author=", "country=", "auto"])

print(opts)
print(args)

# $ python test_getopt.py -d today -t now --author justdopython --country China --auto
# [('-d', 'today'), ('-t', 'now'), ('--author', 'justdopython'), ('--country', 'China'), ('--auto', '')]
# []

解析完成後，我們再從opts中提取相應的值即可。

懶人福音

getopt除了替我們節省了編寫命令列引數解析程式碼的時間和精力，另一方面還可以讓你在輸入命令列引數時少打幾個字母——當然，嚴謹來講，我們並不建議此類行為。慎用，慎用！

getopt對長引數的解析支援字首匹配，只要輸入的引數能夠與某個指定引數唯一匹配，同樣能夠完成預期解析。

$ python test_getopt.py -d today -t now --auth justdopython --coun China --auto
[('-d', 'today'), ('-t', 'now'), ('--author', 'justdopython'), ('--country', 'China'), ('--auto', '')]
[]

可以看到，author和country兩個引數我們都只輸入了一部分，但是getopt依然進行了正確的解析。

總結

本文講解了使用Python解析命令列引數的兩種方式，一種是略顯笨重的手動解析，即自己編寫程式自定義解析；另一種則是呼叫現成、且更加健壯的getopt模組來完成解析。

從此以後，我們終於可以擺脫繁瑣的組態檔，用一種優雅簡潔的方式來修改程式的行為了。

以上就是詳解Python如何優雅地解析命令列的詳細內容，更多關於Python解析命令列的資料請關注it145.com其它相關文章！

詳解Python如何優雅地解析命令列

目錄

1. 手動解析

2. getopt模組

總結

熱門文章