Python如何讀取16進位制byte資料

2022-05-20 13:12:10

如何讀取16進位制byte資料

小弟最近在做網路程式設計的時候，遇到了一些byte資料需要儲存，但是不是常見的str字元對應的byte，類似於b'x00xffxfex01'這樣的資料，查詢資料後發現這種東西是16進位制編碼的byte格式，可以直接轉成str沒有問題，但是再轉回bytes就會出現莫名其妙的雙斜槓，很是頭疼。

a = b'x00xefxa2xa0xb3x8bx9dx1exf8x98x19x39xd9x9dxfdABCDabcd'
b = str(a)
 
print(b)
>>> b'x00xefxa2xa0xb3x8bx9dx1exf8x98x199xd9x9dxfdABCDabcd'
 
print(bytes(b,'utf8'))
>>> b"b'\x00\xef\xa2\xa0\xb3\x8b\x9d\x1e\xf8\x98\x199\xd9\x9d\xfdABCDabcd'"

嘗試寫入檔案，再讀取也是如此，因為寫進去的形式就是str字元

# 寫入data.txt
a = b'x00xefxa2xa0xb3x8bx9dx1exf8x98x19x39xd9x9dxfdABCDabcd'
with open('data.txt','w') as p:
    p.write(str(a))
 
# 讀取data.txt
with open('data.txt','r') as p:
    line = p.readline()
 
print(line, type(line) == str)
>>> b'x00xefxa2xa0xb3x8bx9dx1exf8x98x199xd9x9dxfdABCDabcd\' True
 
print(bytes(line,'utf8'))
>>> b"b'\x00\xef\xa2\xa0\xb3\x8b\x9d\x1e\xf8\x98\x199\xd9\x9d\xfdABCDabcd\\'"

觀察了一下ASCII碼，發現主要還是因為x字元被理解成了一個斜槓加x的形式，然後被儲存為str形式，相當於變成了兩個位元組。

這樣解碼的時候分開解了，但是xnn這種形式是應該看作ASCII碼的，於是我寫了個跳脫的邏輯進行讀取：

def readbytetxt(filename):
    dic = {
    '0': 0,    '1': 1,    '2': 2,
    '3': 3,    '4': 4,    '5': 5,
    '6': 6,    '7': 7,    '8': 8,
    '9': 9,    'a': 10,   'b': 11,
    'c': 12,   'd': 13,   'e': 14,
    'f': 15,
    }
    with open(filename,'r') as p:
        line = p.readline()
        while line:
            if line[-1] == 'n':
                line = line[:-1]
            i = 2
            L = b''
            while i+1 < len(line):
                if line[i:i+2] == '\x' and (line[i+2] in dic.keys()) and (line[i+3] in dic.keys()):
                    L += bytes([dic[line[i+2]]*16+dic[line[i+3]]])
                    i += 4
                else:
                    L += bytes(line[i],'utf8')
                    i += 1
            return L
            line = p.readline()
 
print(readbytetxt('data.txt'))
>>> b'x00xefxa2xa0xb3x8bx9dx1exf8x98x19x39xd9x9dxfdABCDabcd'

問題解決了！基本就是寫了個遍歷，然後遇到x就把16進位制轉成十進位制的int，然後解碼成bytes，這樣常見的十六進位制格式基本都能呼叫了。

後來發現除了x還有其他的跳脫字元，比如\，n，如果不新增轉變邏輯的話，依然會出現不識別的問題，於是重寫了一下函數，支援了常見的大部分跳脫字元，並且寫成了生成器輸出。

def readbytetxt2(filename):
    dic = {
    '0': 0,    '1': 1,    '2': 2,
    '3': 3,    '4': 4,    '5': 5,
    '6': 6,    '7': 7,    '8': 8,
    '9': 9,    'a': 10,   'b': 11,
    'c': 12,   'd': 13,   'e': 14,
    'f': 15,
    }
    dic2 = {
    'a': 'a',     'b': 'b', 
    'f': 'f',     'n': 'n', 
    'r': 'r',     'v': 'v', 
    ''': ''',    '"': '', 
    '\': '\', 
    }
    with open(filename,'r') as p:
        line = p.readline()
        while line:
            if line[-1] == 'n':
                line = line[:-1]
            i = 2
            L = b''
            while i+1 < len(line):
                if line[i:i+2] == '\x' and (line[i+2] in dic.keys()) and (line[i+3] in dic.keys()):
                    L += bytes([dic[line[i+2]]*16+dic[line[i+3]]])
                    i += 4
                elif line[i] == '\' and line[i+1] in dic2.keys():
                    L += bytes(dic2[line[i+1]],'utf8')
                    i += 2
                elif line[i:i+4] == '\000':
                    L += bytes('00','utf8')
                    i += 2
                else:
                    L += bytes(line[i],'utf8')
                    i += 1
            yield L
            line = p.readline()
 
a = b'x00xefxa2xa0xb3x8bx9dx1exf8x98x19x39xd9x9dxfdthe first linenrabt\f'"vbn00'
b = b'xa0xdfxa2xa0xb3x8bx9dx1exf8x98x19x39xd9x9dxfdthe second linenn'
c = b'xe0xafxa2xa0xb3x8bx9dx1exf8x98x19x39xd9x9dxfdthe third line\'
with open('data.txt','w') as p:
    p.write(str(a)+'n')
    p.write(str(b)+'n')
    p.write(str(c))
 
line = readbytetxt2('data.txt')
 
print([a for a in line])
>>> [b'x00xefxa2xa0xb3x8bx9dx1exf8x98x199xd9x9dxfdthe first linenrx07x08\t\x0c'"x0bx08nx00', b'xa0xdfxa2xa0xb3x8bx9dx1exf8x98x199xd9x9dxfdthe second linenn', b'xe0xafxa2xa0xb3x8bx9dx1exf8x98x199xd9x9dxfdthe third line\']

基本上至此為止，大部分編碼形式都可以搞定了。

但是。。。其實還有一個更簡單的方式！因為其實萬惡之源就是str字元格式裡面有很多跳脫的地方不清不楚的，我想要的是byte存進檔案，再以byte讀出來，而byte格式本來就是16進位制的數位，說到底其實只要能存數位就可以了！所以寫了個更簡單的方法，直接轉成數位存數位列表就好！

L = []
a = b'x00xefxa2xa0xb3x8bx9dx1exf8x98x19x39xd9x9dxfdthe first linenrabt\f'"vbn00'
print(a)
for each in a:
    L.append(int(each))
with open('data.txt','w') as p:
    p.write(str(L))
print(L)
>>> [0, 239, 162, 160, 179, 139, 157, 30, 248, 152, 25, 57, 217, 157, 253, 116, 104, 101, 32, 102, 105, 114, 115, 116, 32, 108, 105, 110, 101, 10, 13, 7, 8, 9, 92, 12, 39, 34, 11, 8, 10, 0]
 
 
with open('data.txt','r') as p:
    line = p.readline()
print(b''.join([bytes([int(i)]) for i in line[1:-1].split(',')]))
>>> b'x00xefxa2xa0xb3x8bx9dx1exf8x98x199xd9x9dxfdthe first linenrx07x08t\x0c'"x0bx08nx00'

存進去的是數位列表，然後用split的方式讀出來就可以了，這樣也不會有各種跳脫搞不清的地方，數位是什麼就讀什麼byte出來就可以了。

Python的十六進位制數

轉換關係

十進位制整數轉十六進位制整數用hex()；十六進位制整數轉十進位制整數用int()

類似地，十進位制整數轉二進位制整數用bin()；十進位制整數轉八進位制整數用oct()

hex() 函數

描述：hex() 函數用於將10進位制整數轉換成16進位制，以字串形式表示。

語法：

hex(x)

引數說明：x – 10進位制整數

返回值：返回16進位制數，以字串形式表示。

int() 函數

描述：int() 函數用於將一個字串或數位轉換為整型。

語法：

class int(x, base=10)

引數說明：x – 字串或數位。base – 進位制數，預設十進位制。

返回值：返回整型資料。

運算

對於十六進位制整數，在進行運算前先轉換成十進位制整數，再對其進行運算，之後將運算結果轉換回十六進位制數。

以上為個人經驗，希望能給大家一個參考，也希望大家多多支援it145.com。