OpenCV Python身份證資訊識別過程詳解

2022-04-11 10:01:37

本篇文章使用OpenCV-Python和CnOcr來實現身份證資訊識別的案例。想要識別身份證中的文字資訊，總共分為三大步驟：一、通過預處理身份證區域檢測查詢；二、身份證文字資訊提取；三、身份證文字資訊識別。下面來看一下識別的具體過程CnOcr官網。識別過程視訊

前置環境

這裡的環境需要安裝OpenCV-Python，Numpy和CnOcr。本篇文章使用的Python版本為3.6，OpenCV-Python版本為3.4.1.15，如果是4.x版本的同學，可能會有一些Api操作不同。這些依賴的安裝和介紹，我就不在這裡贅述了，均是使用Pip進行安裝。

識別過程

首先，匯入所需要的依賴cv2，numpy，cnocr並建立一個show影象的函數，方便後面使用：

import cv2
import numpy as np
from cnocr import CnOcr
def show(image, window_name):
    cv2.namedWindow(window_name, 0)
    cv2.imshow(window_name, image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
# 載入CnOcr的模型
ocr = CnOcr(model_name='densenet_lite_136-gru')

身份證區域查詢

通過對載入影象的灰度處理–>濾波處理–>二值處理–>邊緣檢測–>膨脹處理–>輪廓查詢–>透視變換（校正）–>影象旋轉–>固定影象大小一系列處理之後，我們便可以清晰的裁剪出身份證的具體區域。

原始影象

使用OpenCV的imread方法讀取本地圖片。

image = cv2.imread('card.png')
show(image, "image")

灰度處理

將三通道BGR影象轉化為灰度影象，因為一下OpenCV操作都是需要基於灰度影象進行的。

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
show(gray, "gray")

中值濾波

使用濾波處理，也就是模糊處理，這樣可以減少一些不需要的噪點。

blur = cv2.medianBlur(gray, 7)
show(blur, "blur")

二值處理

二值處理，非黑即白。這裡通過cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU，使用OpenCV的大津法二值化，對影象進行處理，經過處理後的影象，更加清晰的分辨出了背景和身份證的區域。

threshold = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
show(threshold, "threshold")

邊緣檢測

使用OpenCV中最常用的邊緣檢測方法，Canny，檢測出影象中的邊緣。

canny = cv2.Canny(threshold, 100, 150)
show(canny, "canny")

邊緣膨脹

為了使上一步邊緣檢測的邊緣更加連貫，使用膨脹處理，對白色的邊緣膨脹，即邊緣線條變得更加粗一些。

kernel = np.ones((3, 3), np.uint8)
dilate = cv2.dilate(canny, kernel, iterations=5)
show(dilate, "dilate")

輪廓檢測

使用findContours對邊緣膨脹過的圖片進行輪廓檢測，可以清晰的看到背景部分還是有很多噪點的，所需要識別的身份證部分也被輪廓圈了起來。

binary, contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
image_copy = image.copy()
res = cv2.drawContours(image_copy, contours, -1, (255, 0, 0), 20)
show(res, "res")

輪廓排序

經過對輪廓的面積排序，我們可以準確的提取出身份證的輪廓。

contours = sorted(contours, key=cv2.contourArea, reverse=True)[0]
image_copy = image.copy()
res = cv2.drawContours(image_copy, contours, -1, (255, 0, 0), 20)
show(res, "contours")

透視變換

通過對輪廓近似提取出輪廓的四個頂點，並按順序進行排序，之後通過warpPerspective對所選影象區域進行透視變換，也就是對所選的影象進行校正處理。

epsilon = 0.02 * cv2.arcLength(contours, True)
approx = cv2.approxPolyDP(contours, epsilon, True)
n = []
for x, y in zip(approx[:, 0, 0], approx[:, 0, 1]):
    n.append((x, y))
n = sorted(n)
sort_point = []
n_point1 = n[:2]
n_point1.sort(key=lambda x: x[1])
sort_point.extend(n_point1)
n_point2 = n[2:4]
n_point2.sort(key=lambda x: x[1])
n_point2.reverse()
sort_point.extend(n_point2)
p1 = np.array(sort_point, dtype=np.float32)
h = sort_point[1][1] - sort_point[0][1]
w = sort_point[2][0] - sort_point[1][0]
pts2 = np.array([[0, 0], [0, h], [w, h], [w, 0]], dtype=np.float32)

# 生成變換矩陣
M = cv2.getPerspectiveTransform(p1, pts2)
# 進行透視變換
dst = cv2.warpPerspective(image, M, (w, h))
# print(dst.shape)
show(dst, "dst")

固定影象大小

將影象變正，通過對影象的寬高進行判斷，如果寬<高，就將影象旋轉90°。並將影象resize到指定大小。方便之後對影象進行處理。

if w < h:
    dst = np.rot90(dst)
resize = cv2.resize(dst, (1084, 669), interpolation=cv2.INTER_AREA)
show(resize, "resize")

檢測身份證文字位置

經過灰度，二值濾波和開閉運算後，將影象中的文字區域主鍵顯現出來。

temp_image = resize.copy()
gray = cv2.cvtColor(resize, cv2.COLOR_BGR2GRAY)
show(gray, "gray")
threshold = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
show(threshold, "threshold")
blur = cv2.medianBlur(threshold, 5)
show(blur, "blur")
kernel = np.ones((3, 3), np.uint8)
morph_open = cv2.morphologyEx(blur, cv2.MORPH_OPEN, kernel)
show(morph_open, "morph_open")

極度膨脹

給定一個比較大的折積盒，進行膨脹處理，使白色的區域加深加大。更加顯現出文字的區域。

kernel = np.ones((7, 7), np.uint8)
dilate = cv2.dilate(morph_open, kernel, iterations=6)
show(dilate, "dilate")

輪廓查詢文字區域

使用輪廓查詢，將白色塊狀區域查詢出來。

binary, contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
resize_copy = resize.copy()
res = cv2.drawContours(resize_copy, contours, -1, (255, 0, 0), 2)
show(res, "res")

篩選出文字區域

經過上一步輪廓檢測，我們發現，選中的輪廓中有一些噪點，通過對影象的觀察，使用近似輪廓，然後用以下邏輯篩選出文字區域。並定義文字描述資訊，將文字區域位置資訊加入到指定集合中。到這一步，可以清晰的看到，所需要的文字區域統統都被提取了出來。

labels = ['姓名', '性別', '民族', '出生年', '出生月', '出生日', '住址', '公民身份證號碼']
positions = []
data_areas = {}
resize_copy = resize.copy()
for contour in contours:
    epsilon = 0.002 * cv2.arcLength(contour, True)
    approx = cv2.approxPolyDP(contour, epsilon, True)
    x, y, w, h = cv2.boundingRect(approx)
    if h > 50 and x < 670:
        res = cv2.rectangle(resize_copy, (x, y), (x + w, y + h), (0, 255, 0), 2)
        area = gray[y:(y + h), x:(x + w)]
        blur = cv2.medianBlur(area, 3)
        data_area = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
        positions.append((x, y))
        data_areas['{}-{}'.format(x, y)] = data_area

show(res, "res")

對文字區域進行排序

發現文字的區域是由下到上的順序，並且x軸從左到右的的區域是無序的，所以使用以下邏輯，對文字區域進行排序

positions.sort(key=lambda p: p[1])
result = []
index = 0
while index < len(positions) - 1:
    if positions[index + 1][1] - positions[index][1] < 10:
        temp_list = [positions[index + 1], positions[index]]
        for i in range(index + 1, len(positions)):
            if positions[i + 1][1] - positions[i][1] < 10:
                temp_list.append(positions[i + 1])
            else:
                break
        temp_list.sort(key=lambda p: p[0])
        positions[index:(index + len(temp_list))] = temp_list
        index = index + len(temp_list) - 1
    else:
        index += 1

識別文字

對文字區域使用CnOcr一一進行識別，最後將識別結果進行輸出。

positions.sort(key=lambda p: p[1])
result = []
index = 0
while index < len(positions) - 1:
    if positions[index + 1][1] - positions[index][1] < 10:
        temp_list = [positions[index + 1], positions[index]]
        for i in range(index + 1, len(positions)):
            if positions[i + 1][1] - positions[i][1] < 10:
                temp_list.append(positions[i + 1])
            else:
                break
        temp_list.sort(key=lambda p: p[0])
        positions[index:(index + len(temp_list))] = temp_list
        index = index + len(temp_list) - 1
    else:
        index += 1

結語

通過以上的步驟，便成功的將身份證資訊進行了提取，過程中的一些數位引數，可能會在不同的場景中有些許的調整。
以下放上所有的程式碼：

程式碼

import cv2
import numpy as np
from cnocr import CnOcr

def show(image, window_name):
    cv2.namedWindow(window_name, 0)
    cv2.imshow(window_name, image)
    # 0任意鍵終止視窗
    cv2.waitKey(0)
    cv2.destroyAllWindows()


ocr = CnOcr(model_name='densenet_lite_136-gru')

image = cv2.imread('card.png')
show(image, "image")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
show(gray, "gray")
blur = cv2.medianBlur(gray, 7)
show(blur, "blur")
threshold = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
show(threshold, "threshold")
canny = cv2.Canny(threshold, 100, 150)
show(canny, "canny")
kernel = np.ones((3, 3), np.uint8)
dilate = cv2.dilate(canny, kernel, iterations=5)
show(dilate, "dilate")
binary, contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
image_copy = image.copy()
res = cv2.drawContours(image_copy, contours, -1, (255, 0, 0), 20)
show(res, "res")
contours = sorted(contours, key=cv2.contourArea, reverse=True)[0]
image_copy = image.copy()
res = cv2.drawContours(image_copy, contours, -1, (255, 0, 0), 20)
show(res, "contours")
epsilon = 0.02 * cv2.arcLength(contours, True)
approx = cv2.approxPolyDP(contours, epsilon, True)
n = []
for x, y in zip(approx[:, 0, 0], approx[:, 0, 1]):
    n.append((x, y))
n = sorted(n)
sort_point = []
n_point1 = n[:2]
n_point1.sort(key=lambda x: x[1])
sort_point.extend(n_point1)
n_point2 = n[2:4]
n_point2.sort(key=lambda x: x[1])
n_point2.reverse()
sort_point.extend(n_point2)
p1 = np.array(sort_point, dtype=np.float32)
h = sort_point[1][1] - sort_point[0][1]
w = sort_point[2][0] - sort_point[1][0]
pts2 = np.array([[0, 0], [0, h], [w, h], [w, 0]], dtype=np.float32)

M = cv2.getPerspectiveTransform(p1, pts2)
dst = cv2.warpPerspective(image, M, (w, h))
# print(dst.shape)
show(dst, "dst")
if w < h:
    dst = np.rot90(dst)
resize = cv2.resize(dst, (1084, 669), interpolation=cv2.INTER_AREA)
show(resize, "resize")
temp_image = resize.copy()
gray = cv2.cvtColor(resize, cv2.COLOR_BGR2GRAY)
show(gray, "gray")
threshold = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
show(threshold, "threshold")
blur = cv2.medianBlur(threshold, 5)
show(blur, "blur")
kernel = np.ones((3, 3), np.uint8)
morph_open = cv2.morphologyEx(blur, cv2.MORPH_OPEN, kernel)
show(morph_open, "morph_open")
kernel = np.ones((7, 7), np.uint8)
dilate = cv2.dilate(morph_open, kernel, iterations=6)
show(dilate, "dilate")
binary, contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
resize_copy = resize.copy()
res = cv2.drawContours(resize_copy, contours, -1, (255, 0, 0), 2)
show(res, "res")
labels = ['姓名', '性別', '民族', '出生年', '出生月', '出生日', '住址', '公民身份證號碼']
positions = []
data_areas = {}
resize_copy = resize.copy()
for contour in contours:
    epsilon = 0.002 * cv2.arcLength(contour, True)
    approx = cv2.approxPolyDP(contour, epsilon, True)
    x, y, w, h = cv2.boundingRect(approx)
    if h > 50 and x < 670:
        res = cv2.rectangle(resize_copy, (x, y), (x + w, y + h), (0, 255, 0), 2)
        area = gray[y:(y + h), x:(x + w)]
        blur = cv2.medianBlur(area, 3)
        data_area = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
        positions.append((x, y))
        data_areas['{}-{}'.format(x, y)] = data_area

show(res, "res")

positions.sort(key=lambda p: p[1])
result = []
index = 0
while index < len(positions) - 1:
    if positions[index + 1][1] - positions[index][1] < 10:
        temp_list = [positions[index + 1], positions[index]]
        for i in range(index + 1, len(positions)):
            if positions[i + 1][1] - positions[i][1] < 10:
                temp_list.append(positions[i + 1])
            else:
                break
        temp_list.sort(key=lambda p: p[0])
        positions[index:(index + len(temp_list))] = temp_list
        index = index + len(temp_list) - 1
    else:
        index += 1
for index in range(len(positions)):
    position = positions[index]
    data_area = data_areas['{}-{}'.format(position[0], position[1])]
    ocr_data = ocr.ocr(data_area)
    ocr_result = ''.join([''.join(result[0]) for result in ocr_data]).replace(' ', '')
    # print('{}：{}'.format(labels[index], ocr_result))
    result.append('{}：{}'.format(labels[index], ocr_result))
    show(data_area, "data_area")

for item in result:
    print(item)
show(res, "res")

到此這篇關於OpenCV Python身份證資訊識別的文章就介紹到這了,更多相關Python身份證資訊識別內容請搜尋it145.com以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援it145.com！