首頁 > 軟體

Opencv實現眼睛控制滑鼠的實踐

2022-02-17 16:01:31

如何用眼睛來控制滑鼠?一種基於單一前向視角的機器學習眼睛姿態估計方法。在此專案中,每次單擊滑鼠時,我們都會編寫程式碼來裁剪你們的眼睛影象。使用這些資料,我們可以反向訓練模型,從你們您的眼睛預測滑鼠的位置。在開始專案之前,我們需要引入第三方庫。

# For monitoring web camera and performing image minipulations
import cv2
# For performing array operations
import numpy as np
# For creating and removing directories
import os
import shutil
# For recognizing and performing actions on mouse presses
from pynput.mouse import Listener

首先讓我們瞭解一下Pynput的Listener工作原理。pynput.mouse.Listener建立一個後臺執行緒,該執行緒記錄滑鼠的移動和滑鼠的點選。這是一個簡化程式碼,當你們按下滑鼠時,它會列印滑鼠的座標:

from pynput.mouse import Listener
def on_click(x, y, button, pressed):
"""
  Args:
    x: the x-coordinate of the mouse
    y: the y-coordinate of the mouse
    button: 1 or 0, depending on right-click or left-click
    pressed: 1 or 0, whether the mouse was pressed or released
  """
if pressed:
print (x, y)
with Listener(on_click = on_click) as listener:
  listener.join()

現在,為了實現我們的目的,讓我們擴充套件這個框架。但是,我們首先需要編寫裁剪眼睛邊界框的程式碼。我們稍後將在on_click函數內部呼叫此函數。我們使用Haar級聯物件檢測來確定使用者眼睛的邊界框。你們可以在此處下載檢測器檔案,讓我們做一個簡單的演示來展示它是如何工作的:

import cv2
# Load the cascade classifier detection object
cascade = cv2.CascadeClassifier("haarcascade_eye.xml")
# Turn on the web camera
video_capture = cv2.VideoCapture(0)
# Read data from the web camera (get the frame)
_, frame = video_capture.read()
# Convert the image to grayscale
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Predict the bounding box of the eyes
boxes = cascade.detectMultiScale(gray, 1.3, 10)
# Filter out images taken from a bad angle with errors
# We want to make sure both eyes were detected, and nothing else
if len(boxes) == 2:
eyes = []
for box in boxes:
    # Get the rectangle parameters for the detected eye
x, y, w, h = box
    # Crop the bounding box from the frame
eye = frame[y:y + h, x:x + w]
    # Resize the crop to 32x32
eye = cv2.resize(eye, (32, 32))
    # Normalize
eye = (eye - eye.min()) / (eye.max() - eye.min())
    # Further crop to just around the eyeball
eye = eye[10:-10, 5:-5]
    # Scale between [0, 255] and convert to int datatype
eye = (eye * 255).astype(np.uint8)
    # Add the current eye to the list of 2 eyes
eyes.append(eye)
  # Concatenate the two eye images into one
eyes = np.hstack(eyes)

現在,讓我們使用此知識來編寫用於裁剪眼睛影象的函數。首先,我們需要一個輔助函數來進行標準化:

def normalize(x):
  minn, maxx = x.min(), x.max()
return (x - minn) / (maxx - minn)

這是我們的眼睛裁剪功能。如果發現眼睛,它將返回影象。否則,它返回None:

def scan(image_size=(32, 32)):
_, frame = video_capture.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
boxes = cascade.detectMultiScale(gray, 1.3, 10)
if len(boxes) == 2:
eyes = []
for box in boxes:
x, y, w, h = box
eye = frame[y:y + h, x:x + w]
eye = cv2.resize(eye, image_size)
eye = normalize(eye)
eye = eye[10:-10, 5:-5]
eyes.append(eye)
return (np.hstack(eyes) * 255).astype(np.uint8)
else:
return None

現在,讓我們來編寫我們的自動化,該自動化將在每次按下滑鼠按鈕時執行。(假設我們之前已經root在程式碼中將變數定義為我們要儲存影象的目錄):

def on_click(x, y, button, pressed):
# If the action was a mouse PRESS (not a RELEASE)
if pressed:
# Crop the eyes
    eyes = scan()
# If the function returned None, something went wrong
if not eyes is None:
# Save the image
      filename = root + "{} {} {}.jpeg".format(x, y, button)
      cv2.imwrite(filename, eyes)

現在,我們可以回憶起pynput的實現Listener,並進行完整的程式碼實現:

import cv2
import numpy as np
import os
import shutil
from pynput.mouse import Listener
 
 
root = input("Enter the directory to store the images: ")
if os.path.isdir(root):
  resp = ""
while not resp in ["Y", "N"]:
    resp = input("This directory already exists. If you continue, the contents of the existing directory will be deleted. If you would still like to proceed, enter [Y]. Otherwise, enter [N]: ")
if resp == "Y": 
    shutil.rmtree(root)
else:
    exit()
os.mkdir(root)
 
 
# Normalization helper function
def normalize(x):
  minn, maxx = x.min(), x.max()
return (x - minn) / (maxx - minn)
 
 
# Eye cropping function
def scan(image_size=(32, 32)):
  _, frame = video_capture.read()
  gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
  boxes = cascade.detectMultiScale(gray, 1.3, 10)
if len(boxes) == 2:
    eyes = []
for box in boxes:
      x, y, w, h = box
      eye = frame[y:y + h, x:x + w]
      eye = cv2.resize(eye, image_size)
      eye = normalize(eye)
      eye = eye[10:-10, 5:-5]
      eyes.append(eye)
return (np.hstack(eyes) * 255).astype(np.uint8)
else:
return None
 
 
def on_click(x, y, button, pressed):
# If the action was a mouse PRESS (not a RELEASE)
if pressed:
# Crop the eyes
    eyes = scan()
# If the function returned None, something went wrong
if not eyes is None:
# Save the image
      filename = root + "{} {} {}.jpeg".format(x, y, button)
      cv2.imwrite(filename, eyes)
 
 
cascade = cv2.CascadeClassifier("haarcascade_eye.xml")
video_capture = cv2.VideoCapture(0)
 
 
with Listener(on_click = on_click) as listener:
  listener.join()

執行此命令時,每次單擊滑鼠(如果兩隻眼睛都在視線中),它將自動裁剪網路攝像頭並將影象儲存到適當的目錄中。影象的檔名將包含滑鼠座標資訊,以及它是右擊還是左擊。

這是一個範例影象。在此影象中,我在解析度為2560x1440的監視器上在座標(385,686)上單擊滑鼠左鍵:

級聯分類器非常準確,到目前為止,我尚未在自己的資料目錄中看到任何錯誤。現在,讓我們編寫用於訓練神經網路的程式碼,以給定你們的眼睛影象來預測滑鼠的位置。

import numpy as np
import os
import cv2
import pyautogui
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import *

現在,讓我們新增級聯分類器:

cascade = cv2.CascadeClassifier("haarcascade_eye.xml")
video_capture = cv2.VideoCapture(0)

正常化:

def normalize(x):
  minn, maxx = x.min(), x.max()
return (x - minn) / (maxx - minn)

捕捉眼睛:

def scan(image_size=(32, 32)):
_, frame = video_capture.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
boxes = cascade.detectMultiScale(gray, 1.3, 10)
if len(boxes) == 2:
eyes = []
for box in boxes:
x, y, w, h = box
eye = frame[y:y + h, x:x + w]
eye = cv2.resize(eye, image_size)
eye = normalize(eye)
eye = eye[10:-10, 5:-5]
eyes.append(eye)
return (np.hstack(eyes) * 255).astype(np.uint8)
else:
return None

讓我們定義顯示器的尺寸。你們必須根據自己的計算機螢幕的解析度更改以下引數:

# Note that there are actually 2560x1440 pixels on my screen
# I am simply recording one less, so that when we divide by these
# numbers, we will normalize between 0 and 1. Note that mouse
# coordinates are reported starting at (0, 0), not (1, 1)
width, height = 2559, 1439

現在,讓我們載入資料(同樣,假設你們已經定義了root)。我們並不在乎是單擊滑鼠右鍵還是單擊滑鼠左鍵,因為我們的目標只是預測滑鼠的位置:

filepaths = os.listdir(root)
X, Y = [], []
for filepath in filepaths:
x, y, _ = filepath.split(' ')
x = float(x) / width
y = float(y) / height
X.append(cv2.imread(root + filepath))
Y.append([x, y])
X = np.array(X) / 255.0
Y = np.array(Y)
print (X.shape, Y.shape)

讓我們定義我們的模型架構:

model = Sequential()
model.add(Conv2D(32, 3, 2, activation = 'relu', input_shape = (12, 44, 3)))
model.add(Conv2D(64, 2, 2, activation = 'relu'))
model.add(Flatten())
model.add(Dense(32, activation = 'relu'))
model.add(Dense(2, activation = 'sigmoid'))
model.compile(optimizer = "adam", loss = "mean_squared_error")
model.summary()

這是我們的摘要:

接下來的任務是訓練模型。我們將在影象資料中新增一些噪點:

epochs = 200
for epoch in range(epochs):
  model.fit(X, Y, batch_size = 32)

現在讓我們使用我們的模型來實時移動滑鼠。請注意,這需要大量資料才能正常工作。但是,作為概念證明,你們會注意到,實際上只有200張影象,它確實將滑鼠移到了你們要檢視的常規區域。當然,除非你們擁有更多的資料,否則這是不可控的。

while True:
  eyes = scan()
if not eyes is None:
      eyes = np.expand_dims(eyes / 255.0, axis = 0)
      x, y = model.predict(eyes)[0]
      pyautogui.moveTo(x * width, y * height)

這是一個概念證明的例子。請注意,在進行此螢幕錄影之前,我們只訓練了很少的資料。這是我們的滑鼠根據眼睛自動移動到終端應用程式視窗的視訊。就像我說的那樣,這很容易,因為資料很少。有了更多的資料,它有望穩定到足以以更高的特異性進行控制。僅用幾百張影象,你們就只能將其移動到注視的整個區域內。另外,如果在整個資料收集過程中,你們在螢幕的特定區域(例如邊緣)都沒有拍攝任何影象,則該模型不太可能在該區域內進行預測。

到此這篇關於Opencv實現眼睛控制滑鼠的實踐的文章就介紹到這了,更多相關Opencv 眼睛控制滑鼠內容請搜尋it145.com以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援it145.com!


IT145.com E-mail:sddin#qq.com