TensorFlow在影象識別中的應用

2020-06-16 17:38:07

本教學將會教你如何使用Inception-v3。你將學會如何用Python或者C++把影象分為1000個類別，也會討論如何從模型中提取高層次的特徵，在今後其它視覺任務中可能會用到。本文章重點談了TensorFlow在影象識別中的應用。

我們大腦的成像過程似乎很容易。人們毫不費力地就能區分出獅子和美洲虎，閱讀符號，或是識別面孔。但是這些任務對於計算機而言卻是一個大難題：它們之所以看上去簡單，是因為我們的大腦有著超乎想象的能力來理解影象。

在過去幾年裡，機器學習在解決這些難題方面取得了巨大的進步。其中，我們發現一種稱為深度折積神經網路的模型在困難的視覺識別任務中取得了理想的效果 —— 達到人類水平，在某些領域甚至超過。

研究員們通過把他們的成果在ImageNet進行測試，來展示計算機視覺領域的穩定發展進步，ImageNet是計算機視覺領域的一個標準參照集。一系列的模型不斷展現了效能的提升，每次都重新整理了業界的最好成績：QuocNet, AlexNet, Inception(GoogLeNet), BN-Inception-v2。谷歌的以及其它的研究員已經發表了論文解釋這些模型，但是那些結果仍然很難被重現。我們正在準備發布程式碼，在最新的模型Inception-v3 上執行影象識別任務。

Inception-v3 是用來訓練2012年ImageNet的Large Visual Recognition Challenge資料集。這是計算機視覺領域的一類標準任務，模型要把整個影象集分為1000個類別，例如“斑馬”、“達爾瑪西亞狗”，和“洗碗機”。如圖所示，這裡展示了一部分AlexNet的分類結果：

為了比較模型，我們檢查模型預測前5個分類結果不包含正確類別的失敗率 —— 即“top-5 錯誤率”。在2012年的驗證資料集上，AlexNet取得了15.3%的 top-5 錯誤率；BN-Inception-v2的錯誤率是6.66%；Inception-v3的錯誤率是3.46%。

人類在ImageNet挑戰賽上的表現如何呢？Andrej Karpathy寫了一篇博文來測試他自己的表現。他的top-5 錯誤率是5.1%。

這篇教學將會教你如何使用Inception-v3。你將學會如何用Python或者C++把影象分為1000個類別。我們也會討論如何從模型中提取高層次的特徵，在今後其它視覺任務中可能會用到。

Python API的使用方法

第一次執行classify_image.py指令碼時，它會從tensorflow.org官網上下載訓練好的模型。你需要在磁碟上預留約200M的空間。

接下去的步驟預設你已經通過PIP包安裝了TensorFlow，並且已經位於TensorFlow的根目錄下。

cd tensorflow/models/image/imagenet

python classify_image.py

上述命令會對熊貓的影象分類。

如果指令碼正確執行，將會得到如下的輸出結果：

giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.88493)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00878)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00317)
custard apple (score = 0.00149)
earthstar (score = 0.00127)

如果你還想測試其它JPEG圖片，修改 — image_file引數即可。

如果你把下載的模型放到了另一個目錄下，則需要通過修改 — model_dir 引數指定地址。

C++ API的使用方法

你可以在生產環境中用C++執行同樣的Inception-v3模型。按照下面的方式下載定義模型的GraphDef檔案（在TensorFlow的根目錄下執行）：

wget https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015.zip -O tensorflow/examples/label_image/data/inception_dec_2015.zip

unzip tensorflow/examples/label_image/data/inception_dec_2015.zip -d tensorflow/examples/label_image/data/

接著，我們需要編譯載入和執行模型的C++程式碼。如果你已經根據自己的平台環境，按照教學下載並安裝了TensorFlow，那麼在shell終端執行這條命令就能編譯例子了：

bazel build tensorflow/examples/label_image/...

這一步生成了二進位制可執行程式，然後這樣執行：

bazel-bin/tensorflow/examples/label_image/label_image

它使用了框架自帶的範例圖片，輸出的結果大致是這樣：

I tensorflow/examples/label_image/main.cc:200] military uniform (866): 0.647296
I tensorflow/examples/label_image/main.cc:200] suit (794): 0.0477196
I tensorflow/examples/label_image/main.cc:200] academic gown (896): 0.0232411
I tensorflow/examples/label_image/main.cc:200] bow tie (817): 0.0157356
I tensorflow/examples/label_image/main.cc:200] bolo tie (940): 0.0145024

這裡，我們使用的預設影象是 Admiral Grace Hopper，網路模型正確地識別出她穿著一套軍服，分數高達0.6。

接著，通過修改 —image=argument引數來試一試你自己的影象。

bazel-bin/tensorflow/examples/label_image/label_image --image=my_image.png

如果你進入 tensorflow/examples/label_image/main.cc 檔案仔細閱讀，就能明白其中的原理。我們希望這段程式碼能幫助你把TensorFlow融入到你自己的產品中，因此我們一步步來解讀主函數：

命令列指定了檔案的載入路徑，以及輸入影象的屬性。模型期望輸入 299x299 RGB 圖片，因此有 input_width 和 input_height兩個標誌。我們還需要把畫素值從0~255的整數值轉換為浮點數值。我們通過 input_mean 和 input_std 來控制歸一化：首先給每個畫素值減去 input_mean，然後除以 input_std。

這些數位可能看起來有些神奇，但它們是模型的原作者根據自己當時的想法定義的數值。如果你有一張自己訓練的圖片，你只需調整數值以匹配訓練過程所使用的值。

你閱讀ReadTensorFromImageFile() 函數就能夠明白它們是如何被應用到一張圖片上的。

// Given an image file name, read in the data, try to decode it as an image,
// resize it to the requested size, and then scale the values as desired.
Status ReadTensorFromImageFile(string file_name, const int input_height,
                               const int input_width, const float input_mean,
                               const float input_std,
                               std::vector<Tensor>* out_tensors) {
  tensorflow::GraphDefBuilder b;

首先建立一個GraphDefBuilder 物件，我們可以用它來指定執行或載入的模型。

  string input_name = "file_reader";
  string output_name = "normalized";
  tensorflow::Node* file_reader =
      tensorflow::ops::ReadFile(tensorflow::ops::Const(file_name, b.opts()),
                                b.opts().WithName(input_name));

接著，我們來為希望執行的模型建立節點，用於載入影象、調整大小和歸一化畫素值，使得其符合模型的輸入條件。我們建立的第一個節點只是一個Const操作，一個用來存放我們希望載入影象的檔名的tensor。然後它作為第一個輸入傳給ReadFile操作。你也許注意到了我們把 b.opts() 作為最後一個引數傳給所有的op 建立函數。這個引數確保了節點被新增到GraphDefBuilder定義的模型下。我們也通過 b.opts() 呼叫 WithName() 函數來給ReadFile操作命名。給節點賦名字並不是嚴格要求的，因為即使我們不做，節點也會自動被分配一個名字，但這會讓debug變得容易些。

  // Now try to figure out what kind of file it is and decode it.
  const int wanted_channels = 3;
  tensorflow::Node* image_reader;
  if (tensorflow::StringPiece(file_name).ends_with(".png")) {
    image_reader = tensorflow::ops::DecodePng(
        file_reader,
        b.opts().WithAttr("channels", wanted_channels).WithName("png_reader"));
  } else {
    // Assume if it's not a PNG then it must be a JPEG.
    image_reader = tensorflow::ops::DecodeJpeg(
        file_reader,
        b.opts().WithAttr("channels", wanted_channels).WithName("jpeg_reader"));
  }
  // Now cast the image data to float so we can do normal math on it.
  tensorflow::Node* float_caster = tensorflow::ops::Cast(
      image_reader, tensorflow::DT_FLOAT, b.opts().WithName("float_caster"));
  // The convention for image ops in TensorFlow is that all images are expected
  // to be in batches, so that they're four-dimensional arrays with indices of
  // [batch, height, width, channel]. Because we only have a single image, we
  // have to add a batch dimension of 1 to the start with ExpandDims().
  tensorflow::Node* dims_expander = tensorflow::ops::ExpandDims(
      float_caster, tensorflow::ops::Const(0, b.opts()), b.opts());
  // Bilinearly resize the image to fit the required dimensions.
  tensorflow::Node* resized = tensorflow::ops::ResizeBilinear(
      dims_expander, tensorflow::ops::Const({input_height, input_width},
                                            b.opts().WithName("size")),
      b.opts());
  // Subtract the mean and divide by the scale.
  tensorflow::ops::Div(
      tensorflow::ops::Sub(
          resized, tensorflow::ops::Const({input_mean}, b.opts()), b.opts()),
      tensorflow::ops::Const({input_std}, b.opts()),
      b.opts().WithName(output_name));

我們接著新增更多的節點，解碼資料檔案得到影象內容，將整型的畫素值轉換為浮點型值，調整影象大小，最後對畫素值做減法和除法的歸一化運算。

  // This runs the GraphDef network definition that we've just constructed, and
  // returns the results in the output tensor.
  tensorflow::GraphDef graph;
  TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));

最終，變數b包含了模型定義的資訊，我們用ToGraphDef() 函數將其轉換為一個完整的圖定義。

  std::unique_ptr<tensorflow::Session> session(
      tensorflow::NewSession(tensorflow::SessionOptions()));
  TF_RETURN_IF_ERROR(session->Create(graph));
  TF_RETURN_IF_ERROR(session->Run({}, {output_name}, {}, out_tensors));
  return Status::OK();

然後，我們再建立一個 Session 物件，它是真正用來執行圖的介面，並且執行它，同時指定我們從哪個節點得到輸出結果以及輸出資料存放在哪兒。

我們會得到一組 Tensor 物件，在這個例子中一組tensor物件僅有一個成員（只有一張輸入圖片）。這裡你可以把 Tensor 當做是一個多維陣列，它以浮點陣列的形式存放299畫素高、299畫素寬、3個通道的影象。如果你現有的產品中已經有了自己的影象處理框架，可以繼續使用它，只需要保證在輸入影象之前進行同樣的預處理步驟。

這是用C++動態建立小型 TensorFlow 圖的簡單例子，但是對於預訓練的Inception模型，我們則需要從檔案中載入大得多的定義內容。檢視 LoadGraph() 函數我們是如何實現的。

// Reads a model graph definition from disk, and creates a session object you
// can use to run it.
Status LoadGraph(string graph_file_name,
                 std::unique_ptr<tensorflow::Session>* session) {
  tensorflow::GraphDef graph_def;
  Status load_graph_status =
      ReadBinaryProto(tensorflow::Env::Default(), graph_file_name, &graph_def);
  if (!load_graph_status.ok()) {
    return tensorflow::errors::NotFound("Failed to load compute graph at '",
                                        graph_file_name, "'");
  }

如果你仔細閱讀影象載入的程式碼，會發現很多熟悉的術語。不同於用 GraphDefBuilder 來生產一個 GraphDef 物件，我們直接載入包含 GraphDef 的protobuf檔案。

  session->reset(tensorflow::NewSession(tensorflow::SessionOptions()));
  Status session_create_status = (*session)->Create(graph_def);
  if (!session_create_status.ok()) {
    return session_create_status;
  }
  return Status::OK();
}

我們然後從那個 GraphDef 建立一個 Session 物件，將它傳回給呼叫者以便後續呼叫執行。

GetTopLabels() 函數和影象載入的過程很像，差別在於這裡我們想獲取執行完main graph的結果，將其按照得分從高到低排序取前幾位的標籤。如同 image loader，它建立一個 GraphDefBuilder，往裡新增一些節點，然後執行short graph得到一對輸出的tensor。本例中是輸出有序的得分和得分最高結果的索引號。

// Analyzes the output of the Inception graph to retrieve the highest scores and
// their positions in the tensor, which correspond to categories.
Status GetTopLabels(const std::vector<Tensor>& outputs, int how_many_labels,
                    Tensor* indices, Tensor* scores) {
  tensorflow::GraphDefBuilder b;
  string output_name = "top_k";
  tensorflow::ops::TopK(tensorflow::ops::Const(outputs[0], b.opts()),
                        how_many_labels, b.opts().WithName(output_name));
  // This runs the GraphDef network definition that we've just constructed, and
  // returns the results in the output tensors.
  tensorflow::GraphDef graph;
  TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));
  std::unique_ptr<tensorflow::Session> session(
      tensorflow::NewSession(tensorflow::SessionOptions()));
  TF_RETURN_IF_ERROR(session->Create(graph));
  // The TopK node returns two outputs, the scores and their original indices,
  // so we have to append :0 and :1 to specify them both.
  std::vector<Tensor> out_tensors;
  TF_RETURN_IF_ERROR(session->Run({}, {output_name + ":0", output_name + ":1"},
                                  {}, &out_tensors));
  *scores = out_tensors[0];
  *indices = out_tensors[1];
  return Status::OK();

PrintTopLabels() 函數接收排序完的結果，然後列印輸出到控制台。CheckTopLabel() 函數的功能也非常相似，只是驗證頂部的標籤符合我們的結果預期，為了偵錯的時候方便。

最後，main() 函數串聯所有的呼叫方法。

int main(int argc, char* argv[]) {
  // We need to call this to set up global state for TensorFlow.
  tensorflow::port::InitMain(argv[0], &argc, &argv);
  Status s = tensorflow::ParseCommandLineFlags(&argc, argv);
  if (!s.ok()) {
    LOG(ERROR) << "Error parsing command line flags: " << s.ToString();
    return -1;
  }

  // First we load and initialize the model.
  std::unique_ptr<tensorflow::Session> session;
  string graph_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_graph);
  Status load_graph_status = LoadGraph(graph_path, &session);
  if (!load_graph_status.ok()) {
    LOG(ERROR) << load_graph_status;
    return -1;
  }

載入main graph。

  // Get the image from disk as a float array of numbers, resized and normalized
  // to the specifications the main graph expects.
  std::vector<Tensor> resized_tensors;
  string image_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_image);
  Status read_tensor_status = ReadTensorFromImageFile(
      image_path, FLAGS_input_height, FLAGS_input_width, FLAGS_input_mean,
      FLAGS_input_std, &resized_tensors);
  if (!read_tensor_status.ok()) {
    LOG(ERROR) << read_tensor_status;
    return -1;
  }
  const Tensor& resized_tensor = resized_tensors[0];

載入輸入影象，調整大小，完成預處理。

  // Actually run the image through the model.
  std::vector<Tensor> outputs;
  Status run_status = session->Run({{FLAGS_input_layer, resized_tensor}},
                                   {FLAGS_output_layer}, {}, &outputs);
  if (!run_status.ok()) {
    LOG(ERROR) << "Running model failed: " << run_status;
    return -1;
  }

我們以圖片作為輸入，執行載入完的graph。

  // This is for automated testing to make sure we get the expected result with
  // the default settings. We know that label 866 (military uniform) should be
  // the top label for the Admiral Hopper image.
  if (FLAGS_self_test) {
    bool expected_matches;
    Status check_status = CheckTopLabel(outputs, 866, &expected_matches);
    if (!check_status.ok()) {
      LOG(ERROR) << "Running check failed: " << check_status;
      return -1;
    }
    if (!expected_matches) {
      LOG(ERROR) << "Self-test failed!";
      return -1;
    }
  }

為了完成測試，我們可以檢查輸出的結果是否符合預期。

  // Do something interesting with the results we've generated.
  Status print_status = PrintTopLabels(outputs, FLAGS_labels);

最後，列印輸出得到的標籤。

  if (!print_status.ok()) {
    LOG(ERROR) << "Running print failed: " << print_status;
    return -1;
  }

例外處理使用了TensorFlow的Status物件，非常方便，呼叫ok() 函數就能知道是否出現了任何錯誤，還可以將錯誤資訊以易讀的方式列印出來。

我們在這個例子中演示了物體識別功能，今後無論在什麼領域，你都應該學會將類似的程式碼用於其它模型或者你自己訓練的模型。希望這個小例子能帶給你一些啟發，將TensorFlow用於自己的產品。

練習：遷移學習（transfer learning）的思想是人們若是擅長解決一類任務，那就應該能遷移其中的理解內容，用它來解決另一類相關的問題。實現遷移學習的方法之一就是移除網路的最後一層分類層，並且提取CNN的倒數第二層，在本例中是一個2048維的向量。可以通過C++的API設定 -- output_layer=pool_3 來指定，然後修改輸出tensor。嘗試在一個影象集裡提取這個特徵，看看你是否能夠預測不屬於ImageNet的新型別。

延伸閱讀

想要獲取更多的神經網路普及資料，Michael Niesen 的免費電子書是個極好的資源。針對折積神經網路，Chris Olah寫過一些很讚的部落格，Michael Nielsen的書裡也有一個章節詳細介紹。

若是要了解更多折積神經網路的應用，你可以直接前去閱讀TensorFlow的深度折積神經網路章節，或是從ML beginner和ML expert MNIST初學者教學逐漸深入。最後，若果想要追趕此領域的前沿動態，可以閱讀本教學所參照的所有文獻。

原文連結：Image Recognition

本文永久更新連結地址：http://www.linuxidc.com/Linux/2016-07/133227.htm

TensorFlow在影象識別中的應用

Python API的使用方法

C++ API的使用方法

熱門文章