Y.A.M の雑記帳: I/O Recap : ML Kit 情報まとめ（Android 向け）

* 以下は 2018年5月25日時点での情報です。

ML Kit for Firebase

現在 ML Kit はベータで、以下の機能を Android と iOS で利用することができます。

テキスト認識（Text recognition）
顔検出（Face detection）
バーコードの読み取り（Barcode scanning）
画像のラベルづけ（Image labeling）
ランドマーク認識（Landmark recognition）
独自 TensorFlow Lite モデルの実行（TensorFlow Lite model serving）
（Google I/O 2018 のセッションで High density face contour feature と Smart Replay API が Coming soon であると紹介されています。いずれも on-device で real time に動作するようです）

これらの処理にはデバイス上で行う on-device API とクラウドで実行される cloud-based API が用意されています。
on-device API はリアルタイムに処理でき、オフラインでも動作し、無料で使うことができます。
cloud-based API（Cloud Vision API）は on-device API よりも詳しい情報を提供しますが、一定の利用回数以上は有料です。

例えばテキスト認識では、on-device API では Latin-based language しか認識できず、他の言語も認識したいなら cloud-based API を使う必要があります。
画像のラベルづけでは、on-device API では 400+ labels ですが cloud-based API なら 1000+ labels に対応しています。

Cloud Vision API を利用するには Firebase の課金プランを Blaze（従量制課金）にする必要があります。
Firebase Pricing Plans

機能ごとに毎月1000 API calls までは無料で使うことができます。
Cloud Vision API Pricing

独自の TensorFlow Lite モデルを Firebase に upload するだけで、アプリからそのモデルを実行できるようになります。モデルのホスティングと実行は無料で使うことができます。

API には on-device でのみ使えるもの、Cloud でのみ使えるもの、両方用意されているものがあります。

機能	on-device	Cloud
テキスト認識	o	o
顔検出	o
バーコード読み取り	o
画像のラベルづけ	o	o
ランドマーク認識		o
独自モデルの実行	o

テキスト認識

https://firebase.google.com/docs/ml-kit/recognize-text

画像からテキストを認識します。

on-device API と cloud-based API 両方用意されています。 on-device API は Latin-based language のみ認識でき、cloud-based API は他の言語にも対応しています。

on-device API は無料で使うことができ、cloud-based API は毎月1000 API calls までは無料で使うことができます。
cloud-based API を使うには Firebase の課金プランを Blaze（従量制課金）にする必要があります。

最新は 16.0.0 です。


dependencies {
    implementation 'com.google.firebase:firebase-ml-vision:16.0.0'
}

cloud-based API には通常の文字認識用の FirebaseVisionCloudTextDetector の他に、書類のように文字密度の高いテキストの認識用に FirebaseVisionCloudDocumentTextDetector が用意されています。

顔検出

https://firebase.google.com/docs/ml-kit/detect-faces

画像から顔を検出します。
目・耳・頬・鼻・口の位置を取得できます。
笑顔かどうか（笑顔の確率）を取得できます。
目が閉じているかどうか（目が閉じている確率）を取得できます。
検出された個々の顔ごとの識別子を取得でき、動画のフレーム間で同一の顔をトラッキングできます。

on-device API のみです。無料で使うことができます。

最新は 16.0.0 です。


dependencies {
    implementation 'com.google.firebase:firebase-ml-vision:16.0.0'
}

バーコード読み取り

https://firebase.google.com/docs/ml-kit/read-barcodes

画像からバーコードを読み取ります。

対応フォーマット

Linear formats: Codabar, Code 39, Code 93, Code 128, EAN-8, EAN-13, ITF, UPC-A, UPC-E
2D formats: Aztec, Data Matrix, PDF417, QR Code

バーコードの向きに関係なく認識します。

on-device API のみです。無料で使うことができます。

最新は 16.0.0 です。


dependencies {
    implementation 'com.google.firebase:firebase-ml-vision:16.0.0'
}

画像のラベルづけ

https://firebase.google.com/docs/ml-kit/label-images

追加のメタ情報なしで画像内のエンティティ（人、物、場所、活動など）を認識し、リストとして取得できます。

on-device API と cloud-based API 両方用意されており、on-device API は 400+ labels に、cloud-based API は 1000+ labels に対応しています。
on-device API は無料で使うことができ、cloud-based API は毎月1000 API calls までは無料で使うことができます。
cloud-based API を使うには Firebase の課金プランを Blaze（従量制課金）にする必要があります。

最新は 16.0.0、on-device API は 15.0.0 です。


dependencies {
    implementation 'com.google.firebase:firebase-ml-vision:16.0.0'
    // on-device
    implementation 'com.google.firebase:firebase-ml-vision-image-label-model:15.0.0'
}

ランドマーク認識

https://firebase.google.com/docs/ml-kit/recognize-landmarks

画像からランドマーク（例えば東京タワーなど）を認識します。

cloud-based API のみです。そのため Firebase の課金プランを Blaze（従量制課金）にする必要があります。毎月1000 API calls までは無料です。

最新は 16.0.0 です。


dependencies {
    implementation 'com.google.firebase:firebase-ml-vision:16.0.0'
}

独自 TensorFlow Lite モデルの実行

https://firebase.google.com/docs/ml-kit/use-custom-models

独自の TensorFlow Lite モデルは Firebase console からアップロードします。

アプリへのモデルのダウンロードは Firebase が動的に行ってくれるため、APK にモデルをバンドルする必要がありません。これによりアプリのインストール時のサイズを減らすことができます。
また、アプリのリリースとモデルのリリース（Firebase への upload）プロセスが分離されることで、それぞれのチームでリリースをハンドリングできるようになります。

Firebase Remote Config と組み合わせれば A/B test を行うこともできます。
full TensorFlow モデルを lightweight TensorFlow Lite モデルへ変換・圧縮する機能が coming soon だと I/O で発表されています。

（Firebase console に upload せずに）APK にモデルをバンドルしたり、自分のサーバーでモデルをホストしてアプリにダウンロードして、それを ML Kit の API 経由で使うこともできます。

最新は 16.0.0 です。


dependencies {
    implementation 'com.google.firebase:firebase-ml-model-interpreter:16.0.0'
}

upload した独自モデルを使うには、FirebaseCloudModelSource.Builder にモデル名を渡して指定します。
（I/O のセッション動画のコードが古いので注意）
モデル名は Firebase にモデルをアップロードするときに指定します。あとから変更はできません。


val cloudSource = FirebaseCloudModelSource.Builder("my_model_v1")
        ...
        .build()

RemoteConfig でモデル名を切り替えるようにすれば、target ごとにそれぞれ異なるモデルを使うことができます。


val modelName = firebaseRemoteConfig.getString("my_model")
val cloudSource = FirebaseCloudModelSource.Builder(modelName)
        ...
        .build()

High density face contour feature

I/O のセッション動画より

100以上の点を検出し 60fps で処理できるとのこと。 coming pretty soon だそうです。

ML Kit console

左側のメニューの [DEVELOP] - [ML Kit] で ML Kit のコンソールを開くことができます。

ここのカスタムタブから独自モデルをアップロードします。

Codelabs

最初に Codelabs のアプリで動作を見てからドキュメントを読むのがよいと思います。
実際に両方ともやりましたが on-device での認識が速くすごいと思いました。

Android 向けの ML Kit のコードラボは2つ用意されています。
サンプルコードは Java で Kotlin 版は用意されていません。

Recognize text in images with ML Kit for Firebase

アプリにあらかじめ用意されている画像からテキストを認識する
on-device API と cloud-based API 両方使う
Cloud Vision API （https://console.cloud.google.com/apis/library/vision.googleapis.com/）を試すには Firebase の料金プランを Blaze（従量課金制）にしないといけない

左: on-device API での認識結果、右 : cloud-based API での認識結果