How to unleash the potential of the MLKit document scanner on your Android phone

  • ML Kit provides integrated solutions for image scanning and analysis on Android, making it easy to detect code, objects, text, and faces efficiently and on-device.
  • Integration with CameraX and proper image preparation ensure fast and accurate results, optimizing real-time performance even on resource-limited devices.
  • Taking advantage of advanced features such as autozoom, multiple image format management, and user experience customization allows you to develop more professional and comprehensive scanning apps.

MLKit Document Scanner

In the world of Android development, taking full advantage of today's tools makes the difference between a functional app and a truly cutting-edge one. One of the most versatile components is the ML Kit-based scanner, capable of transforming the way an app interacts with its surroundings through the camera. From reading barcodes and QR codes to detecting objects or recognizing text and faces, the possibilities are virtually endless with the right approach.

Many developers stick to the basics, failing to fully exploit the capabilities of ML Kit on Android. Proper integration isn't just about including dependencies and testing examples; the key lies in optimizing performance, configuring each parameter for specific cases, and understanding all the options and tricks offered by both the official documentation and the experience of other professionals. In this article, we'll go over, step by step and in depth, everything you need to know to get the most out of the ML Kit scanner on Android, from installation to more advanced settings, including image analysis, coordinate management, and essential tips for working in real time.

What is ML Kit and why it revolutionized scanning on Android

ML Kit is an SDK developed by Google that integrates powerful artificial intelligence technologies for computer vision tasks on mobile devices. Not only does it allow for the easy application of machine learning techniques, but it also does so on the device itself, without the need for an internet connection. This makes it a reliable, fast, and useful tool for applications that require code scanning, facial recognition, text reading, or object identification.

ML Kit's modular architecture provides flexibility for those looking to include the essentials or customize down to the smallest detail. You can choose between packaged templates (larger, built right in) or dynamically downloaded templates (which save space in the app but require an initial download). This duality allows you to prioritize app size or ease of use depending on your project's needs.

ML Kit
ML Kit
Developer: Novum Logic
Price: Free

Integrating ML Kit with CameraX: The Winning Combination

If you're looking for a robust integration between the Android camera and ML Kit capabilities, CameraX is your best ally. This library greatly facilitates access to camera hardware and image stream management, and also allows for overlaying interface elements and machine learning results over the camera preview.

Through class ML Kit Analyzer, you can connect the output of CameraX with the detectors and analyzers of ML Kit. This analyzer implements the interface ImageAnalysis.Analyzer, efficiently managing image resolution, coordinate transformations, and results delivery, which greatly simplifies the development of advanced scanning functions.

Using CameraController and PreviewView simplifies the presentation of the interface and the reception of results. By integrating the ML Kit Analyzer, you only need to specify the type of detector you want (for example, barcode) and how you want to receive the results. CameraX also handles details like image rotation and aspect ratio, avoiding common errors in more manual development.

Detecting and decoding barcodes: beyond the basic example

MLKit Barcode Reader

El barcode scanning It is one of the star use cases of ML Kit on Android. The interesting thing is that its API allows you to go far beyond detecting the typical QR code. From GS1, EAN-13, PDF417 to Data Matrix or Aztec, the versatility is maximum, opening the door to applications in logistics, commerce, product identification, and much more.

When configuring the detector, you can restrict it to the formats you actually need, which results in a increased speed and lower resource consumptionBy limiting the search, processing is more efficient and responds better in real time.

Among the highlighted options are:

  • enableAllPotentialBarcodes(): It allows you to detect all possible codes present in an image, even if some cannot yet be decoded. This is useful when the user needs to zoom in or focus better on the camera.
  • setZoomSuggestionOptions(): Enables autozoom, so the app can automatically suggest how much the camera should zoom in to maximize the readability of the detected code.

Zoom auto-suggestion is a key new feature which improves the user experience, especially in contexts where the distance or size of the codes varies greatly. You can implement your own callback to adjust the zoom parameters according to the recommendations of the ML Kit detector.

Correctly preparing input images

One of the most critical factors for the performance of the ML Kit scanner is the quality and size of the input images. The official documentation places great emphasis on adequate pixels and resolution, as detection accuracy directly depends on the representation of the data in the image.

For example, for EAN-13 codes, bars and spaces must be at least 2 pixels wide; the entire code should be no less than 190 pixels wide. For more complex formats like PDF417, the required dimensions are even larger, reaching up to 1156 pixels wide for a single row.

Focus and resolution are crucialIf the user captures a blurry or low-resolution image, the results may be erratic. A good practice is to suggest images with a resolution of 1280 x 720 or 1920 x 1080, as long as the device's performance allows. If latency is an issue, you can lower the resolution, but make sure the code fills as much of the image as possible.

ML Kit's own API allows you to transform the image sources you receive from the camera into InputImage. You can build this object from:

  • camera-specific media.Image.
  • Bitmap, ByteBuffer or ByteArray.
  • File URI (useful for loading images from the gallery).

In the case of CameraX, many of these transformations are solved automatically., especially regarding image rotation and real-time frame delivery. This takes a lot of work off the developer's shoulders and ensures that the processed image is always accurate.

Google Drive updates its scanner and adds significant improvements
Related article:
Google Drive will improve its document scanner

Processing images and managing results

MLKit Image Labeling

Once the input image is prepared, the ML Kit detector processes the image and returns a list of results. For barcodes, you will receive objects Barcode that allow you to access the coordinates in the image, the decoded value, the data type (URL, contact, text, etc.), and even additional attributes if available.

The management of these results is flexible. You can overlay information on the image, interact with the user (for example, by opening a web link if it's a QR code), or save the detected values ​​to a database. Thanks to the use of listeners, you can manage both successes and errors or unrecognized image events.

In integration with CameraX, it is essential to properly close each ImageProxy after processing it, to avoid memory leaks and maintain the fluidity of real-time analysis.

Optimization for real-time analysis

The full potential of ML Kit is unleashed when you process images in real time, for example, from a video stream. To maintain a smooth, lag-free experience, there are several key tips:

  • Don't use your camera's maximum native resolution unless absolutely necessary. In many cases, 2-megapixel images are more than sufficient and improve processing speed.
  • Reduce resolution if speed is a priority, but maintaining the minimum necessary for accurate coding.
  • In video streams, the detector may produce different results between consecutive frames. It's recommended to wait for at least several consecutive identical detections before considering a result valid.
  • Limit the number of calls to the detector. For example, if an image is already being processed, ignore subsequent frames until you finish.
  • If you use CameraX, make sure to set the strategy ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST to always deliver the most recent image and keep the app agile.
  • To overlay graphics on the image (e.g. bounding boxes), it first processes the image and then renders both the image and the overlay in a single pass, optimizing the amount of graphics work per frame.

The type of image format also matters. If you use Camera2, choose YUV_420_888; for older APIs, NV21 guarantees compatibility and speed.

Object detection and tracking in ML Kit

MLKit Object Tracking

The potential of the ML Kit scanner is not limited to barcodes: object detection and tracking is another key feature. This API allows you to identify up to five objects in an image or frame, obtaining their position and assigning a unique tracking ID to each one. When working with real-time video, you can track an object's movement over successive frames with complete accuracy.

Object detection settings allow you to:

  • Choose between transmission mode (STREAM_MODE), optimized for low latency and tracking, and single image mode (SINGLE_IMAGE_MODE).
  • Turn on or off sorting objects into broad categories (fashion, food, home, places, plants, strangers).
  • Set whether you want to process multiple objects simultaneously or just the most prominent one.

Applications range from product recognition in stores, plant or animal analysis in educational apps, to advanced logistics and warehouse assistance systems.

Image preparation and rotation management follows the same principles mentioned in code scanning, and it is recommended to use InputImage created directly from supported formats to maximize performance.

Text recognition and face detection

Text recognition (OCR) and real-time facial analysis are two other major pillars of ML Kit. The Text API is capable of locating blocks, lines, and elements within an image, returning their exact position and recognized content, even for complex characters or varied languages.

Facial recognition adds the ability to identify contours and specific features (eyes, mouth, nose, etc.), establish probabilities associated with gestures (such as smiling or blinking), and draw on the image to visualize the results. By integrating custom classes, you can, for example, overlay boxes or dots over each detected face and display various information to the user.

This type of implementation requires some extra management in terms of graphical representation., but the integration with CameraX and GraphicOverlay makes the process easier. Developers can fully customize the interface and adapt the results of the automatic analysis to their desired visual style.

Tips for an optimal user experience

ML Kit

The success of an advanced scanning feature doesn't depend solely on the algorithms; it's also key to taking care of the user experience. Here are some helpful guidelines:

  • Make sure the objects you want to detect are sufficiently prominent and have visual details. Elements with low contrast or ambiguous details may require the user to move them closer to the camera.
  • When using object classification, be prepared to handle unknown or ambiguous objects, providing clear feedback to the user.
  • Include visual or textual cues in the interface to help the user focus correctly, avoiding frustration due to erroneous results.
  • If multiple objects are detected, consider options for the user to select the relevant result (for example, in educational or inventory apps).
  • Remember to adapt your interface and error messages for offline contexts, especially if you use dynamically downloaded models.

Analysis of results and practical examples

The capabilities of ML Kit They allow us to deliver very rich and structured information to the end user. In both object detection and barcode scanning, you have positions, values, types, and confidence levels. This makes it possible to display tables, summaries, or perform automatic actions (such as opening links or storing information in the background).

For example, in the case of object detection, you can present the user with the identified category and its confidence level, the tracking ID to compare the same object across multiple frames, and the boundaries of the detected area. This way, in an inventory app, it's easy to highlight the main object and track it frame by frame to process actions or trigger automations.

The integration between the different ML Kit modules allows you to combine functionalitiesImagine reading a barcode on a detected object, obtaining its value, and classifying it, all in real time and using the camera image. These types of synergies are only possible with advanced configuration and complete mastery of all available options.

How to keep everything up to date and avoid problems

The update rate of ML Kit libraries is high and it's important to stay on top of each new release, as performance, compatibility, and new feature improvements are introduced frequently.

Always check the dependencies in your file build.gradle and check that the minimum SDK level is correctly defined (usually API 21 or higher). Also, review the initialization methods, as recent versions have optimized some processes and introduced new features, such as automatic model download when installing the app.

Another recommendation is to consult the sample apps and teaching materials available in both official Google documentation and open source repositories. These resources often include Performance tests, integration examples with advanced interfaces, and solutions to common errors.

Maintain compatibility and optimization across diverse devices

One of the challenges of developing Android apps is the huge variety of devices, resolutions, and cameras available on the market. ML Kit is optimized for a wide range of devices, but it's always a good idea to test the app on multiple devices, both high-end and low-end, to ensure that the processing and visual quality are adequate in all cases.

If your audience primarily uses low-power devices, prioritize efficiency and speed, sacrificing resolution or functionality range if necessary to avoid compromising the user experience.

We conclude with an overview that refreshes the key points for getting the most out of the ML Kit scanner on Android: choose the best integration strategy between CameraX and ML Kit, prepare your input images well, properly manage formats and resolutions according to your usage, and customize your interface and workflows to provide the most robust and advanced experience possible for both users and developers.

The best apps to view and edit documents on Android
Related article:
The 5 best apps to view and edit documents on Android

Follow us on Google News

Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Actualidad Blog
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.