Intelligent Picture Teller Mobile App with Computer Vision API

Introduction

We all love to take photos, but often, we have no idea how to make a fun and short caption for the photo. So, today, we are going to learn to build a intelligent picture teller mobile app, which utilizes technology – Computer Vision API of Cognitive service, to construct a short caption.

Before we start to build the app, you may want to ask, “What’s Cognitive Service?” or “What’s Computer Vision API”?

What’s Cognitive Service?

Microsoft Cognitive Services (formerly Project Oxford) are a set of APIs, SDKs and services available to developers to make their applications more intelligent, engaging and discoverable. Don’t forget to learn how you can create an account for Cognitive Service and we require that to obtain the API key to consume the API later.

What’s Computer Vision API?

Computer Vision API of Cognitive Services is capable to extract rich information from images to categorize and process visual data – and machine-assisted moderation of images to help curate your services.

Computer Vision API provides different sets of API endpoint that are able to:
-Analyze an image,
-Read text in images,
-Read handwritten text from images,
-Recognize celebrities and landmarks,
-Analyze video in near real-time, or
-Generate a thumbnail.

Therefore, we are going to use the Computer Vision API for analyzing an image.

Getting Started with Building an Intelligent Mobile App with Computer Vision

  1. First and foremost, we need to login with a Cognitive Service account and add Computer Vision to obtain the key. If you have yet to own one, click here to learn to create now.
  2. Secondly, we need to create a Xamarin cross platform mobile app with .NET Standard 2.0.
  3. Thirdly, we need to add few NuGet Packages, right click Solution in the Solution Explorer -> Manage NuGet Packages for Solution…IntelligentPictureTellerMobileAppWithComputerVision
  4. Next, search for Xam.Plugin.Media and install it to all projects.
    IntelligentPictureTellerMobileAppWithComputerVision
  5. Then, also search for Newtonsoft.Json and install it to all projects.
    IntelligentPictureTellerMobileAppWithComputerVision
  6. Now, we need to add a new ContentPage xaml at .Core project.

    IntelligentPictureTellerMobileAppWithComputerVision

  7. Select Content Page, name it as PicSelectPage.xaml and click Add.

    IntelligentPictureTellerMobileAppWithComputerVision

  8. Replace with the following UI code to PicSelectPage.xaml.
    <?xml version="1.0" encoding="utf-8" ?>
    <ContentPage xmlns="http://xamarin.com/schemas/2014/forms"
     xmlns:x="http://schemas.microsoft.com/winfx/2009/xaml"
     x:Class="HmhengPicTeller.Core.PicSelectPage"
     NavigationPage.HasNavigationBar="False" 
     NavigationPage.HasBackButton="True" 
     NavigationPage.BackButtonTitle="Photo">
     <ContentPage.Content>
       <StackLayout Margin="20,30,30,20">
         <Label Text="From this page you can use the Computer Vision API to construct a description about picture." 
     FontSize="Large" 
     Margin="0,0,0,20"/>
         <Label Text="Press Take Photo or Import Photo to begin." 
     FontSize="Large" 
     Margin="0,0,0,20"/>
         <Button x:Name="ButtonTakePrinted" 
     Text="Take Photo" 
     TextColor="White" 
     BackgroundColor="LightSeaGreen" 
     Clicked="TakePhotoButtonClickEventHandler" />
         <Button x:Name="ButtonImportPrinted" 
     Text="Import Photo" 
     TextColor="White" 
     BackgroundColor="LightSeaGreen" 
     Clicked="ImportPhotoButtonClickEventHandler"/>
         </StackLayout>
       </ContentPage.Content>
    </ContentPage>
  9. Now, we need to add some codes to PicSelectPage.xaml.cs logic code. At the constructor PicSelectPage(), let’s add following line and add some references.
    using System.IO;
    using System.Threading.Tasks;
    using Xamarin.Forms;
    using Xamarin.Forms.Xaml;
    
    ….
    
    public PicSelectPage ()
    {
      InitializeComponent ();
    
      CrossMedia.Current.Initialize();
    }
  10. Followed by the code below to declare an event handler TakePhotoButtonClickEventHandler() for button ButtonTakePrinted.
    /// <summary>
    /// Called when Take Photo is pressed.
    /// </summary>
    
    async void TakePhotoButtonClickEventHandler(object sender, EventArgs e)
    {
    
      byte[] photoByteArray = null;
    
      try
      {
         photoByteArray = await TakePhoto();
      }
      catch (Exception exc)
      {
          Console.WriteLine(exc.Message);
      }
    
      if (photoByteArray != null)
      {
         await Navigation.PushAsync(new ResultTellerPage(photoByteArray));
      }
    }
  11. Then, add another event handler ImportPhotoButtonClickEventHandler() for button ButtonImportPrinted.
    async void ImportPhotoButtonClickEventHandler(object sender, EventArgs e)
    {
      Boolean error = false;
      MediaFile photoMediaFile = null;
    
      byte[] photoByteArray = null;
      try
      {
         photoMediaFile = await CrossMedia.Current.PickPhotoAsync(new PickMediaOptions
         {
            PhotoSize = PhotoSize.Medium,
    
         });
    
         photoByteArray = MediaFileToByteArray(photoMediaFile);
       }
       catch (Exception exception)
       {
         Console.WriteLine($"ERROR: {exception.Message}");
         error = true;
       }
       if (error)
       {
          await DisplayAlert("Error", "Error taking photo", "OK");
       }
       else if (photoByteArray != null)
       {
          await Navigation.PushAsync(new ResultTellerPage(photoByteArray));
       }
    }
  12. After that, let’s also create one function TakePhoto in the PicSelectPage.xaml.cs./// <summary>
    /// Uses the Xamarin Media Plugin to take photos using the native camera
    /// application
    /// </summary>
    
    async Task<byte[]> TakePhoto()
    {
      MediaFile photoMediaFile = null;
      byte[] photoByteArray = null;
      if (CrossMedia.Current.IsCameraAvailable)
      {
         var mediaOptions = new StoreCameraMediaOptions
         {
            PhotoSize = PhotoSize.Medium,
            AllowCropping = true,
            SaveToAlbum = true,
            Name = $"{DateTime.UtcNow}.jpg"
         };
         photoMediaFile = await CrossMedia.Current.TakePhotoAsync(mediaOptions);
         photoByteArray = MediaFileToByteArray(photoMediaFile);
      }
      else
      {
         await DisplayAlert("Error", "No camera found", "OK");
         Console.WriteLine($"ERROR: No camera found");
      }
      return photoByteArray;
    }
  13. Create a function named MediaFileToByteArray to convert media file to byte array.
    /// <summary>
    /// Convert the media file to a byte array.
    /// </summary>
    byte[] MediaFileToByteArray(MediaFile photoMediaFile)
    {
       using (var memStream = new MemoryStream())
       {
          photoMediaFile.GetStream().CopyTo(memStream);
          return memStream.ToArray();
       }
    }
  14. Now, we need to add another new ContentPage xaml at .Core project.IntelligentPictureTellerMobileAppWithComputerVision
  15. Select Content Page, name it as ResultTellerPage.xaml and click Add.

    IntelligentPictureTellerMobileAppWithComputerVision

  16. Add the following UI code to ResultTellerPage.xaml.
    <?xml version="1.0" encoding="utf-8" ?>
    <ContentPage xmlns="http://xamarin.com/schemas/2014/forms"
                 xmlns:x="http://schemas.microsoft.com/winfx/2009/xaml"
                 x:Class="HmhengPicTeller.Core.ResultTellerPage"
                 NavigationPage.BackButtonTitle="Words"
                 Title="Photo Teller">
        <ContentPage.Content>
            <StackLayout>
                <ActivityIndicator Color="DeepSkyBlue"
                               IsRunning="True"
                               HorizontalOptions="CenterAndExpand"
                               VerticalOptions="CenterAndExpand"
                               IsVisible="True"
                               x:Name="LoadingIndicator"
                               Margin="0,20,0,0" />
                <Image x:Name="imgView" Source="" />
                <Label x:Name="lblDescription" Text="" IsVisible="false" FontSize="40" HorizontalTextAlignment="Center"/>
                
            </StackLayout>
        </ContentPage.Content>
    </ContentPage>
  17. At the code behind of ResultTellerPage.xaml.cs, add the references and the following variables in the class.
    using HmhengPicTeller.Core.Models;
    using Newtonsoft.Json;
    using System;
    using System.Net.Http;
    using System.Net.Http.Headers;
    using System.Threading.Tasks;
    using Xamarin.Forms;
    using Xamarin.Forms.Xaml;
    
    …
    public partial class ResultTellerPage : ContentPage
    {
            HttpClient visionApiClient;
            byte[] photo;
            DescriptionResult values;
    …
    }
  18. Replace the constructor ResultTellerPage() with the following code.
    public ResultTellerPage(byte[] photo)
    {
                InitializeComponent();
                this.photo = photo;
                visionApiClient = new HttpClient();
                visionApiClient.DefaultRequestHeaders.Add(AppConstants.OcpApimSubscriptionKey, AppConstants.ComputerVisionApiKey);
    }
  19. Add the following override function OnAppearing().
    protected override async void OnAppearing()
    {
      base.OnAppearing();
      if (values == null)
      {
        await LoadData();
      }
    }
    
    
  20. Then, create a function named LoadData().
    async Task LoadData()
    {
      // Try loading the results, show error message if necessary.
      Boolean error = false;
      try
      { 
         values = await FetchDescription();
      }
      catch
      {
         error = true;
      }
      // Hide the spinner, show the table
      LoadingIndicator.IsVisible = false;
      LoadingIndicator.IsRunning = false;
      lblDescription.IsVisible = true;
      if (error)
      {
         await ErrorAndPop("Error", "Error fetching description", "OK");
      }
      else if (values != null)
      {
         lblDescription.Text = values.description.captions[0].text;
      }
      else
      {
         await ErrorAndPop("Error", "No description found", "OK"); ;
      }
     }
  21. After that, create a function named FetchDescription().
    async Task<DescriptionResult> FetchDescription()
    {
        DescriptionResult descriptionResult = new DescriptionResult();
        if (photo != null)
        {
           HttpResponseMessage response = null;
           using (var content = new ByteArrayContent(photo))
           {
               // The media type of the body sent to the API.
               // "application/octet-stream" defines an image represented
               // as a byte array
               content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
               response = await visionApiClient.PostAsync(AppConstants.ComputerVisionApiDescriberUrl, content);
    
           }
    
           string ResponseString = await response.Content.ReadAsStringAsync();
           DescriptionResult _result = JsonConvert.DeserializeObject<DescriptionResult>(ResponseString);
    
           if (_result != null)
           {
               descriptionResult = _result;
           }
        }
        return descriptionResult;
    
    }
    
    
    
  22. Last for ResultTellerPage.xaml.cs, create a function named ErrorAndPop() with following code.
    /// <summary>
    /// Shows an error message, navigates back after it is dismissed.
    /// </summary>
    
    protected async Task ErrorAndPop(string title, string text, string button)
    {
        await DisplayAlert(title, text, button);
        Console.WriteLine($"ERROR: {text}");
        await Task.Delay(TimeSpan.FromSeconds(0.1d));
        await Navigation.PopAsync(true);
    }
    
    
  23. In addition, let’s create a Folder named Models.

    IntelligentPictureTellerMobileAppWithComputerVision

  24. Right click the newly created folder, add a new class named DescriptionResult.cs.

    IntelligentPictureTellerMobileAppWithComputerVision

  25. In DescriptionResult.cs, add the following code. Refer to here for JSON Structure of what Computer Vision will return.
    public class Caption
    {
       public string text { get; set; }
       public double confidence { get; set; }
    }
    
    public class Description
    {
       public List<string> tags { get; set; }
       public List<Caption> captions { get; set; }
    }
    public class Metadata
    {
       public int width { get; set; }
       public int height { get; set; }
       public string format { get; set; }
    }
    
    public class DescriptionResult
    {
       public Description description { get; set; }
       public string requestId { get; set; }
       public Metadata metadata { get; set; }
    }
    
    
  26. Now, add a new class to the .Core Project and name it as AppConstants.cs.IntelligentPictureTellerMobileAppWithComputerVision
  27. In AppConstants.cs, add the following code and save it.
    public static class AppConstants   
    {      
      public const string OcpApimSubscriptionKey = "Ocp-Apim-Subscription-Key";       
      /// <summary>      
      /// Url of the Computer Vision API OCR method for printed text       
      /// [language=en] Text in image is in English.         
      /// [detectOrientation=true] Improve results by detecting orientation          /// </summary>       
      public static string ComputerVisionApiDescriberUrl = "";
      
      public static void SetLocation(string location)       
      {           
         ComputerVisionApiDescriberUrl = $"https://{location}.api.cognitive.microsoft.com/vision/v1.0/analyze?visualFeatures=Description&language=en";
      }       
    
      /// <summary>       
      /// User's API Key for the Computer Vision API. Not a constant because it can get set in the app        
      /// if a user enters a key on the screen that allows key input.      
      /// </summary> 
      public static string ComputerVisionApiKey = "";
    }
  28. Lastly, we need to do some editing to App.xaml.cs.

    IntelligentPictureTellerMobileAppWithComputerVision

  29. In App.xaml.cs, replace the original constructor with the following along with the key that you have obtained in step 1.
    public App()
    {
                InitializeComponent();
                AppConstants.ComputerVisionApiKey = “<YOUR_API_KEY_HERE>";
    
                //Applicable Computer Vision locations (at time of writing) are: westus, eastus2, westcentralus, westeurope, southeastasia
                AppConstants.SetLocation("<YOUR_SELECTED_REGION_HERE>");
                MainPage = new NavigationPage(new PicSelectPage());
    }
  30. Finally, let’s compile and run to try out your Picture Teller app.
    IntelligentPictureTellerMobileAppWithComputerVision

 

IntelligentPictureTellerMobileAppWithComputerVision

Grab your sample code here.
Don’t forget to follow me @
Twitter: @hmheng
More slides @ SlideShare: https://www.slideshare.net/HiangMengHengMarvin
Blog: http://hmheng.pinsland.com

You may also like...

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: