Receipt Recognition Proof of Concept using Microsoft Cognitive Services

In the initial stages of the development of a technology, best practices are still being discovered through experimentation. While this provides developers with maximum flexibility in designing systems, it also places a large burden on them to have to do everything for themselves, often including learning lessons that had already been learned by others before them. AI, and in particular machine learning, has been in this state for quite some time, requiring developers to assemble the necessary computational resources and algorithms. It is a sign of maturity of a technology when it can be provided “as a service” (think “software-as-a-service,” “platform-as-a-service,” or “infrastructure-as-a-service”). Microsoft Cognitive Services (MSCS) and other analytics providers can be seen as demonstrating the maturity of machine learning technology by providing “analytics-as-a-service,” i.e., making their Cortana analytics platform available. That’s not to say that further improvements aren’t inevitable  ̶̶  but Microsoft believes that the basic operations are now well-enough understood that they can be performed behind the curtain of an API (application programming interface). Basic services might just process search queries, as can already be done from any browser. But Microsoft Cognitive Services can go much further.

The Breadth of use cases of Microsoft Cognitive Services

To take a now commonplace example, it is possible to create a standardized recommendation system using Microsoft Cognitive Services. An application would upload its catalog data (describing products) and usage data (specifying user interactions with products). The application can then request product recommendations for any user. Additional APIs allow users to constrain the recommendation algorithm to include or exclude particular recommendations. This API is used, for example, by allrecipes.com and Orkestra.

Is this really learning? Perhaps not yet. But Microsoft Cognitive Services also has an API that would allow the system to get feedback about its recommendations (perhaps by whether or not users click on them). Now, the recommendations are not based only on some preexisting theory or examples of user behavior, but can be shaped by that behavior as it manifests itself.

Pros and cons of analytics-as-a-service, such as Microsoft Cognitive Services Pros:

  • Microsoft takes care of the details.  This makes it easier to create an application, and thus to try out new ideas.
  • Microsoft will (hopefully) update the algorithms on the basis of the latest research, so you don’t have to!
  • As with any public cloud-based solution, MSCS enables flexible use of resources.

Cons:

  • With the ease of use gained through APIs, some expressivity is lost. For applications such as trading, which are so fundamental to the value added by an organization that relative superiority to other users is critical, analytics as a service may not be ideal if it does not provide adequate support for fine-tuning the algorithms.

How Softjourn Used Microsoft Cognitive Services for a Receipt Recognition POC Our client’s Problem

Our client manages prepaid cards for its clients, making it possible for them to track their expenses. Often, corporate customers wish to track the expenses of their employees. In the old days, traveling employees laid out funds for their expenses and submitted expense reports for review and, ultimately, reimbursement. Our client allows these expenses to be prepaid by the company, removing the need for employees to use their own funds and to sift through old receipts. Softjourn previously created a mobile app to allow expense-tracking to take place, including allowing administrators to monitor spending policies, to capture receipts, to transfer funds to a card, and to view specific transactions, account balances, or transaction summary information. But how to validate these expenses, when employees might, through their error or malice, submit receipts that should not qualify for spending through the card? Most immediately, how to even pull information from them?

Softjourn’s MSCS Solution

For this problem, the balance tipped in favor of using analytics-as-a-service, and in particular Microsoft Cognitive Services. Roman Kosiuk, middle .NET developer at Softjourn, implemented the proof of concept. He helped explain to us how it all works. 

The corporate customer’s employee scans and uploads a picture of their receipt, taken with their smart phone. Softjourn’s proof of concept (POC) sends it to MSCS’s Computer Vision API to perform OCR (optical character recognition). This pulls out editable lines of text from the receipt image, which are returned to the POC along with indications of Microsoft’s level of confidence in this result. Text with a low confidence level must be sent to be read by a human; other text contains errors that Softjourn can correct automatically. Next, templates encoding standard receipt formats are selected and applied to extract the various important pieces of information from the receipt, including transaction total, amount of tax, card charged, and address of the establishment. This allows for initial validation of transaction totals. Receipts from designated vendors may already have an approval code  ̶̶  for others there may be more work to do for validation. To this end, MSCS also returns a “raciness” indicator (checking for a situation in which the expense ought to be rejected for lack of relevance/appropriateness).

Other analytics-as-a-service providers

Microsoft is not the only game in town providing analytics services for financial and other applications. Another is IBM. Their Bluemixoffering leverages Watson (yes, the Jeopardy player) to provide analytics in domains including investment management. Their Wealth Portfolio Management Bot, for example, might predict how certain events would affect the value of each investment in a portfolio. Bottlenose sources from a wide variety of data streams to provide analytic insights in areas including finance, competitive intelligence, and risk estimation. Domo provides a platform by which organizations can access analytics not via APIs (at least not yet), but via numerous 3rd party apps.

Because this client is a Microsoft shop, using .NET, Azure, and VSTS (Visual Studio Team Service), it made sense to go with the Microsoft product for this project. Roman also cited Microsoft’s excellent documentation for their Cognitive Services as a factor in selecting that tool.

Source: SOFTJOURN