SAN FRANCISCO: Microsoft (NASDAQ: MSFT) launches Cognitive Services, a suite of APIs aimed at developers so that they can make their apps ‘smarter’. Microsoft had announced at last year’s Build Conference Project Oxford, a set of similar APIs and services available to developers so that they can make their applications more intelligent and engaging to users.
This year they renamed Project Oxford as Microsoft Cognitive Services after expanding the availability to developers around the world. Cortana Intelligence Suite running on the Microsoft bot framework has also been integrated with the Cognitive Services.
The APIs, when integrated into various apps and services, will help them to see, hear, process, and interpret human needs using normal methods of communication. The Cognitive Services expand the existing perceptual intelligence capabilities of Vision, Face Detection, Text and Speech to include new capabilities like Emotion, Language Understanding coupled with Bing Search.
Satya Nadella, CEO of Microsoft, in his keynote speech said, “We want to take the power of human language and apply it more pervasively to all of the computing interface and interactions.”
To “Give your solutions a human side,” there is an introduction of a brand-new object recognition engine, to enhance the Project Oxford. Microsoft created Captionbot.ai, to exhibit the capabilities of the API. Another instance of how the cognitive services can be used was shown at the Build Conference 2016. The other API is a custom voice-recognition tool which can transcript audio and help to recognize low-grade audio.
As described on the official site, it uses Computer Vision and Natural Language processing to describe contents of images. It is a combination of Computer Vision API, Emotion API and the Bing Image Search API.
The site is a pretty good one with a clean interface. But, it cannot describe all the photos correctly, especially the age and emotion. This establishes the fact that Microsoft needs to brush up the code a bit more.
Optical Character Recognition (OCR) is the term what the geeks use, in lay man’s terms, it means reading text present in images. Captionbot can detect text in an image and extracts the recognized words into a machine-readable character stream.
Developers can use Microsoft’s services and incorporate the APIs into their apps for free. For the more advanced devs, there will be a pricing plan, based on the number of transactions, usually for every 1,000 transactions.
Microsoft’s rebranding of Project Oxford to Cognitive Services is to compete with IBM’s Wilson, which has been touted as a cognitive computing product.
So let’s see how developers can put these APIs in action and ultimately lead to future where it is going to be about “man with machines” instead of “man versus machine” as shown in the much popular ‘Terminator’ series.