Integration of voice control in smart devices is buzzing, and adoption continues to grow. Voice control provides a more natural way of interacting with connected apps and devices ranging from news feeds, traffic information to acting as personal assistants in the home. These intelligent devices respond to commands spoken in our own voice and act immediately.
This post was originally published in the IBM Cloud Blog
In this tutorial, we will show you how to develop a home automation app to control devices in the home using voice commands, with the help of IBM Watson Speech to Text service. We will use a model smart home based on an SVG image of a home’s floor plan and few light bulbs which can be simulated to switch on / off with the voice commands.
Watson Speech to Text service provides machine intelligence and knowledge of grammar and language structure, making it easy to enhance your apps by adding voice recognition capabilities. The service can be used to convert streaming audio to text in real-time or convert speech to text as a single request. It can also recognize different speakers and label the transcript accordingly. It has a few extra features such as profanity filtering, formatting and word confidence.
You can use a microphone and get the converted text back. You can even convert a recorded audio file to text. Moreover, it is also possible to customize the service to improve accuracy for a specific language or content using your own set of keywords. The ability to customize and train it with your own unique language model gives it the power to transcribe accurately unique accents, specific words, or uncommon dialect.
You can try a demo of this service.
Who wouldn’t want the luxury of controlling their home appliances from their couch? The idea of commanding all the devices around you isn’t new. Typical home automation systems have intelligent connected devices which can be controlled from a mobile app. There are a multitude of home automation products available in the market, each one with its own unique set of features.
With a mobile app however, you loose all the intuitiveness and spontaneity. You need to unlock the phone, then navigate and open the app, then navigate again to the appropriate screen to control a device.
A voice assisted system for controlling the appliances is much more intuitive. It is like a personal assistant listening to commands in your own voice and controlling the devices in your house.
The system uses two major components – Watson Speech to Text API and PubNub Data Streaming Network.
Watson Speech to Text service is accessed in this project via HTTP REST APIs to convert speech commands to text. A locally running lightweight server generates authentication tokens using the service credentials for accessing the service. A client Web page served by this server listens to the microphone on your PC and sends the speech to the Watson Speech to Text service. The service returns converted text which is then parsed to extract control commands to send to the home appliances or devices. All of this is orchestrated via the PubNub Data Stream Network.
Here are the commands supported by this voice activated home automation app:
The app supports both “turn on,” “turn off” commands and “switch on,” “switch off” commands as well.
The software components and cloud services used to build this app are listed below:
This project utilizes the following major SDKs and libraries:
To build and run this app, you will need to create accounts on IBM Cloud and PubNub. Both the services offer a free tier account.
You can experience the virtual home automation system controlled by voice commands after creating required services and building the app. Here is a quick video demo.
The video shows a local Web page to listen to the speech commands and another Web page to simulate the smart home with simulated devices. The page listening to speech commands shows the text returned by Watson Speech to Text service before sending the control commands to smart home. It also shows device status feedback received from the smart home. This Web page is run on a local server to authenticate to the Watson Speech to Text service.
Here is a brief description of how this voice controlled home automation system works.
The smart home is simulated as a Web page that uses an SVG image with bulb icons. This Web page listens to PubNub channels to receive the control commands and simulate switching of the devices as per the commands.
Note: By default, Watson Speech to Text can recognize certain speech accents but can also be trained to understand your native speech accent. To save time and effort, recorded audio clips have been used for voice commands in this demo.
There are a number of ways you can extend the capabilities of this system.
Talking to a virtual assistant gives a better sense of control for the user. Hence, in addition to listening to commands, you can also add ability for the system to talk back and make it a talking home automation system. To do this, you can combine this project with the IBM Watson Text to Speech service to add audible feedback for the user. Refer to this blog post about building a Text to Speech based app using IBM Watson and PubNub.
PubNub plays a unique role in orchestrating the messages across the application components. Apart from data streaming, PubNub provides historical data for the messages published through their network. This is an extremely useful feature and can be used to develop data analytics applications that can provide insights into the usage pattern and other aspects of the voice controlled home automation system.
Watson Speech to Text service from IBM provides a powerful API to add speech recognition capabilities to your application. One area where it can impact the most can be developing applications for elderly or physically disabled people who can use voice for various tasks which otherwise may be difficult for them. A number of other applications can benefit from using this service. For example, transcribing a voice call in call centers or a teacher can have the notes ready after finishing a class and so on.
So get inspired and start building awesome applications capable of speech recognition. The documentation here will help you get started with Watson Speech to Text service. Please do share your feedback with us !
Gopal has a rich experience of over 20 years in Embedded systems, Digital/Analog design, Microcontrollers as well as in programming. Long back he learned to programme the classic 8051 Microcontroller in assembly language and now he is working on cutting-edge technologies around Cloud, Multimedia and IoT. Outside the professional world, Gopal loves photography, listening to music and spending time with his family.
Web 3.0 Messaging APIs for Enabling NextGen Internet Communication Powered By PubNub
Build a Social Perception Dashboard for Brands
Model IoT Application For Tracking Kitchen Inventory
Getting started with IBM dashDB using Python
Serving The Future Economy With XaaS (Everything as a Service) Platforms
Please log in again. The login page will open in a new window. After logging in you can close it and return to this page.