Join a Panel

Question in: Cloud Platforms

Integrating voice control into IoT and connected devices - what are the main challenges?


Chatbots, artificial intelligence and natural language interfaces are nowadays shared in almost every tech news out there.
Many of these discussions are driven from web and mobile integration abilities, but I think there are even more applications where direct product integration of natural language interfaces in connected devices make sense! Healthcare, Industrial IoT, consumer products ...
Providing a chatbot on a digital - web or mobile channel - is comparable an easy task, if you compare this with real hardware product integration. Beside hardware integration, the complete infrastructure including all data processing must be reliable and fulfill enterprise grade requirement for uninterrupted operation.
Where do you see the main challenges when integrating voice control into IoT and connected devices?
Please share your thoughts!

Artificial Intelligence
Enterprise Software
Sascha Poggemann
10 months ago

2 answers


Hi Sascha,
Alexa, Google home etc. are all IoT devices that do this job quite nicely, but voice control is expanding to other areas in the b2b and b2c sector ranging from cars, conference rooms, digital assistants to connected homes.
What we have to disect here from an architecture perspective is what components need to work in conjunction to deliver the end user experience you are looking for.
With that being said it really depends on the requirements and context (location, device hardware, intent etc.) where you want to deliver this service.
Generally speaking you need to look at:

  1. End point physical hardware capabilities - in this case speaker, mic, sensors, power source
  2. Connectivity - is this a consumer device or a interface for a machine in a b2b environment - bandwidth, throughput are key factors here to provide a seamless user experience - for ease of setup a mobile app will be required to assist the consumer in connecting the device
  3. Edge Analytics - Is there any information that needs to be analyzed on the edge device itself?
  4. Data storage - Does any information need to be stored on the device for offline processing? Besides speech is there any additional sensor data that we want to capture in conjunction with the spoken question or order?
  5. Voice recognition specific natural language processing - this is likely going to happen in the cloud instance that is connected to the device - Alexa does that in the Alexa Skills Interface that is hosted on Amazon AWS using Lambda Microservices coded in NodeJS querying DynamoDB and Cloudwatch datasources
  6. Rules & recommendation engine - the speach recognition technology that translated speech into actionable queries against a knowledge database needs to determine the best possible answers and solutions to the question
  7. Artificial intelligence and machine learning to train and improve answers and solutions provided based on machine algorithms.
  8. Presentation layer - should the user interface only be a spoken answer or does the endpoint have a display to show answers e.g. some mart fridges have displays built in that display content in context of your questions.
  9. Are there any reporting and analytics that the IoT device vendor wants to capture e.g. consumption beaviour, other sensor data for better customer service, predictive maintenance etc.


Carsten Krause
10 months ago
Dear Carsten, thanks for your reply. This is a great summary of the required components required in order to build such a system. Wouldn't it be great to be provided with all of those by a single contact point. So let's say a one stop shop supplier for speech enabled interfaces and devices ?! - Sascha 10 months ago

This is an interesting thought.
Lets say you have climate sensors in your living room - a temperature and humidity sensor.
The sensor is a small device and needs to respond rapidly - so I believe, at least today, we can't build the voice recognition intelligence into the sensors. Secondly you'd have several sensors - so the cost of putting the "intelligence" onto the sensor ("thing" in IoT) would be high
But you desire voice control: So lets say you speak out loud (either to the TV remote) or just out loud and ask "how's the climate?" or "how warm is the room?" - and we have the AI in the TV (or the remote), a single and heavier/larger device and that does the processing - sends out commands to the sensors, receives inputs and gives you an appropriate response.
I guess that's the best way to implement it today

Rajesh Siraskar
10 months ago
Dear Rajesh, thanks for your reply. As more and more devices get connected to the cloud, the interaction with those via dedicated smartphone apps becomes more and more complicated. We see the potential for Alexa and other interactive devices in the home to become the main interface for interaction. Our company now helps companies to bridge the gap between their connected IoT device and Alexa. - Sascha 10 months ago

Have some input?