Summer is well underway, and after a long cold winter here in Philadelphia, I’ve been enjoying long hikes and trips to local swimming holes with my pup. Sometimes I worry, as the days have gotten hotter, that my hound will overheat, dehydrate, or merely overdo it. It’s often hard to tell when something is wrong with our four-legged friends, until they are in a crisis.
“The tendency of a companion animal is to disguise a problem. It is in their nature because to show it is to become vulnerable. They hide everything until the last moment and then it’s too late,” said Avi Menkes, chief executive officer (CEO) of PetPace.
PetPace is one player in a hot new trend: The creation of smart collars for our canine and feline friends. Many smart pet collars are like human activity trackers. Slap one on a beagle (who is known as the “canine escape artist”) and you can track where he or she is and even set up a geofence to receive alerts if he or she wanders outside of a designated area. Pet owners can track a pet’s activity level, calories burned, and his or her quality of sleep. This makes the PetPace collar more like an activity tracker and medical device all rolled into one, and the design approach and technology the organization applies in such a small space is interesting.
Menkes has a lot of remote sensing experience, but it was in the business-to-business supply chain world. He explained that he was looking to use his knowledge to create something that served a greater good. He wanted to give someone a voice who previously didn’t have one. That someone turned out to be cats and dogs, and his device allows them to tell their humans how they are feeling.
While the collar is an engineering feat, Menkes points out that it was not born from engineering but from a notion of how to give our pets better lives and their owners peace of mind. Instead of asking what we want to know, the PetPace team asked: “What is a pet telling us?”
“We took electrical engineering, veterinary medicine, physics, and computer science and married them together. Each one is a different discipline, yes, it's a piece of hardware, but you have to understand the pet’s biology. The collar uses some physics algorithms and on the back end it's signal processing analytics and machine learning.”
With the help of cofounder and chief veterinarian Dr. Asaf Dagan, PetPace determined that it would need to monitor body temperature, pulse, respiration, heart rate variability (HRV), activity level, calories burned, and positions. Initially, they chose not to monitor the dog’s global positioning, and they decided instead to concentrate on whether a dog is sitting or lying down (in which position or on what side).
Dogs are creatures of habit in subtle ways. They often lie on the same side of their body. Should a dog suddenly start lying on the opposite side, the PetPace collar, by way of a three-axis accelerometer, notices the change and determines if it’s an indication of a change in behavior or just an anomaly. If the algorithm determines the change is significant, it alerts the owner. That level of monitoring allows the pet’s owner to seek medical advice long before a problem manifests itself in a way they can easily see or feel, such as in the form of a tumor-producing lump.
Once the PetPace team identified what they wanted to monitor, they began identifying the technology that would go into the collar. In some cases, it was a process of elimination. Take for example pulse oximetry, a common method for measuring how much of the hemoglobin in blood is carrying oxygen. Pulse oximeters use laser and infrared light to determine oxygen saturation. However, laser sensors do not work on dogs with dark fur. Likewise, an infrared sensor was not viable because it would require an owner to shave their dog’s neck to get a proper signal, and it was important that everything would work non-invasively.
Pursuing an alternative, PetPace chose to employ acoustic monitoring, which functions similar to a stethoscope, yet it uses a piezoelectric sensor with acoustic concentrators on one side of the collar and acoustic balancers on the other side to reduce the signal-to-noise ratio. This allows the collar to measure a pet’s pulse and respiration while filtering out irrelevant sounds such as panting or the sounds associated with head movements.
The collar continuously monitors the pulse to create an accurate picture of a pet’s HRV. It can capture a pulse not only when dogs or cats are resting but also when they are leaping into the air to catch their favorite toy.
Menkes gives the example of a cat owner who received an alert that the pet’s HRV was atypical. The pet appeared fine, but the owner took it to the vet regardless and found out the cat had an extensive hyperthyroid situation. Had the owner waited three months for the apparent symptoms to show up in the cat, such as severe weight loss, the cat would already be losing kidney function.
Temperature was another hurdle to overcome. Non-invasively measuring core temperature is impossible. PetPace’s patented algorithm calculates the thermoregulation of a dog to determine if the dog has a fever, is overheating, is suffering from hypothermia, or is lying in the sun or in front of an air-conditioner. The collar and the algorithms learn the trends of the pet to understand their characteristics and then determines when there is a deviation.
“When you start to put the collars on pets, suddenly you have objective data,” said Menkes.
Not only did the collar have to be non-invasive, but it had to be rugged, considering dogs like mine have a love for all things wet and dirty (and stinky, unfortunately). To address this matter, the PetPace collar has an IP67 rating, protecting it from dust and immersion in water to depths up to 1m (about 3ft.) for 30 minutes.
PetPace engineers are now working on adding Global Positioning System (GPS) navigation capabilities to the collars. As anyone who has ever turned on a smartphone’s GPS to access directions when walking around town knows, the transmission of GPS data has a draining effect on the battery, which causes power loss. PetPace is trying to solve this problem via algorithms that will determine when the GPS should and should not be transmitting data.
The American Pet Products Association estimates that Americans are spending $70 million on their pets, so it’s easy to see why the smart collar trend is taking off here in the United States and around the world. It’s less about an indulgence than about a tool that engineers have created to give our pets a voice.
In the summer of 2010, Shanghai was host to the 41st World Expo with the theme “Better City, Better Life.” This was an international high point for discussions around cultural exchange, social development, and especially, urban development. According to statistics from the United Nations and the World Bank, the percentage of city dwellers to the overall world population in 2010 was 51 percent, marking a historic shift of population from rural to urban centers. From that point forward, countries all over the world began to see the connection between improving the quality of life in cities to improving the lives of their citizens.
Ten years on, the global proportion of city dwellers has increased from 51 percent to 55 percent. Based on forecasts by the United Nations, this percentage will increase to 68 percent by 2050. City life will therefore become the default state of human civilization. The concentration of populations in cities will, on the one hand, bring many conveniences, but on the other hand, will introduce new challenges in housing, traffic, environmental damage, and resource conservation, to name a few. Many hope that emerging technologies can be used to solve these new challenges that are unique to cities, and this has given birth to the concept of the smart city. As part of the smart city’s conceptual framework, new technologies such as the internet, modernized industry, and artificial intelligence (AI) are hoped to be used to integrate city systems and services, boost the efficiency of resource utilization, and optimize city administration and services. This can help solve the problems faced by cities and improve the quality of life of their residents.
The smart city concept has been developing for more than a decade since IBM first proposed it in 2008. Some preliminary smart applications have already become a part of the everyday lives of city dwellers. Mapping software, such as Google Maps, combines geographical data and actual images of the city and uses algorithms to assist users to understand their city and plan specific routes, all from the comfort of their home. Uber in the US and DiDi in China leverage such services and integrate vehicle and user data with their recommendation algorithms to help users quickly catch a ride. In the field of security, China established its Skynet surveillance camera system in 2017, with more than 200 million cameras being put to its service by 2019. Similar surveillance networks are being quickly deployed in other places worldwide, such as the Domain Awareness System jointly built by the New York City Police Department and Microsoft. It consists of a vast number of cameras and sensors and back-end data processing systems that can be used to constantly monitor and swiftly respond to criminal activity.
These examples of smart city applications already use some AI algorithms such as recommendation algorithms, recognition algorithms, and prediction algorithms. But the vast majority of applications are concentrated around data collection, networking, and information sharing—such as e-government platforms, device remote control, and sensor arrays. As AI technology develops alongside the smart cities it is supporting, this data will be further leveraged through such AI functions as inference, prediction, and decision-making.
With the rise of AI technology in 2012, many new technologies based on deep learning were introduced to help meet the everyday housing and transportation needs of urban residents, to help maintain the sustainability of environmental resources, and to help city administrators more quickly get a handle on information and communicate with residents. This has meant greater convenience and efficiency for urbanites.
The biggest contributor to an intelligent transportation system has been the advent of autonomous driving systems. When self-driving vehicles become the main mode of urban transportation, it will guarantee better safety and provide greater efficiency as self-driving vehicles use big data and route-planning algorithms to automatically avoid congestion and automatically find optimal driving routes.
With this future in mind, the research and industrialization of self-driving vehicles are now fully underway. Waymo, a subsidiary of Alphabet, issued a self-driving vehicle safety report in October 2020. The report found that Waymo self-driving vehicles had already driven 24.1 billion kilometers on virtual roads and 32 million kilometers of autonomous driving on actual roads. In the 106,000 kilometers of real road testing over the past two years, only 18 actual collisions and 29 virtual collisions were recorded, and most were the result of other drivers not following traffic rules. This shows that autonomous driving technology is already becoming quite mature and can adeptly handle any simple road situation. However, it also shows that the vision for highly intelligent self-driving vehicles has not yet become a reality.
Although the day when self-driving vehicles can truly replace human drivers has not yet arrived, driving assistance technology and road control technology have already become a part of people's daily lives. Examples are technologies that use sensors, cameras, and control technologies to support features such as automatic reverse and parking and warnings about pedestrians, front and rear obstacles, and lane changes. An on-board computer can change the vehicle's path a few seconds ahead of time because of the comprehensive analysis of vehicle speed, distance, and sensor images, and this is a great boon for traffic safety. In terms of the road itself, AI algorithms have already been put to good use controlling traffic lights. The city of Hangzhou, China, tested its urban data brain on some roads in the Xiaoshan District in 2016. With AI algorithms analyzing vehicle data and road surveillance cameras intelligently controlling traffic lights, the speed of traffic was increased by 3 percent to 5 percent and even by 11 percent in some road sections.
Another important role of smart cities is their protection of the urban environment and the optimization of urban resource allocation. AI can also assist in these areas.
This is true for urban electricity supply systems in particular. Urban electricity grids experience different power loads in different seasons, times of day, weather conditions, and regions. AI algorithms can combine this data with knowledge of electrical power to analyze the electrical grid's operating mode. This makes a data-driven health assessment of the grid that includes equipment status, network topology, and real-time operations possible. The health assessment allows operators to monitor and instantly discover problems in the power supply. Power grid equipment, such as power transmission lines and transformers, can also be more frequently monitored. Field robots collect images of equipment, which are analyzed with classification and integration algorithms to quickly discover any equipment failures—such as loose dampers and missing insulators—and risks from factors such as construction work, overgrown trees, and fireworks.
A network of sensors can monitor the urban environment. Barcelona, Spain, for example, installed more than 20,000 wireless sensors around the city to collect data about temperature, humidity, pollution, noise, and traffic flow. In the future, AI algorithms can run classification and regression analysis on this data to predict pollution, weather, and traffic situations. This will help city administrators take the appropriate measures as quickly as possible.
Garbage sorting is another area that can benefit from AI-based monitoring. Forecasts predict that the amount of trash produced by urban residents globally will increase from the current 2 billion tons a year to 3.4 billion tons in 2050. If this household waste were disposed of in landfills, it would displace billions of square meters of soil every year, which would have a massive impact on the world's environment. Intelligent garbage sorting can replace manual work and achieve superior results. Finland's Bin-e smart trash bins first use cameras to capture images of the trash and then use trained algorithms for image identification and physical object detection to analyze the contents of the bin. A mechanical system is finally used to sort and compress the garbage, while the bin's internal sensors can also notify the user and the waste management company to dispose of the waste promptly.
In smart cities, information is exchanged more efficiently and with greater transparency between city administrators and city residents. Realizing such benefits requires building the necessary data platforms and the deployment of information technology such as blockchains.
Blockchain technology features distributed storage and multi-party maintenance and is impervious to falsification. This ensures that information is valid and genuine and also increases the efficiency of point-to-point information transfers. Blockchain technology can facilitate the application of AI algorithms. Examples are:
These applications depend on the efficient transmission of data guaranteed by blockchain technology.
Blockchains can also be used to protect and share data. In a city's e-government system, city residents can instantly view new or modified government policy, give instant feedback, and see other people’s comments. This will greatly enhance the communication between city administrators and residents. Health data includes various types of private patient information, and medical records are generally only kept on file by hospitals, so they are not easily accessible. With blockchain technology, patients can establish confidential electronic health records that can securely transfer between the patient and hospital in complete form. Blockchains can also help governments and the public respond quickly to sudden public health incidents. In response to the COVID-19 outbreak, the Chinese government introduced a health code system where each person can display their individual health status and see the local exposure risk. Blockchain technology guarantees data security and genuineness for the program, and AI algorithms analyze risk levels. The health code information service system enabled the Chinese government to respond quickly and get control of the pandemic.
Intelligent healthcare has always been seen as a natural direction in the development of AI. Microsoft's Healthcare Bot is a chatbot that harnesses natural language processing and speech-recognition technology so that patients can get diagnosis and triage for simple conditions by talking with a chatbot online. In the field of imaging, Chinese companies such as YITUTech and Deepwise have developed intelligent diagnostic systems based on image classification and segmentation to help doctors quickly find tuberculosis and pinpoint cerebral hemorrhages in computed tomography (CT) scans and magnetic resonance imaging (MRI), which enhances diagnosis efficiency.
Smart-home technology will gradually replace traditional appliances in the home. As the IoT becomes more ubiquitous, everything in the house, from traditional home appliances to curtains, doors, and windows, will connect to the home data brain, and smart-voice controllers such as Alexa and Siri will recognize verbal commands and transmit these to the corresponding household devices. AI algorithms can also analyze daily living routines to control home appliances automatically. Smart devices have already begun to find their way into the daily lives of the average person. Take smart cameras as an example. Devices such as Nanit or Cubo AI (USA) integrate scene segmentation, behavior recognition, and facial recognition algorithms to help parents monitor their child's every move from infancy to childhood. They analyze the sleeping position of infants and provide warnings about dangerous situations such as climbing on the furniture or the detection of obstructions over an infant's mouth and nose.
In residential complexes, residents will enjoy conveniences such as intelligent logistics and unmanned supermarkets. Amazon’s warehouses, which are considered the world’s most efficient, use more than 15,000 robots working in 3D warehouses and logistics centers to convey and sort goods quickly. In terms of unmanned supermarkets, two years after Amazon started up operations with its Amazon Go markets, it opened up even bigger unmanned supermarkets called Amazon Go Grocery in 2020, not only increasing the size of the store but also adding more types of products and increasing quantities. Such well-known unmanned supermarkets combine computer vision, sensor technologies, and deep-learning algorithms to monitor the movement and interaction of multiple physical objects simultaneously. This results in the ability to record in detail images and data about each shopper's activities. Shoppers can simply take products off the shelves and place them in their bags without dealing with item scanning and checkout. Customers receive an accurate bill after they exit the supermarket.
From this overview of smart city application scenarios, it's clear that AI technology has profoundly changed the relationship that people have with information. Data and information from cities train AI technology, and AI prediction, decision-making, judgment, and modeling can be widely applied across smart cities to serve the daily needs of residents better.
The changes brought to smart cities by the application of AI technology do not stop there. Even the city's basic functions are not immune from changes occurring in this area. AI technology, autonomous driving, and the IoT have changed the way connections are made between physical objects and people and between the physical objects themselves. The allocation of resources within a city and between cities no longer relies solely on manual input and labor. This lowers the costs of transporting goods to individual communities in the city. With the rise of 5G technology and shared office spaces, more and more people will be able to work and handle their affairs near where they live. Cities can naturally develop toward having multiple centers of activity, and each center can become a multi-purpose community that does not have to be either exclusively a residential or commercial area. This lowers the overall costs of getting around the city and also naturally reduces carbon emissions.
Changes will also occur to the types of occupations that the people in cities will be working in. AI technology will handle garbage sorting, traffic control, driving, and checkout, freeing up an abundance of human resources. Meanwhile, these AI technologies will also require large-scale data collection and continuous model training, triggering a need for more data engineers, sensor hardware engineers, and AI engineers. People who have a strong grasp of AI technology will be in great demand as AI is deployed in all sorts of fields such as healthcare, education, information management, construction, and real estate.
Of course, this kind of ideal smart city will not just spring up overnight, nor is it something that can easily be brought into being through top-down planning. AI technology develops cyclically, so city administrators should develop short- and long-term development plans. In the short-term, city administrators should support AI businesses that use deep-learning-based AI technologies to create applications in areas such as transportation, healthcare, and power, thus jointly forming intelligent infrastructure from the bottom up. In the long-term, AI technology will likely see revolutionary advances soon, yet information and data will always be inseparable from it. Therefore, administrators of future smart cities should digitize all city administrative functions and all city-related data. Such digitization will mean that cities will have a virtual replica of the physical city, allowing simulation for urban planning and forecasting for potential incidents. Digitization also builds the data foundation for further application of AI technologies, and it will provide advanced tools for urban planning and city construction.
In addition to AI technology, building smart cities will require developments in other basic technologies. One example is 5G technology, which is set to make a long-lasting impact. It transmits data at speeds that are 20 times faster than what 4G technology can handle, and it supports the simultaneous transmission of data from many different communication devices. The enormous input of data required by AI algorithms can be transferred to the cloud, processed, and instantly returned. This allows for the use of lightweight smart devices that do not need complicated processors. Meanwhile, connecting as much infrastructure equipment as possible to a smart network can finally achieve the Internet of Everything (IoE). Newly installed smart devices can also further promote the city's digitization so that its digitization and smartification can move forward in tandem.
Smart cities will still have some limitations. Massive differences in histories, cultures, planning, and management between cities mean that experience might not be readily replicable. For example, China will need to consider its very high population density and historic landmarks. In contrast, Australia would need to deal with the significant differences between coastal and interior cities. AI algorithms are always influenced by the data they rely on, and the process and results of their work reflect the prejudices of the data source to some degree or another. This requires city administrators and social workers to supervise the algorithms and the data collection to ensure that the results are fair for all segments of society. City residents will also have to relinquish some of their data privacy to enjoy the convenience provided by these algorithms. Therefore, the use of this private data will have to be protected by rigorous standards for data management. The actual environment of the city itself will also be a factor that limits the scope of its development. This means that, while developing big cities, governments should also place importance on building up remote regions and rural areas so that all population centers can enjoy the conveniences brought by AI technology.
Smart cities offer city residents the dream of fast and convenient city life, smart and efficient, and full of hope. This future certainly requires the helping hand of AI technology. Building smart cities will not be something that happens overnight. As AI technology is embedded in cities, residents will be gradually introduced to new concepts and new lifestyles that will not necessarily be easy to accept right away. However, the benefits to human civilization offered by this next great technological revolution are worthwhile.
(Source: ArtemisDiana - stock.adobe.com)
Advances in speech synthesis have accelerated adoption of smart assistants like Amazon Alexa, Apple Siri, among others, but sophisticated speech capabilities are edging closer to offering a more vital service. Speech technologies based on artificial intelligence (AI) are evolving towards the ultimate goal of giving voice to millions of individuals suffering speech loss or impairment.
Cutting-edge voice technology underlies a massive, tremendously competitive marketplace for smart products. According to the 2022 Smart Audio Report1 from NPR and Edison Research, 62 percent of Americans aged 18 and over use a voice assistant in some type of device. For companies, participation in the trend for sophisticated voice capabilities is critical—not just for securing their synthetic voice brand, but also for participating in the unprecedented opportunities for direct interaction with consumers through AI-based agents that listen and respond through the user’s device in a natural-sounding conversation.
Speech synthesis technology has evolved dramatically from voice encoder, or vocoder, systems first developed nearly a century ago to reduce bandwidth in telephone line transmissions. Today’s vocoders are sophisticated subsystems based on deep learning algorithms like convolutional neural networks (CNNs). In fact, these neural vocoders only serve as the backend stage of complex speech synthesis pipelines that incorporate an acoustic model capable of generating various aspects of voice that listeners use to identify gender, age, and other factors associated with individual human speakers. In this pipeline, the acoustic model generates acoustic features, typically in mel-spectrograms, which map the linear frequency domain into a domain considered more representative of human perception. In turn, neural vocoders like Google DeepMind’s WaveNet use these acoustic features to generate high-quality audio output waveforms.
Text-to-speech (TTS) offerings abound in the industry, ranging from downloadable mobile apps, open-source packages like OpenTTS, and comprehensive cloud-based, multi-language services such as Amazon Polly, Google Text-to-Speech, and Microsoft Azure Text to Speech, among others. Many TTS packages and services support the industry-standard Speech Synthesis Markup Language (SSML), allowing a consistent approach for speech synthesis applications to support more realistic speech patterns, including pauses, phrasing, emphasis, and intonation.
Today’s TTS software can deliver voice quality that’s a far cry from the kind of robotlike speech of the electrolarynx or that the late Stephan Hawking employed as his signature voice even after improved voice rending technology became available2. Even so, these packages and services are focused on providing a realistic voice interface for applications, websites, videos, automated voice response systems, and the like. Reproducing a specific individual’s voice—including their unique tone and speech patterns—is not their primary objective.
Although some services such as Google’s provide an option for creating a user-supplied voice by special arrangement, they aren’t geared to meeting the critical need of reproducing the voice lost by an individual. For these individuals, this need is indeed critical because our unique voice is so closely tied to our identity, where a simple voiced greeting conveys so much more than the individual words. Individuals who have lost their voice feel a disconnection that goes beyond the loss of vocalization. For them, the ability to interact with others in their own voice is the real promise of emerging speech synthesis technology.
Efforts continue to lower the barrier to providing synthetic voices that can match the unique persona of individuals. For example, last year actor Val Kilmer revealed that after he had lost his voice due to throat cancer surgery, UK company Sonantic provided him with a synthetic voice that was recognizably his own. In another high-profile voice cloning application, the voice of the late celebrity chef Anthony Bourdain was cloned in a film about his life, delivering words in Bourdain’s voice that the chef wrote but never had spoken in life.
Another voice pioneer, VocalID, provides individuals with custom voices based on recordings that each individual “banks” with the company in anticipation of their loss of voice or with custom voices based on banked recordings made by volunteers and matched to the individual who has lost their voice. The individual can then run the custom voice synthesis application on their IoS, Android, or Windows mobile device, carrying on conversations in their unique voice.
The technology for cloning voices is moving quickly. This summer, Amazon demonstrated the ability to clone a voice using audio clips less than 60 seconds in duration. Although billed as a way to resurrect the voice of dearly departed relatives, Amazon’s demonstration highlights AI’s potential for delivering speech output in a familiar voice.
Given the link between voice and identity, high-fidelity speech generation is both a promise and a threat. As with deepfake videos, deepfake voice cloning represents a significant security threat. A high-quality voice clone was cited as the contributing factor in the fraudulent transfer of $35 million in early 2020. In that case, a bank manager wired the funds in response to a telephone transfer request delivered in a voice he recognized but proved to be a deepfake voice.
With an eye on the market potential for this technology, researchers in academic and commercial organizations are actively pursuing new methods to generate speech output capable of all the nuances of a human speaker to more fully engage the consumer. For all the market opportunity, however, advanced speech synthesis technology promises to deliver a more personal benefit to the millions of individuals who are born without a voice or have lost their voice due to accident or illness.
1. “The Smart Audio Report.” national public media, June 2022. https://www.nationalpublicmedia.com/insights/reports/smart-audio-report/.
2. Handley, Rachel. Stephen Hawking’s voice, made by a man who lost his own. BeyondWords, July 15, 2021. https://beyondwords.io/blog/stephen-hawkings-voice/.
(Source: wladimir1804 - stock.adobe.com)
‘It is not only about what you say. It is also about how you say it.’ This old-age adage quite aptly sums up the need for human beings to communicate effectively with each other. The necessity of humans to interconnect with one another through voice and sounds has presented a future where communication with machines has become inevitable.
A key enabler for the increasing adoption of voice communication has been accelerated with the expansion of the Internet of Things (IoT) and artificial intelligence (AI). Integration of AI at the endpoint— combined with advances in voice analytics—is changing the availability of products, and the consumption of product experiences are giving rise to a new ecosystem of companies that are participants and enablers of these products. Intelligent endpoint solutions are making it possible to implement both online and offline systems, reducing reliance on always-on internet/cloud connections. This, in turn, is creating new opportunities to solve many challenges related to real-time voice analytics across several consumer and industrial applications. The advances in psycholinguistic data analytics and affective computing make allowance for inferring emotions, attitudes, and intent with data-driven voice modeling. With the voice medium becoming a natural way for humans to interact, it will lead to improvements in measuring intent from voice recognition and voice analytics.
Voice user interfaces (VUIs) allow the user to interact with endpoint systems through voice or speech commands. Despite mass deployments across a wide range of applications, VUIs have some limitations.
In this blog, Renesas Electronics addresses these challenges by using state-of-the-art microcontrollers and partner-enabled intelligent voice processing algorithms, which makes it easier for product manufacturers to integrate highly efficient voice commands. Renesas Electronics provides general-purpose MCUs enabling VUI integration without compromising performance and power consumption.
To make the experience compelling for the user, devices need to be equipped with several components to ensure robust voice recognition.
One of the most significant features of a voice-enabled device is its ability to identify speech commands from an audio input. The speech command recognition system on the device is activated by the wake word, which then takes the input, interprets it, and transcribes it to text. This text ultimately serves the purpose of the input or command to perform the specific task.
Voice activity detection (VAD) is the process that distinguishes human speech from the audio signal and background noise. VAD is further utilized to improve the optimization of overall system power consumption otherwise; the system needs to be active all the time, resulting in unnecessary power consumption. The VAD algorithm can be subdivided into four stages (Figure 1):
Figure 1: The block diagram specifies the four stages of the VAD algorithm: noise minimization, segregation, classification, and response. (Source: Renesas Electronics)
The Renesas RA voice command solution built on the RA MCU family and partner-enabled voice recognition MW boasts a robust noise reduction technique that helps in ensuring high accuracy in VAD. In addition, Renesas can help to address some of the key voice command features outlined below:
Keyword spotting systems (KWS) are one of the key features of any voice-enabled device. The KWS relies on speech recognition to identify the keywords and phrases. These words trigger and initiate the recognition process at the endpoint, allowing the audio to correspond to the rest of the query (Figure 2).
Figure 2: The diagram illustrates the keyword spotting process, which relies on speech recognition to identify the keywords and phrases, the identified keywords and phrases triggering and initiating the recognition process at the endpoint, and allowing the audio to correspond to the rest of the query. (Source: Renesas Electronics)
To contribute to a better hands-free user experience, the KWS is required to provide highly accurate real-time responses. This places an immense constraint on the KWS power budget. Therefore, Renesas provides partner-enabled high-performance optimized machine learning (ML) models capable of running on advanced 32-bit RA microcontrollers. They come with pre-trained DNN models, which help in achieving high accuracy when performing keyword spotting.
Speaker identification, as the name suggests, is the process of identifying which registered speaker has the given voice input (Figure 3). Speaker recognition can be classified as text dependent, text independent, and text prompted. To train the DNN for speaker identification, individual idiosyncrasies such as dialect, pronunciation, prosody (rhythmic patterns of speech), and phone usage are obtained.
Figure 3: Speaker identification system block diagram illustrates the process of training the DNN for speaker identification and individual speech idiosyncrasies. (Source: Renesas Electronics)
Spoofing is a type of scam where the intruder attempts to gain unauthorized access to a system by pretending to be the target speaker. This can be countered by including anti-spoofing software to ensure the security of the system. The spoofing attacks are usually against Automatic Speaker Verification (ASV) systems (Figure 4). The spoofed speech samples can be generated using speech synthesis, voice conversion, or by just replaying recorded speech. These attacks can be classified as direct or indirect depending on how they interact with the ASV system.
Figure 4: Block representation of an automatic speaker verification. (Source: Renesas Electronics)
Accent recognition in English-speaking countries is a much smoother process due to the availability of training data, hence accurate predictions. The downside for organizations operating in countries where English is not the first language is less precision with speech recognition due to the availability of a limited amount of data. An inadequate amount of training data makes building conversational models of high accuracy challenging.
To overcome the accent recognition issue, Renesas offers a VUI partner-enabled solutions that support more than 44 languages, making it a highly adaptable speech recognition solution that can be used by any organization worldwide.
The Give Voice to Smart Products blog was originally published on www.renesas.com and is republished here with permission.
The ubiquity and use of smart home gadgets have turned regular houses into places full of intelligent, interconnected devices that offer both ease and safety. This big change in how smart homes work is due to the Internet of Things (IoT), which has led to many advanced devices with different uses that make home life more convenient and secure. From home security to appliances and more, these emerging technologies are redefining what it means to be ‘home sweet home.’
The cornerstone of this technological renaissance is the smart speaker. Smart speakers exemplify the fusion of artificial intelligence (AI) and machine learning (ML) with everyday utility. These speakers, equipped with voice recognition (referred to as voice assistants), function as the central nervous system of the smart home, enabling users to command various connected gadgets through simple verbal cues. From adjusting the ambiance with smart lighting systems to regulating the indoor temperature with intelligent thermostats to tuning your smart television to your favorite channel, control rests at the tip of the tongue.
Furthermore, smart security systems have redefined household safety. Using voice commands, users can enable alarms or communicate with visitors at the front door, whether home or away. These systems, a combination of cameras, sensors, and alarms, offer real-time surveillance and alerts, ensuring peace of mind. Notably, with their analytics capabilities, they are adept at distinguishing between routine occurrences and potential threats, thereby reducing false alarms and enhancing security efficacy.
The kitchen, too, has witnessed a transformation with smart appliances (Figure 1). Refrigerators like the Samsung Family Hub™+ not only monitor for food freshness and make plenty of ice, but also feature a 32" touchscreen that lets you share pictures, stream music and videos, and access recipes, all from the fridge. When you’re hustling around the kitchen with oven mitts on and need to make sure all temperatures and cooking times are correct, there are ovens that can be remotely voice-controlled for precision cooking, epitomizing the convergence of culinary art and technology.
Figure 1: Today’s smart home kitchen is equipped with appliances enhanced by IoT connectivity that put you in complete control. (Source: Koldunova/ stock.adobe.com)
Additionally, the availability of high-speed fiber internet connections and the use of Wi-Fi mesh routers are future-proofing homes and making slow connections and dead zones a thing of the past. This high-speed connectivity is the lifeline of these devices and provides voice control capabilities never seen before. Relying on Wi-Fi®, Bluetooth®, Zigbee, Thread, Matter, and sometimes proprietary protocols, they form an interconnected web, allowing for seamless interaction and data exchange.
This connectivity not only facilitates voice control but also enables these devices to learn and adapt to user preferences over time, enhancing their utility and personalization. Many of the apps that come with these smart devices further enhance the user experience by enabling remote control and tracking of the devices’ activity via smartphones and watches.
The added value of these smart home devices transcends mere convenience. With the advantages of ML, some of these smart home devices can analyze energy consumption and determine usage schedules for air conditioning and more. By optimizing energy usage, these devices contribute to a more sustainable living environment. Their ability to learn and adapt leads to a more personalized and efficient home living experience, saving time and enhancing comfort.
This week’s New Tech Tuesday features a connectivity evaluation kit from u-blox and a Bluetooth® module from Renesas. Both products represent the latest cutting-edge solutions for wireless connectivity applications in the IoT realm.
The u-blox EVK-NORA-B12x evaluation kits are an indispensable tool for design engineers looking to develop IoT applications. This kit provides stand-alone use of the NORA-B1 series module, which features the powerful Nordic Semiconductor nRF5340 dual-core RF System-on-Chip (SoC). It serves as an excellent starting point for a wide range of projects, including Bluetooth 5.2 Low Energy, Thread, or Zigbee applications. One of its key advantages is accessibility, allowing easy access directly from the evaluation board to all features of the NORA-B1 series modules. With a simple USB connection, engineers can power the kit, perform programming tasks, and utilize virtual COM ports. The board includes convenient features, like four user buttons, a USB peripheral connector, user LEDs, and a reset button. Additionally, it offers 48 GPIO signals (46 for EVK-NORA-B12) on headers that are compatible with the Arduino® form factor, simplifying the use of existing Arduino shields. Whether used for Bluetooth® connectivity in smart homes or other IoT applications, this evaluation kit offers a valuable toolset for developers seeking to work with the NORA-B1 module.
The Renesas DA14695MOD Multi-Core Bluetooth® 5.2 Modules are versatile solutions for wireless connectivity applications. These modules are part of the SmartBond™ DA1469x family, known for its feature-rich and powerful multi-core microcontroller units. Key features of the DA14695MOD modules include global certification, integration of all necessary passives and antennas, a 32Mbit QSPI FLASH, and user-friendly software support, making them easy to work with. Additionally, these multi-core wireless microcontroller modules feature the latest Arm® Cortex®-M33 application processor with a floating-point unit, advanced power management functionality, and a cryptographic security engine. These modules are designed to enable Bluetooth® Low Energy 5.2 and proprietary 2.4GHz protocols, making them suitable for a wide range of wireless applications, including IoT, beacons, proximity tags, low power sensors, robotics, and many more.
Smart IoT home gadgets have revolutionized our homes, making them more connected, convenient, and secure. Key to this change is smart speakers, which use artificial intelligence to let us control our home environment with just our voice. They can adjust lights, set thermostats, and control our televisions. Smart security systems also play a significant role, using cameras and sensors to keep our homes safe and giving us peace of mind. And let us not forget our kitchens. They, too, are being transformed by smart appliances that can suggest recipes and cook food precisely at our command. Overall, these smart home devices do more than just make life easier—they also help save energy and create a more personalized and comfortable living space.
 “Bespoke 4-Door Flex™ Refrigerator with Family Hub™+ in Charcoal Glass Top and Stainless Steel Bottom Panels.” Samsung Electronics America, accessed November 29, 2023, https://www.samsung.com/us/home-appliances/refrigerators/bespoke/bespoke-counter-depth-4-door-flex-refrigerator-23-cu-ft-with-family-hub-in-charcoal-glass-top-and-stainless-steel-bottom-panels-rf23cb9900qkaa/.
 Hirt, Meredith. “10 Smart Home Trends This Year.” Forbes Home, July 4, 2023, https://www.forbes.com/home-improvement/internet/smart-home-tech-trends/.
Here is a scenario: You come home from work or school, you tell the TV what show you want to watch, and it automatically turns on and switches to your preferred channel. Or perhaps you tell the stove to prepare for low and slow cooking so that dinner is cooked at the appropriate temperature at the right time. Today, home appliances are capable of performing these functions. Through voice control, you can just relax on the sofa after a tiring day at work or school and give instructions to these appliances that obediently follow your command.
Complex architecture and wide-ranging connections are the hallmarks of the Internet of Things. More companies are choosing cloud-hosted IoT systems because cloud architecture is secure, fast, and convenient. A system becomes more secure by using several layers of encryption and authentication. AI-based model training and deployment—such as natural language processing—can be completed with just one click. An IoT cloud generally includes a sensor embedded inside a home appliance that connects to the internet via Wi-Fi. It is used to receive data and transfer it to the cloud database to be analyzed and processed in the cloud environment. In this article, cloud architecture is used as the framework to explain how voice control technology enables home appliances to obey verbal commands and respond.
With constant AI and IoT developments, human-machine interaction (HMI) has seen more high-end experiences. Voice control technology is one of the most widely applied and popular research topics today. The application of voice control in home appliances, which eliminates the need for familiar remote controls and enables appliances to function using verbal commands alone is new to most people. Voice-controlled home appliances are made possible using AI, machine learning, speech recognition, IoT, and cloud computing.
A voice control system includes:
Speech recognition refers to the transformation of information from speech to text. The Azure platform's TTS (text-to-speech) uses a universal language model trained using Microsoft's existing data and is deployed in the cloud. This model can be used to create and train custom language models. It can select a specific lexicon and add it into the training data as needed.
Natural language analysis/natural language processing is a part of machine learning that designs models and conducts training.
The tasks of dialog management comprise three main points:
The response text is generated based on the model's analysis of the user's command. The main effect of speech synthesis technology is transforming text into a humanized voice. The basic Azure cloud voice synthesis uses voice SDK or REST Application Programming Interface (API) protocols (see details below) to achieve text-to-speech with a neural or custom voice.
In home appliances, the dialog models’ emotional requirements are somewhat lower because most user commands are only functional requests, such as turning on the device and requesting the temperature or humidity.
A basic solution for cloud voice control technology includes:
With the Universal Windows Platform, the same API can be universally applied to computers, smartphones, or other Windows 10 devices. In other words, the same code can be run on different terminals without writing different versions of the code for different platforms.
Voice SDK software allows manufacturers to boost voice quality enhancement in hands-free applications by using voice-band audio processing for automotive hands-free applications, such as speech recognition in cockpit devices.
The official documentation states that: "As an alternative method for voice SDK, the voice service allows the use of REST APIs to transform speech to text. Every accessible endpoint is connected to a certain region. The application requires a subscription key for the endpoint used. REST APIs are very limited since they can only be used in situations where voice SDKs are not available."
Using speech recognition as an example: A key for the REST API must be acquired before sending the HTTP request to the server. After authentication, the server returns the transformed audio locally. This diagram is an example of creating and using a REST client in an application and then invoking it (Figure 1). When invoking a REST client, the input is transformed into an HTTP request and sent to the REST API. The response from the communication endpoint is an HTTP response. The REST client transforms it to a type that the application can recognize and returns it to the application.
Figure 1: Creating and using a REST client in an application. (Source: gunnarpeipman.com)
We opt not to publicly disclose the details of our application’s REST client, so an adapter for the communication with external servers can be added. The adapter receives parameters of known types from the application, and the adapter returns to the same data to the external server.
Azure's LUIS is a cloud-based dialog AI service that allows machines to understand human language. The mode of operation can be summed up as follows: The client directly sends a voice request to LUIS through the application. The natural language processing function in LUIS transforms the command into JSON format. After it is analyzed, the answer is also returned in JSON format. The LUIS platform provides the user with a training model service. This model sporting a "continuous learning" function and responding to the client's request by making corrections continuously and automatically to improve accuracy.
Now, let’s take a look at how LUIS works using a residential humidity monitoring system as an example. What if you wanted a user to give the "check the humidity" command? LUIS incorporates the essential components of natural language processing:
The user can customize LUIS features based on their own needs, which means that when your model cannot easily recognize one or a few words, it can automatically add new data for retraining.
Raspberry Pi is a development board that can connect sensors of different types. Raspberry Pi can be used with a Web server. Such a server receives different interpretation commands and sends electrical signals to control home appliances installed in the smart home.
Voice control makes the home environment smarter and brings about home appliance automation (Figure 2). We can define it this way: Improving the homeowner's quality of life by using technologies that provide different services related to the areas of health, multimedia, entertainment, and energy.
Figure 2: Voice control technology recognizes audio commands to operate connected home appliances. (Source: Andrey Suslov/Shutterstock.com)
Let’s take a look at how voice control technology for home appliances works with a smart voice-controlled humidity monitor using cloud architecture as an example.
When running Universal Windows Platform (UWP) on Raspberry Pi 3, the speech recognition API and sensor interact with the user. Semantic analysis is performed in LUIS, and Raspberry Pi 3 inputs the user’s question. The answer finally comes from the speech recognition API of Cognitive Services.
Cloud computing has become the first choice in data architecture to ensure that data transmission is secure, data processing is fast, and model predictions are accurate. Cloud deployment can also significantly reduce device operation and enhance device performance while improving user experience, thus achieving a win-win outcome. The cloud architecture selected here is the Microsoft Azure cloud platform that has recently given rise to major developments and innovations in the fields of AI & IoT.
Refer to the following GitHub link for an example of creating this type of solution.
Data transfer from the sensor to the cloud database can already be accomplished using today's data architecture. Clients can directly use different types of databases to meet their various needs.
Example: The user wishes to know what the humidity level in their home is, so they say, "Hey, cloud! à What is the humidity in the room now?" The text of the question is provided using the UWP running in Raspberry Pi 3 on the device. The application will communicate with all sensors and actuators and then trigger the system to send the question to LUIS for semantic analysis.
LUIS is used to understand the command received from Raspberry Pi 3. Through model training, the application can recognize that the intention of the command is to detect the indoor humidity. After that, the LUIS API is added into the UWP application. When the user says the trigger command "Hey, cloud!", all contents are sent to LUIS through the API and analyzed. LUIS is called in the UWP, and it receives the input and analyzes the intention. Based on the predicted intention’s confidence level, the correct answer is provided to the user. A command is then sent to the IoT center to get the temperature from the sensor.
A web application can be developed for device management. This application can display all sensor data received by the IoT center, making the management of devices easier and realizing the functions of restart and firmware update.
The UWP application and web application interact with each other to give the client a response, with the web application being responsible for sending the command to the designated sensor, detecting the specific sensor’s current indoor humidity, and answering the user's question. Finally, the user is provided with the current indoor humidity through the text-to-speech API.
In the era of the Internet of Things, man's dream of attaining a high-quality and convenient life is made possible by home appliances with voice control and response capability. The voice control function of home appliances is designed using a combination of technologies that include artificial intelligence, machine learning, natural language processing, the Internet of Things, cloud computing, data transmission, and sensors.
The use of voice-control technology in home appliances is a very forward-looking application. The future home will certainly be a place filled with smart devices that can talk to their users. It is hoped the technology will draw more scientists to this field of study and work toward constant innovation and development.
With Matter, smart-home devices will work well together and the true potential of connected tech will be realized. (Source: AndSus - stock.adobe.com)
The dream of the smart home—an automated dwelling that cossets its occupants in a warm blanket of technology—remains just that. But we might not have to wait too much longer for easily accessible, reliable, and crucially, interoperable connected-home products, which is a welcome relief because the smart home’s potential has been touted for much longer than you might think.
Science fiction aside—which nearly a century ago had robots helping with household chores and homes that continued to operate even though the occupants were long gone—American Jim Sutherland was among the first to attempt wide-scale automation. A Westinghouse power station engineer by day, Sutherland designed the Electronic Computing Home Operator (or ECHO IV) in his spare time during 1966. The machine managed the Sutherland family home accounts, calendar, air conditioning, and TV antennas, among other tasks. The phrase “smart home”—coined by the American Association of House Builders in 1984—is only a little younger than ECHO IV.
Yet, in 2022, mainstream smart-home adoption remains elusive. While the shipment numbers of connected home devices—think smart-speakers, -lights and -thermostats—number in the billions, they tend to be purchased by early tech-adopters. Analyst Statista1, for example, claims that just 14.2 percent of homes across the globe have embraced smart-home products.
The slow take-up is caused by complexity. Today, it’s almost impossible to walk into a store and walk out with a range of smart-home products that play nicely together. Even tech-savvy buyers struggle to get their smart-home products working. For example, early-adopters find a digital voice assistant from one manufacturer often falls over when trying to configure and control smart lights or an air-conditioning system built by another vendor. Without an informed choice of technology and smart-home ecosystems—such as Apple’s, Amazon’s, or Google’s—consumers seem to be forever toiling to keep finicky equipment connected. The average consumer has no chance. And neither does a realization of the fully-integrated smart home.
While around 14 different connectivity standards are vying for a share of the smart-home sector, BLUETOOTH® Low Energy (Bluetooth LE), Wi-Fi®, and Thread are forging ahead. But that’s little help to consumers because even these mainstream RF protocols are not interoperable.
Realizing that no single wireless connectivity standard is ever likely to emerge, the tech industry has come together to find an engineering solution that promises harmony. The 400-plus member group these companies have formed is called the Connectivity Standards Alliance (CSA). As of October 2022, the organization has announced the release of Matter 1.0, a smart-home protocol that promises to straighten out the current tangle of wireless connectivity.
Rather than introducing a competing standard, Matter complements the existing smart-home technologies of Thread and Wi-Fi (plus the Ethernet-wired protocol). Thread is a popular low-power protocol suitable for devices like thermostats and smart lights, while Wi-Fi supports higher-bandwidth products such as entry cameras. Bluetooth LE support is included primarily because of its interoperability with smartphones—thus allowing consumers to use their mobiles to commission and configure their new smart-home gadgets. For the technically minded, Matter adds a unifying application layer to the Wi-Fi, Thread, and Bluetooth LE protocol stacks that manufacturers can leverage to bring compatibility and interoperability to their products.
But perhaps more importantly, Matter promises simplicity for consumers. Instead of having to work out if a thermostat is Apple compatible, or if Google smart speaker can control a Yale smart lock, buyers can check for the Matter certification that ensures interoperability. And for manufacturers, the product development process is made easier because they can use a single standard for all their products, safe in the knowledge they’ll work with all major smart-home ecosystems.
Who’d have thought that fierce competitors like Apple, Amazon, Google, and Samsung would even sit around the same table, let alone work closely together for years on a solution to the smart home’s stultifying complexity? Cynics said it would never happen, and for a long time, it seemed they were right—for example, the Matter 1.0 standard faced several lengthy delays before it was adopted.
With some fanfare, the project was originally announced in 2019 as Project CHIP, and the standard was planned for release in late 2020. That was delayed into early 2021. Then in August 2021, following the rebrand to Matter, the standards release was pushed to mid-2022. Finally, because of problems with the Matter software development kit (SDK), Matter was released in late 2022.
The good news is that collaboration continued behind the scenes during the delays, and chip makers and end-product manufacturers worked hard on their hardware and software solutions ahead of the official launch. Because of that background work, today it’s possible to purchase Matter chips from a selection of silicon vendors mere weeks after the standard was been adopted. In addition, the certification labs are up and running, the SDK is available, and companies are lining up for Matter certification of their smart-home devices.
Now that Matter is here, manufacturers will be able to put much less effort into patches and workarounds to ensure their products work with others and focus more on innovation, security, and quality. In a decade or less, the smart home will be commonplace in the developed world, and it will be much more than a place where our voice controls the lights or a smart thermostat looks after the heating. Instead, Artificial Intelligence and machine learning will fine-tune automation so that energy bills drop, the electric car is only charged for the short journey it knows you’re taking tomorrow, the media room lights are set for movie night, and some paracetamol has been automatically ordered and delivered because your wearable has detected signs of an impending chill.
Ver versión para móvil
Centro de privacidad |
Términos y condiciones
Mapa del sitio web
Copyright ©2024 Mouser Electronics, Inc.
Mouser® y Mouser Electronics® son marcas de Mouser Electronics, Inc. en los Estados Unidos o en otros países.
Todas las demás marcas son propiedad de sus respectivos propietarios.
Sede corporativa y centro logístico en Mansfield, Texas, EE. UU.