Deployment of Sound Sensor in Smart Nation Projects

Thi Ngoc Tho Nguyen, Furi Andi Karnapi, and Woon-Seng Gan
May 15, 2018


According to a study on smart cities by Juniper Research and Intel, Singapore is ranked first among the top 20 global smart cities in 2017, ahead of other big cities such as London, New York, and San Francisco. The Global Smart City Performance Index measures how technology used in smart city saves time for its citizens and improves their life quality. Singapore topped the charts in all four measured areas - mobility, healthcare, public safety, and productivity, largely thanks to its Smart Nation initiative, which aims to harness digital technologies to make Singapore not only an economically competitive city but also a liveable home for its people.

One of the five pillars of the Smart Nation initiative is the Smart Nation Sensor Platform, of which the objective is to deploy sensors and other IoT (Internet of Things) devices across the island. In this platform, a sound sensor network is one of the focuses. What is a sound sensor network? How would a sound sensor network benefit the people? Are the required technologies ready for such ideas to be implemented? What are the remaining challenges? We will discuss these matters in the context of smart city with Singapore as an example.

Sound Sensor Network

A sound sensor network is a wireless sensor network where each node includes a single or an array of microphones. The functionality of each node can include one or a combination of the following information: sound pressure level, sound event/type classification, directional-of-arrival, and location of the main sound source. To reduce cost and increase performance, scalability, and reliability, it is often desirable to process the sound signals at the sensor nodes using a low-cost embedded processor. Instead of the raw sound data, the extracted sound information is sent via some gateway to the central server, where the metadata is aggregated for further analysis and visualization. These sensor nodes can be powered 24/7 by using a rechargeable battery through solar panels or tapping from the power supply in a lamp post.

Applications of a Sound Sensor Network

A lamp post provides a convenient structure to mount different environmental sensors. Different types of data can be collected, aggregated, and analyzed across different sensors and across different lamp posts. Sound sensors can be a useful addition to the smart lamp post sensor setup in Singapore. Sound sensors not only sense the noise information in its vicinity, but also provide soundscape information of the city and detect abnormal events such as car crash, explosion, and aggression. Sound sensors with their omni-directional characteristic can detect sound events faster than visual sensors and are not affected by lighting condition. Furthermore, the fusion between sound and vision sensors can be beneficial towards a better surveillance system. The direction-of-arrival information obtained from microphone array can help steer surveillance cameras towards the sound events. This will effectively increase the coverage of surveillance cameras to improve public safety without installing cameras at every corner.

Audio Signal Processing and IoT Technology

To detect sound source direction and location, a microphone array is required. Sound source localization is a matured field in audio signal processing. The main techniques to estimate the sound source direction-of-arrival and location can be divided into three categories: beamformer, high-resolution subspace, and time-different-of-arrival. Generally, the estimation accuracy increases as the number of microphone increases, at the expense of more computational load. With suitable array geometries and estimation algorithms, the localization can be achieved in real time with sufficient accuracy using as few as four microphones.

A majority of the current state-of-the-art models for sound event detection and classification are trained in a supervised manner using deep leaning. In sound event classification, the system returns one or multiple classes of sound that are present in some analysis time window. In sound event detection, the system detects the starting and ending time of sound events as well as their classes. The main steps to build a sound detection and/or classification system are data collection, model training, and model evaluation. Since the deep learning model is trained in a supervised manner in which the labels of data are required, the data collection process is often the most time-consuming step. To achieve a good accuracy, it is desired that the training data closely match the condition at which the sensors are deployed. Both convolutional neural network (CNN) and recurrent neural network (RNN) have been successfully used to train sound event detection and classification system. Evaluation on some public datasets show that deep learning can train 50 classes of sound, but it is feasible to train more classes with a larger dataset. Generally, the larger the model, the larger number of sound classes it can handle at the cost of more memory and computational time. With proper dataset and model architecture, the system can run in real-time with adequate accuracy, even on some small computers, such as Raspberry Pi. A picture of our sound sensing nodes with wireless connectivity to the central server for visualization is shown in the figure below.

Figure 1: Sound sensor nodes with wireless connectivity to a central server.
Figure 1: Sound sensor nodes with wireless connectivity to a central server.


With the advent of highly integrated semiconductor technology and rapid global adoption of wireless technology, like Wi-Fi, IoT has become a common sight in everyday life. The economy of scale for such global adoption has brought the cost for a single integrated MCU with built-in Wi-Fi capability to below US$3 for low volume purchase, which spawns huge growth of IoT capable devices in the consumer market. New ideas and areas are kept being developed as IoT products. Enabling connectivity to many existing sensors/products have brought upon us the convenience of knowing sensor data/environmental situation remotely in real-time (almost no perceivable delay). This new breadth of data turns company from predictive into prescriptive actions as decision making is based on a wider spectrum of data. Integrating IoT into the sound classifier, inference result can be reported back to cloud for centralized data access point with ease at relatively low cost. The type of connectivity choice is decided based on target deployment location. For indoor, the choice is Wi-Fi due to the easily available local Wi-Fi network; whereas for outdoor, the choice, for now, is through 3G GSM modem. For smart city implementation, it is expected that deployment will be in a location where coverage of GSM signals is sufficient.


The development and deployment of sound sensor networks face several challenges. In term of technology, the sound events are diversified and often overlapped. The audio signal processing algorithms must consider the constantly changing background noise, reverberation, and overlapping events to improve the accuracy, and lower the false alarm of the algorithms. These problems are still ongoing research topics. To our best knowledge, there is no algorithm at this moment that can estimate direction/location of multiple sound sources and assign the sound class to each direction in real-time using embedded processors. In term of security and privacy, like many other IoT application, the sound sensor network is also vulnerable to data breaches and cyber attacks. To protect the privacy of citizens, it is best to process all the raw sound signals at the sensor nodes and only send the outputs to the central server. In addition, the sensor nodes also need to be secured against hackers. According to Intel, IoT technologies play a key point in building smart cities. Therefore, if we can tackle these challenges, with many useful applications, sound sensor networks are promising for improving public safety and life quality.



Thi Ngoc Tho NguyenThi Ngoc Tho Nguyen is a Ph.D. student at the Nanyang Technological University (NTU) in Singapore. Prior to joining NTU, she worked at University of Illinois Research Center in Singapore for five years as a research engineer. Her research interests are machine learning, audio signal processing, array signal processing, sound source localization, and real-time processing. She can be contacted at




Furi Andi KarnapiFuri Andi Karnapi is currently a senior research engineer under Smart Nation Translational Lab in Nanyang Technological University (NTU) in Singapore. He received M.Eng in Information Engineering from School of Electrical and Electronics Engineer, Nanyang Technological University, Singapore in 2003. Prior to joining NTU, he has worked in sales/marketing/application engineer, mainly in the semiconductor and electronics manufacturing sector. His current research interests are embedded design, IoT applications, audio signal processing, machine learning, and real-time processing. He can be contacted at



Woon-Seng GanWoon-Seng Gan is a Professor in the School of Electrical and Electronic Engineering at the Nanyang Technological University (NTU) in Singapore. He is currently leading the Singapore Smart Nation Translational Laboratory activities at NTU and is responsible for several IoT projects in deploying sensors in indoor and outdoor infrastructure. His research has been concerned with the connections between the physical world, signal processing, and sound control, which resulted in the practical demonstration and licensing of spatial audio algorithms, directional sound beam, and active noise control for headphones. He has published 3 books and more than 300 international refereed journal and conference papers and has translated his research into 6 granted patents. He can be contacted at