Generating Data Traces for Intrusion Detection in Low-power IoT Mesh Networks

Niclas Finne, JeongGil Ko, and Thiemo Voigt

January 14, 2022

Given their remote deployment and distributed operational characteristics, low-power IoT mesh networks are prone to various attacks. Furthermore, since many of these networks carry user-sensitive data, such attacks must be captured early before compromising user privacy. Machine learning algorithms hold the potential to effectively model and capture various attack scenarios.

Several previous works have targeted to design intelligent software components for learning and exploiting various network attack patterns. Unfortunately, these studies are limited in supporting a wide range of network topologies, scenarios, and attacks. This shortcoming is mostly caused by insufficient data traces required to train attack-detecting machine learning algorithms. Towards this end, we have devised Multi-Trace, an extension to the widely used Cooja simulator. Multi-Trace enables the efficient generation of emulated attack traces used as training data for machine learning models designed to serve as intrusion detection mechanisms in IoT mesh networks.

Low-power IoT mesh networks form the foundation of many IoT systems that promise applications in high industrial and societal relevance domains such as healthcare, agriculture, aviation and aerospace, civil infrastructure monitoring, and process control in industrial settings. However, these networks also introduce new attack vectors mainly due to their constraints in memory, computing power and energy, and the lossy nature of wireless communication [1,2]. These new attack vectors include, for example, denial of service attacks through jamming that aim to degrade network performance by depleting devices' batteries by causing additional packet loss and delays. In addition, intrusion detection systems are designed to detect attackers that have managed to intrude into a network. However, the resource constraints of low-power embedded platforms challenge designing efficient intrusion detection systems [3]. Indeed, while intrusion detection systems typically exploit various machine learning techniques, a lot of data is needed for their training to be effective in various attack scenarios.

Challenges

Low-power IoT mesh networks hold distinct characteristics, which set them apart from other wireless network architectures, such as WiFi networks. While research in different domains has secured a noticeable amount of network data traces (e.g., NSL-KDD dataset [8]), such data is not yet available for IoT mesh networks. Instead, many studies employ self-created datasets, leading to a limited quantity to effectively validate the proposed scheme and insufficient for designing attack detection schemes for broader scenarios. Note that IoT mesh networks hold unique architecture in which a gateway node is often designed to overhear packets from a subset of individual nodes operating in its subnet (see Fig. 1, where the gateway is depicted as the green node and the green and gray regions represent the communication range and interference range of a sensor node). This sets them apart from conventional network architectures, where a single node can overhear all packet exchange activities by introducing diverse potential network topologies. This complicates the generation of generally-applicable attack traces for machine learning model training.

Figure 1: In IoT mesh networks, no entity can overhear all packets, challenging data collection.

Figure 1: In IoT mesh networks, no entity can overhear all packets, challenging data collection.

Multi-level Trace Generation

We modify the Cooja simulator [4] to support efficient trace generation. The Cooja simulator was originally designed to simulate networks of resource-constraint embedded devices running the Contiki operating system. Still, it is also able to run binaries of other operating systems. With Multi-Trace, we extract four different types of data from Cooja simulations. First, application developers can print data specific to their application using standard Contiki log messages. At this level, users have the full freedom to customize their log messages. Second, Cooja has a radio logger plugin that logs all data traffic in the network. This data is available in pcap format, a format originally used by the pcap API of tcpdump. Third, we enable radio transmission logging at the radio medium level. While the radio logger plugin logs all in-air messages, it does not include information about which nodes received a particular radio transmission. Specifically, it is difficult to derive which nodes received an omnidirectional broadcast message transmission using only the radio logger in multihop networks. Fourth, there is an event log for events during the simulation. For example, a simulation can log an event when the network has reached a steady state to make it easier to ignore the startup phase of a simulation or when an attack is started or stopped to indicate when an attack is active. Note that while we offer opportunities for logging at different levels, they all share a global simulation time, which facilitates the fusion of the information from heterogeneous logs.

Scenario-Setup and Attack Generation

Many machine learning algorithms require a significant amount of data traces. We hence need to perform many simulation runs, simulating different scenarios even for a single attack. The scenarios should differ in many aspects, including the number of nodes, the physical or logical distribution of the nodes, the traffic pattern and load, the node that executes the attack, and the application running on the nodes. Therefore, our trace generation facility includes a Python script that, based on a simulation, generates additional simulations with other properties such as new randomized topologies. We also provide a script that runs Cooja on a set of simulations to generate a data-trace for each simulation. This script sets up each simulation in Cooja and executes the simulation.

When generating data traces containing attacks, we typically do not start the attack before the network has stabilized. While there are multiple approaches to orchestrate attacks, we have decided to implement the attack as a module linked with all nodes. In addition, we have added a command-line interface (CLI) to the nodes. The CLI lets the simulation run commands on the nodes via each node's serial port. Hence, the simulation sends an "attack" command over the serial port to start or stop an attack. This way, the same node firmware can also be used on real hardware to perform the same experiments in a testbed. Finally, the attacks themselves are typically implemented in the simulated nodes. For example, to implement routing attacks in the Contiki-NG operating system, Contiki-NG provides a mechanism to add a hook into the network stack that will be called whenever an IP packet is sent or received. This hook can be used to inspect and modify IP packets and discard IP packets, as some routing attacks do.

Detecting Blackhole Attacks

We have recently applied Multi-Trace in generating data traces for machine learning algorithm input to detect blackhole attacks in IoT mesh networks [5]. Using the large data-trace quantity, we showed that using more and more diverse training data, the resulting intrusion detection model generalizes better than those trained with less training data. In addition, we showed that they also generalize well for larger topologies with more IoT devices.

Acknowledgment

This work was supported by the ITEA 3 project STACK funded by the Swedish Innovation Agency VINNOVA and by the Korean Ministry of Trade, Industry and Energy/Korea Institute for Advancement of Technology through the International Cooperative R&D program (Project No. P0016150).

Multi-Trace is available at https://github.com/STACK-ITEA-Project/.

References

A. Arıs, S. F. Oktug, and T. Voigt, "Security of internet of things for a reliable internet of services," Autonomous Control for a Reliable Internet of Services, 2018.
E. Boo, S. Raza, J. Höglund, J. Ko. "FDTLS: Supporting DTLS-based Combined Storage and Communication Security for IoT Devices," IEEE International Conference on Mobile Ad-hoc and Smart Systems (IEEE MASS), 2019.
I. Butun, S. D. Morgera, and R. Sankar, "A survey of intrusion detection systems in wireless sensor networks," IEEE communications surveys & tutorials, vol. 16, no. 1, pp. 266–282, 2013.
F. Österlind, A. Dunkels, J. Eriksson, N. Finne, and T. Voigt, "Cross-level sensor network simulation with cooja," in Proceedings of the First IEEE International Workshop on Practical Issues in Building Sensor Network Applications (SenseApp), 2006.
H. Keipour, S. Hazra, N. Finne, and Thiemo Voigt, "Generalizing Supervised Learning for Intrusion Detection in IoT Mesh Networks", First International Conference on Ubiquitous Security (UbiSec), 2021.

Niclas Finne is a senior researcher at RISE Computer Science. His current research interests are in the area of networked embedded systems and the Internet of Things. He is one of the main contributors to developing the Contiki-NG operating system and the network simulator Cooja and the main author of the Multi-Trace tool.

JeongGil Ko is an associate professor in the School of Integrated Technology, College of Engineering at Yonsei University. He received his B.Eng. degree in computer science and engineering from Korea University in 2007 and his Ph.D. degree in Computer Science from Johns Hopkins University in 2012. He is a recipient of the Abel Wolman Fellowship awarded by the Whiting School of Engineering at Johns Hopkins University in 2007 and a senior member of the IEEE since 2017. He has served on the program committee for many top conferences in the mobile and ubiquitous computing field (ACM MobiCom, MobiSys, SenSys, IEEE PerCom in particular). In addition, he is an associate editor for several academic journals, including PACM IMWUT. His research interests are in the general area of developing embedded and mobile computing systems with ambient intelligence.

Thiemo Voigt received the Ph.D. degree from Uppsala University, Sweden, in 2002. He is a Professor of computer science at the Department of Information Technology, Uppsala University. He is also a research leader at RISE Computer Science. His current research focuses on system software for embedded networked devices and the Internet of Things. He is the author or co-author of more than 230 reviewed publications, and his work has been cited more than 17500 times and received awards at multiple conferences.

Please sign in to comment.

IEEE Internet of Things