CityVerve: Smarter Transport through Data Analytics

John Davies, Sandra Stincic Clarke, and Irene McAleese
September 18, 2018

 

With the costs of congestion estimated at £1.5bn per annum, air quality issues and the health consequences of inactivity, the city of Greater Manchester has an objective to create a modal shift towards cycling, from 2% of journeys currently, to 10% of journeys taken by bike by 2025. With 30% of car journeys at under 1km distance, there is a significant potential for change if the right cycling environment can be provided.

Data Collection

The pilot involved 180 cyclists. The user group was selected from more than 400 volunteers to provide a representative demographic sample. The light has a set of sensors and can collect anonymised data about bike journeys, including location and bike motion in three dimensions (that is, not only forward velocity but also tilt motion and vertical motion). See.Sense’s lights shine both in daylight and at night time; and react to moments when a cyclist may be at risk (such as at a junction or intersection) by automatically flashing more strongly and quickly. An associated mobile app also lets cyclists customise their lights and send and receive low-battery, crash and theft alerts.

This data was combined with a range of other city data aggregated on the BT CityVerve Data Hub1.

Data Privacy and Participation

As part of the pilot design, mapping between the participants and the smart lights used was not recorded. Additionally, the data privacy of the participants was maintained by sharing anonymised and aggregated data only. Participants were encouraged to set geo-fenced privacy zones around areas of home or work where they did not wish data to be collected. There has been a high participation rate with over 75% of cyclists actively collecting data, with over 4000 journeys recorded and over 25,000kms logged and over 385 travel issues recorded. The trial results clearly showed representative data from a broad range of cyclists and was not skewed to a particular segment. This is important for planners who want to encourage more women, children and the elderly to cycle.

Data Analytics

Data aggregated across all participants was analysed to provide useful insight for the city with the goal of informing future planning and investment. Presented here are some key examples of this data, specifically road surface quality information and cyclist directionality analysis.

Figure 1: Road surface quality data.
Figure 1 : Road surface quality data.

 

Figure 1 shows road surface quality data automatically generated within Manchester city centre categorised into three distinct classes: green - smooth road which is pleasant to cycle on, orange - a road that has a reasonably rough surface and is less pleasant to cycle on but sufficient for low speed commutes and red - a road considered to be very rough with significant cracks and defects or a cobblestone street. A machine learning classifier is used to calculate road surface quality based on a manually collected training set.  This citywide view of the road surface is the first time a crowdsourced dataset has been used in this manner and allows city officials to understand where cyclists experience the roughest surfaces within the city. Action can then be taken to improve the road surface, not just for the comfort of cyclists, but for all road users. Highlighted is a particularly poor stretch of road which is extremely degraded most likely due to recurring maintenance work (see Figure 2).

Figure 2: A detected stretch of road with poor quality.
Figure 2 : A detected stretch of road with poor quality.

 

Another element of the data collection trial is the GPS data which is collected once per second during a cyclist’s journey. This high temporal resolution yields highly accurate information pertaining to the exact route of the cyclist as they navigate through the ever-changing urban environment. Figure 3 illustrates the directionality of the cyclists along two segments of road and can be used to assess the usage of the road by cyclists. Figure 3 is a segment along Oxford Road in Manchester which has undergone major redevelopment in order to create separated bicycle lanes. Yellow lines indicate journeys by cyclists heading north into the city centre and blue represents cyclist heading south away from the city centre. There is a clear delineation between the two directions and it is clear that cyclists can use the lanes in the direction of travel without needing to manoeuvre out of the cycle lane.

Figure 3: Directionality of the cyclists in Oxford Road.
Figure 3 : Directionality of the cyclists in Oxford Road.

 

Figure 4 is another segment of the road less than a mile away where there are no separated cycle lanes and cyclists travelling in both directions routinely move from one side of the road to the another while traversing this area. This behaviour not only slows the cyclist down but can impact upon their comfort and safety. An average speed of 19.3km/h for the journeys portrayed in Figure 3 and 12.7km/h for Figure 4 illustrates this point and details how the improved cycle infrastructure enhances a cyclist’s journey.

Figure 4: Directionality of the cyclists near St Peters Square.
Figure 4 : Directionality of the cyclists near St Peters Square.

 

Near Real Time Insight

Data collected during the pilot has also been combined with other relevant data sets on the BT CityVerve Data Hub – in particular with information about cycling infrastructure, cycle use, and other traffic and environmental data. This data has been pulled into near real time visualisation and analysis system to demonstrate the potential this kind of analysis can offer city planners (Figure 5).

Figure 5: Cycle Journeys with Infrastructure used and Speed vs Traffic Lights.
Figure 5 : Cycle Journeys with Infrastructure used and Speed vs Traffic Lights.

 

Conclusions

Insights gathered in this pilot derive from the analysis and visualisation of multiple data sources and they have been crucial for engagement of stakeholders, both in the city and with cyclists.   Implementation of actions based on the insights gained will be at a later stage, whereby the findings can be translated into policy and actions to implement change. It will be important for the city to provide a feed-back loop to demonstrate a response to the data and insights contributed by the cycling community in order to maintain the trust and ongoing commitment of the cycling community.

john daviesJohn Davies is Chief Researcher in BT’s Research & Innovation department, where he leads a team focussed on Internet of Things technologies. He has a strong track record of researching and innovating and his current research interests include the application of Internet of Things and semantic technologies to smart cities, smart transport, business intelligence and information integration. He currently leads BT’s contribution to the Manchester-based CityVerve IoT smart city programme and he co-wrote the Hypercat IoT standard. John has authored over 80 scientific publications and is the inventor of several patents. He is a Fellow of the British Computer Society and a Chartered Engineer. He is a visiting professor at the Open University and holds a PhD in Artificial Intelligence from University of Essex, UK.

 

sandra stincic clarkeSandra Stincic-Clarke is Principal Researcher in British Telecommunications’ Future Business Technology, where her work is focussed mainly on the Internet of Things space.  Her research interests include IoT, Distributed Systems and Web Services and their industrial application. Sandra provides thought leadership on IoT technologies to BT and has been involved in a number of large collaborative research programmes with national, EU and international funding, most recently managing BT’s contribution to the CityVerve project. She has won several industry awards for her work on IOT and smart cities and has a number of publications in aforementioned areas.

 

irene mcaleeseIrene McAleese is Co-founder and Chief Strategy Officer of cycling technology and data company, See.Sense. Irene leads the company's developing Smart Cities practice and brings years of consulting experience to the role, having led change and workforce transformation initiatives in transport, telecommunications, and resource industries around the world.  Irene is passionate about harnessing technology and data to create safer and smarter cities, and is the Winner of the NI Women in Business Award for Best Small Business.

 

 

Avoid Another IoT Tower of Babel!

Lindsay Frost, Duncan Bees, and Martin Bauer
September 18, 2018

 

Every reader of this newsletter knows that the world is facing ever-higher pressure to use technologies such as Internet of Things (IoT) and Artificial Intelligence (AI) to help solve problems due to global trends like increased urbanization, failing road and building infrastructures, greatly increased care for the elderly, etc. etc. Everyone also knows the story of an older infrastructure project that did not go well: the Tower of Babel [1]. There, an attempt to reach heaven by building a single tower, motivated by pride, was punished by the people being condemned to speak in mutually incomprehensible languages and being dispersed around the world.

IoT today is like that post-Babel world, where each disparate group started again to build its own vertical tower, each solution striving to be the first or the best, with data collected in a plethora of proprietary formats, and with huge variations in how the meaning of the data is defined and exchanged. Simply transporting the data using Internet Protocol channels over 3G, 5G, broadband WLAN or fibre is not the problem. The problem is giving or selling the data to the recipient, if it is unclear what the information means, how it was collected, when, where, with what quality, under what license, and so on and on.

Figure 1: Tower of Babel, Pieter Bruegel the Elder, 1563 [2].
Figure 1: Tower of Babel, Pieter Bruegel the Elder, 1563 [2].

 

The Linked Data approach for sharing data on the web, famous for the Berners-Lee “Five Stars” description [3] of best practice (i.e. open license, machine-readable, non-proprietary format, open standards for identification and definition, linked to other widely accepted definitions) does help, but has not yet been sufficiently widely adopted in IoT.

There are, however, many initiatives to re-use and link to appropriate definitions, wherever a community of users see an urgent need. For example, www.schema.org is a collaborative approach for metadata in webpages, supported by internet search-engine companies. Another example is the Open Metadata Registry [4], which provides a means to register and discover machine-readable vocabularies: currently about 460. The Basel University helpfully lists about 2800 various other sets of definitions [5], called ontologies. In the European Union, with so many borders and languages for information to cross, the EU parliament in 2007 stepped in to require national governments to at least use a common set of geo-spatial and environmental data definitions: that INSPIRE Directive [6] has triggered many activities.

But in a post-Babel world, insisting that all partners use a common vocabulary, as attractive and efficient as that would be, is slow to succeed. Another approach is to provide a way for each partner to point (link to) the definitions and licensing etc used in their data. This makes the metadata details and assumptions, implicit within the original “vertical silo” of the collecting organisation, accessible to third parties and therefore makes reliable sharing of the information possible. This greatly simplifies integration work in e.g. Smart City projects. It would also enormously reduce time wasted “cleaning up” raw data for the “data lakes” currently extolled as training grounds for AI recommendation systems.

This ‘link to definitions’ approach is taken by the ETSI Industry Specification Group for cross-cutting Context Information Management, called for short ETSI ISG CIM [7]. An ISG is a special working group of ETSI that is set up to allow participation by non-ETSI members, but with IPR and transparency of operations handled according to usual ETSI rules.

ETSI ISG CIM sees a need for an API (called by the group NGSI-LD) to exchange the definitions/metadata/context, not only for IoT information, but also for Linked Data from numerous municipal and national repositories, for diverse input from mobile devices, plus handling of requests from users (or their Apps) to access the data and for software to check the provenance (source, licensing, “freshness” etc.) of information. In real world systems, the tracking of usage of data is also important, not only for commercial billing or government usage-based budgeting, but also for exploring patterns of usage, planning of improvements and checking of consistency. Figure 2 shows the six kinds of information currently considered and indicates the role of NGSI-LD in enabling context information discovery and exchange.

Figure 2: ETSI ISG CIM view of integration of six kinds of information.
Figure 2: ETSI ISG CIM view of integration of six kinds of information.

 

The use cases considered by ETSI ISG CIM (Smart City, Smart Agriculture and other systems), plus best practice in semantic web applications, requires the NGSI-LD API to be agnostic to the kind of data architecture used (centralized, distributed, federated) and to enable migration from one architecture to another. This has in turn implied being extremely flexible about the kind of “object identifier” used in the system, so Uniform Resource Indicators (URIs) [8] are used. These need to be unique text strings, but do not require special hierarchical naming (familiar from URLs) and also do not mandate that the resource must be reachable via the public internet – although of course that is always an option.

There was also a strong desire to leverage expertise in the semantic web and the developer communities by, firstly, basing the exchange protocol on JSON-LD [9] and, secondly, choosing a formatting which is also “human friendly”. A preliminary version of the specification was published for comment in March 2018 [10]. A simple example is shown in Figure 3, showing that a person ‘Bob’ has observed at a certain time that a certain vehicle, of type ‘Mercedes’, is parked at a certain location ‘Downtown1’. The actual definitions of all those terms are to be found in re-usable form at the location referenced by the JSON-LD term ‘@context’.

 

{
"id": "urn:ngsi-ld:Vehicle:A4567",
"type": "Vehicle",
"brandName": {
"type": "Property",
"value": "Mercedes"
},
"isParked": {
"type": "Relationship",
"object": "urn:ngsi-ld:OffStreetParking:Downtown1",
"observedAt": "2017-07-29T12:00:04",
"providedBy": {
"type": "Relationship",
"object": "urn:ngsi-ld:Person:Bob"
}
},
"@context": [
"http://uri.etsi.org/ngsi-ld/coreContext.jsonld",
"http://example.org/cim/commonTerms.jsonld",
"http://example.org/cim/vehicle.jsonld",
"http://example.org/cim/parking.jsonld"
]
}

Figure 3: Example of a NGSI-LD statement.


The next steps for the ETSI ISG CIM work are to enhance the specification’s query capabilities, security, and other aspects, with publication as a document GS CIM 009 Version 1.1.0 expected by December 2018. An ‘NGSI-LD Primer’, introducing usage of the specification, will also be published. Additional feedback from the IoT community is very welcome (e.g. using NGSI-LD@etsi.org). The next public presentations will occur at the ETSI IoT Week conference in October [11].

References

[1] See https://en.wikipedia.org/wiki/Tower_of_Babel
[2] Pieter Bruegel the Elder. “The Tower of Babel”, 1563, Oil on wood panel, 1,14m (44.9 in) by 1.55m (61 in). Public domain image, via Wikimedia Commons, Accessed 26th August 2017 at https://commons.wikimedia.org/wiki/File:Pieter_Bruegel_the_Elder_-_The_Tower_of_Babel_(Vienna)_-_Google_Art_Project_-_edited.jpg
[3] Berners-Lee, Tim, ‘Linked Data’, Published 27th July 2006. Accessed 26th August 2018 at https://www.w3.org/DesignIssues/LinkedData.html
[4] Open Metadata Registry, see http://metadataregistry.org/about.html . Accessed 26th August 2018.
[5] Basel Register of Thesauri, Ontologies & Classifications (BARTOC) is a database of Knowledge Organization Systems and KOS related Registries, developed by the Basel University Library, Switzerland. See http://bartoc.org/en
[6] Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE). Accessed 26th August 2018 at  https://eur-lex.europa.eu/legal-content/en/TXT/?uri=CELEX:32007L0002
[7] European Telecommunications Standards Institute (ETSI) Industry Specification Group (ISG) for cross-cutting Context Information Management (CIM). See https://portal.etsi.org/CIM
[8] Berners-Lee, Tim, R. Fielding, L. Masinter. ‘Uniform Resource Identifier (URI): Generic Syntax’. Published January 2005. Accessed 27th August 2018 at  https://tools.ietf.org/html/rfc3986. See also https://tools.ietf.org/html/rfc7320
[9] JSON-LD specification is currently in version 1.0, published in January 2014 at https://www.w3.org/TR/json-ld/  Note that improvements are being considered in the W3C JSON-LD Working Group https://www.w3.org/2018/json-ld-wg/ 
[10] ETSI ISG CIM ‘Context Information Management Application Programming Interface (API)’. Published March 2018 at https://docbox.etsi.org/ISG/CIM/Open/ISG_CIM_NGSI-LD_API_Draft_for_public_review.pdf
[11] ETSI IoT Week is an evolution of the annual M2M/IoT workshops, for anyone involved in standards-enabled IoT technologies and deployments. See https://www.etsi.org/etsi-iot-week-2018

 


 

Lindsay FrostLindsay Frost is Chief Standardization Engineer at NEC Laboratories Europe. He was elected chairman of ETSI ISG CIM in February 2017, elected to the Board of ETSI in November 2017 and is ETSI delegate to the sub-committee of the EC Multi-Stakeholder Platform (Digitizing European Industry) and to the CEN-CENELEC-ETSI Sector Forum on Smart and Sustainable Cities and Communities. He began his career as a research manager in experimental physics facilities in Australia, Germany and Italy, before joining NEC in 1999 where he has managed R&D teams for 3GPP, WiMAX, fixed-mobile convergence and WLAN. Contact him at Lindsay.Frost@neclab.eu

 

Duncan BeesDuncan Bees carries out technical and business projects in telecommunications, IoT, media streaming, and broadband infrastructure in Vancouver, Canada. He has led wireless baseband signal processing development teams, product planning for communications semiconductors, strategic planning for the Digital Living Network Association, and was Chief Technology and Business Officer of the Home Gateway Initiative (HGI). He holds the degrees of Master of Electrical Engineering (digital signal processing) from McGill University, and a Bachelor of Applied Science from the University of British Columbia. Contact him at duncan@duncanbeestechnologies.com

 

Martin BauerMartin Bauer is Senior Researcher at NEC Laboratories Europe. He has been working on activities related to Internet of Things, Context Management and Semantics for more than 15 years. Recently he has been working on IoT-related European projects in the area of smart city and autonomous driving, as well as IoT-related standardization activities, in particular oneM2M and ETSI ISG CIM. He is the AIOTI WG03 sub-group chair for the semantic interoperability topic. He holds a doctorate degree in Computer Science from Stuttgart University and Master of Science degrees in Computer Science from both the University of Oregon and Stuttgart University. Contact him at Martin.Bauer@neclab.eu

 

 

How the IoT Can Improve Customer Experience Profits and More in Automotive Retail

John Shah, Petr Gotthard, and Tomáš Jankech
September 18, 2018

 

Businesses see massive opportunities in the IoT (Internet of Things) but are also aware of significant challenges. Quantifying the Return of Investment (ROI) and finding a clear use case has been identified [1] as the most immediate challenge for IoT professionals. In this article, we show how the IoT can be used in a concrete and viable business scenario to improve customer experience in an example based on an automotive retailer.

The automotive industry faces many challenges to increase turnover, revenues and profitability. For retailers (i.e. the car dealerships), these challenges will be met by increasing the number of cars and the related services that they sell (routine maintenance, repair services).  However, sales are influenced by the customer experience (CX) and while a good one may help dealerships sell more cars, a bad experience will certainly not.  This is important as, until recently, typically customers visited a dealership four times before making a purchase.  Now, over half of the cars are sold on the first visit [2][3] and 67% within two visits [4]; the CX on the first visit clearly matters.

A frequent CX problem for dealerships with over 400 cars per site is locating the ones that the customers want to see or test-drive.  For various reasons, cars are often parked in ‘the wrong place’ (for example, after previous test-drives).  The longer before they are found, the worse the customer experience becomes.  This results in ‘lost sales’ and the more sales affected per site, the greater the financial loss.

Figure 1: Estimated Loss of Revenue and Profit, Per 50-Week Year, at Different Sale Conversion Rates.

Figure 1: Estimated Loss of Revenue and Profit, Per 50-Week Year, at Different Sale Conversion Rates.

 

Figure 1 quantifies the loss of revenue and profit, per 50-week year, for a used car supermarket.  The site has 30 salespeople.  On average, every salesperson will be unable to find a car, without a lengthy delay, four times a week, potentially resulting in a ‘lost sale’.  This is 120 potential ‘lost sale’ events per week (30 x 4 = 120), or 6,000 per 50-week year (120 x 50).  The average sale price is £10,000, incorporating a 5.6% profit margin. However, not all events are true ‘lost sales’, because not all customer visits result in a car being sold.  A more accurate estimate of the number of ‘lost sales’ comes from the customer visit-to-sale conversion rates cited by dealers themselves.  These are that over half of the cars are bought on the first visit [2][3] (indicated by the values of 50% and 60%, with the ‘true’ figure expected to lie somewhere in this range), and 67% within two visits [4]. The figure of 10% gives a more conservative estimated loss, reflecting that even if many ‘lost sales’ are recovered throughout the year, losses are still significant.

Lengthy delays also result in lower customer review ratings, lower net promoter scores (NPS) and damage dealers’ reputations.  In turn, when reputational damage causes dealerships to sell fewer cars, manufacturers make lower returns as a result.

Konica Minolta Laboratory Europe is developing an IoT-based solution, named The Shepherd, to help overcome these challenges. It accurately enough locates vehicles, both indoors and outdoors, to within range of an electronic car key/fob.  Just as shepherds need to know where all of their flock are, car retailers need to know where all of their vehicles are.  This allows them to be found, test-driven and then sold, quickly.

The Shepherd locates the vehicles without extensive power or infrastructure requirements, meaning no expensive set-up, service, maintenance or running costs.

  • Each vehicle is fitted with a small, battery-powered device (see Figure 2), which hangs from the rear-view mirror and uses radio signals to measure its distance to similar devices in the surrounding vehicles.
  • The relative distance measurements from the digital ‘flock’ are wirelessly transmitted via LoRaWAN to a central server that calculates an absolute position of each vehicle.
Figure 2: Tracking device (approx. 2cm x 3cm x 4cm).

Figure 2: Tracking device (approx. 2cm x 3cm x 4cm).

 

One LoRa Gateway can cover a parking area of 2 km radius.  Provided each of the tracked vehicles is within 20 m of at least three neighbouring vehicles, the tracking devices form a mesh network, in which any car in the flock can be located to within 10 – 15 m.  The use of a sensor network localization provides the lowest deployment and operating costs while preserving sufficient accuracy:

  • Unlike Bluetooth / Bluetooth Low Energy (BLE) it does not require any beacons distributed across the site.
  • Unlike GPS it works under the cover of a roof and in multi-floor parking houses, and even if the cars are covered by snow.
  • Unlike Ultra Wideband (UWB) or Wi-Fi the tag can run on a small battery for as long as 3 years.

When a vehicle arrives, The Shepherd discovers its position in a flock within five minutes.  This provides near real-time location updates that can be shown on a user’s IT system, or tablet or smart phone, via an app.  The tracking devices are conveniently reused after a vehicle is sold.

By locating any car within the parking area, the dealers can:

  • Improve their CX, customer review ratings, NPS and brand reputation.
  • Reduce the number of ‘lost sales’ caused by lengthy delays, increase inventory turnover and boost sales of higher-margin services, including finance, warranties, insurance, service and maintenance, and valeting.
  • Reclaim employee time and increase the return on capital employed.

After integrating the location information into the dealership’s or another information management system and augmenting the existing inventory information The Shepherd will also:

  • Accurately display the movement of cars within or between geographical areas.
  • Automatically register deliveries and transfers on arrival, to save time spent on manual recording and data entry.
  • Provide data analytics on test-drive frequencies, to record consumer trends or optimise stock management.

The Shepherd is currently under development and can only be offered on a trial basis at this time, as either a hosted or managed service. We are interested in getting feedback from potential users to confirm the value and usefulness of our development.

References

[1] Canonical, (2017). Defining IoT Business Models, at https://pages.ubuntu.com/IOT_IoTReport2017.html.
[2] McKinsey & Company, (2014). Innovating automotive retail.  Downloaded from the World Wide Web on April 24th, 2018 at: https://www.mckinsey.com/industries/automotive-and-assembly/our-insights/innovating-automotive-retail.
[3] Marshall Motors, (2018). Personal communication.
[4] Inchcape PLC, (2018). Annual Report and Accounts 2017, p. 6.  Downloaded from the World Wide Web on April 12th, 2018 at: http://www.annualreports.com/HostedData/AnnualReports/PDF/LSE_INCH_2017.pdf.

 


 

John ShahJohn Shah led commercialisation of The Shepherd as an Incubation Manager at Konica Minolta's Business Innovation Centre, Europe.  He has 10 years experience in leadership roles, developing new products and services for the automotive, energy, utility, medical imaging and biopharmaceutical sectors.  He received his B.Sc. in Molecular Biology from the University of London and carried out research in experimental medicine at Oxford University's Department of Pharmacology.  He can be contacted at johnshah3000@gmail.com.

 

Petr GotthardPetr Gotthard is a research specialist for the internet of things in Konica Minolta Laboratory Europe. He is focused on design and development of affordable LoRaWAN solutions. Petr has fifteen years’ experience in software engineering, solutions architecture, systems integration and leadership roles in high-tech companies. He received his MS in electrical engineering and computer science from the Brno University of Technology. Contact him at petr.gotthard@konicaminolta.cz.

 

Tomas JankechTomáš Jankech is a research engineer for embedded devices in Konica Minolta Laboratory Europe. He is focused on design of low-power wireless devices. Tomáš received his MS in electronics and communication from the Brno University of Technology. Contact him at tomas.jankech@konicaminolta.cz.

 

 

 

 

Towards Decomposed Data Analytics in Fog Enabled IoT Deployments

Mohit Taneja, Nikita Jalodia, and Alan Davy
September 18, 2018

 

With the exponential growth rate of technology, the future of all activities involves an omnipresence of widely connected devices, or as we better know it, the ‘Internet of Things (IoT)’. In its report [1], McKinsey estimates a user base with 1 trillion interconnected IoT devices by 2025; while the recent publications [2] by Cisco in June 2017 indicate that we have already reached the Zettabyte Era, and the number of devices connected to the Internet is growing exponentially.

The increasing range of real-world IoT deployments essentially increase the sources of data generation,  thereby globally strengthening the challenges already being faced in the Big Data space [3], particularly regarding moving data from one end (i.e. from data sources such as sensor/IoT devices at the edge level of infrastructure) to the other extreme end (i.e. centralized data centres at the cloud) in the network infrastructure. Sending the entire data set across the extreme ends in the infrastructure becomes an unrealistic solution, specifically in scenarios with constrained network bandwidth and low/no internet connectivity.  Instead, approaches that collect data and perform computational processing near the source of data itself present a more practical alternative to such scenarios, and is beneficial for a number of reasons such as in cases of video, whose transport across infrastructure can claim considerable network resources such as the requirement for storage at each node from source to destinationWhile IoT deployments vary across use cases, the most prominently common underlying aim is to analyse the data generated from the devices to achieve a specific set objective.

Fog Computing, IoT and Decomposition of Data Analytics Computing Programs

In the existing approaches for data analytics in IoT, all data from an IoT deployment is collected at a centralized location such as server(s) in data centre (i.e. cloud) and is then subjected to the desired data analytics model to generate value. Data in these IoT deployments moves from ‘things’ to cloud, and along this continuum passes through a number of network devices such as routers, gateways, etc. Each of these devices can be a potential candidate to host partial computing analytics capability to analyse the data, and further sending the calculated partial results instead of sending the raw data to cloud [4]. The edge of the network in such deployments can act as a potential site to host what we call ‘decomposed analytic computing units’ (Figure 1) to reduce the amount of data being transferred to cloud, and also to maximize the quality of analytics results by having the localized contextual information at hand while performing analytics operations.

Fog computing has recently emerged as a potential architecture for scaling IoT network applications. It aims to provide computing resources and services closer to the end devices at the edge of the network along things to cloud continuum, and thus appears to be a perfect paradigm fit for the desired decomposition of data analytics programs in the IoT ecosystem. Depending on the IoT deployment, a fog node can range from a dedicated industrial router/gateway to a smartphone, a wearable smartwatch, and so on.

Post the decomposition of data analytics and machine learning computing programs to run on resource constrained devices along the things to cloud continuum, a further futuristic vision is where the decomposition itself is automated and happens dynamically during runtime.

Figure 1: Decomposition of computing program into single computing units and placing those computing units onto different fog nodes in the infrastructure. Note that the infrastructure architecture considered is most common and widely used three tier IoT-Fog-Cloud (with multi-tier fog).

Figure 1: Decomposition of computing program into single computing units and placing those computing units onto different fog nodes in the infrastructure.

Note that the infrastructure architecture considered is most common and widely used three tier IoT-Fog-Cloud (with multi-tier fog).

Challenges

There are number of challenges associated with decomposing computing programs to run between edge/fog and cloud, major ones of which include:

  1. Decomposition methods: The existing methods for distributing operations onto homogenous nodes are insufficient for fog assisted IoT setting due to heterogeneous nature of fog and cloud nodes. Moreover, existing distributed processing frameworks such as MapReduce are not directly applicable to such settings; cloud has mostly homogenous nodes with well-structured network topologies and reliable network connectivity, whereas fog assisted IoT deployments have a highly variable environment.
    In such settings, deciding on which part of computing program to decompose becomes crucial, for e.g. if there is a recursive function in the program that is being used and called again and again, then it might not be a good idea to decompose it as it would generate communication overhead. Also, how to define an atomic computing unit for a program is also crucial. The applicability of existing methods and the required modification in them for such settings needs to be studied carefully.
  2. System performance: Another key metric to keep a tab on is the kind of effect such decomposition has on the overall system performance— whether the resource consumption increases, decreases or gets balanced overall in the infrastructure as compared to the centralized cloud solution.
  3. Quality of analytics: As the data is now processed to get partial results which are further combined to get overall analytical result, it is important to note how it affects the quality of analytics.

Initial exploratory work by authors in [5] shows that such decompositions can reduce bandwidth consumption and can significantly decrease the associated costs. But for further developments, all these pointers need to be carefully evaluated and studied to design and develop efficient distributed algorithm solutions for decomposition in fog assisted IoT deployments across a wide variety of use cases.

But why do we really need to decompose computing units? Why not to use the whole computing program on the edge/fog device, and why the decomposed computing units?

The justification for the above involves resource constraints. Contrary to the cloud which can be thought of as ‘resource rich’, the fog devices are resource constrained in nature whereby resource scaling (up/down and horizontal/vertical) cannot be done dynamically. The fog devices are already performing their fundamental computing/network operation (for e.g. in case of router as a fog device, it is already forwarding the packets to the set destination), so these operations are already utilizing the available resources (CPU, RAM and bandwidth) on it. An additional deployment of a complete data analytics computing program/algorithm on the said resource might lead to full utilization of resources on device as the workload or data input increases and also affect its fundamental network operation. Hence, a careful placement of computing operations is sought for efficient overall system performance, and thus, the approach of decomposed computing units seems ideal in an IoT environment with fog assistance.

Conclusion

It might be argued that it is more desirable to develop cloud centric solutions with sufficiently large number of resources available on hand, rather than designing fully distributed computing programs/algorithms which might bring along additional complexities due to the need for communication over a network. Yet, there are strong reasons for developing distributed data analytics solutions in fog assisted IoT settings:

  1. In many industrial settings and IoT deployments, the data is collected and stored in a decentralized manner. When the data generation/ storage is itself distributed, then it appears more desirable to also process/analyse it in a distributed fashion to avoid the bottleneck of data transfer to the centralized cloud.
  2. The number of data centres is less likely to grow at the same rate as the number of devices at the network edge, since traditional data centres consume a lot of power and global network bandwidth, and have begun to raise the impending concern of increased carbon footprint.
  3. Undeniably, the computing capabilities of devices such as our smartphones have increased significantly in the last decade, this can simply be seen in terms of RAM capacity of our smartphones now compared to a couple of years ago, as with the simple raspberry pi devices too. While the continuous increment in the resource capabilities of the distributed devices is still lower compared to the rate of data production and expansion over the past decade, a consortium of stronger devices at the network edge make the network better equipped to explore distributed computing solutions for data analytics in IoT domain at a fine-grained level.

Overall, keeping in mind the challenges, the decomposition of analytics programs in fog assisted IoT environments does look promising towards the effort to design efficient distributed data analytics solutions and making the edge of network smarter, and in line with the vision of distributed computing towards future networks.

Acknowledgement

This work has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) and is co-funded under the European Regional Development Fund under Grant Number 13/RC/2077.

References

[1] J. Manyika et al., "Unlocking the potential of the Internet of Things," McKinsey & Company, June 2015.
[2] Cisco, "The Zettabyte Era:Trends and Analysis," CISCO, June 2017.
[3] B.Tang et al., "Incorporating Intelligence in Fog Computing for Big Data Analysis in Smart Cities," IEEE Transactions on Industrial Informatics, vol. 13, no. 5, pp. 2140-2150, October 2017.
[4] M. Taneja and A. Davy, "Resource aware placement of IoT application modules in Fog-Cloud Computing Paradigm," in 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), Lisbon, 2017, pp 1222-1228.
[5] T.-C. Chang et al., "Decomposing Data Analytics in Fog Networks," in Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems (SenSys '17), (New York, NY, USA), pp. 35:1–35:2, ACM, 2017.

 


 

Mohit TanejaMohit Taneja is currently pursuing his Ph.D. in the Department of Computing and Mathematics at the Emerging Networks Lab Research Unit in Telecommunications Software and Systems Group, Waterford Institute of Technology, Ireland. He joined in 2015 as a Masters Student, and has since been working as a part of the Science Foundation Ireland funded CONNECT Research Centre. His current research interests include Fog and Cloud Computing, Internet of Things (IoT), Distributed Systems, and Distributed Data Analytics. His research focuses on decomposing data analytics and machine learning programs for fog enabled IoT systems towards effective resource and service management to support and meet the requirements for real-time IoT analytics. He received his Bachelor’s Degree in Computer Science and Engineering from The LNM Institute of Information Technology, Jaipur, India in 2015.

 

Nikita JalodiaNikita Jalodia is currently pursuing her Ph.D. in the Department of Computing and Mathematics at the Emerging Networks Lab Research Unit in Telecommunications Software and Systems Group, Waterford Institute of Technology, Ireland. She joined in July 2017, and has since been working as a part of the Science Foundation Ireland funded CONNECT Research Centre. Her current research interests include Internet of Things (IoT), Fog and Cloud Computing, Machine Learning, Virtualised Telecom Networks, and Network Function Virtualization (NFV).  She received her Bachelor’s Degree in Computer Science and Engineering from The LNM Institute of Information Technology, Jaipur, India in 2017, with a specialization in Big Data and Analytics with IBM. She has also previously worked as a developer at Sapient Global Markets, India.

 

Alan DavyAlan Davy completed his Ph.D. studies at Waterford Institute of Technology in 2008. He is currently Research Unit Manager of the Emerging Networks Laboratory with the Telecommunications Software & Systems Group of Waterford Institute of Technology. His current research interests include Virtualised Telecom Networks, Fog and Cloud Computing, Molecular Communications and TeraHertz Communication.