Discovering hidden knowledge in aviation data

Author: Paula Lopez (INX)

Machine learning is producing outstanding results although we know it is still far from emulating human intelligence. Applying machine learning techniques, including multi-level artificial neural networks (deep learning) to, for example, speech or image recognition has been continuously resulting in improved results (e.g. digital assistants like Apple´s Siri or Amazon´s Echo). In spite of the significant progress achieved so far, there are still some challenges that need to be resolved in order to be applicable in most industries. On one hand, we face a fragmented ecosystem, meaning that there is a gap between the data scientists and the domain experts working in each particular sector. In order to be able to convert data into knowledge, collaboration among both expertises is required. On the other hand, challenges related to data management and data analysis need to be addressed prior to implementing machine learning techniques in most industries. These challenges, just to name a few, include heterogeneous and distributed data sources, data validation, distributed data architectures, data security, scalability, real-time analysis and decision-support or data visualization.

However, we cannot fall into the error of assuming that a machine learning problem can be addressed through a generic standard application of a set of algorithms and techniques. Machine learning problems are highly case-dependent and, therefore, the purpose of the analysis needs to be carefully defined in advance. This is what we (at Innaxis) call Purposeful Knowledge Discovery which also was the title of the keynote speech made by Innaxis President Carlos Alvarez Pereira at the SESAR Innovation Days 2017 in Belgrade. And this is, precisely, the approach we follow at Innaxis in our data science research projects, like an H2020 project aimed at enhancing aviation safety through the application of data science techniques. includes a team of 16 partners including data scientists and engineers from several research entities (Innaxis, Tadorea, Fraunhofer, TU Munich, Linköping University, TU Delft and CRIDA) and a group of airlines, ANSPs and safety authorities (Iberia, Air Europa, Vueling, Norwegian, Pegasus, LFV, Eurocontrol, AESA and EASA). This group of airspace stakeholders is the user group of the project, in other words, those defining the questions for which they need data for gaining answers. These questions can be of three types: descriptive (what happened?), predictive (what will happen?) or prescriptive (what to do for what we want to happen). Once the questions are defined ( use cases) the team of data scientists and engineers work together and collaborate with users covering the full cycle of data science techniques: data management, data processing architecture, deep analytics, data protection, pseudo- anonymization, advanced visualization and user experience. As previously mentioned, every step has its own challenges as there are no data science standard tools to be transferred automatically from one field to another. Below, we outline just two challenges: fusion of proprietary confidential data and benchmarking among these competing stakeholders.

  • Smart Data Fusion: Simply erasing the flight-identifier parameters would protect the data but not allow fusion of datasets. Many data require protection and cannot be shared (e.g. FDM data and radar tracks), so fusion needs sophisticated techniques coming from cryptography and enabling coding sensitive data in a non-reversible way.
  • Secure Blind Benchmarking: Benchmarking among stakeholders based on data that cannot be shared also requires the application of specific techniques. This includes secure multiparty computation enabling comparison between confidential data without disclosing the data, not even to a trusted third party.

These are just some examples of the challenges the team is facing in the field of aviation safety data analysis. The solutions offered by these techniques make them ideal to be applied to other fields such as fuel consumption but, again, the purpose of the analysis will determine the following necessary steps.


Mobility datasets exploration tool


Within the project, we have recently listed the sources of EU door-to-door mobility datasets, reports and papers. That information is crucial for us to build the subsequent data-driven tasks (including the model). On top of that, they could be extremely useful to anyone doing research or simply interested in the mobility topic.

Having this in mind, the consortium has developed a visual, interactive tool that provides all the information in a simple, attractive way.  By using a dynamic D3.js , it includes information about data sources together to their temporal data coverage, authors, description and availability

How it works? Click here: The datasets have been categorized in 9 families, all of them relevant within mobility context.

  • Demographic
  • Passenger demand
  • Passenger type
  • Passenger behaviour
  • Door-to-kerb
  • Kerb-to-gate
  • Gate-to-kerb
  • Airside capacity
  • Competing services

By clicking in each of them (the text, right side), all the data sources available within that family are displayed. Doing a mouseover on each of them (right side), detailed information is given in a tool tip about the data coverage, sources etc. In the cases too many sources are available, scrolling is the way to see them all 🙂 Clicking on the [x] at the top brings you back to the main page.




Connect with us!