On 11th of April, we had a successful mid-term review for our H2020 project, Safeclouds. The meeting was hosted by Eurocontrol in Brussels, with participants from all entities involved in the project.
On 11th of April, we had a successful mid-term review for our H2020 project, Safeclouds. The meeting was hosted by Eurocontrol in Brussels, with participants from all entities involved in the project.
Last January, a team of European and American entities organised a workshop on transatlantic research with the support of the European Commission. The event was hosted by the FAA in their facilities at the William J. Hughes Technical Center in Atlantic City. Those mostly in attendance were US and European companies interested in how the different research threads could be boosted through international cooperation.
Among the subjects discussed during the three day event, data analytics was mentioned several times as a interesting area with applicability to different areas in industrial research. Particularly, safety data analytics was covered in three presentations. First, the FAA presented their +10-year old programme ASIAS, which collects data from more than 40 carriers and has been leading the developments in this field for more than a decade. Second, EASA presented the Data4Safety programme, recently launched and in a proof-of-concept stage. Lastly, Innaxis presented the research programme SafeClouds.eu, including the latest technological developments and how they could complement the existing initiatives by providing and exploring new research avenues.
Author: Jens Krueger
Safety is key in aviation. To reach maximum safety, stakeholders are collecting a large amount of data for analytics. Ultimately, researchers want to not only evaluate the causal dependencies of safety critical events, but to also enhance operational efficiency.
Presently, such data is stored in isolated data silos. The goal of SafeClouds.eu is twofold: advance data-driven analytics for safety and efficiency and manipulate data outside of the silos to enable data sharing and merging between different stakeholders, including data owners. However, the infrastructure must ensure that personal or confidential data is not leaked to third parties; all while maintaining data sharing capabilities.
In order to address the requirements for data protection and analysis, the SafeClouds.eu infrastructure must enable the following data analysis paradigms:
The infrastructure architecture must reflect data protection requirements in order to guarantee the different data confidentiality levels. The physically-independent components are as follows:
The local system sits at the premises of the participating companies (e.g. airlines and ANSPs) and stores raw datasets from different source systems. The data leverages other sources to comprise a 360-scenario dataset with enhanced informational context and processing. The global cloud system should provide such datasets. Finally, the dataset is de-identified and made accessible. Authorised third parties are allowed access only for data management and administrative tasks.
Dedicated private cloud:
Each participating party will be provided with a private segment of the cloud infrastructure that is logically and physically independent. It is used for de-identified data storage and analytics. Data scientists from SafeClouds.eu official partners will have access to the de-identified data under the data protection agreements.
Global cloud system:
The global cloud system is divided into two parts. The global storage will hold all open datasets (Meteo, ADS-B, SWIM, Radar). It will also ensure dataset quality and accessibility through pre-processing. In addition, it will grant access from the local systems and the dedicated private cloud. Note that the global processing infrastructure performs analytics on joint datasets from all dedicated private clouds.
Figure 1: Hierarchical architecture of the SafeClouds.eu infrastructure
The SafeClouds.eu Cloud Infrastructure
The SafeClouds.eu cloud infrastructure is built on Amazon Web Services (AWS). One of the main advantages of AWS is that it consists of several datacenters located around the world. This enables SafeClouds.eu to reduce communication latencies by choosing the most appropriate datacenter locations. For example, each AWS datacenter is located within a region. Then, each region has several datacenters, or Availability Zones. Each Availability Zone is attached to a different part of the power grid, to mitigate a case of potential power outage damanage. Any distributed cloud application running in AWS must consider the tradeoff between fault-tolerance by placing nodes in different Availability Zones with keeping computational resources as close together as possible to enhance performance.
For SafeClouds.eu, AWS enables the infrastructure to horizontally scale with an increasing number of stakeholders or increased processing or storage requirements.
To ensure security AWS Identitiy and Access Management (IAM) as well as virtual private clouds (VPC) and encryption for data in motion and at rest is used.
The SafeClouds.eu infrastructure enables data protection, data sharing and flexibility. Data safety and security is key to gain trust from data providers; without it the overall project is at risk for success. This blog post stresses the importance of a distributed and secure infrastructure and gives a first look into how the overall infrastructure architecture is designed. However, alhough the base infrastructure technology supports scalability, security, and other factors, the most important challenge is to leverage and implement those technological capabilities. One of the main security threads is human failure, bugs, and wrong implementations. To account for user error, the infrastructure must be as automated as possible along with clearly defined and deterministic processes. In addition, each entry point must be defined and encapsulated while keeping accessibility and usability. SafeClouds.edu will be using this precise infrastructure for aviation data analytics, and will share those findings with the aviation and data science communities.
Author: Paula Lopez (INX)
Machine learning is producing outstanding results although we know it is still far from emulating human intelligence. Applying machine learning techniques, including multi-level artificial neural networks (deep learning) to, for example, speech or image recognition has been continuously resulting in improved results (e.g. digital assistants like Apple´s Siri or Amazon´s Echo). In spite of the significant progress achieved so far, there are still some challenges that need to be resolved in order to be applicable in most industries. On one hand, we face a fragmented ecosystem, meaning that there is a gap between the data scientists and the domain experts working in each particular sector. In order to be able to convert data into knowledge, collaboration among both expertises is required. On the other hand, challenges related to data management and data analysis need to be addressed prior to implementing machine learning techniques in most industries. These challenges, just to name a few, include heterogeneous and distributed data sources, data validation, distributed data architectures, data security, scalability, real-time analysis and decision-support or data visualization.
However, we cannot fall into the error of assuming that a machine learning problem can be addressed through a generic standard application of a set of algorithms and techniques. Machine learning problems are highly case-dependent and, therefore, the purpose of the analysis needs to be carefully defined in advance. This is what we (at Innaxis) call Purposeful Knowledge Discovery which also was the title of the keynote speech made by Innaxis President Carlos Alvarez Pereira at the SESAR Innovation Days 2017 in Belgrade. And this is, precisely, the approach we follow at Innaxis in our data science research projects, like SafeClouds.eu: an H2020 project aimed at enhancing aviation safety through the application of data science techniques.
SafeClouds.eu includes a team of 16 partners including data scientists and engineers from several research entities (Innaxis, Tadorea, Fraunhofer, TU Munich, Linköping University, TU Delft and CRIDA) and a group of airlines, ANSPs and safety authorities (Iberia, Air Europa, Vueling, Norwegian, Pegasus, LFV, Eurocontrol, AESA and EASA). This group of airspace stakeholders is the user group of the project, in other words, those defining the questions for which they need data for gaining answers. These questions can be of three types: descriptive (what happened?), predictive (what will happen?) or prescriptive (what to do for what we want to happen). Once the questions are defined (SafeClouds.eu use cases) the team of data scientists and engineers work together and collaborate with users covering the full cycle of data science techniques: data management, data processing architecture, deep analytics, data protection, pseudo- anonymization, advanced visualization and user experience. As previously mentioned, every step has its own challenges as there are no data science standard tools to be transferred automatically from one field to another. Below, we outline just two challenges: fusion of proprietary confidential data and benchmarking among these competing stakeholders.
These are just some examples of the challenges the SafeClouds.eu team is facing in the field of aviation safety data analysis. The solutions offered by these techniques make them ideal to be applied to other fields such as fuel consumption but, again, the purpose of the analysis will determine the following necessary steps.
On November 15-16, 2017, IATA organised the first Aviation Data Symposium in Miami, FL USA. This event covered different angles of the application of engineering and data analytics to airline safety, operations, passenger distribution, sales, and air freight. These three areas were complemented by a technology track, which covered techniques and tools to support data activities in airlines. The safety and operation tracks discussed how big data is helping airlines to optimise operations while maintaining safety, and also presenting the upcoming main challenges.
The event also covered a review of the benefits from the various global information sharing and exchange networks, including the Global Aviation Data Management programmes coordinated by IATA. During the Symposium, Mr. Quevedo presented IATA data connect, the database of aviation accidents, IATA FDX, the GDDB and STEADES. ASIAS, the US data exchange programme was also presented by Mr. Madar, Managing Director of Operation Safety of American Airlines. Then, Mr. Hernández-Coronado, Director of Safety Analysis and QM of the Spanish Aviation and Security Agency (AESA) presented the European programme Data4Safety, that was recently launched by EASA in Europe.
Concerns regarding privacy remain very strong, as often, the privacy protocols are strict and de-identification could make data challenging to use, as explained by the programme representatives. Mr. Madar stressed new techniques and technologies that allow to progress on data privacy, together with new tools that allow to move from descriptive to predictive technologies, like machine learning, as an area that will help the programmes evolve, as the descriptive analysis done in the last decade, as done with ASIAS.
Mr. Hernández-Coronado presented SafeClouds in detail. AESA participates in the SafeClouds project and helps the team understand how different technologies researched in the project can help aviation data exchange programmes overcome some of the presented challenges. These challenges include data fusion and integration, data protection and privacy, and computing infrastructures. SafeClouds also investigates predictive analytic concepts and techniques to help aviation stakeholders make decisions, even during the operations.
Mr. Hérnandez-Coronado also covered the activities performed by the Spanish Aviation and Security Agency, particularly the Spanish SSP, State Safety Programme. This system receives and collects around 300-400 safety events per week. He also presented the RIMAS system, showing the capability of providing a complete risk assessment picture of the national safety status by combining a variety of data sources; ultimately providing analytical support for AESA so that they may focus their attention on those areas that require supervision.
For the 5th consecutive year, Innaxis organized the Data Science in Aviation Workshop with much positive feedback. This 2017 edition took place last September at EASA HQ in Cologne, Germany, sponsored by the SafeClouds.eu project.
This series of annual workshops was created in 2013 to promote data science techniques applied to the aviation field. Initially, this was a breakthrough idea as data analytic initiatives in the sector were very scarce. On the other hand, the potential benefit of applying these techniques to aviation, with relatively limited investment, greatly supported the effort of pushing this paradigm shift. Now, only 5 years later, the number of ongoing initiatives of data science applications in the aviation sector has continuously increased; demonstrating that the effort was really worth it.
Data has become the key driver of change all across aviation: from maintenance to training, from fuel efficiency to safety. There are on-going examples, with different levels of maturity, in nearly every layer of the aviation sector. This ranges from manufacturing to operations, both from the industry as well as the academia. The last DSIAW brought together this wide variety. Knowledge discovery and Data Mining (KDD) will be, is currently being, a key enabler of the digitalization of our industry.
The entire Horizon2020 transport research programme is driven by the overall objective of making ““. These challenges were precisely the 4 pillars of the 2017 DSIAW, showing how data can play a key role in achieving them through the application of data science (DS) techniques. The presentations were distributed among these 4 sessions: DS4Environment, DS4Safety, DS4Predictability and innovative DS techniques and supporting tools, illustrating the audience with these initiatives:
DS4Environment: While the development of greener technologies (engines, aerostructures, components, etc) require several coordinated initiatives, data science offers cost-effective solutions based on real figures of fuel burnt and noise pollution. Applying data analytics techniques to these datasets enhances our knowledge of fuel consumption and noise emission patterns, which supports efficient resource use, thus resulting in a emissions reduction to minimize environmental impact. For this theme, Boeing Global Services – Fuel Dashboard solution and the Technical University of Madrid initiatives related to environmental and noise emissions studies.
DS4Safety: The aviation sector’s requirement for high safety levels has always been the main reason to avoid ‘radical’ changes in this industry or, at least, follow a very slow adoption path. Nevertheless, aviation safety has recently become a pioneering area in data science applications. We can’t neglect to mention the significant challenges in this line of research, such as data protection, data merging, pattern detection in rare events, secure data infrastructures, etc, but nonetheless there are very promising initiatives such as: the SafeClouds project coordinated by Innaxis, the EASA Data4Safety programme, or the activities from SafetyData in NLP applied to Occurrence Reports. All projects were presented at the workshop.
DS4Predictability: In air transportation, efficiency is very linked to predictability, and predictability in turn, is highly dependent on data. Improving predictability reduces uncertainty which avoids losses and enables a more efficient aviation system from reducing delays to predicting systems failures. Ongoing studies, such as those presented by the University of Westminster or Atos, are good examples on how data can provoke a deep transformation of common airline procedures, like disruption management or maintenance scheduling.
DS techniques and supporting tools: Different KDD application techniques require appropriate infrastructures as well as supporting techniques that ensure various requirements are met. This includes: data protection, security, computation efficiency, flexibility, scability, etc. During this last workshop, we learned from the Eurocontrol experience in using cloud-based infrastructures. We also learned about the Innaxis spin-off, TADOREA, which shared knowledge on crypto-economics as a potential solution for enabling secure data analytics, while maintaining data privacy.
Still not convinced? Wanting to learn more? Visit the event page to watch the presentations and videos.
SafeClouds.eu gathers 16 partners for research collaboration with a wide and diverse group of users, including air navigation services providers, airlines and safety agencies. SafeClouds.eu encourages active involvement from users, as the project aims to apply data science techniques to improve aviation safety. SafeClouds.eu is unique as it involves data combination and collaboration from ANSPs, airlines and authorities in order to improve our knowledge on safety risks, all while maintaining the confidentiality of the data. This safety analysis requires comprehensive understanding of various data sources, and supports the use case analysis as selected by the users.
The basics of the FDM data, as one of the main data sources for the project, is outlined in this post.
A large amount of data is recorded during civil aircraft flights. Apart from the “Flight Data Recorder” that is mainly used for accident investigations (widely known as “Black Box”), there are also recorders for regular operations. These recorders are often called “Quick Access Recorders” (QAR). QAR data is analysed in terms of safety, efficiency and other aspects in Flight Data Monitoring activities for airlines and is furthermore an integral part of the research project SafeClouds.eu.
Figure 1: Example for a QAR (Source: https://www.safran-electronics-defense.com/aerospace/commercial-aircraft/information-system/aircraft-condition-monitoring-system-acms)
Aircraft are very complex systems with a large number of sensors constantly recording measurements. Important parameters regarding the aircraft state, including position, altitude, speed, engine characteristics and many others are recorded by the QAR. Depending on the aircraft type and airline, the number of recorded parameters can reach several thousand.
As a digital device, the recording uses binary format. In other words, if we look at the QAR data we would only see a bit stream, i.e. a sequence of 0 and 1. In order to use the data and investigate, for example the aircraft position, two additional components are necessary. First, logic is needed to determine how the data is written into the bit stream. This is given by an ARINC standard and two versions are presently used: ARINC 717 standard is used for older aircraft types and the ARINC 767 is used for newer aircraft types. Second, a detailed description of the location of any considered parameter in the bit stream is needed. This is given by a “dataframe” which is a text document of up to several hundreds of pages.
Figure 2: Overview (Source: “Flight Data Decoding used for Generating En-Route Information based on Binary Quick Access Recorder Data”, Master thesis, Nils Mohr, Technical University of Munich)
One of the advantages of data stored in binary format is storage efficiency. The size of the same flight data file stored in binary format compared to being stored in engineering values (e.g. in a CSV file) might be ten times smaller. Considering the research project SafeClouds.eu or the shared framework for flight data such as ASIAS of the FAA, FDX of IATA or Data4Safety of EASA which collects millions of flight data, an efficient storage is obviously needed.
However, storing flight data in binary format then requires an efficient way to transfer the binary data into engineering values. Considering the bit stream logic, two parts are necessary. First, the bit stream logic (provided by the ARINC standard) needs to be represented in a decoding algorithm. Second, the dataframe information, i.e. which parameter can be found in which part of the bit stream needs to be accessible to the decoding algorithm.
Recorded parameters have different characteristics. For example, they can be numeric, alphanumeric or characters. Depending on these characteristics, different decoding rules have to be applied. As an example, a temperature recording of 36.5 °C with a linear conversion rule is considered in the following figure.
Figure 3: Simple Decoding Example (Source: “Flight Data Decoding used for Generating En-Route Information based on Binary Quick Access Recorder Data”, Master thesis, Nils Mohr, Technical University of Munich)
Starting from the bit stream, just specific binary values are relevant for the temperature recording. As mentioned above, this information can be found in the dataframe. The combination of all bits leads to a number in the binary system, which can then be transferred into the associated decimal value. Applying the conversion rule for linear parameters gives the result 36.5. Information about these rules as well as the unit, in this case degree Celsius, can be found in the dataframe.
The data that is recorded by civilian aircraft in their daily operation contains valuable information that can be used for airline safety analyses. Due to the nature of the recording, the data is generated in binary format. To make the data accessible and readable for the analysts, a decoding algorithm is applied. For the development of this algorithm, information about the recording logic and for all the considered parameters must be available.
SafeClouds.eu, a H2020 big data for safety project, coordinated by Innaxis, kicked off earlier this month.