DATASET2050 presentation at Data Science in Aviation Workshop (EASA, Cologne-Germany)

The annual event exploring Data Science in Aviation (ComplexWorld funded; organized by Innaxis) has recently celebrated its fourth edition this past September 8th 2016. The event was hosted on the EASA premises in Cologne, Germany . This year it highlighted a presentation of the DATASET2050 project, “Data Science for Mobility”, by project coordinator Samuel Cristobal (Innaxis).

5-1

 

 

On the Data Science In Aviation event:

Previous editions of the Data Science in Aviation event were hosted in Madrid 2013, Paris 2014 and Brussels 2015. The popular event usually draws attendance from more than 80 individuals from top European and worldwide aviation entities (including Airbus, Eurocontrol, Boeing, EASA, Airlines, Airports, ANSPs, SESAR , Universities, etc) along with ICT and data-related entities (including CERN researchers, Fraunhofer, Infrastructure-related, and various universities). Notable presenters from the 2016 edition included EASA, Innaxis, NATS, Eurocontrol, Boeing, ENAC and Fraunhofer.

In terms of the event agenda and content, the presentations has traditionally outlined how data science is understood as a useful set of fundamental principles that support and guide the principled extraction of information and knowledge from aviation data. Furthermore, the discipline leans on well-known data-mining techniques, and goes far beyond these techniques with successful data-science paradigms which provide specific applications in various air transport areas (safety, performance, mobility etc).

 

On DATASET2050 Samuel’s presentation:

The event also highlighted a key presentation from Innaxis project coordinator, Samuel Cristobal. Samuel presented five different points on data science in aviation.

  • First, he explained how some of the data science tools, techniques and concepts have been used in the mobility context, specifically using the DATASET2050 project as a case example.
  • Second, Samuel explained the different door to door phases under analysis (door-kerb; kerb-gate; gate-gate; kerb-door), which helps to delve deeper in the different data science components within aviation phases.
  • Third, Samuel outlined the different links between project objectives and overall Flightpath2050 goals.
  • The fourth point explored mobility data in Europe, and the value of the DATASET approach in this context.
  • The presentation concluded with a fifth and final point announcing the next communication actions. The full presentation can be accessed here: https://www.dropbox.com/s/91julyl8gsij2k9/DATASET_SC_v1.pdf?dl=0

 

In sum, the fourth edition of the Data Science in Aviation event was an excellent opportunity for dissemination of DATASET2050. This was in conjunction with a fruitful exchange of ideas with other aviation data scientists, some of whom working with similar tools in other sub-areas far from mobility. We hope to continue this momentum of knowledge exchange and look forward to a potential fifth edition of the popular event.

 

You can watch DATASET2050 Samuel’s presentation here, and the rest of the event videos at Innaxis’ Vimeo channel.

Information, time, knowledge

We live in a world that gathers exponentially increasing amounts of information/data coming from endless sources, and a limited time to analyse it.

What is the current speed of “creating” information/data? What about knowledge/wisdom? What is the role of Data Science and Big Data in this context?

Food for thought for your -deserved- summer break! Enjoy, charge your batteries and get ready for a 2016/2017 year full of cutting-edge research, innovation (and Innaxis blogposts!)

 

tumblr_o5px3cKzOs1rj9sw5o1_500

Guardar

Complex networks, data mining, causality, and beyond

Over the last few weeks Innaxis has published two papers that may be of interest to air transport researchers, among others.

The first paper is an extensive review on the combined use of complex network theory and data mining. Not only do complex network analysis and data mining share the same goal in general- that of extracting information from complex systems to ultimately create a new compact quantifiable representation- but they also often address similar problems as well. Despite these commonalities, a surprisingly low number of researchers take advantage of methodologies, as many conclude that these two fields are either largely redundant or totally antithetic. In this review, we challenge this perception, show how this state of affairs should be relegated to contingent rather than conceptual differences, and that these two fields can in fact advantageously be used in a synergistic manner. The review starts by presenting an overview of both fields, and by illustrating some of their fundamental concepts. A variety of contexts in which complex network theory and data mining have been used in a synergistic manner are then presented. Finally, all discussed concepts are illustrated with worked examples through a series of hands-on sections, which we hope will help the reader to put these ideas in practice. If you ever wonder how a real-world problem can be tackled by these two techniques, you should definitively read this review!

 

 

The second paper addresses the common misinterpretation of correlation vs causality. Following this idea, many causality metrics have been proposed in the literature, all sharing a same drawback: they are defined for time series. In other words, the system (or systems) under analysis should display a time evolution. Associating causality to the temporal domain is intuitive, due to the way the human brain incorporates time into our perception of causality; nevertheless, such association results in some rather important problems.

For instance, suppose one is trying to detect if there is a causality relation between the workload of an ATC controller and the appearance of loss of separation events. These events are only defined at one point in time. To illustrate, one can detect an instance of a loss of separation and check the corresponding workload; afterwards, perform the same actions for another event; and so forth. In the end, the researcher would get two vectors of features, which do not encode any temporal evolutions – in other words, consecutive values are not correlated. So, in this situation, how can we detect if a true causality (and not just a correlation) is present?

In this paper we propose a novel metric able to detect causality within static data sets, by analysing how extreme events in one element correspond to the appearance of extreme events in a second element- refer to the picture above for a graphical representation. The metric is able to detect non-linear causalities, to analyse both cross-sectional and longitudinal data sets, and to discriminate between real causalities and correlations caused by confounding factors.

If you are interested in these ideas, feel free to have a look at these two papers:

M. Zanin et al., Combining complex networks and data mining: why and how. Physics Reports (2016), pp. 1-44. http://authors.elsevier.com/a/1T3yF_8QfbYE-k. Also available at: http://arxiv.org/abs/1604.08816
M. Zanin, On causality of extreme events. PeerJ. Also available at: http://arxiv.org/abs/1601.07054

If you have questions about them, please contact M. Zanin at mzanin@innaxis.org

Finally, Seddik Belkoura is going to present a paper at the forthcoming ICRAT 2016, Philadelphia, about the use of the static causality metric to study delay propagation. You can find the paper on the official website of the conference (http://www.icrat.org/), and also by contacting him at sb@innaxis.org.

Guardar

Innaxis at EASA-OPTICS conference. Cologne 12-14 April

Developing the future of a safe and growing aviation business, whilst also reassuring the travelling public that it is safe to fly, is a major vision for both EU and national aviation policies, however:

What role do policy makers play?

What are the recent, implemented safety measures?

Who is guiding the safety topics within aviation research?

EASA, the European Commission, the Advisory Council of Aviation Research & Innovation in Europe (ACARE), and the EU’s OPTICS Project organised a three day event in Cologne (12-14 April) in order to provide answers to these types of imperative questions, and furthermore define the way forward to ensure continued aviation safety in Europe. The event had a number of presentations and workshops within several aviation safety areas.

Two Innaxis’ team members David Perez (dp@innaxis.org) and Hector Ureta (hu@innaxis.org) attended the interesting event and took part in several of the workshops, explaining how can Data Science and BIG data can boost aviation safety. Hector  also presented some of the latest data science techniques and tools in safety research, based on SESAR-COMPASS project, during the third day of the event.

 

IMG_20160414_144630

Hector Ureta (Innaxis) presenting the Data Science research done in COMPASS (Cologne 14 April 2016)

 

The presentation, “Data science and data mining techniques to improve aviation safety: features, patterns and precursors”, is available online in this link.

If you’d like further information about data science in aviation, big data or aviation safety research completed by Innaxis, please feel free to contact Innaxis team (innovation@innaxis.org).

 

FullSizeRender

More details of the event available in EASA and OPTICS websites:

Guardar

Mobility and performance (DATASET2050 postpost)

PERFORMANCE FRAMEWORKS

Different performance frameworks look into different aspects of the European mobility framework, with varying goals that are not necessarily compatible or aligned in the same direction. To illustrate, ‘Flightpath 2050’ envisions an air transport system that improves safety levels but also guarantees time-related performance for the future passengers of Europe; up to four hours maximum door-to-door travel time for 90% of travellers using air as a mode. This number is not arbitrary, as it corresponds to the type of experience high-level experts envision for European passengers. However, punctuality and efficiency metrics are mostly flight-centric. Passengers are rarely considered in performance schemes and therefore very little is known about the actual door-to-door time performance from the passenger perspective. Decisions such as ‘when’ or ‘where’ to act in achieving this goal have proven to be more challenging than initially expected.

383KI7T5I3

The European Commission Single European Sky Unit is working on ‘Reference Period 3’, which delves deeper into the performance scheme for air navigation service and network functions from 2020. This performance framework is very detailed, but unfortunately does not yet include provisions for passenger punctuality. Due to the complexity of different, non-interchangeable metrics, the KPAs and the different performance goals do not necessarily match.

SESAR and CleanSky have detailed, technical performance goals. By looking into specific technology developments or procedures, it is clear that their technologies will surely improve the performance of many concrete operational elements (e.g. runway performance or environmental impact in terminal areas, to mention two of them) – however it is yet unclear how much those programmes will contribute to passenger mobility.

PASSENGER PROFILING

In addition, traditionally, passengers have been categorised as ‘business’ and ‘leisure’ travellers. However, these traditional distinctions have become less distinct over recent years and will continue to do so in the future. This is driven by various developments such as newly emerging markets and cultural backgrounds, an ageing society, and increasing digitalisation within private and business life. Resulting passenger needs and expectations during their journey can thus differ to a great extent. This is reflected in their willingness to pay for extra services and time savings during their stay at the airport, for example. Therefore, the initial passenger group classification is not sufficient any more to properly address and integrate passenger requirements across the different transport modes.

(DATASET2050 D3.1 on passenger profiling 2.0 to be delivered soon!)

See you in the next blog post!

Data Scientist position at Innaxis

Innaxis is seeking a Data Scientist to join its research and development team in aviation projects. As a member of the team, you are joining a very interdisciplinary group of researchers, scientists, mathematicians and engineers that work for private companies and public institutions on solving the most challenging problems and get the most out of their data.

A mixture of creativity and technical skills are required to complement the skill set of a team that has worked in the last 5 years achieving landmarks in terms of network performance analyses across different areas within the aviation sector.

We are looking for a talented individual to help the team to complement the existing research threads on machine learning and data mining, to provide new insights on the performance of complex systems and enable the real time analysis of complex phenomena. Being part of our team will mean to cooperate with other skilled researchers currently focused on knowledge discovery, data engineers and visualisation experts.
Requirements are as follows:

  • Degree on Computer Science or similar (mathematics, physics) with outstanding background and experience in programming.
  • Experience on collection and preparation of datasets for machine learning exercises.
  • Understanding of general architectures and tools for machine learning, from validation to (automatic) feature selection.
  • Fluency in English: it is the working language at Innaxis!

Technical skills that may be relevant in the evaluation:

  • Understanding of the theoretical and implementation approaches for standard data mining models and algorithms, from SVMs to deep learning techniques based on Deep Neural Networks, as well as their combination.
  • Basic knowledge of database technologies and use: MySQL, MongoDB, JQuery.
  • Any programming language is a plus: both general (Python, C, Matlab) and data analysis oriented (Weka, R) …

We offer:

  • Immediate start within a highly qualified and collaborative international team with innovative thinking and working methodology focused on the development of large scale research and innovation projects.
  • Interesting salary as a function of skills, experience and education.
  • Flexibility and good working conditions

Interested candidates should send their detailed CV and relevant information to innovation@innaxis.org

Big Data Engineer position at Innaxis

Innaxis is seeking a Big Data Engineer to join its research and development team. As a member of the team, you are joining a very interdisciplinary group of researchers, scientists and engineers that work for private companies and public institutions on solving the most challenging problems and get the most out of their data. A mixture of creativity and technical skills are required to complement the skill set of a team that has worked in the last 5 years achieving landmarks in terms of network performance across different areas within the aviation sector.

We are looking for a talented individual to help the team to complement the existing research threads on engineering infrastructures to support data mining against large datasets. Your role within the team will be to design, test and implement state-of-the-art information acquisition systems of existing data sources within the aviation sector. This data will be further analysed in search for insightful patterns and ultimately knowledge discovery. The acquisition systems developed should also be cost-efficient, reliable and in compliance with our data providers privacy directives.

Requirements are as follows:

  • Degree or MSc on Computer Science with outstanding background and experience in programming and systems management.
  • Strong interest for Amazon cloud-based solutions, specifically EC2, EBS, RDS and IAM.
  • Strong interest for databases design and management, including SQL and NoSQL solutions and ecosystems.
  • Enthusiasm for software design and testing methodologies
  •  Fluency in English: it is the working language at Innaxis!

Technical skills that may be relevant in the evaluation:

  • Knowledge of database technologies and use: MySQL, MongoDB, JQuery
  • Proficiency in at least one programming language: Python, Perl, R, C++
  • Understanding of data mining algorithms: KDD, support vector machines, etc.

We offer:

  • Immediate start within a highly qualified and collaborative international team with innovative thinking and working methodology focused on the development of large scale research and innovation projects.
  • Interesting salary as a function of skills, experience and education.
  • Flexibility and excellent working conditions.

Interested candidates should send their detailed CV, a research interest letter and any relevant information to innovation@innaxis.org

DATASET2050 – H2020 CSA coordinated by Innaxis

How can we provide a seamless travel experience from door to door for future European passengers? This question is addressed by the DATASET2050 project which Innaxis is coordinating. DATASET2050 is an aviation Coordination and Support Action (CSA) funded by European Commission within the H2020 frame and coordinated by Innaxis.

DATASET2050 (DATA driven approach for a Seamless Efficient Travelling in 2050) deals with those Flightpath 2050 goals that postulate future air transport to be more passenger-centric and to incorporate the door-to-door perspective. It complements the vision for European transport with guidelines and quantitative targets how the goals can be accomplished as well as with conceptual foundations. Project partners are Innaxis (Spain-coordinator), University of Westminster (United Kingdom), Eurocontrol (Belgium) and Bauhaus Luftfahrt (Germany).

The CSA is divided into four blocks, each of them focusing on a different aspect of the mobility assessment challenge: the data architecture and modelling, the passenger needs assessment, the supply of transport services assessment and the mobility assessment that, inter alia, includes novel concepts foundations.

dataset2050-blocks-png

 

In the course of the project, a data-driven toolset will be developed to assess the main characteristics of the current system and to analyze various future scenarios. State-of-the-art data analysis techniques together with the analysis on customer and transport services demand and supply evolution will be integrated to correctly identify the opportunities for Europe in this context. The following specific objectives will be addressed:

  • Identification and acquisition of relevant datasets to support the analysis
  • Depiction of current and future passenger expectations, needs and requirements
  • Building of a data-driven model
  • Development of mobility metrics
  • Understanding of the current and future air transport services supply
  • Identification and initial development of novel concept foundations, identification of (potential) bottlenecks within the current and future transport system
  • Sound dissemination and communication among the relevant stakeholders

Project was kicked-off December 2014, and so far two deliverables concerning management and dissemination topics have already been submitted.  In addition to the KoM at Brussels, the partners also recently met at the University of Westminster (London) to discuss detailed content and data requirements for the data-driven model to be developed in WP2.

www.dataset2050.com

Linkedin group: feel free to join!

Be a subscriber of our monthly blog post here!

Questions? Feedback? Interest arisen? Feel free to contact the project coordinator Samuel Cristobal (sc@innaxis.org) or Hector Ureta (hu@innaxis.org)

 

Guardar

Back from ComplexWorld 2015

 

subdito

 

 

 

 

 

 

 

 

 

 

Very successful ComplexWorld 2015 event, including the 3rd Data Science in Aviation Workshop in Brussels. Our team worked hard on this and it showed!

Thank you very much to everyone participating. We will post a report soon!

Guardar

Guardar

Roadmap for data-driven applications, from a concept to a fully integrated system

Doubtless Data Science and Big Data applications have been growing fast in the last years, the blooming of new data sources and the emergence of accessible and affordable cloud infrastructures have contributed widely to this movement. However only a few applications reach the necessary level of maturity to become fully functional systems. Most of data driven applications reach its maximum level of maturity as a “proof of concept,” and one of the main reasons being it does not have a solid integration program with current systems or a successful validation plan.

The path from raw data to wisdom is long and complex. Moving from data to information requires comprehension on data relations and the overall context. Then, knowledge is reached only after fully understanding the patterns within information. Finally, it is required to get deep into the details (the underlying principles), to gain insight and be able to move from knowledge to real and “applicable” wisdom.

There are three pillars in any Data-Driven application, namely: Data Acquisition (DA), Information Processing (IP) and Knowledge Discovery (KD). The Data Acquisition should cover not only the technical aspects of consuming services or data sources with different formats, coverage or scope but also tackle the sociological, legal and limiting aspects of every data source (e.g. data provenance). The Information Processing should be built over a solid mathematical framework, from Data Mining algorithms to Simulation Tools. Information Processing should also provide answers in terms of performance and precision to support the application concept. Lastly, Knowledge Discovery should serve as a interface from processed data to human perception, from metrics to representation of those (e.g. dashboards). In some cases integration of those into the current system may be critical.

The application must be developed at the three fronts simultaneously, starting with a concept, a purely speculative idea, to a proof of concept, a prototype application tested only over a simplified set of data. Then the proof of concept should be further tested in a laboratory environment, in which data samples are produced artificially to feed the application and carry on performance and reliability tests and then leveraged into a relevant environment. In a relevant environment the application should be capable of working with real-time data feeds although not yet at an operational level, and robustness and stability should be assessed at this stage. Finally, applications should be moved for a relevant environment to the actual operations.

Connect with us!