Roadmap for data-driven applications, from a concept to a fully integrated system

Doubtless Data Science and Big Data applications have been growing fast in the last years, the blooming of new data sources and the emergence of accessible and affordable cloud infrastructures have contributed widely to this movement. However only a few applications reach the necessary level of maturity to become fully functional systems. Most of data driven applications reach its maximum level of maturity as a “proof of concept,” and one of the main reasons being it does not have a solid integration program with current systems or a successful validation plan.

The path from raw data to wisdom is long and complex. Moving from data to information requires comprehension on data relations and the overall context. Then, knowledge is reached only after fully understanding the patterns within information. Finally, it is required to get deep into the details (the underlying principles), to gain insight and be able to move from knowledge to real and “applicable” wisdom.

There are three pillars in any Data-Driven application, namely: Data Acquisition (DA), Information Processing (IP) and Knowledge Discovery (KD). The Data Acquisition should cover not only the technical aspects of consuming services or data sources with different formats, coverage or scope but also tackle the sociological, legal and limiting aspects of every data source (e.g. data provenance). The Information Processing should be built over a solid mathematical framework, from Data Mining algorithms to Simulation Tools. Information Processing should also provide answers in terms of performance and precision to support the application concept. Lastly, Knowledge Discovery should serve as a interface from processed data to human perception, from metrics to representation of those (e.g. dashboards). In some cases integration of those into the current system may be critical.

The application must be developed at the three fronts simultaneously, starting with a concept, a purely speculative idea, to a proof of concept, a prototype application tested only over a simplified set of data. Then the proof of concept should be further tested in a laboratory environment, in which data samples are produced artificially to feed the application and carry on performance and reliability tests and then leveraged into a relevant environment. In a relevant environment the application should be capable of working with real-time data feeds although not yet at an operational level, and robustness and stability should be assessed at this stage. Finally, applications should be moved for a relevant environment to the actual operations.

ComplexWorld within ICRAT2014: Registration Open!

Following the success of its previous editions, International Conference on Research in Air Transportation (ICRAT) has now been established as a mainstream biennial event in Air Transport Research, alternating with the USA/Europe Air Traffic Management (ATM) Research and Development (R&D) Seminar.

The next ICRAT will be held the week of May 26th-30th, 2014 at the Istanbul Technical University in Istanbul, Turkey. In this 2014 edition, ICRAT will host ComplexWorld activities and will be organised in conjunction with the 4th International ATACCS Conference (HALA!); two SESAR WP-E Research Networks.

In particular, two Tutorials will be organised by ComplexWorld which focuses on its research threads:

Massimiliano Zanin and Samuel Cristobal (Innaxis) on how to apply Data Science in Aviation: stationarity and metrics

Thomas Hauf (University of Hannover) on Adverse Weather and ATM performance

To register for the event, please visit the ICRAT Conference Registration page.

Be sure to follow ComplexWorld (www.ComplexWorld.eu) and ICRAT (www.icrat.org) to get the latest updates on the event information and the other informative tutorials. ICRAT 2014 will be an excellent forum for young researchers within air transportation to share their work, expand their professional network and gain new knowledge and inspiration.

See you in Istanbul!

SecureDataCloud – applying secure computation to ATM data

The achievement of an efficient information sharing and coordination between the different stakeholders involved in air transport and ATM is nowadays considered one of the most important priorities in aviation, with potential benefits ranging from improved safety and reduced delays, to more environmental-friendly operations. In spite of its support as a priority, the management of the different types of information is at present separated between organisations, divisions and various stakeholders and mostly isolated and with little cross-integration, due to organisational and institutional barriers that prevent the timely and free-flow of relevant data.

A new research project, called SecureDataCloud and coordinated by Innaxis, aims to solve this problem by means of secure computation techniques. Secure computation is the field of cryptology devoted to the study of performing a computation while preserving the privacy of the inputs of the proprietary parties. Nowadays, there are several problems tackled using a secure computation approach, with applications spanning from secure sealed-bid auctions, elections with an electronic voting scheme and stock transactions, through to defense applications in military operations.

More information about the projects, partners, publications, and results obtained can be found on the project’s website, http://innaxis.org/securedatacloud/

 

 

 

 

Innaxis looks forward to sharing their work in ATM at SIDS 2013 in Stockholm

As in every year, many of us are getting ready to participate in the SESAR Innovation Days, organised by Eurocontrol and the SJU, this year in Stockholm.

In 2013, Innaxis has been particularly busy in this research field and, just as in previous editions, we will be especially active during the SID. In any case, with the goal of stimulating discussions with you in different research areas, this email aims to give you a small briefing on how we are participating in Stockholm in this important event:

The ComplexWorld network has gone a long way in the last three years. Paula López (plc@innaxis.org) will give a presentation on the first day providing details of the different activities and how the network is setting itself up for 2014. On Day 3 the network will hold the satellite event “Complex Metrics in ATM” and a PhD session. Please, do not hesitate to talk to Paula (plc@innaxis.org) if you are interesting in more information on the network activities.

At Innaxis we have been working on passenger-oriented metrics for a few years now, crafting a detailed tool to compute those metrics focusing on the 4 hour door-to-door challenge of FlightPath2050 and now Horizon2020. On the second day of SID (27th Nov), our colleagues from the University of Westminster will present this tool and the initial results . As part of our efforts to improve the way performance is assessed in air transport, this year we also worked on other case studies, including scenarios in which there is no tool set available to correctly design ATM operational concepts. To tackle the challenges of changing ATM while still remaining in control of the performance assessment, it is critical to look into new ways of estimating KPIs. This is what the tool set developed by CASSIOPEIA has accomplished. A poster about this project and the tool set developed will be available at SID.

If you would like more information on the four hour door-to-door challenge in which Innaxis is currently engaged, the latest passenger metrics developed in POEM, or the most recent agent-based CASSIOPEIA modelling framework, please do not hesitate to contact the architect of these design tools for ATM, Samuel Cristóbal (scristobal@innaxis.org) who will be at the conference over the whole week.

Data Science has been an area of major interest at Innaxis over the last few years and in October we organised the first Data Science Workshop for Air Transport, which was held in Madrid. We are working on different elements of an infrastructure to allow major data mining work for Air Transport on different fronts; from evaluating current delay propagation, resilience of airports and airlines against disturbances, to evaluating new paradigms on safety monitoring, all of which is based on powerful data analytics. We are very proud of our advancements in the area. On Tuesday the 26th, our colleague Massimiliano Zanin will present  a paper comparing traffic density as measured today (i.e., number of aircraft crossing a sector), with other measures based on data analytics. If you need any information about this interesting research topic, please contact Mass directly (mz@innaxis.org).

Information Management has also been an area of interest for us. In particular, we think the Data Science paradigms will only be fully enabled if data is shared across stakeholders and this can be achieved only if the right secure and encrypted mechanisms are put in place. We present as a poster a number of SecureDataCloud ideas for ATM. This will be of use on different fronts; safety and fuel consumption, among others. You should also talk to Mass if Information Management is your area of interest.

Last, but not least, we will also serve as rapporteurs and we will help Eurocontrol to extract some conclusions as well as provide our own views on future research avenues. Carlos Álvarez will take care of this during the closing session. Please, contact Carlos (calvarez@innaxis.org) if you feel inspired by his words!

We hope we have many opportunities to interact next week and hope you find our activities interesting and motivating for future initiatives.

See you in Stockholm!

Innaxis brings together leading experts for Data Science in Aviation workshop

Nearly 60 industry experts, academics and professionals from the fields of data science and aviation gathered in Madrid on October the 15th to attend the first ComplexWorld Network’s workshop, ‘Data Science in Aviation‘.  The workshop provided the opportunity for experts in the field to  discuss ways knowledge from aviation data could be extracted in order to enhance our understanding of the air transport system’s behaviour and the complex relation among its elements.

The workshop was motivated by the challenge of extracting ground breaking insights from the large quantities of data collected in the air transport network. The aviation sector gathers and stores a large amount of unstructured, heterogeneous data – safety data and reports, flight plans, navigation data, airport data, radar tracks – from multiple sources – airlines, ANSPs and airports. While the collection of information through different data sensors is growing exponentially, the application of data science to the data has not.  The workshop looked at how to capture the new opportunities offered by the data and close the large opportunity gap between the potential offered and the current outcomes of its analysis.

Innaxis as coordinator of the ComplexWorld Network, through which the workshop was supported, led the data science in aviation workshop initiative. Innaxis brought extensive IT expertise and experience in data-science analysis techniques to the workshop. Innaxis’s expertise in these areas has been developed through the various research programmes in which it works and through its exposure to different data science applications in a variety of fields.

The outcomes of the workshop will be made available shortly. It is our hope that these outcomes, which include new research ideas and discussions from this dynamic meeting of experts, will result in greater discussion and debate around the topic from the community as a whole. So please keep in touch and check back if you’d like to be involved in the ongoing development in this area.

Wh- Questions about Data Science

The five Wh’s of Data Science – What, Why, When, Who and Which.

While preparing the upcoming October workshop in Data Science, Innaxis has gathered wh- questions and simple answers about the “new reality” of data science. We also provide links to pages where more information about these important questions have been provided.

What?

The basic answer to what is Data Science could be “a set of fundamental principles that support and guide the principled extraction of information and knowledge from data”. Definitions, especially of new terms should remain simple despite the urge to make them complicated. Furthermore, the boundaries of Big Data, Data Science, Statistics and Data Mining definitions are not so discernible and include common principles and tools and, importantly, the same aim: extraction of valuable information.

Why?

What is the reason for extracting information from data? There is a brilliant quote by Jean Baudrillard “Information can tell us everything. It has all the answers. But they are answers to questions we have not asked, and which doubtless don’t even arise” In this context, proper data science is [ generally ]  neither basic science nor long term research; it is considered an extremely valuable resource for the creation of business. Mining large amounts of both structured and unstructured data to identify patterns that can directly help an organization in terms of costs, in creating customer profiles, increasing efficiencies, recognizing new market opportunities and enhancing the organization’s competitive advantage.

When?

Through history, an extensive list of names have been given to a well known duality: information=power;  from the middle ages census to the Royal Navy strategies based on statistical analysis. Concerning the current understanding of Data Science, its name has moved away from being a synonym for Data Analysis in the early 20th century to being associated, from the nineteen-nineties, with Knowledge Discovery (KD). One of the very best compilations of data science history and publications over the last 60 years can be found in this Forbes article.

Throughout history, the various methods and tools used have changed, developing as both the mathematical, extraction and software and hardware capabilities have increased in recent years. The consequent “sudden” eruption in Data Science jobs,  which identifies the market’s real interest in those potential benefits that knowledge extraction offers, is visually described with the following graph taken from Linkedin analytics:

Courtesy LinkedIn Corp.

Who?

If you are a lawyer or a doctor everybody knows more or less your level of education at university and the nature of your daily tasks. What is then a “Data Scientist”? The clear paths that could lead to a Data Science career are not so defined and are difficult to identify. The so called “Sexiest Job of the 21st century” (according to the Harvard Business Review), needs a common definition and even specific university degrees.  The data jockeys that have always been employed in Wall Street are no longer alone. Meanwhile the scope and variety of data now available is a non-stop, growing, force resulting in operational, statistical and even hacking backgrounds being welcome to extract value from it. More information about data scientist careers and the main disciplines can be found in this excellent article from naturejobs.com.

In order to understand Data Science job titles, we recommend you also have a look at this article by Vincent Granville from DataScienceCentral. It’s a living tongue twister: data mining activity done by a data scientist regarding data scientist job titles. Summing it up, it is pretty similar to the following recipe: Take a mixer from the kitchen; add the words “Data” “Analytics” “Scientist”; switch it on; include some institutional label “director” “Junior” “Manager”. An additional optional topping could be your university degree “engineer” “mathematician”. There you have one of the possible names of current data scientist.

Which?

Which data is “datascience-able”? As we described in our previous post about Data Science, there is huge potential in almost every imaginable field that could provide sufficient quality data for analysis. Although, even where the date is available, there are challenges faced,  generally connected with data storing and managing capabilities. These challenges are covered in detail in the Innaxis blogpost, “The benefits and challenges of Big Data”. One of the remarkable and exciting things about Data Science is that there is additional knowledge to extract from data sets that at first sight are not expected to provide anything beyond the obvious potential from the so called “direct” datasets. The reality is it’s hard to know which data sets will add value before testing them with Data Science. When discovered, hidden patterns and unseen correlations are really adding more valuable knowledge to entities than direct cause-and-effect relationships. They represent being one step ahead, which is crucial in the highly competitive world in which we are living.

By Héctor Ureta – Collaborative R&D Aerospace Engineer at Innaxis

 

 

Guardar

Wh- Questions about Data Science

The five Wh’s of Data Science – What, Why, When, Who and Which.

While preparing the upcoming October workshop in Data Science, Innaxis has gathered wh- questions and simple answers about the “new reality” of data science. We also provide links to pages where more information about these important questions have been provided.

What?

The basic answer to what is Data Science could be “a set of fundamental principles that support and guide the principled extraction of information and knowledge from data”. Definitions, especially of new terms should remain simple despite the urge to make them complicated. Furthermore, the boundaries of Big Data, Data Science, Statistics and Data Mining definitions are not so discernible and include common principles and tools and, importantly, the same aim: extraction of valuable information.

Why?

What is the reason for extracting information from data? There is a brilliant quote by Jean Baudrillard “Information can tell us everything. It has all the answers. But they are answers to questions we have not asked, and which doubtless don’t even arise” In this context, proper data science is [ generally ]  neither basic science nor long term research; it is considered an extremely valuable resource for the creation of business. Mining large amounts of both structured and unstructured data to identify patterns that can directly help an organization in terms of costs, in creating customer profiles, increasing efficiencies, recognizing new market opportunities and enhancing the organization’s competitive advantage.

When?

Through history, an extensive list of names have been given to a well known duality: information=power;  from the middle ages census to the Royal Navy strategies based on statistical analysis. Concerning the current understanding of Data Science, its name has moved away from being a synonym for Data Analysis in the early 20th century to being associated, from the nineteen-nineties, with Knowledge Discovery (KD). One of the very best compilations of data science history and publications over the last 60 years can be found in this Forbes article.

Throughout history, the various methods and tools used have changed, developing as both the mathematical, extraction and software and hardware capabilities have increased in recent years. The consequent “sudden” eruption in Data Science jobs,  which identifies the market’s real interest in those potential benefits that knowledge extraction offers, is visually described with the following graph taken from Linkedin analytics:

Courtesy LinkedIn Corp.

Who?

If you are a lawyer or a doctor everybody knows more or less your level of education at university and the nature of your daily tasks. What is then a “Data Scientist”? The clear paths that could lead to a Data Science career are not so defined and are difficult to identify. The so called “Sexiest Job of the 21st century” (according to the Harvard Business Review), needs a common definition and even specific university degrees.  The data jockeys that have always been employed in Wall Street are no longer alone. Meanwhile the scope and variety of data now available is a non-stop, growing, force resulting in operational, statistical and even hacking backgrounds being welcome to extract value from it. More information about data scientist careers and the main disciplines can be found in this excellent article from naturejobs.com.

In order to understand Data Science job titles, we recommend you also have a look at this article by Vincent Granville from DataScienceCentral. It’s a living tongue twister: data mining activity done by a data scientist regarding data scientist job titles. Summing it up, it is pretty similar to the following recipe: Take a mixer from the kitchen; add the words “Data” “Analytics” “Scientist”; switch it on; include some institutional label “director” “Junior” “Manager”. An additional optional topping could be your university degree “engineer” “mathematician”. There you have one of the possible names of current data scientist.

Which?

Which data is “datascience-able”? As we described in our previous post about Data Science, there is huge potential in almost every imaginable field that could provide sufficient quality data for analysis. Although, even where the date is available, there are challenges faced,  generally connected with data storing and managing capabilities. These challenges are covered in detail in the Innaxis blogpost, “The benefits and challenges of Big Data”. One of the remarkable and exciting things about Data Science is that there is additional knowledge to extract from data sets that at first sight are not expected to provide anything beyond the obvious potential from the so called “direct” datasets. The reality is it’s hard to know which data sets will add value before testing them with Data Science. When discovered, hidden patterns and unseen correlations are really adding more valuable knowledge to entities than direct cause-and-effect relationships. They represent being one step ahead, which is crucial in the highly competitive world in which we are living.

By Héctor Ureta – Collaborative R&D Aerospace Engineer at Innaxis

 

 

Guardar

Guardar

Innaxis presents paper on new approach to safety at the USA and Europe ATM R&D Seminar

Safety is a critical aspect of air traffic management and it receives significant attention from the research community. This criticality leads to lower innovation in different safety aspects, ensuring that only well-known and established procedures and technologies are applied. In this context, new ways to innovate in safety assessment techniques could not only provoke a significant change in the safety levels but also enable technologies and procedures through easier and more straightforward safety analysis.

Innaxis is a firm believer in the potential of complex networks analysis and the power of Data Science techniques. We will present the paper Synchronization Likelihood in Aircraft Trajectories in the next USA/Europe Air Traffic Management R&D Seminar, held from June 10 to 13, 2013 in Chicago. We strongly believe these techniques will set the foundation for new ways of analysing safety levels in different contexts; from providing new techniques and correctly evaluating the safety levels of large airspace blocks to the actual development of predictive analytics that would assist in the implementation of new automation technologies.

Please attend Massimiliano Zanin’s presentation of his paper if you are attending the Seminar and do not hesitate to contact him if this area is of interest to you. Massimiliano is reachable on mz@innaxis.org.

 

The Federal Aviation Administration and the EUROCONTROL Organization will host the Tenth USA/Europe Seminar on ATM R&D June 10-13, 2013 in Chicago, IL, USA.

Turning Big Data into Knowledge

In this two-part blog post we first look at the emergence of Big Data and the challenges it brings. In the next post we take a look at how these challenges are being addressed and the benefits this will unlock.

The emerging challenge of Big Data

Over the first years of the third millennium, worldwide digital data experienced huge growth, from scarce to super-abundant. Produced either by high-tech, scientific experiments or simply compiled from the now ubiquitous sources of automatic data collection through ordinary, every day transactions, this new reality of “Big data” -or being visually precise: “BIG DATA”- has resulted in the need for large-scale management and storage of data which cannot be handled with conventional tools.

Data management tools and hard drive capacity is not increasing fast enough to keep up with with this explosion in digital data world wide. While in economic production we are increasingly asked to “do more with less”, in contrast, in relation to data we are increasingly asked to “do more with more”.

What is the impact of this new reality and its potential benefits for the world of scientific research? Living in a world where economies, political freedom, social welfare and cultural growth increasingly depend on our technological capabilities, Big Data management, and most importantly, the knowledge that can be obtained from it, has enormous potential to benefit individual organizations.

There will be 2 parts covering this interesting reality: the first one including the current introduction and main big data sources, the second part will explain the Big Data challenges and benefits

Sources of Big Data

There are two common provenances of Big Data: On one hand, scientific experiments and tools, which were the first origin of Big Data specific study, mostly from the physics field, involving either macro or micro spatial scales. In the natural science field there is also, latterly, some biology studies, in particular, the DNA research field, starting to make use of Big Data.

On the other hand, one of the other significant sources is simply “everyday” data, the vast quantity of information that is now collected everyday at a million points of citizen interactions, collected through billions of worldwide embedded sensors.

Prepare for some big numbers:

Physics: Large Hadron Collider (LHC):

The world’s largest and highest-energy particle accelerator and one of the greatest engineering milestones ever achieved, the LHC produces around 25 petabytes of raw data per year capturing information for the over 300 (3×10^14) trillion proton-proton collisions. The information management is not easy even making use of the world largest computer grid (170 computing centres in 36 countries). The extraction of information and knowledge from these particularly huge datasets enabled the recent discovery of the Higgs Boson or “god particle” , a discovery that will probably result in the team behind the discovery being awarded the 2013 Physics Nobel prize.

Astronomy:

When the telescope from the Sloan Digital Sky Survey (SDSS) opened in 2000, it collected in one week more data than had been amassed in the entire history of astronomy. The new Large Synoptic Survey Telescope (LSST) commencing in 2020, will store in 5 days the same amount of data that SDSS will have collected over the 13 years since its inception. The storing and processing of these massive data sets from the gigapixel telescopes on the earth’s surface and in space, requires very specific tools that have been beyond the current state of the art. Consequently, astronomy, while trying to extract knowledge to create the most accurate “universe map”, is one of the leading protaganists in the field of Big Data.

Everyday data

This is the kind of data collected by countless automatic recording devices that collect data on what, how, and where we purchase, where we go and more. Its really outstanding how our lives have changed in the last couple of decades. All of these improvements and the inherent multiplication in consumption and goods, the resulting transactions, communications and more are being captured through hundreds of receptors.  In addition, user-generated content like digital media files, video, photos and blogs are being generated and stored on an unprecedented scale. Our locations (GPS-GLONASS-Galileo), money transactions (credit card, NFC payments etc), several different forms of communication and even what we think and we do in our free time (via social networks) is being collected by different corporate and government bodies.

One of the most accurate ever studies, published in the journal, ‘Science’ in 2007, revealed that humanity might store in that year around 295 exabytes (1 exabyte = 1,000,000terabytes) of data. The global data of 2009 was calculated to have reached 800 exabytes, meanwhile by the end of 2013 it is forecast to reach more than 3 zettabytes, (3*10^21 bytes, 3000000000000000000000 bytes). Impressive. Many challenges obviously arise with a, roughly, 60% yearly increase in data to be handled and issues abound in relation to how to process and extract useful information from what is 95% raw data.

In our next post we’ll look at how these challenges are being addressed.

Data Science and Complex Systems applied to Aviation – Innaxis Workshop

Businesses have entered a new era of decision-making and managing principles due to the pervasive availability of large amounts of data and the drastic growth, in the last decade, in the capacity to store and process data. Aviation is not an exception; Data Science principles have started to emerge through research programmes and practical applications in the field, albeit more slowly in some business functions than others.

Data Science, as a set of fundamental principles that support and guide the principled extraction of information and knowledge from data, leans on well-known data-mining techniques. However, it goes far beyond these techniques, with successful data-science paradigms that provide specific application guidelines. Data-driven decision making involves principles, processes and techniques for understanding phenomena via the automatic analysis of data.

A data-analytic thinking approach will help to envision opportunities for improving data-driven decision making in different contexts. There is strong evidence that aviation performance can be improved substantially via data-driven decision making and data-science techniques drawing on big data. Data-science will support data-driven decision making in the aviation field, where the underlying principles have yet to be established, in order to be able to realize its potential.

Innaxis participates in various research programmes and works on different applications in this field. We will be organizing a workshop on Data Science applied to Aviation in Madrid, Spain during October 2013. Please, write to us at innovation@innaxis.org if find this of interest and you would like to receive information on the workshop (please state “Data Science workshop” in the subject).

Connect with us!