Entry level/Junior Data Scientist or Data Engineer

Innaxis is currently seeking for Data Scientists and Engineers to join its research and development team based in Madrid, Spain. Talented and highly motivated individuals who want to pursue and lead a career outside of the more mainstream, conventional alternatives. Individuals with a great dose of imagination, problem solving skills, flexibility and passion are encouraged to apply.

  • As a Data Engineer, you will help the team to design and integrate complete solutions for Big Data architectures; from data acquisition and ETL processes until storage and delivery for analysis, using the latest technologies and solutions for the ultimate performance.
  • As a Data Scientist, you will mainly assist the team to understand, analyse and mine data, but also to prepare and assess the quality of such. You will also develop methods for data fusion and anonymization. Ultimately your goal will be to extract the best knowledge and insights from data, despite technical limitations and committing with regulatory requirements.

About Innaxis

If not unique, Innaxis is at most not conventional: it is a private independent non-profit research institute focused on Data Science and its applications: most notoriously in aviation, air traffic management and mobility, among other areas.

As an independent entity, Innaxis determines its own research agenda and has now a decade of experience in European research programs with more than 30 successfully executed ones. New projects and initiatives are evaluated continuously and open to new opportunities and ideas proposed within the team.

The Innaxis team consist on a very interdisciplinary group of scientists, developers, engineers and program managers, together with an extensive network of external partners and collaborators, from private companies to universities, public entities and other research institutes.

Skills wanted

Our team work very closely on a daily basis, so a broader knowledge means a much better coordination. Therefore, there is a unique list of skills ideally wanted for both positions. Those skills would be then weighted/assessed as requirements or “bonus points” according to the candidate’s position of interest, i.e. Data Scientist or Data Engineer.

  • University degree, MSc or PhD on Data Science or Computer Science, or related field provided all other requirements are met.
  • No professional experience required, although it might be positively evaluated.
  • Proficient in a variety of programming languages, for instance: Python, Scala, Java, R or  C++ and up to date on the newest software libraries and APIs, e.g. Tensorflow, Theano.
  • Experience with acquisition, preparation, storage and delivery of data,  including concepts ranging from ETL to Data Lakes.
  • Knowledge of the most commonly used software stacks such as LAMP, LAPP, LEAP, OpenStack, SMACK or similar.
  • Familiar with some of the IaaS, PaaS and SaaS platforms currently available such as Amazon Web Services, Microsoft Azure, Google Cloud and similar.
  • Understanding of the most popular knowledge discovery and data mining problems and algorithms; predictive analytics, classification, map reduce, deep learning, random forest, support vector machines and such.
  • Continuous interest for the latest technologies and developments, e.g. blockchain, Terraform,
  • Excellent English communication skills. It is the working language at Innaxis.
  • And of course, great doses of imagination, problem solving skills, flexibility and passion.

Benefits

The successful candidate will be offered a Innaxis’ position as a Data Scientist or Data Engineer, including a unique set of benefits:

  • Being part of a young, dynamic, highly qualified, collaborative and heterogeneous international team.
  • Great flexibility in many aspects -including working hours, compatibilities and location- and most excellent working conditions.
  • A horizontal hierarchy, all researchers’ opinions matter.
  • Long term and stable position. Innaxis is steadily growing since its foundation ten years ago.
  • A fair salary according to the nature of the institute and adjusted to skills, experience and education with continuous revision.
  • Independence, as a non-profit and research-focused nature of Innaxis, the institute is driven by different forces than in the private sector, free of commercial and profit interests.
  • The possibility to develop a unique career outside of mainstream: academics, private companies and consulting.
  • No outsourcing whatsoever, all tasks will be performed at Innaxis offices.
  • An agile working methodology; Innaxis recently implemented JIRA/Scrum and all the research is done on a collaborative wiki/Confluence.

Apply

Interested candidates should send their CV, a research interest letter (around 400 words) and any other relevant information supporting their application to recruitment@innaxis.org You will be contacted further and a personal selection process will start.

 

SafeClouds kick off meeting

Safeclouds-post

SafeClouds.eu, a H2020 big data for safety project, coordinated by Innaxis, kicked off earlier this month.

SafeClouds is the recently launched H2020 aviation-safety project. It is coordinated by Innaxis, with 15 additional entities (including airlines, ANSPs, EASA, Eurocontrol, various research entities, etc) from 8 different countries.
The aim of SafeClouds is to improve aviation safety by developing state-of-art big data and data analysis tools. The consortium will build a coordinated platform to combine and share data among different aviation actors.
img_6330

Information, time, knowledge

We live in a world that gathers exponentially increasing amounts of information/data coming from endless sources, and a limited time to analyse it.

What is the current speed of “creating” information/data? What about knowledge/wisdom? What is the role of Data Science and Big Data in this context?

Food for thought for your -deserved- summer break! Enjoy, charge your batteries and get ready for a 2016/2017 year full of cutting-edge research, innovation (and Innaxis blogposts!)

 

tumblr_o5px3cKzOs1rj9sw5o1_500

Guardar

Innaxis at EASA-OPTICS conference. Cologne 12-14 April

Developing the future of a safe and growing aviation business, whilst also reassuring the travelling public that it is safe to fly, is a major vision for both EU and national aviation policies, however:

What role do policy makers play?

What are the recent, implemented safety measures?

Who is guiding the safety topics within aviation research?

EASA, the European Commission, the Advisory Council of Aviation Research & Innovation in Europe (ACARE), and the EU’s OPTICS Project organised a three day event in Cologne (12-14 April) in order to provide answers to these types of imperative questions, and furthermore define the way forward to ensure continued aviation safety in Europe. The event had a number of presentations and workshops within several aviation safety areas.

Two Innaxis’ team members David Perez (dp@innaxis.org) and Hector Ureta (hu@innaxis.org) attended the interesting event and took part in several of the workshops, explaining how can Data Science and BIG data can boost aviation safety. Hector  also presented some of the latest data science techniques and tools in safety research, based on SESAR-COMPASS project, during the third day of the event.

 

IMG_20160414_144630

Hector Ureta (Innaxis) presenting the Data Science research done in COMPASS (Cologne 14 April 2016)

 

The presentation, “Data science and data mining techniques to improve aviation safety: features, patterns and precursors”, is available online in this link.

If you’d like further information about data science in aviation, big data or aviation safety research completed by Innaxis, please feel free to contact Innaxis team (innovation@innaxis.org).

 

FullSizeRender

More details of the event available in EASA and OPTICS websites:

Guardar

Innaxis at SIDs 2015

Every year we are excited to participate in the SESAR Innovation Days (SIDs) 2015, organised by Eurocontrol and the SJU, which will take place this year in Bologna, Italy. In 2015, Innaxis has been particularly busy in long term aviation research and as in previous SIDs, we will be continue to be especially engaged during the event. We look forward to discussing many innovative research topics and provide an update on Innaxis’ efforts. These include:

 

  • The ComplexWorld network has greatly evolved within the last five years. On the first day of SIDs, the network coordinator, Paula Lopez, will present an overview of the ComplexWorld evolution since it was launched with special emphasis on the key 2015 outcomes and 2016 initiatives. Please feel free to reach out to Paula (plc@innaxis.org) if you are interested in obtaining more information on the network activities.
  • At Innaxis we have been working on new air transport metrics and indicators for the last few years. We have been crafting a tool to compute those metrics against real traffic data along with advanced visual tools to help understand these complex metrics. On the day before SIDs officially commences, Monday Nov. 30, we will be hosting a workshop on air transport resilience metrics: The 2015 Resilience2050 Workshop. Additional information along with free registration can be found here. Please contact Hector Ureta (hu@innaxis.org) for further information on the workshop and/or resilience research.
  • The EC four hour door-to-door challenge warrants more effort to bring everyone on the same page. Building new modelling tools, metrics and data analysis capabilities will help to understand how we may achieve this goal. Innaxis has strong expertise within Mobility, with coordination efforts in the Horizon 2020 Coordination and Support Action DataSET 2050, along with the most recent SESAR-CASSIOPEIA agent-based modelling framework. These research initiatives may be of interest to you if you are working with mobility. Please do not hesitate to contact our architect of mobility tools for ATM, Samuel Cristóbal (sc@innaxis.org) and Jorge Martin (jm@innaxis.org) who will be at the conference.
  • Exploring trade-offs between different stakeholders has always been one of the main research priorities within Innaxis. For this particular SIDs Innaxis has liaised with the University of Westminster and Belgocontrol to present the paper “Controller time and delay costs -a trade-off analysis”. The paper will be presented within the technical sessions of the SIDs.
  • Data Science has also been an area of major interest at Innaxis over the last few years. We are working on different elements of a big data / data science infrastructure to enable major data mining efforts within Air Transport, including: current delay propagation evaluations, airport and airline resilience against disturbances, and an evaluation on new paradigms for safety monitoring, all of which is contingent on powerful deep analytics. We have advanced very far in this area, for which we are very proud. Our colleague Massimiliano Zanin will be at the conference and can speak to these efforts. Feel free to contact him at mz@innaxis.org.
  • In addition, complex network theory has also been prioritised within Innaxis’s research efforts and has been increasingly used to study the air transport system by defining static or dynamic structures to characterise how airports are connected. Our ComplexWorld PhD student, Seddik Belkoura, will present a poster entitled “A young person’s guide to the reconstruction of air transport networks” depicting how the sampling processes intervening in the construction of such structures can affect the topological stability of the final system’s representation. Please, contact him if this is of your interest (sb@innaxis.org).
  • Information Management has also been an area of interest for us. In particular, we think Data Science paradigms can only be fully enabled if data is transparently shared across stakeholders, which can only be achieved with the right secure and encrypted mechanisms are put in place. Related to this, the Innaxis team will present a talk about the main results of the SecureDataCloud project. Again, please reach out to Massimiliano Zanin should this be your area of interest.
  • Last, but not least, we will also serve as SIDs rapporteurs and help Eurocontrol extract some conclusions as well as provide our own views on future research avenues. Carlos Alvarez will lead this during the closing session. Please, contact Carlos (calvarez@innaxis.org) if you’d like to continue the conversation!

 

We hope we have many opportunities to interact next week and hope you find our activities interesting and motivating for future initiatives.
See you soon in Bologna!

Roadmap for data-driven applications, from a concept to a fully integrated system

Doubtless Data Science and Big Data applications have been growing fast in the last years, the blooming of new data sources and the emergence of accessible and affordable cloud infrastructures have contributed widely to this movement. However only a few applications reach the necessary level of maturity to become fully functional systems. Most of data driven applications reach its maximum level of maturity as a “proof of concept,” and one of the main reasons being it does not have a solid integration program with current systems or a successful validation plan.

The path from raw data to wisdom is long and complex. Moving from data to information requires comprehension on data relations and the overall context. Then, knowledge is reached only after fully understanding the patterns within information. Finally, it is required to get deep into the details (the underlying principles), to gain insight and be able to move from knowledge to real and “applicable” wisdom.

There are three pillars in any Data-Driven application, namely: Data Acquisition (DA), Information Processing (IP) and Knowledge Discovery (KD). The Data Acquisition should cover not only the technical aspects of consuming services or data sources with different formats, coverage or scope but also tackle the sociological, legal and limiting aspects of every data source (e.g. data provenance). The Information Processing should be built over a solid mathematical framework, from Data Mining algorithms to Simulation Tools. Information Processing should also provide answers in terms of performance and precision to support the application concept. Lastly, Knowledge Discovery should serve as a interface from processed data to human perception, from metrics to representation of those (e.g. dashboards). In some cases integration of those into the current system may be critical.

The application must be developed at the three fronts simultaneously, starting with a concept, a purely speculative idea, to a proof of concept, a prototype application tested only over a simplified set of data. Then the proof of concept should be further tested in a laboratory environment, in which data samples are produced artificially to feed the application and carry on performance and reliability tests and then leveraged into a relevant environment. In a relevant environment the application should be capable of working with real-time data feeds although not yet at an operational level, and robustness and stability should be assessed at this stage. Finally, applications should be moved for a relevant environment to the actual operations.

Wh- Questions about Data Science

The five Wh’s of Data Science – What, Why, When, Who and Which.

While preparing the upcoming October workshop in Data Science, Innaxis has gathered wh- questions and simple answers about the “new reality” of data science. We also provide links to pages where more information about these important questions have been provided.

What?

The basic answer to what is Data Science could be “a set of fundamental principles that support and guide the principled extraction of information and knowledge from data”. Definitions, especially of new terms should remain simple despite the urge to make them complicated. Furthermore, the boundaries of Big Data, Data Science, Statistics and Data Mining definitions are not so discernible and include common principles and tools and, importantly, the same aim: extraction of valuable information.

Why?

What is the reason for extracting information from data? There is a brilliant quote by Jean Baudrillard “Information can tell us everything. It has all the answers. But they are answers to questions we have not asked, and which doubtless don’t even arise” In this context, proper data science is [ generally ]  neither basic science nor long term research; it is considered an extremely valuable resource for the creation of business. Mining large amounts of both structured and unstructured data to identify patterns that can directly help an organization in terms of costs, in creating customer profiles, increasing efficiencies, recognizing new market opportunities and enhancing the organization’s competitive advantage.

When?

Through history, an extensive list of names have been given to a well known duality: information=power;  from the middle ages census to the Royal Navy strategies based on statistical analysis. Concerning the current understanding of Data Science, its name has moved away from being a synonym for Data Analysis in the early 20th century to being associated, from the nineteen-nineties, with Knowledge Discovery (KD). One of the very best compilations of data science history and publications over the last 60 years can be found in this Forbes article.

Throughout history, the various methods and tools used have changed, developing as both the mathematical, extraction and software and hardware capabilities have increased in recent years. The consequent “sudden” eruption in Data Science jobs,  which identifies the market’s real interest in those potential benefits that knowledge extraction offers, is visually described with the following graph taken from Linkedin analytics:

Courtesy LinkedIn Corp.

Who?

If you are a lawyer or a doctor everybody knows more or less your level of education at university and the nature of your daily tasks. What is then a “Data Scientist”? The clear paths that could lead to a Data Science career are not so defined and are difficult to identify. The so called “Sexiest Job of the 21st century” (according to the Harvard Business Review), needs a common definition and even specific university degrees.  The data jockeys that have always been employed in Wall Street are no longer alone. Meanwhile the scope and variety of data now available is a non-stop, growing, force resulting in operational, statistical and even hacking backgrounds being welcome to extract value from it. More information about data scientist careers and the main disciplines can be found in this excellent article from naturejobs.com.

In order to understand Data Science job titles, we recommend you also have a look at this article by Vincent Granville from DataScienceCentral. It’s a living tongue twister: data mining activity done by a data scientist regarding data scientist job titles. Summing it up, it is pretty similar to the following recipe: Take a mixer from the kitchen; add the words “Data” “Analytics” “Scientist”; switch it on; include some institutional label “director” “Junior” “Manager”. An additional optional topping could be your university degree “engineer” “mathematician”. There you have one of the possible names of current data scientist.

Which?

Which data is “datascience-able”? As we described in our previous post about Data Science, there is huge potential in almost every imaginable field that could provide sufficient quality data for analysis. Although, even where the date is available, there are challenges faced,  generally connected with data storing and managing capabilities. These challenges are covered in detail in the Innaxis blogpost, “The benefits and challenges of Big Data”. One of the remarkable and exciting things about Data Science is that there is additional knowledge to extract from data sets that at first sight are not expected to provide anything beyond the obvious potential from the so called “direct” datasets. The reality is it’s hard to know which data sets will add value before testing them with Data Science. When discovered, hidden patterns and unseen correlations are really adding more valuable knowledge to entities than direct cause-and-effect relationships. They represent being one step ahead, which is crucial in the highly competitive world in which we are living.

By Héctor Ureta – Collaborative R&D Aerospace Engineer at Innaxis

 

 

Guardar

Guardar

Wh- Questions about Data Science

The five Wh’s of Data Science – What, Why, When, Who and Which.

While preparing the upcoming October workshop in Data Science, Innaxis has gathered wh- questions and simple answers about the “new reality” of data science. We also provide links to pages where more information about these important questions have been provided.

What?

The basic answer to what is Data Science could be “a set of fundamental principles that support and guide the principled extraction of information and knowledge from data”. Definitions, especially of new terms should remain simple despite the urge to make them complicated. Furthermore, the boundaries of Big Data, Data Science, Statistics and Data Mining definitions are not so discernible and include common principles and tools and, importantly, the same aim: extraction of valuable information.

Why?

What is the reason for extracting information from data? There is a brilliant quote by Jean Baudrillard “Information can tell us everything. It has all the answers. But they are answers to questions we have not asked, and which doubtless don’t even arise” In this context, proper data science is [ generally ]  neither basic science nor long term research; it is considered an extremely valuable resource for the creation of business. Mining large amounts of both structured and unstructured data to identify patterns that can directly help an organization in terms of costs, in creating customer profiles, increasing efficiencies, recognizing new market opportunities and enhancing the organization’s competitive advantage.

When?

Through history, an extensive list of names have been given to a well known duality: information=power;  from the middle ages census to the Royal Navy strategies based on statistical analysis. Concerning the current understanding of Data Science, its name has moved away from being a synonym for Data Analysis in the early 20th century to being associated, from the nineteen-nineties, with Knowledge Discovery (KD). One of the very best compilations of data science history and publications over the last 60 years can be found in this Forbes article.

Throughout history, the various methods and tools used have changed, developing as both the mathematical, extraction and software and hardware capabilities have increased in recent years. The consequent “sudden” eruption in Data Science jobs,  which identifies the market’s real interest in those potential benefits that knowledge extraction offers, is visually described with the following graph taken from Linkedin analytics:

Courtesy LinkedIn Corp.

Who?

If you are a lawyer or a doctor everybody knows more or less your level of education at university and the nature of your daily tasks. What is then a “Data Scientist”? The clear paths that could lead to a Data Science career are not so defined and are difficult to identify. The so called “Sexiest Job of the 21st century” (according to the Harvard Business Review), needs a common definition and even specific university degrees.  The data jockeys that have always been employed in Wall Street are no longer alone. Meanwhile the scope and variety of data now available is a non-stop, growing, force resulting in operational, statistical and even hacking backgrounds being welcome to extract value from it. More information about data scientist careers and the main disciplines can be found in this excellent article from naturejobs.com.

In order to understand Data Science job titles, we recommend you also have a look at this article by Vincent Granville from DataScienceCentral. It’s a living tongue twister: data mining activity done by a data scientist regarding data scientist job titles. Summing it up, it is pretty similar to the following recipe: Take a mixer from the kitchen; add the words “Data” “Analytics” “Scientist”; switch it on; include some institutional label “director” “Junior” “Manager”. An additional optional topping could be your university degree “engineer” “mathematician”. There you have one of the possible names of current data scientist.

Which?

Which data is “datascience-able”? As we described in our previous post about Data Science, there is huge potential in almost every imaginable field that could provide sufficient quality data for analysis. Although, even where the date is available, there are challenges faced,  generally connected with data storing and managing capabilities. These challenges are covered in detail in the Innaxis blogpost, “The benefits and challenges of Big Data”. One of the remarkable and exciting things about Data Science is that there is additional knowledge to extract from data sets that at first sight are not expected to provide anything beyond the obvious potential from the so called “direct” datasets. The reality is it’s hard to know which data sets will add value before testing them with Data Science. When discovered, hidden patterns and unseen correlations are really adding more valuable knowledge to entities than direct cause-and-effect relationships. They represent being one step ahead, which is crucial in the highly competitive world in which we are living.

By Héctor Ureta – Collaborative R&D Aerospace Engineer at Innaxis

 

 

Guardar

Big Data challenges and benefits

In last week’s post – the first of a two part discussion on Big Data – an introduction to Big Data was made, including the main sources, currently, of Big Data. After explaining the state of the art of this field, in this second post the challenges and possible benefits of the new reality of Big Data will be tackled.

Data storing

Hard drive capacity is not increasing fast enough to keep up with the explosion of digital data world wide. A 50-fold increase in global data is forecast by 2020, but hard drives are likely to grow only by a factor of 15, even considering all the latest advances in data systems. No storage technology has been developed that can scale up to the pegabyte and beyond. Today piling conventional hard drive upon hard drive is the common procedure to handle data sets of this size.

However, there are a couple of research lines addressing this problem, both in their infancy:

Diamonds and quantum computers

Diamonds are not just for jewellery. Researchers at the Max Planck Institute of Quantum Optics and Caltech (Harvard), were able to store a quantum state in a diamond crystal for more than second, at room temperature. Doesn’t sound like much, but in quantum physics, that’s a lifetime, and a big step toward building a quantum computer with magnificent storage capacity.

Bacteria

The Chinese University of Hong Kong has recently discovered how to store encrypted data in the DNA of E. coli bacteria. Such “biostorage” could be used for future Big Data storages, especially considering a single gram of the bacteria could hold as much as 450 conventional 2-terabyte hard drives.

Big Data benefits

As explained in the first Big Data post, living in a world where economies, political freedom, social welfare and cultural growth increasingly depend on our technological capabilities, big data management and, most importantly, the knowledge that can be obtained from it, has enormous potential to benefit individual organizations. While in production teams it is increasingly asked to “do more with less”, in relation to data we are asked to “do more with more”.

There remains much unexplored terrain in Big Data and traditional databases and analytical platforms are not able to meet the challenges required of it. Capturing, filtering, storing and analysing Big Data flows has huge potential outcomes: Innovative new products, services and business models, better decision making, better productivity and higher revenues.

Turning Big Data into Knowledge

In this two-part blog post we first look at the emergence of Big Data and the challenges it brings. In the next post we take a look at how these challenges are being addressed and the benefits this will unlock.

The emerging challenge of Big Data

Over the first years of the third millennium, worldwide digital data experienced huge growth, from scarce to super-abundant. Produced either by high-tech, scientific experiments or simply compiled from the now ubiquitous sources of automatic data collection through ordinary, every day transactions, this new reality of “Big data” -or being visually precise: “BIG DATA”- has resulted in the need for large-scale management and storage of data which cannot be handled with conventional tools.

Data management tools and hard drive capacity is not increasing fast enough to keep up with with this explosion in digital data world wide. While in economic production we are increasingly asked to “do more with less”, in contrast, in relation to data we are increasingly asked to “do more with more”.

What is the impact of this new reality and its potential benefits for the world of scientific research? Living in a world where economies, political freedom, social welfare and cultural growth increasingly depend on our technological capabilities, Big Data management, and most importantly, the knowledge that can be obtained from it, has enormous potential to benefit individual organizations.

There will be 2 parts covering this interesting reality: the first one including the current introduction and main big data sources, the second part will explain the Big Data challenges and benefits

Sources of Big Data

There are two common provenances of Big Data: On one hand, scientific experiments and tools, which were the first origin of Big Data specific study, mostly from the physics field, involving either macro or micro spatial scales. In the natural science field there is also, latterly, some biology studies, in particular, the DNA research field, starting to make use of Big Data.

On the other hand, one of the other significant sources is simply “everyday” data, the vast quantity of information that is now collected everyday at a million points of citizen interactions, collected through billions of worldwide embedded sensors.

Prepare for some big numbers:

Physics: Large Hadron Collider (LHC):

The world’s largest and highest-energy particle accelerator and one of the greatest engineering milestones ever achieved, the LHC produces around 25 petabytes of raw data per year capturing information for the over 300 (3×10^14) trillion proton-proton collisions. The information management is not easy even making use of the world largest computer grid (170 computing centres in 36 countries). The extraction of information and knowledge from these particularly huge datasets enabled the recent discovery of the Higgs Boson or “god particle” , a discovery that will probably result in the team behind the discovery being awarded the 2013 Physics Nobel prize.

Astronomy:

When the telescope from the Sloan Digital Sky Survey (SDSS) opened in 2000, it collected in one week more data than had been amassed in the entire history of astronomy. The new Large Synoptic Survey Telescope (LSST) commencing in 2020, will store in 5 days the same amount of data that SDSS will have collected over the 13 years since its inception. The storing and processing of these massive data sets from the gigapixel telescopes on the earth’s surface and in space, requires very specific tools that have been beyond the current state of the art. Consequently, astronomy, while trying to extract knowledge to create the most accurate “universe map”, is one of the leading protaganists in the field of Big Data.

Everyday data

This is the kind of data collected by countless automatic recording devices that collect data on what, how, and where we purchase, where we go and more. Its really outstanding how our lives have changed in the last couple of decades. All of these improvements and the inherent multiplication in consumption and goods, the resulting transactions, communications and more are being captured through hundreds of receptors.  In addition, user-generated content like digital media files, video, photos and blogs are being generated and stored on an unprecedented scale. Our locations (GPS-GLONASS-Galileo), money transactions (credit card, NFC payments etc), several different forms of communication and even what we think and we do in our free time (via social networks) is being collected by different corporate and government bodies.

One of the most accurate ever studies, published in the journal, ‘Science’ in 2007, revealed that humanity might store in that year around 295 exabytes (1 exabyte = 1,000,000terabytes) of data. The global data of 2009 was calculated to have reached 800 exabytes, meanwhile by the end of 2013 it is forecast to reach more than 3 zettabytes, (3*10^21 bytes, 3000000000000000000000 bytes). Impressive. Many challenges obviously arise with a, roughly, 60% yearly increase in data to be handled and issues abound in relation to how to process and extract useful information from what is 95% raw data.

In our next post we’ll look at how these challenges are being addressed.

Connect with us!