It is well known that the problem of building a schedule plan for an airline is a difficult one. The core difficulty is indeed to take into accounts the multiple constraints of aircraft, crew, maintenance, passenger correspondence etc, while trying to capture as much market as possible, all with minimum expenses. It is similar to riding a bike... except you do not know who is riding, where the wheels are, where you are supposed to go and if you should buy a car instead.
One of the most important constraints is the aircraft, since:
- it is impossible to fly without it (rockets are quite unsafe to land at airports),
- it is quite expensive (I've been told).
Let's imagine that, as an airline, you roughly know what cities you want to connect and how many passengers should travel with you. Where should your existing aircraft fly? Should you buy one? Do you have different strategies if you are a low-cost carriers or a traditional one? This is roughly the answers that our agents are trying to answer in the second block of our Vista model, the "schedule mapper". Of course, since our model simulates all the airlines in Europe, we cannot dedicate as much time (real and computational) as airlines do in reality to their schedule plan. But, like for the other parts of Vista, we are trying to catch to main behaviours of the system.
As usual, we start from what we can observe from data. For instance, it is common to say that aircraft usually go back and forth, and that some of them do sometimes triangular flights. Is that true? To investigate this, we take a three days time window where we track the itineraries of aircraft in terms of airports, defined as "patterns", using DDR data. What kind of patterns 'live' in this environment? How to classify them?
First, like taxonomists do not care about the specifics of a single individual to make a classification, we should not take into account the details of the patterns to classify them (in fact, that's the definition of a classification...). So for instance Rome - Paris - Rome has the same pattern than Frankfurt - London - Frankfurt, which can be rewritten 1 - 2 - 1 for instance. If a specific sequence is an individual in zoology, a pattern is thus akin to a taxon.
We can roughly divide these taxons into two "reigns": the ones which are closed (more explicitly have at least one closed loop), and the rest. For instance, an aircraft doing Paris - Frankfurt - Rome - Paris - Rome - Paris in three days has a closed pattern, whereas an aircraft doing Rome - Madrid - Barcelona is open. Of course, in the long run, most of aircraft do at least one full loop, but in three days some of them cannot make it. However, when counted in number of flights, most of them are closed in 3 days already, as shown in the figure below. In the following, we focus only on these closed taxons. Pretty much like one could focus on a study on mammals for instance, except that in this case, the mammals represent most of the animal kingdom.
Among them, some are more elemental than others, in the sense that they cannot be constructed from their peers. These are the ones which have exactly one closed loops. The ones present in the data are represented in the figure below, with their frequency of appearance (the number n corresponds to the number of airports in the loop). Most of them are single returns (1 - 2 - 1), triangular flights (1 - 2 - 3 - 1), and rectangular flights (1 - 2 - 3 - 4 - 1), and we focus on these three ones in the following. Note that rectangular flights seem more frequent than triangular ones, perhaps contrary to the popular belief.
All the other patterns can be constructed from these elementary ones, and we name them 'combined' patterns. For instance, (1 - 2 - 1 - 2 - 1) is composed of two single back and forth. In terms of zoology, it is a bit like saying that an elephant can be obtained by gluing a snake to a hippopotamus. Or that a giraffe is really nothing more that a horse with a periscope in the throat, which personally I believe very much. In any case, it easy to plot the frequency of appearance of these combined taxons, as shown in the figure below. Since all of them are coming from three taxons, we use notation the (X, Y, Z), where e.g. (2, 0, 0) represents two returns, (1, 1, 1), a return, a triangular flight and a rectangular one, etc. Some very rare patterns have been omitted in the figure. As expected from the previous figure, most of the aircraft goes back and forth during the three days. It is interesting to see that triangular flights are very under-represented, and that it is more frequent to have a rectangular flights every now and then, in combination with returns. Note that when a pattern features several returns, it is not necessarily between the same airports (e.g. Warsaw - Oslo - Warsaw - Vienna - Warsaw). In fact, we found that most of the combined patterns are 'impure', i.e. they are composed of elementary patterns with different airports (like gluing two birds of different colours for instance).
What does Vista do with this freak zoo? Well, the way the airlines choose implicitly the different patterns is a complex procedure, driven by the different constraints cited above. So the idea is that the best patterns should be selected for their efficiency, much like some taxons are selected by evolution based on their fitness in the given environment. Each taxon has also some particularities. For instance, flights using the taxon (4, 0, 1) mainly departs (from their first airport) in the early morning, whereas taxons (2, 0, 0) are used by flights departing more frequently in the late morning, and sometimes in the evening, as shown in the figure below. Other regularities can be found in terms of average turn-around times for instance.
In the model, we use all these data to build reasonable schedules by resampling the different taxons for each airline. This will be described in a later blog post. And no more weird animal crossings, we swear!