Finding Hidden City Flights Using Neo4j & Python 🐍 Part #3
In this post, I’ll introduce you to hidden city ticketing using Python.
Hidden city ticketing lets you save a lot of money, especially when otherwise no other cheap fares are available. Of course we’ll do it the python and neo4j way!
What is hidden city ticketing?
Let’s assume you wanna fly from Greensboro, NC (GSO) to New York Newark, NJ (EWR). It would cost you 179 € for a one-way flight. Not bad, but we can get there cheaper…
If you search for a flight that connects in Newark, but continues somewhere else the price for the same flight drops to 133 €. Instead of booking a flight to Newark you would book one where the final destination is Providence and get off the plane in Newark.
Disclaimer: Hidden City is a great way to save money, but it also violates the terms of service of some airlines. You, as a passenger and customer are solely responsible for your actions. It’s not the authors intent to support you in doing anything fraudulent. Instead this post should be a proof of concept how neo4j and python can be combined to find low airline fares.
Note: Hidden City Ticketing only works for one-way flights, as the other segments become invalid if you didn’t fly them.
How to store the flight data in Neo4j?
Great if you made it past the disclaimer, let’s get started with the fun part. Before we can proceed, I’d like to talk about how I store the flights in Neo4j database.
|
|
I am calling this function for every segment of a flight that I search. This function creates the relationship between the date, the origin and destination airport and the flight which is also saved as a node. You may have also noticed that I am also passing a bunch of properties like distance and price.
Let us look at a simple snippet of code which I use for looping over the segments of the flights and insert them into the database.
|
|
The MERGE-Clause is really useful in this case, as it only creates the nodes if they are not already stored in the database. That way I can avoid having hundreds of nodes for December.
The image below shows what the origin-, destination airport, day, month and flight node look like once they’ve been inserted into Neo4j. It shows a number of possible connections from GSO to EWR on 06/06/2018. Prior to creating this image I’ve searched and inserted GSO->EWR and GSO->PVD.
The Cypher Query for finding these nonstop connections is fairly simple:
MATCH con= (:Month{month:6})-[:HAS_DAY]->(:Day{day:6})-[:HAS_AIRPORT]->(:Airport{code:'GSO'})-[r*..6]->(:Airport{code:'EWR'}) return con
You may be wondering what these connections with multiple flight numbers are. These are additional flights (“hidden flights”) I am inserting whenever a search consists out of multiple flights. I basically join the flight numbers and also save the final destination as property. That way I have much more combinations in my database even if I haven’t proactively searched them and I’ll be able to retrieve them using my standard query. Note, we can’t just insert a single relationship between the airports of the segments, as the flights have to be taken in the specific order as ticketed.
How to find them the pythonic way?
You may now be saying how did I know that PVD is a possible destination for our journey which drops the price of our flight and still connects in EWR. At first, I was searching the direct connections departing from EWR from openflights.org’ route list and then inserting those into the database.
|
|
And in the next step, I am basically searching this list of airports as my destination. We could swap the search_flights function for the search_flights_multi, but I deliberately kept it as I get more connections (even if they might not even connect in EWR) which later on might come in handy.
|
|
The last step would be to query the database for the cheapest connection, whether it’s hidden city ticket, a connection or just a straight direct flight. We can search the cheapest flight with this cypher query.
MATCH con= (:Month{month:6})-[:HAS_DAY]->(:Day{day:6})-[:HAS_AIRPORT]->(:Airport{code:'GSO'})-[:HAS_FLIGHT]->(:Flight{})-[f:TO]->(:Airport{code:'EWR'}) return con order by f.price limit 1
tl;dr
This fast and easy run down demonstrates that hidden city flights can efficiently be searched using Python and Neo4j. By using the information from openflights.org, we’ve been able to find some great airfares. There is still a lot of room for improvement, but I’ll cover that in another post.
Cheers Alex
PS: Thanks also Philippe Khin for his blog posts and code snippets about a different project, which had a lot of overlap with my own project. These came in handy!