• Analytics Wisdom
  • Posts
  • Airline Revenue Optimization: Ticket Pricing Analytics with Polynomial Regression

Airline Revenue Optimization: Ticket Pricing Analytics with Polynomial Regression

How airlines can optimize their revenue by leveraging dynamic pricing. SaaS revenue management solutions, and potential future developments in airline pricing.

Introduction to Non-Tech Industry Problems

Let this issue mark the start of “Beyond Tech Analytics,” a new section of my analytics work. This section refers to applying data analysis, data science techniques, and tools to industries outside the traditional technology sector, such as aviation, logistics, healthcare, commodities, and manufacturing. I will keep researching and writing to articulate how analytics can provide immense value to the above industries and their stakeholders.

The below reasons, out of many others, contribute to the slow innovation in these industries, so their decision-making has to be very data-driven.

  1. Regulatory and legal constraints - industries such as healthcare and aviation are heavily regulated, and innovation must comply with stringent safety, security, and privacy regulations, which can slow down the innovation process.

  2. High capital costs and long development cycles - innovation often requires significant investments and long development cycles, which can be challenging for companies in capital-intensive industries with limited resources.

  3. Lack of incentives - industries dominated by a few prominent players may have less incentive to innovate than smaller, more competitive industries. In addition, companies may focus more on short-term profits rather than long-term innovation.

  4. Complexity and interdependence - industries such as logistics and manufacturing are complex and highly interdependent, making it challenging to introduce new technologies without disrupting the entire supply chain or manufacturing process.

The Analysis

It’s not that complicated

Challenge

For the analysis, we refer to the airline as “Substack Airlines.” Substack airlines have found optimizing their revenue on flight routes challenging: CDG-FRA, LHR-CDG, and FRA-LHR hence want to change their ticket pricing strategy.

They’ve provided us with aggregate metrics for the ticket prices + seats available on each flight per route by weekday, month, and year(2020-2022).


Analysis Method

Let’s visualize this dataset for medians of the ticket prices and seats available by month. Perhaps we spot some seasonality, IMO I’d expect some seasonality if they want to turn a profit or even break even.

This definitely is not priced right

We’d expect high-priced tickets when fewer seats are available, but that’s not the case here. The reason for non-correlational tickets to seat availability is that the airline does not subscribe to Analytics Wisdom…….……and because they haven’t composed a demand score for their flights.

Hence they should do just the same. Here’s the subscribe button and continuation of the article.

See what I did there?

Demand Score

The demand score is a metric used to represent the demand for flight tickets in a particular month and year. It will be calculated based on flight prices and seat availability, indicating the route's popularity during that time. We will then use the demand score in a linear or polynomial regression model predictor variable to estimate the median ticket price for a given month and year.

This code calculates the demand score as a weighted average of the normalized Total Flights and Median Available Seats columns.

Demand Score = (Normalized Flights flight_weight) + (Normalized Seats seat_weight)

# Create a demand score that we can later utilize to change prices 

# Normalize the Total Flights and Median Available Seats columns
month_stats['Normalized Flights'] = (month_stats['Total Flights'] - month_stats['Total Flights'].min()) / (month_stats['Total Flights'].max() - month_stats['Total Flights'].min())
month_stats['Normalized Seats'] = (month_stats['Median Seats Available'].max() - month_stats['Median Seats Available']) / (month_stats['Median Seats Available'].max() - month_stats['Median Seats Available'].min())

# Set the weights for the columns
flight_weight = 0.6
seat_weight = 0.4

# Calculate the weighted average of the normalized columns
month_stats['Demand Score'] = (month_stats['Normalized Flights'] * flight_weight) + (month_stats['Normalized Seats'] * seat_weight)

This will result in a score that reflects the demand for flights based on the number of flights and seats available for each flight.

Polynomial Regression Model (Wtf is that? How does that help)

The Complex Definition:
A polynomial linear regression models the relationship between the independent and dependent variables as an nth-degree polynomial. It is used when the relationship between the variables is not linear and can help capture more complex relationships.

In simpler terms, imagine you have a toy car that moves forward when you push it. You can measure how far it goes based on how hard you push it. If you plot these measurements on a graph, the points follow a curve. A polynomial linear regression model is a way to find an equation for this curve, so you can predict how far the car will go if you push it harder or softer without trying it out every time. It's like magic!

We will train the model on the dataset for 2020 and 2021 and then test how it would have priced tickets for 2022. Here’s the code for the same.

from sklearn.preprocessing import PolynomialFeatures, StandardScaler

# Split the data into training and testing sets
train_data = month_stats.loc[month_stats.index.get_level_values('Year') < 2022]
test_data = month_stats.loc[month_stats.index.get_level_values('Year') == 2022]

# Create the polynomial regression model with degree 2
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(train_data[['Demand Score']])
X_test_poly = poly.transform(test_data[['Demand Score']])

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_poly)
X_test_scaled = scaler.transform(X_test_poly)

# Create the linear regression model
model = LinearRegression()

# Train the model on the training data
y_train = train_data['Median Ticket Price']
model.fit(X_train_scaled, y_train)

# Predict the median ticket price for the testing data
y_test = test_data['Median Ticket Price']
y_pred = model.predict(X_test_scaled)

# Multiply the predicted median ticket price by 1.5 if demand is high
demand_score = test_data['Demand Score']
high_demand_idx = demand_score > demand_score.quantile(0.75)
y_pred[high_demand_idx] *= 1.5

Here’s how the forecasted ticket pricing looks beside the actual ticket price for 2022 with X seat availability.

The forecasted pricing is bound to have outliers that must be manually modified. Still, for the most part, we see the ticket price increase where seat availability is lower and sometimes also a decrease than the actual price when availability is low.

To see if this model has been a net good, we sum up the actual vs. forecasted ticket prices for 2022 and compare the results.

As the above image shows, using our model to forecast ticket pricing would’ve led Substack Airlines to make 13% more than they made in the actuals. The model can constantly be tweaked to produce a higher/lower result for ticket pricing and better impact the airline’s margins.

Let me know if you liked the new edition of Analytics Wisdom, and keep yourself updated with the next ones!