# Embracing Quantile Loss: A Superior Alternative to MAE
Written on
Chapter 1: Introduction to Loss Functions
As a data scientist who has gained insights from various online resources, I've noticed that discussions on alternative loss functions beyond MAE (Mean Absolute Error) and RMSE (Root Mean Square Error) are surprisingly limited. This article aims to provide a concise overview of the quantile loss function, its applications, and variations. The insights shared here will be beneficial for both those unfamiliar with quantile loss and those eager to delve deeper into its practical uses.
Regression and Loss Functions
Before we explore business applications, it's essential to understand the context. The quantile loss function is applicable in regression scenarios, which involve predicting continuous outcomes. For example, predicting a value that ranges from 0 to 100 is a classic regression task.
Common loss functions used in regression include:
- MAE: This function focuses on the median without considering directionality.
- RMSE: This function emphasizes larger errors, making it sensitive to outliers.
In practical terms, MAE is preferable when data is symmetrically distributed without outliers, whereas RMSE is suitable for datasets with significant outliers.
When to Utilize Quantile Loss
The term "quantile" can be understood as a fractional representation of percentiles. For instance, a quantile value of 0.80 implies that under-predictions incur a penalty of 0.80, while over-predictions are penalized at 0.20. This setup indicates that over-predictions are less detrimental than under-predictions, suggesting that we might overestimate outcomes 80% of the time.
This approach is particularly advantageous when the actual values tend to be higher than the median.
Now, let’s investigate the practical implications of quantile loss for business and academic scenarios. For instance, if we consider predicting a value within a 0-100 range, and the median is 50, but the majority of actual observations fall above 50 (let's say between 60 and 80), opting for a higher quantile alpha value is advisable. Starting with an alpha greater than 0.50 is recommended to ensure effective utilization of quantile loss rather than inadvertently employing MAE.
Use Case Analysis
Use Case #1: Estimating Long-Distance Flight Prices
In this scenario, we want to mitigate the impact of underestimations, so selecting a quantile above 0.50 is optimal—starting with 0.55 or 0.60 may yield beneficial results. It’s prudent to test 0.50 as a baseline for comparison. Given that flight prices generally skew higher, underestimating them would be less favorable, especially if historical data suggests prices are frequently closer to higher values (e.g., $200 instead of $10).
Use Case #2: Forecasting Rainfall in Arid Regions During Summer
In a dry area during summer, predicting rainfall may yield lower actual values compared to the maximum range, which might include thunderstorms. In this case, using an alpha of 0.45 or lower is advisable, as the data indicates that lower rainfall occurrences are more frequent, warranting a tendency to underpredict.
Conclusion
As highlighted, there isn't a universal solution for selecting loss functions; the choice largely hinges on:
- The nature of your data
- Its distribution
- The specific business context
- The implications of over- or under-prediction
I hope this article has provided you with valuable insights. I welcome your thoughts in the comments—do you believe one loss function is more beneficial than another? Are there other loss functions that deserve more attention? I aim to clarify these concepts further and illuminate the significance of loss functions in data science applications.
I am not associated with any companies mentioned.
For further reading, check out my profile, Matt Przybyla, and feel free to subscribe for updates on my articles. Connect with me on LinkedIn if you have any questions or feedback.
References
[1] Photo by Joseph Yip on Unsplash, (2021)
[2] Photo by Maxim Hopman on Unsplash, (2021)
[3] Photo by Edward Howell on Unsplash, (2020)
Chapter 2: Deep Dive into Loss Functions
This video explores why the Mean Squared Error (MSE) loss function is not typically used in classification tasks, offering insights into better alternatives.
This video discusses the differences between cost functions and loss functions in data science, specifically focusing on their roles in machine learning and regression scenarios.