In order to see what the driving force of price is for the secondary market, I modeled the effect of the whiskey's age, the proof, whether it was an original bottling or not (Cadenhead bottling Heaven Hill, for example), year of auction, and the distiller on the final sale price (I tried to model cask type as well, but there just wasn't enough clean information). There are about a billion methods to do what I'm going to do, however, I chose a RandomForest because trees are inherently easier to understand for non-data whiskey lovers than more mathematically complex models (http://www.r2d3.us/visual-intro-to-machine-learning-part-1/ link for non-data people). Using a basic RandomForest model, one can easily interpret the importance each variable has on predicting the final sale price. (NOTE: this does not take into account bidding wars, bidding strategies, or number of bids per whiskey - I'll save that for another time)
# define our RandomForest
rf = RandomForestRegressor()
# distill (pun intended) our features from the whiskey data and make
# brand and auction year categorical
features = pd.get_dummies(data)
# fit our random forest to the dataset
# grab the feature importances from the rf
importances = rf.feature_importances_
# sort which ones are the most important [0 -> most important]
indices = np.argsort(importances)
# take the top 30 most important and make most important at 0 index
indices = indices[-30:]
# remove pandas string concatenations from data for plotting
features_names = [x.replace('distillery_cleaned_','') for x in features.columns[1:].values]
# plot title
# plot horizontal bar
plt.barh(range(len(indices)), importances[indices], color='b', align='center')
# plot the importances
# label x axis
# plot it
Cool, so what does the plot below mean? First, age is the most important variable (duh). And in no particular order are the rest. Alcohol by volume: people will pay more for cask strength and higher proof whiskies. Original bottlings matter: people will pay more for an original bottling compared to that of the same spirit bottled independently (Cadenhead, Signatory, Willett, etc.). Which presents one heck of an opportunity to buy the same distillate at a discounted rate. Next is the year at which the whiskey is sold - I find this interesting; it's basic price bubble capitalization from a supply/demand perspective, and lastly, at no surprise to anyone, is brand power. All of this is pretty much confirming what we already believed: in the context of marketing and demand, 2014 onwards have benefited from increased interest in whiskey where the demand for certain brands has a direct relationship with price. People are wanting higher proof distillate from specific brands and will pay more if it is bottled by the particular distillery.
Somewhat related, I tried to use barrel-type as a variable, but there just wasn't enough clean information to incorporate into the over-arching model. However, what information that did exist basically says that scotches aged in Pedro Ximenez barrels are more evenly distributed with respect to price than bourbon or oloroso barrels. Considering my love for all things Pedro Ximenez, I found this pretty interesting.
So in the end, what does this all mean? Well, basically, Age + ABV + OB + Brand + Timing = Price. Go figure. The alternate take is that outside the top brands, there exist many lower cost (and better tasting) options for cask-strength, high aged, originally bottled spirit. Now most people have never tried the wealth of distilleries in existence, and that includes the big players - the Karuizawa's, 80s Macallans, or Stitzel-Wellers, so maybe next time, rather than wasting your time hunting down a bottle of this stuff (or any in-demand juice), try 20 different whiskies for the same price. I personally think that it can be a lot less of a risk than it may seem to buy a lesser known brand than to spend $1500 (or even whatever it goes for at MSRP) on 23 year old Pappy Van OakJuice.
All in all, when you have increased demand, whiskey hoarders, 'dwindling supply', advertising - sometimes in the form of bought and paid for 'reviewers' all working at the behest of large brands, it's easy to get caught up in thinking that these bottles of liquid are worth the money others are paying for them. However, you're a smart person, you're informed, you're not going to get caught up in the brand hype. In fact, you're going to set out and do more blind taste tests to truly see if all this crap is just hype or actually worth it to your wallet.