Alternative data includes any information containing potential investing insight that is not a traditional data source. A company’s investor relations department publishes this information, including financial statements, SEC filings, press releases, and marketing presentations.
While traditional data will always have its place in investing, alternative data is growing in popularity, and it’s something every trader and investor needs to understand.
What is Alternative Data?
To understand alternative data, it helps first to understand what it is not. Alternative data is the “alternative” to information gathered from traditional data sources. Before we go any further, let’s take a moment to look at what we mean by traditional data sources.
Traditional data sources come from within a company. Specifically, a company’s investor relations department publishes this information, including financial statements, SEC filings, press releases, and marketing presentations. This information provides data on securities issued by the company.
In contrast, alternative data comes from external sources. As you can likely imagine, any data not coming directly from the company issuing the security is an extensive and diverse data set.
Some examples of alternative data include employment or weather data, imagery from satellites, app usage, and cell phone data. Alternative data also includes market sentiment data gathered from a wide variety of sources. For example, one source of alternative data may look at what words are being used on social media to understand a specific stock’s current market sentiment.
In recent years, alternative data has become increasingly popular. We can see the recent explosion in demand for alternative data in the dramatic increase of providers that collect and analyze alternative data. Between 2008 and 2018, the number of alternative data providers went from around 100 to almost 450.
The increased popularity of alternative data is due to a variety of reasons:
- Alternative data often provides real-time data, which is especially beneficial in chaotic or fast-moving markets.
- Sources of alternative data are growing exponentially, enabling researchers to find new sources of alpha.
- Improved computer capabilities allow for a more straightforward analysis of alternative data.
A decade ago, hedge funds ruled the alternative data world. Here’s why.
[Hedge funds/blog/hedge-fund) typically do not follow a buy-and-hold approach. They try to find an edge in the market and use it to profit. In this environment, the real-time information alternative data provides has massive potential to increase profits. A decade ago, alternative data was prohibitively expensive, but this is starting to change.
Today, alternative data has become easier to store and access, thanks to lower costs and more companies that can analyze the data. This has led to its acceptance in many areas of finance beyond hedge funds.
Specifically, alternative data has begun to be adopted in the fields of private equity, valuations, credit underwritings, and insurance, to name a few. While all of these fields continue to use traditional data, they’ve all begun incorporating alternative data to provide a more holistic picture.
Alternative data is also becoming more commonplace in decision-making at major corporations, such as Google and Facebook. These firms are trendsetters in many other respects, making it seem likely that the use of alternative data will trickle down to smaller tech firms and should eventually even become standard across industries.
Ninety percent of the current data in the world was generated in the previous two years.
While each type of data generation process is a bit different, in general, there are three ways to generate alternative data:
As individuals, we’re continually creating data. When you search Google, write a product review, buy a product, post on social media, like someone else’s post, etc., you’re creating alternative data.
The problem with alternative data generated by individuals is that it’s often challenging to gather and analyze.
Alternative data generated by businesses is also known as “exhaust data” because it is generally a derivative of other business processes. This type of data differs from traditional data because while traditional data is gathered from the company itself, exhaust data is generated by the business but gathered and disseminated by others.
For example, credit card transactions or data from various government agencies may be examples of alternative data generated by a business.
Compared to alternative data produced by individuals, exhaust data is usually much more structured, which makes it easier to gather and analyze.
Machines regularly send signals from one device to another. The capture of these signals is what we mean by sensor-generated data. Examples of alternative data captured by sensors include point-of-sale systems, satellite images, shipping data, and your house’s thermostat, to name a few.
Sensor data is easy to gather but typically involves a conversion process to analyze.
Like all information, alternative data gets its value depending upon how and when it’s used; and often, the value is much more subjective.
For example, when a company releases its sales numbers from the previous quarter (a form of traditional data), traders almost always have clear -insights. The company either met expectations or didn’t, and how much it missed or exceeded expectations is quantifiable.
On the other hand, the number of people who make comments regarding a specific brand on social media is harder to translate into usable data for trading. Think about the negativity from the infamous Peloton commercial. Every social media was in outrage, but what happened to the stock price?
Analyze the following chart of Peloton over the December timeframe and through the COVID pandemic. It was one of the best-performing stocks. Did such a negative backlash boost Pelaton’s visibility?
One could argue that the sentiment data was valuable for a shorter-term timeframe, but that’s the point — the value of alternative data is more subjective.
To help mitigate the lack of context, combine alternative data with other data sets, either traditional or other alternative data, and use price as a confirmation signal. By looking at multiple sources of data, the signals provided often become more robust.
For example, cell phone data can tell us where people are. If they’re going into certain stores, this is likely to impact the sales of that store positively. Therefore, we could potentially use cell phone data to help inform sales expectations and see if the alternative data’s expectations align with the anticipated sales in the traditional data.
But while combining some data sets may provide a more comprehensive perspective and stronger signals, more data is not necessarily better.
The amount of alternative data that exists is massive, and you can’t use all of it. As with any data, you must search through the existing options and find which data provides the most value for you.
To use alternative data effectively, you’ll need to ask yourself a few questions. The following five questions can also act as steps to help you through your use of alternative data.
- What is the question or problem you’re looking for the data to help you solve?
- How can you translate that question or problem into parameters that data can help you solve?
- Where can you find this data, using either traditional or alternative data sources?
- How can you gather this data from your source?
- What other data sources can you use to back up your findings?
This process can help you gain insights from various types of data relating to a specific industry, region, key companies, etc.
We’ve gone over the basic steps of gaining useful insights from alternative data but putting these steps into practice requires understanding the various types of alternative data, which is why we’ll now look at some of the most common types of alternative data.
You’ll likely notice that many of the data types have some overlap as you read through the list. For example, both product reviews and social media posts contain sentiment data. As we’ve already discussed, this unstructured, overlapping quality of alternative data often makes it challenging to analyze.
Web site usage includes how long people spend on a website and their actions on a site, or the volume of queries such as Google Trend data.
Facemasks was a pretty clear information arbitrage play for those who were paying attention.
Social sentiment gauges how a person or a group of people feel about a topic. Data may come from various sources, including social media posts, news, videos, or online interactions on social media, such as retweets on Twitter.
Social sentiment is often a leading indicator of future sales performance. Chris Camillo, Dave Hanson, and Jordan McClain, collectively known as #dumbmoney, turned thirty thousand into thirty million using nothing more than Twitter.
Data gathered from GPS, Wi-Fi, or Bluetooth signals on mobile phones or other electronic devices. This data can provide insight into consumers’ movements – what stores do they go into, how much time do they spend in different stores, where is more or less foot traffic occurring?
Credit card transaction data is one of the most structured and valuable types of alternative data, but it’s also one of the most expensive to gather. This data tracks retail and business spending and is highly predictive; however, it often lags behind valid social sentiment signals.
Point of Sale (POS) systems track transactions. This data can provide information on sales and the popularity of products, price trends, sales volume, etc.
One common form of alternative data with many potential uses is satellite imagery. How satellite imagery is used depends on the type of satellite imagery, which includes optical and infrared. Optical satellite imagery allows you to see objects on the ground, such as cars in a parking lot or ships at a port. Infrared satellite imagery will enable you to see lights, pollution, and/or particulate matter in the air.
Weather data is collected by a variety of different types of sensors. This data has many uses but is especially useful when applied to agricultural production or other commodities reliant on the weather.
Mobile apps collect a wide range of data. This data can tell us a lot about consumer engagement, sentiment, sales, etc.
Lemonade is a unique insurance company that is trying to flip the traditional insurance business model upside down. They are almost entirely automated using AI and analyze data such as login time, claim text, and others to better identify their insurance risk.
Product reviews, which some would call another type of social sentiment data, can provide valuable insights into what people say about a product, service, or company in real-time. This type of data is often considered along with or is a component of the market sentiment data.
Aterian is an exciting company utilizing this information. They are an acquisitive company that uses social data and other analytics to anticipate future customer demands and develop or acquire products to fit those needs.
Shipping Container Receipts
Container and other shipping data can come in months ahead of traditional sales data. This is one reason why Charles Dow used the transportation index as a leading indicator in Dow Theory.
A quick note before moving on – while these are some of the most common types of alternative data, it is far from an exhaustive list. The whole point of alternative data is that it uses information not typically considered to gain insight, which means almost anything could be valuable alternative data.
The Growth of Alternative Data
Earlier in this post, we touched on that alternative data is growing at a breakneck speed, but it’s worth diving into and discussing further. In 2016, buy-side firms (those who buy assets for themselves or clients – such as hedge funds, mutual funds, and private-equity firms) spent $232 million on acquiring alternative data.
By 2019 the alternative data market was valued at $1.06 billion and expected to grow at a compound annual growth rate of 40% between 2020 and 2027.
This growth may seem extreme, but it makes sense. The amount of data in existence has increased exponentially over the last few years. And while alternative data will undoubtedly evolve as the types of data and gathering options grow in response to advancing technology, alternative data’s value seems likely only to increase.
What does this growth mean for retail investors?
First of all, alternative data is not a fad. While traditional data will always have its place, it seems likely that alternative data will only become more and more mainstream. It’s, therefore, worth taking the time to understand how this data impacts trading.
Additionally, the rapid growth of the alternative data market has opened it up to retail investors. While certain types of alternative data may remain prohibitively expensive for retail investors, many varieties have become more accessible. And this accessibility comes at an excellent time.
The benefits of alternative data have become more evident and more extreme with increased uncertainty due to COVID-19.
The pandemic and rolling lockdowns highlighted the value of alternative data. Analysts using traffic data and other alternative sources saw that oil was heading lower. And while no one could predict futures would turn negative for the first time in history, first movers with this data profited mightily.
And just because COVID-19 has acted as a catalyst for increased dependence on alternative data does not mean that alternative data is likely to go anywhere after the pandemic ends. That’s because the benefits of alternative data, though exacerbated during chaotic times, such as 2020, apply during more stable environments as well.
Regardless of the state of the economy, real-time information can be beneficial. Much of the traditional data sources, as previously mentioned, consist of historical data. While there is value in this data, if you’re looking to gain an edge, the more recent the data, the better.
Alternative data can also save time and money.
Previously, individuals may have searched through data looking for patterns, but alternative data vendors specialize in sifting through this information and providing insights far more quickly and efficiently to subscribers.
Additionally, alternative data can help inspire new investment ideas, make a company’s performance more transparent, and help both companies and traders gain a competitive edge.
Alternative data has many benefits, but it’s not without its challenges, as well. Here are three of the biggest challenges of working with alternative data according to TransUnion.
There are legal and regulatory issues that may arise when working with alternative data. The degree to which this is an issue will depend, in part, on the type of data used. Thankfully, there’s a reasonably simple solution, and that’s working with a third-party partner who has experience in alternative data.
Integrating alternative data into your trading requires developing and testing risk models. If this is not handled thoughtfully, the value of the data is diminished.
The challenge of gaining insights from alternative data is threefold.
First, you’ll need to figure out if your systems can store and process the information. You’ll want to decide if the data is too niche or if others are using it so widely there is little alpha left.
Secondly, you must format the data appropriately and separate the data that genuinely provides a signal from that which is either useless or may have value. Python and Pandas are excellent, freely available tools to do this.
Finally, you must decide the relevance of the data to your investment process. For example, you may have data that implies a particular conclusion, but using this data may be incredibly risky without sufficient statistical analysis and backtesting.
This leads us to the best practices for using alternative data in trading. As we’ve discussed, the value in alternative data lies in how you use it. The data is almost always messy, unstructured, hard to gather, requires statistical analysis, backtesting, etc.
TV shows based in the world of finance love to show a genius trader coming up with brilliant applications of alternative data (such as Taylor on Showtime’s Billions). Still, in the real world, these insights are much harder to come by.
The very best way to learn the best practices for analyzing alternative data is to watch other experts do it.
Kaggle.com, a subsidiary of Google, has over 400,000 public notebooks analyzing various datasets. I think if I had to give out one tool to learn how to analyze alternative data, Kaggle would be the #1 spot.
There’s a great getting started post showing the best Kaggle competitions for beginners. If you venture down this road, it’s one of the best places to start.
How to Obtain Alternative Data
We’ve covered many of the critical elements of alternative data and how to use it, but the one area we have not yet touched on is how to get alternative data.
There are three typical ways to obtain alternative data, and we’ll quickly look at each of them.
- Web Scraping
- Third-Party Licensing
Web scraping, also known as web harvesting, is done by computer programmers. These programmers create algorithms that then search the web for specific types of data. If you’re a computer programmer, this is a great way to obtain alternative data, and if you are not a computer programmer, you’ll need to get alternative data through one of the other two options.
For those interested in learning how to scrape the web, I demonstrate how to build an S&P 500 component list with Python. There are also a lot of great resources for programmers looking to do the same.
Acquiring raw data has its benefits, but alternative data is typically messy and unstructured, as we’ve discussed. This makes it difficult for the average person or company to turn this raw data into usable insights.
Third-party licensing is typically the most expensive of these three options, but it also provides the most usable insights. If you’re not a computer programmer or experienced in data analysis, this is likely your best option for obtaining alternative data.
Below is a non-exhaustive list of alternative data sources and marketplaces used frequently to gain an alternative edge.
Alternative Data Websites
|Alternative Data Source||Description|
|Ahrefs||Search and backlink data|
|App Anne||Mobile app data|
|Built With||Determine the technology stack of a website|
|CarSalesBase||Global automotive data|
|City-data||City profile information|
|Clinicaltrials.gov||Clinical trial data|
|Comparably||Compare brands and salaries|
|Craft||Company data and analytics|
|Glassdoor||Company review and salary data|
|Google Keyword Tool||Query pay-per-click data|
|Google Trends||Search data|
|Jungle Scout||Amazon seller shipment information|
|Kaggle||Multiple datasets, and how to analyze them|
|Linkup||Job market data|
|Open Table||Restaurant Booking|
|Parrotanalytics||Subscription video on demand (SVOD) data|
|Placer AI||Foot traffic data|
|Tipranks||Analyst and financial blogger recommendations|
Alternative Data Marketplaces
|Alternative Data Marketplace||Description|
|InfoTrie||Low-latency alternative data API feed|
|Quandl||General data marketplace|
|QueXopa||South American alternative data provider|
|S&P Marketplace||General marketplace|
|Yewno||Knowledge graph data inference|
The Bottom Line
Alternative data is such a broad and varied topic that concluding with a single takeaway is almost impossible. Alternative data is new, proliferating, and more accessible, but this doesn’t mean that alternative data eliminates the need for traditional data that doesn’t have its challenges.
What I can say – just like any other tool in your trading toolbox, alternative data is all about how you use it. I will also tell you that some of the alpha in my algorithmic trading strategies are due to alternative data.