Why Data Matters When Building Quantitative Trading Models

The automation of the global financial market has opened vast opportunities to traders and investors. In the past, the stocks exchanges were noisy places with traders receiving orders and reacting to good and bad orders. Today, in major trading floors, the noise has been replaced with traders staring at computers. In addition, the number of people in the floors has reduced significantly. In fact, most of the world’s largest trading companies have downsized their trading employees. A good example is Citi which is winding down its trading business. People are today being replaced by machines. Algorithmic trading, also known as quantitative trading is taking over. Quantitative trading is however not new. However, in the past, there were two major challenges. One, for one to be a quant trader needed to have a good background in statistics. In fact, most companies which offered quant trading employed PhD holders. A good example of this is Long Term Capital Management (LTCM), a once flourishing hedge fund whose principals were all recipients of Nobel prices in economics. The second main barrier that existed back then is that the technology was not affordable to small traders. These challenges have since been solved. Today, you don’t need a background in statistics or physics. In fact, some of the best quant traders today don’t have this background. Secondly, the technology to help you develop quant models is readily available. For instance, brokerage company FxPro offers a free platform to develop quantitative models by just dragging and dropping.

Why data matters

In quant trading, data is one of the most important parameter that must be gotten right. In fact, it has been argued that data is the backbone of any quantitative trading system. It’s the engine that powers any system. If a single digit or decimal point is left out when developing the system, chances of losing your investment are very high. There are two main types of data when developing algorithms. These are: price data and fundamental data. Price data includes a number of parameters such as the price of the asset, trading volumes of assets, size of the trade, and the information derived from transactions among others. In simple terms, price data refers to the entire order book which shows a continuous series of all bids and offers of an asset. On the other hand, fundamental data are more complicated and refer to a number of data types that are difficult to categorize. They refer to any other data that is entered that is not related to the price of asset. Some of the good types of fundamental data are: price to book ratio, financial performance, and sentiment among others. Macroeconomic data such as inflation and interest rates can also be said to be fundamental data. To know how to use the data, one needs to understand where to get the data from. In quant trading and high frequency trading, the accuracy of the data must be accompanied by the timely delivery of the data. A microsecond in the financial market can mean huge losses. There are many sources of data which include: regulators (filings relating to large owners), government agencies (mostly for fundamental data), news agencies (such as Bloomberg), proprietary data vendors (such as Markit), and corporations. After getting the data, a common problem faced by many quantitative traders is on cleaning the data. This is a common problem that has led to the downfall of many quant traders. A common problem with quants is missing data especially where the data is not supplied at the given time by the data supplier. This can be solved by building a system that understands when the data is missing. This system will not take irrational decisions that can lead to significant losses. Another problem is what we call look-ahead bias. This is when you assume that you could have known something before it was possible to know it. As stated before, data is the machine that moves quant systems. Hedge funds such as Renaissance technologies and Citadel have for years made more than 20% returns using quantitative systems. The LTCM mentioned above is a good example of what not to do when using quant systems. The fund almost lost 100% of its capital as a result of poor data sets combinations. Therefore, you should carefully take your time when developing your system. You should back test and forward test the system to ensure that everything is right.