Wednesday, December 27, 2023
This Thesis will attempt to figure out if it is possible for an individual to outperform well-known stock index funds by designing a machine learning model based on historical stock market data and financial models like the Fama-French model, that can construct a diversified portfolio and accurately predict future values like the RSI indicator in order to trim the loosing stocks from the portfolio. The data used will be based on the US stock market in the period between 2014 and 2024. The source for that data will be Yahoo Finance, and the method to access it will be the yfinance library for Python.
To answer the first question concerning the feasibility of constructing a well performing machine learning model, the plan proposes a systematic approach. A simple machine learning model or a set of them will be built and rigorously tested against an actual index fund like the Fidelity 500 Index Fund (FXAIX), the Vanguard 500 Index Fund Admiral Shares (VFIAX), Shelton Sustainable Equity Fund (NEXIX) or the State Street S&P 500 Index Fund Class N (SVSPX).
A possible approach to this comes in the form of a Long Short-Term Memory (LSTM) Neural Network. This neural network will analyze historical stock prices directly by employing a simple time series analysis to attempt to predict the price of a stock for the next 5 days.
After this, the study will use K-Means clustering as an instrument to create a diversified portfolio optimized for maximizing returns without incurring unwarranted risk. The K-means clusters will leverage a plethora of financial instruments to be described on a latter section of the thesis to come up with ‘buy’ signals on a monthly basis or the stocks that will ultimately make up the portfolio.
Lastly, an LSTM will be used on the stocks selected for the portfolio to perform a time series analysis on the RSI and ATR values of the selected stocks in an effort to filter out stocks that may impact the returns of the portfolio negatively.