Forecasting stock market behavior is a challenging task due to its dynamic and volatile nature. Traditional models like ARIMA are useful for identifying trends and making forecasts, but they often fail to predict sudden market movements triggered by external factors like news and investor sentiment. To overcome this, I developed a hybrid forecasting model that integrates sentiment analysis with time series forecasting to improve the predictive accuracy for major financial indices.
This article provides an in-depth walkthrough of the problem, the methods I used, and the technical implementation of the project. It discusses the use of classical ARIMA models, neural networks, and sentiment scores derived from financial news to achieve better forecasting accuracy. If you're interested in predictive analytics or financial data science, this project offers practical insights on how to integrate sentiment into your models.
Goal: Forecast key European stock indices such as DE40, UK100, and EU50 using time series models.
Challenge: Standard models like ARIMA struggle to account for external shocks, such as major financial news or geopolitical events, which lead to sudden market shifts.
Solution: Integrate sentiment analysis from financial news with time series forecasting to improve prediction accuracy during volatile periods.
To address this problem, I built a hybrid model that combines ARIMA, neural networks, and sentiment analysis from news articles. The goal was to see if sentiment scores could provide better insights into market trends and enhance predictive performance.
Historical stock prices and log returns were collected for indices like DE40, UK100, IT40, and other major indices.
Tools: Pandas for data manipulation and TradingView API for data collection.
Here’s an example of how I prepared the data from the stocks_data.ipynb
file:
import pandas as pd
import numpy as np
df = pd.read_csv("daily_data.csv")
df['log_returns'] = df['Close'].pct_change().apply(lambda x: np.log(1 + x))
print(df.head())
Financial news articles were scraped from Reuters using the Data Collection Script - REUTERS.py
file.
Sentiment scores were calculated for each news article using the Sentiment Score Script.py
.
Relevance scores were also calculated using the Relevance Score Script.py
to ensure that only impactful news influenced the forecasts.
Example of sentiment scoring from the script:
from textblob import TextBlob
def get_sentiment_score(text):
return TextBlob(text).sentiment.polarity
df['sentiment_score'] = df['news_article'].apply(get_sentiment_score)
Both stock and sentiment data were merged to create a comprehensive dataset where each row contained stock returns and the associated sentiment score for that day.
ARIMA (AutoRegressive Integrated Moving Average) is a popular method for forecasting time series. It identifies patterns based on three parameters (p, d, q) to model autoregression, differencing, and moving averages. I used the AutoARIMA method to automate the selection of the best parameters.
Example implementation of AutoARIMA from ARIMA.ipynb
:
from pmdarima import auto_arima
import matplotlib.pyplot as plt
model = auto_arima(df['log_returns'].dropna(), seasonal=False, stepwise=True)
forecast = model.predict(n_periods=30)
plt.plot(df['log_returns'], label='Actual')
plt.plot(forecast, label='Predicted')
plt.legend()
plt.show()
The ARIMA model performed well under normal market conditions but struggled to handle sudden changes in stock prices caused by financial news.
To overcome ARIMA’s limitations, I created a neural network that combines both log returns and sentiment scores as input features. The neural network learns from past price trends as well as the mood of the market derived from sentiment analysis.
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
features = scaler.fit_transform(df[['log_returns', 'sentiment_score']].dropna())
X = features[:-1] # Input features
y = features[1:, 0] # Target: next log return
NN with Sentiment.py
)from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=2))
model.add(Dense(32, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=20, batch_size=32)
The full source code, datasets, and scripts for this project are available on my GitHub. You can find the following files in the repository:
ADF_test.ipynb
ARIMA.ipynb
NN_with_Sentiment.py
NN_without_Sentiment.py
Data_Collection.py
(for Reuters news scraping)Integrating sentiment analysis into financial forecasting models enhances accuracy and enables models to adapt to market changes caused by news and events. By combining classical ARIMA with neural networks and sentiment analysis, this project demonstrates a robust approach to stock prediction. Future work could explore the use of transformer models like BERT for sentiment extraction and consider macroeconomic indicators as additional input features.
If you found this project interesting, feel free to check out the code and leave comments on GitHub. Also for a more comprehensive analysis, you can access the full PDF report here .I'd love to hear your feedback!