Cryptocurrency Sentiment Analysis and Market Correlations

Description

project was developed as part of the Text Mining and Information Retrieval coursework at RIT Croatia. The goal was to analyze how public sentiment, expressed through news articles, fluctuates during bull and bear markets in the cryptocurrency space — and whether these changes correlate with the market valuation of assets like Bitcoin and Ethereum.
Working with three large document collections (15k–46k articles), the analysis used a combination of sentiment scoring (TextBlob), Net Promoter Score (NPS) tracking, and price comparison to identify potential patterns across time.
Using LDA (Latent Dirichlet Allocation), the project extracted dominant discussion themes — from scams and ETFs to NFTs, El Salvador’s adoption of Bitcoin, and institutional crypto investments.
Sentiment shifts were tracked in three key time periods, revealing that:
In 2021, sentiment closely followed Bitcoin price spikes.
In late 2022, positive sentiment continued despite price stagnation, likely driven by excitement around NFTs and Web3.
In early 2024, positive sentiment spiked again, driven by optimism around Ethereum ETFs and institutional adoption.
The project also tested different IR models (BM25, TF-IDF, BM25F) to retrieve the most relevant “best” and “worst” investment-related news and compared their outputs using Spearman correlation and precision/recall metrics.
Special attention was paid to Ethereum’s rise, the FTX collapse, and speculative tokens — painting a broader picture of how narrative and price intertwine.
Ultimately, the findings show that sentiment analysis combined with IR and NLP techniques can capture hidden market signals and anticipate shifts in crypto perception, especially when zooming out across macro cycles.

Tools

The project was developed entirely in Python, using .json datasets processed with Whoosh for indexing.
Sentiment analysis was performed with TextBlob, and Net Promoter Scores (NPS) were calculated bi-weekly to track sentiment shifts.
For topic modeling, I applied Latent Dirichlet Allocation (LDA) to extract key themes.
To retrieve relevant news, I tested and compared TF-IDF, BM25, and BM25F models, evaluating performance with correlation and precision/recall.
Visualizations were built with Matplotlib, and the final results presented in PowerPoint