Inspiration
Language changes meaning depending on the domain in which it is used. Words that appear positive in everyday language may actually signal negative outcomes in fields like finance or politics. For example, terms like “inflation rise” or “rate hikes” can carry very different implications depending on the context.
This inspired me to build a system that performs domain-aware sentiment analysis instead of relying on generic sentiment models. The goal was to correctly interpret text within its specific domain and then convert that understanding into actionable insights for financial markets. By connecting sentiment from multiple domains to market predictions, the system could help identify whether news might lead to a positive, negative, or neutral market reaction.
What it does
LexiSnap is a domain-aware sentiment analysis platform that converts text into actionable market insights. The application allows users to input text from one of four domains—Finance & Business, Law & Government, Politics, or Technology—and then performs domain-specific sentiment analysis using transformer-based models trained to understand contextual language within each field. After generating a sentiment score tailored to the selected domain, the system feeds this signal into an LSTM model that predicts the likely market reaction, classifying it as positive, negative, or neutral. The results are then presented through an interactive interface, allowing users to quickly understand how news or statements from different domains could potentially influence market behavior.
How we built it
LexiSnap was built using a combination of natural language processing, deep learning, and an interactive web interface. The system was developed primarily in Python, with PyTorch used to build and train the deep learning models. It leverages a transformer-based architecture to perform domain-aware sentiment analysis, supported by custom word embeddings that help the model better understand contextual language differences across domains. After generating sentiment signals, these outputs are fed into an LSTM neural network to predict potential market movement categories such as positive, negative, or neutral. The user interface was built using Streamlit, allowing users to easily interact with the model and visualize results. Rather than relying solely on pretrained sentiment models, the project focuses on experimenting with custom transformer-based sentiment analysis tailored to specific domains, enabling more accurate interpretation of domain-specific language.
Domain-Specific Sentiment Analyzer I implemented a shared RoBERTa encoder with per-domain adapters and heads. The text is first processed by the shared encoder, producing a CLS token embedding. This embedding is passed through a domain adapter (Pfeiffer bottleneck: LayerNorm → down-projection → GELU → up-projection + residual connection) specific to the chosen domain (finance, tech, law, politics, business). The output is then fed into a sentiment head that predicts a sentiment score (0–1) and classifies it as negative, neutral, or positive. At inference time, only the adapter and head corresponding to the selected domain are activated, enabling domain-specific sentiment analysis.
Multi-Head LSTM Market Predictor This component predicts ETF price direction by combining sentiment scores with market technical indicators. The inputs consist of a sentiment score ranging from 0 to 1, a category embedding (0–4) representing the domain, and a 31-dimensional feature vector that includes price, RSI, MACD, Bollinger Bands, and volume ratios for ETFs SPY, QQQ, TLT, GLD, USO, and UVXY. The category embedding is concatenated with the scaled feature vector and passed through a 2-layer LSTM, followed by a shared trunk that feeds four separate heads, one for each ETF (SPY, QQQ, TLT, GLD). Each head outputs three logits Down, Neutral, and Up which are converted to probabilities using softmax. This architecture enables simultaneous multi-ETF directional predictions while effectively integrating both sentiment and technical indicators.
Challenges we ran into
One of the biggest challenges was working with computational resources. Training transformer-based models locally was difficult, so I had to transition to Google Colab GPUs, which was my first time working with GPU-based model training.
Another challenge was data collection. Gathering high-quality, domain-specific text data required scraping and filtering large amounts of information from multiple sources to create useful datasets for sentiment analysis.
A further challenge involved the market prediction model. News data is not generated at consistent intervals, which makes it difficult to train time-series models like LSTMs that typically expect continuous data streams.
Accomplishments that we're proud of
One of the accomplishments I’m most proud of is successfully building a working end-to-end prototype. The system is capable of performing domain-specific sentiment analysis, processing user inputs across multiple domains, and converting the resulting sentiment scores into market insight predictions. It also presents the results through a functional and interactive application interface, making the insights easy to interpret. Additionally, developing a custom domain-aware sentiment pipeline, rather than relying entirely on pretrained models, was a significant milestone and demonstrated the potential for more accurate sentiment interpretation across different domains.
What we learned
Through this project, I learned several important technical skills, including training and experimenting with deep learning models in PyTorch, working with GPU-based training environments, designing domain-specific word embeddings and transformer architectures, and understanding the relationship between news sentiment and financial markets. Beyond these technical skills, I also gained valuable experience in building a complete machine learning pipeline, encompassing everything from data collection and preprocessing to model training, evaluation, and deployment.
What's next for
Looking ahead, the next steps for LexiSnap involve expanding its capabilities beyond the current MVP. Future development includes predicting actual market price movements, integrating real-time news streams and financial data, and incorporating additional domains and richer datasets to improve accuracy. Enhancing the prediction models with more advanced time-series techniques and refining the domain-specific sentiment analysis will allow the system to provide even more actionable and precise market insights, moving closer to a fully robust tool for anticipating market trends.


Log in or sign up for Devpost to join the conversation.