Break Through Tech AI Studio | Host Company: Salesforce
Smart CRM Helper is an intelligent conversational AI assistant built for Salesforce sales teams to reduce manual CRM work, surface high-value opportunities, and enable data-driven decision making through predictive analytics and natural language interaction.
| Name | GitHub Handle | Contribution |
|---|---|---|
| Afifah Hadi | @hadiafifah | Data cleaning and preprocessing, feature engineering, lead scoring model, Gradio interface |
| Sai Wong | @cywlol | Sentence transformers, ChromaDB integration, Ollama model integration |
| Caitlyn Widjaja | @caitlynw5 | Opportunity win prediction modeling |
| Mariam Jammal | @mjamm-inc | Lead scoring model development |
| Anusri Nagarajan | @anusrinagarajan | Account health scoring model |
| Mya Barragan | @myabarragan | Opportunity win prediction modeling |
- Built three supervised machine learning models for lead scoring, opportunity win prediction, and account health scoring using Random Forest, Gradient Boosting, and ElasticNet.
- Achieved strong performance across tasks, including approximately 94 percent accuracy for lead scoring, approximately 96 percent accuracy and a 97.3 percent F1 score for opportunity win prediction, and an RΒ² of 0.9466 for account health scoring.
- Designed a conversational AI workflow using sentence transformers, ChromaDB, Ollama, and Gradio to enable natural language CRM queries.
- Delivered measurable business value by reducing manual data search time and improving sales prioritization and retention outcomes.
Since this is a private repository, you will clone it directly to your computer.
-
Open your terminal or command prompt.
-
Navigate to the folder where you want to store your project.
-
Clone the repository using the following command. Make sure to use the correct URL for your project.
git clone [https://github.com/ishween/Team2A.git](https://github.com/ishween/Team2A.git) -
Change into the project directory you just cloned.
cd Team2A
Python dependencies:
pip install pandas numpy matplotlib seaborn scikit-learn xgboost sentence-transformers chromadb jupyter notebook ipykernel gradio
Ollama installation
- Download Ollama from: https://ollama.com/download
- Pull model to be used:
ollama pull llama3
Datasets are inlcuded in the repo
data/processed
Start Jupyter Notebook
jupyter notebook
Execute the notebooks in order: account health, lead scoring, opportunity win, sentence transformer, and then agents.ipynb inside the agents folder
This project was completed as part of the Break Through Tech AI Studio, a workforce development program that partners students with industry companies to solve real business problems using machine learning and AI.
Host Company: Salesforce
Industry: Customer Relationship Management (CRM)
Sales teams spend a significant portion of their time manually searching CRM systems, prioritizing leads, and identifying at-risk accounts. This leads to lost opportunities, inefficient workflows, and reactive decision making.
Smart CRM Helper transforms Salesforce from a passive data repository into an active decision-support system by:
- Automatically scoring and ranking leads
- Predicting opportunity win probabilities
- Identifying at-risk and high-value accounts
- Allowing users to ask natural language questions such as:
Which healthcare accounts are likely to close this quarter?
https://www.kaggle.com/datasets/innocentmfa/crm-sales-opportunities
- Source: Salesforce CRM data
- Size: Approximately 8,800 sales opportunities
- Structure: Four merged tables including accounts, products, sales pipeline, and sales teams
Key preprocessing steps included:
- Handling missing values and duplicate records
- Feature engineering for engagement, recency, and performance metrics
- Standardization and encoding of categorical variables
- Correlation analysis to prevent multicollinearity and data leakage
- Revenue, recency, and engagement metrics showed the strongest correlation with account outcomes
- Severe class imbalance required careful metric selection and validation strategies
- Time-based splits were necessary to ensure temporal generalization
- Distribution of closed vs. non-closed deals highlights significant class imbalance, motivating the use of F1 score and recall-focused evaluation.
- Revenue scales non-linearly with company size, motivating log transformations for revenue and employee count.
- Strong correlations between revenue and employee-related features informed feature selection and regularization strategies.
- Pre-model feature correlations guided feature selection for the opportunity win prediction model.
Annotated visualizations, including feature importance plots and confusion matrices, are included within the notebooks.
- Lead Scoring: Random Forest was selected for its ability to capture non-linear interactions while maintaining strong precision.
- Opportunity Win Prediction: Gradient Boosting provided the best balance of accuracy, recall, and robustness.
- Account Health Scoring: ElasticNet handled correlated features and enabled interpretable weighting of business drivers.
- Stratified cross-validation for classification tasks
- 80/20 train-test splits
- Time-based validation for regression modeling
- Metrics included Accuracy, Precision, Recall, F1 Score, ROC AUC, and RΒ²
These results exceeded baseline expectations and demonstrated readiness for real-world deployment.
lead_scoring.ipynb: Feature engineering and Random Forest training pipelineopportunity_win.ipynb: Gradient Boosting model with leakage preventionaccount_health.ipynb: Synthetic target construction and ElasticNet regressionsentence_transformer.ipynb: Embedding generation and ChromaDB vector storageagents.ipynb: Agentic routing and Gradio conversational interface
What worked well:
- Clear separation of predictive tasks
- Strong alignment between business objectives and model outputs
- Effective integration of NLP with traditional machine learning
Challenges:
- Lack of ground truth labels for account health required synthetic target design
- Class imbalance demanded careful evaluation strategy
- Deployment constraints limited real-time integration
- Add richer explanatory visualizations and outputs
- Develop an interactive dashboard or web application
This project was completed for educational purposes as part of Break Through Tech AI Studio. No open-source license is currently applied.
We thank Ishween Kaur, Challenge Advisor, and Leah Dsouza, AI Studio Coach, for their guidance and support throughout this project.

