Data Science & AI Undergraduate | Generative AI & Data Engineering Intern π BrasΓlia, Brazil | π§ | LinkedIn
I build production-ready AI and data engineering solutions, from scalable LLM applications and retrieval-augmented generation (RAG) pipelines to robust ETL workflows with SAP and modern data stacks. I transform complex data challenges into efficient systems that drive business impact.
LLM-powered PDF parsing β’ Streamlit dashboard β’ SQLite persistence β’ Batch processing
Python LLM Streamlit Polars SQLite PDF Processing
- Engineered a modular system for large-scale legal PDF parsing and structured data extraction
- Optimized batch processing to handle documents up to 200MB with minimal memory footprint
- Implemented schema validation and auto-repair mechanisms to ensure data integrity
- Delivered a production-ready UI enabling real-time analytics and manual review workflows
RAG pipeline β’ Domain-specific intent detection β’ Streamlit UI
RAG Sentence-Transformers Streamlit FAISS SAP BODS
- Developed a custom chatbot tailored for SAP BODS users, overcoming generic LLM limitations
- Integrated semantic search and intent classification for precise, domain-aware support
- Enabled fluent, natural Portuguese interactions aligned with SAP ecosystem documentation
XGBoost β’ Model Explainability β’ Streamlit UI
XGBoost SHAP Streamlit Scikit-learn Pandas
- Delivered highly accurate delivery time predictions using structured business features
- Applied SHAP for interpretable models, providing actionable insights to stakeholders
- Built a user-friendly interface for real-time predictions and scenario analysis
Sentiment Analysis β’ Text Generation β’ Multi-modal Applications
Gemini AI Polars Sentiment Analysis Text Generation
- Built a sentiment-aware Food Review Reply Bot generating empathetic responses to customer feedback
- Created a persuasive Dish Description Generator to boost engagement in food delivery apps
- Designed efficient data pipelines with Polars for scalable multi-modal applications
Hybrid Neural Network β’ Collaborative Filtering β’ Deep Learning
PyTorch Neural Networks Collaborative Filtering Hugging Face
- Designed a hybrid recommendation engine combining deep learning with collaborative filtering
- Developed scalable workflows in PyTorch for personalized book suggestions
- Documented the entire process through a comprehensive Medium tutorial
Interactive Dashboards β’ Healthcare Analytics β’ Automated Reporting
PyGWalker Polars Streamlit Plotly Gradio
- Delivered instant exploratory data analysis with an interactive Fast EDA dashboard
- Developed a visual analytics platform for Brazilian live births (SINASC 2023 data)
- Authored an interactive tutorial for PyGWalker enabling fast, code-light EDA
Programming & Query Languages:
Python SQL C Bash
AI & Machine Learning:
PyTorch Transformers Hugging Face XGBoost Scikit-learn LLMs RAG Embeddings
Data Engineering & Processing:
SAP Data Services (BODS) Pentaho Polars Pandas SQL Databases ETL/ELT
Data Visualization & Apps:
Streamlit Gradio Plotly PyGWalker Power BI
Tools & Platforms:
Git Docker Linux Hugging Face Spaces Google Colab
Certifications:
Oracle Generative AI Professional
Oracle Data Science Professional
Data Migration Intern β First Decision (Apr 2025 β Present)
- Delivered complex ETL migrations for enterprise clients, handling end-to-end data workflows
- Developed and optimized pipelines leveraging SAP BODS, Pentaho, and SQL databases
- Gained hands-on experience managing data migrations for large-scale market players
B.Sc. Data Science & Artificial Intelligence β IESB (2023β2026) Core Courses: Machine Learning, Data Mining, Big Data, Statistical Modeling, Deep Learning