Vineeth Guptha | Data Scientist/ ML Engineer

Hi, I'm Vineeth Guptha

Self-driven, quick Learner, passionate about solving real world problems using Data Science.

Click to Scroll Down

About

I am a Data Science & Machine Learning Engineer with 5 years of experience and 3 published research papers solving business problems by building scalable end-to-end Machine Learning pipelines.

I have done my Masters in Data Science at the University of San Francisco and have done my Bachelors from Indian Institute of Technology, Madras (IITM).

Languages: Python, C, HTML, Bash
Databases: PostgreSQL, MongoDB
Libraries: NumPy, Pandas, OpenCV, Spacy, Huggingface
Frameworks: Flask, Keras, TensorFlow, PyTorch, LangChain
Tools & Technologies: Git, Docker, AWS, GCP, JIRA

Experience

Machine Learning Engineer

Architectured an agentic multi-hop QA framework (IP bound) using on-device local LLMs without needing any additional space in the memory, enabling users to ask complex questions and receive accurate answers by leveraging multiple data sources. Improving the response quality by 25%
Built game config recommendation system for Omen AI, by using Catboost and Shap dependency plots, which increased the users FPS by 20% compared to the previous models.

Tools:

Oct 2024 - Present

Data Scientist, Intern

Engineered an interactive LLM-based dashboard in production, integrating sentiment evaluation of articles and media agencies, and mapping to balance activity for holistic performance overview, utilizing RAG framework and functional calling mechanism.
Researched, developed, and deployed a climate risk modeling framework in production environment to track and estimate collateral risk of assets worth $250 billion. Streamlined an ETL pipeline and developed Power BI dashboards to help make strategic decisions.
Utilized Apache Spark to streamline the processing of extensive datasets while leading the development of a PowerBI dashboard for monitoring bank members' credit balances. This integration enhanced the identification of fluctuations and bolstered financial oversight. Executed complex SQL queries across more than 40 tables, ensuring smooth data integration and delivering comprehensive insights.
Performed hypothesis testing using Chi-square tests to evaluate loan eligibility bias among minority groups in the VantageScore credit checking framework versus the existing system, ensuring transparent and fair credit assessments for informed business decisions and identified potential to expand customer base by 33 million.

Tools:

Dec 2023 - June 2024

Data Scientist

Location Alias:

Support demand was reduced by 75% by the implementation and deployment of an automated pipeline leveraging the Fellegi-Sunter probabilistic model with 98% accuracy to map business listings from various digital directories, linking businesses to parent entities.
Improved data quality, consistency, and reliability by establishing a unified hierarchy of businesses through the automated pipeline and monitoring online presence across platforms efficiently.

AI for customer review response:

Developed two variants of review response system, a semi-automated and an automated approach. The automated system leverages few-shot learning and prompt engineering techniques, optimizing templates for industry-specific contexts. Integrated an output parser to structure and refine language model responses utilizing the GPT 3.5 turbo model.
Designed the semi-automated system to utilize a BERT model for sentiment analysis, categorizing customer feedback. Employed machine learning algorithms to extract relevant feedback from the dataset and propose responses.
Designed and performed A/B testing to evaluate the effectiveness of the systems. The systems collectively accelerated the rate of customer support responses by 65% and reduced the cost per review response by 110%.

Navigation on the product platform using Large Language Models (LLMs):

Developed a conversational website navigation tool by implementing Open AI GPT 3.5 Turbo model, enhancing user experience by empowering them to efficiently navigate the platform using natural language, thereby streamlining accessibility and engagement and augmenting overall usability.
Addressed LLM hallucination problem by implementing RAG (Retrieval-Augmented Generation) framework with additional creation of website navigation dataset to achieve accurate results.

Google Feature Recommendation system:

Developed a customized recommendation system for businesses suggesting top features to improve their discoverability and rankings in Google Maps over their competitors by implementing the XGBoost model for top feature selection and rank prediction.
The accuracy score has been improved from 75% to 89% by introducing a data selection strategy that statistically samples the dataset across various rank differences.

Tools:

Nov 2021 - June 2023

Senior AI Research Engineer

HELIOS (Hate speech detection on Social Media):

Developed a real-time hate tweet identification system with geographic insights using NLP algorithms such as GPT and BERT fine-tuning to detect hateful tweets.
Employed active learning methods to expand the hate speech dataset from 50k to 5M tweets, integrating human annotators and ML algorithms, thereby enhancing system capabilities. Generated $250,000 in savings by optimizing human annotation resources, reducing reliance on Amazon Mechanical Turk.

FACTDEMIC (Fake Claim Detection and Meta-Fact Checking Through Textual Entailment-based Validation):

Spearheaded the design and development of a sophisticated 4-stage web application leveraging machine learning models and BERT textual entailment. Automated the identification and validation of fake news across social media platforms.
Incorporated open-source data mining methodologies to furnish corroborative evidence and adhered to conceptual rules to ensure accurate detection. The system has assisted fake fighters by providing potential tweets and increasing the speed of annotations by 400%.

Antharyami (Code-mixed Language Model):

Researched the poor performance of code-mixed language models (statistical to neural models) and proposed and implemented a data ingestion strategy to enhance the overall efficiency of the language model.
Implemented a beam search algorithm to significantly enhance word prediction capabilities within statistical language models, contributing to the development of more efficient and accurate language processing systems.

Tools:

June 2019 - Nov 2021

Achievements and Research Publications

International Conference on Language Resources and Evaluation (LREC)

Minority Positive Sampling for Switching Points - 11 citations

France, 2020

Association for the Advancements of Artificial Intelligence (AAAI)

Overview of constraint 2021 shared tasks: Detecting English covid-19 fake news and Hindi hostile posts - 73 citations

Fighting an Infodemic: COVID-19 Fake News Dataset - 441 citations

2021

Winner of the Hackathon

2022

Projects

                
LLM based Smart News Application
                  
Projects
                  Tools: LLMs, Langchain, Streamlit, Prompt Engineering, RAG framework
                  
                  Bored of reading long news articles? This tool helps you quickly consume news articles by doing sentiment analysis and summarizing them and makes you updated all the time.
                
Article Recommendation System
                  
Projects
                  Tools: Recommendation, NLP, Embeddings, Flask, Python, Content based recommendation
                  
Fake News Detection
                  
Projects
                  Tools: Fine-tuning Transformers, Python, Machine Learning, Statistics Generation
                  
                  Real-time fake news detection in twitter