Projects
JAVIER LOPEZ CASTILLO | Data Engineer
Welcome! I’m a data engineering leader who thrives on building robust systems to solve real-world business problems. My experience spans designing scalable infrastructure, defining data standards, and transforming fragmented datasets into trusted sources of truth. From building performance pipelines to supporting cross-functional decision-making, I’m passionate about making data usable and impactful.
Outside of work, I enjoy strategy games, puzzles, and warm pool days with my family. Feel free to explore my projects below—each one reflects my commitment to clear logic, strong architecture, and business-driven outcomes.
Explore projects below ↓
Citi Bike Ridership Forecasting with Linear Regression

I analyzed five years of NYC Citi Bike data to identify key drivers of bike usage, factoring in weather patterns, seasonal shifts, and pandemic disruptions. Using Python and scikit-learn, I developed a multiple linear regression model to forecast demand, applying statistical rigor to validate each predictor’s influence.
To streamline the pipeline, I engineered features for event-adjusted days, normalized temporal variables, and layered in external datasets. The final model achieved strong predictive accuracy and offered actionable insights for city planning and shared mobility operations.
This project reflects my ability to structure messy public data, apply machine learning responsibly, and surface results through clear data storytelling.
Time Series Forecasting: A Case Study on Telecom Revenue

Exploring the application of Time Series Analysis using the ARIMA model to forecast telecom revenue. We began by cleaning and preparing the dataset for analysis, after which we identified significant patterns and trends. By leveraging the ARIMA model, we generated accurate forecasts for the future, enabling more informed business decision-making. We also validated and visualized the model’s performance, ensuring its reliability. The results demonstrate the valuable insights time series forecasting can provide, contributing to strategic business growth.
Telecom Churn Analysis: Using Clustering Techniques for Customer Segmentation

This project aims to reduce customer data dimensionality in the telecom industry using Principal Component Analysis (PCA). By identifying key factors contributing to customer churn, targeted marketing campaigns and strategies can be improved. The project involves preprocessing the data, applying PCA, and analyzing the principal components to gain insights into customer behavior and preferences. The findings will help telecom companies optimize their marketing efforts and reduce customer churn.
Leveraging Data Analysis for Better Healthcare Management: Installation and Analysis of a Compliance Dashboard

This project focuses on implementing a compliance dashboard using pgAdmin 4 and Tableau Public to analyze healthcare data. By preprocessing the data and installing the dashboard, stakeholders gain access to critical metrics and patient demographics. The dashboard allows for in-depth analysis, aiding decision-making and improving patient outcomes.
Predicting Customer Bandwidth Usage Using Machine Learning

Utilizing random forest regression, this project predicts future bandwidth usage in the telecommunications industry. By analyzing customer data, we identify key features such as tenure, internet service type, and monthly charges that impact bandwidth usage. The model achieves high accuracy and offers insights for attracting younger customers, retaining long-term customers, and optimizing network resources. Future work includes hyperparameter tuning and exploring other machine learning models.
Tackling Customer Churn: A Deep Dive into a Telecom Case Study Using Logistic Regression and Python

This project uses logistic regression and Python to understand and predict customer churn in the telecom industry. Through data cleaning, feature selection, and model development, key factors impacting churn are identified. The reduced model highlights the relationship between customer tenure, satisfaction, technical support, and churn probability. Practical implications include targeted marketing and retention strategies. Careful consideration of outliers is crucial. The project provides valuable insights for improving customer retention in the telecom sector.
Predicting Customer Churn: Data Analysis, Cleaning and Principal Component Analysis

This project analyzes customer churn in the telecommunications industry using data cleaning, PCA, and predictive modeling. It highlights the importance of addressing data quality issues and reducing dimensionality through PCA. The findings guide businesses in reducing churn and fostering customer loyalty for sustainable growth.
Hotel Review Sentiment Analysis with LSTM

This project explores the use of Long Short-Term Memory (LSTM) networks, a type of recurrent neural network, to predict sentiment in hotel reviews. The workflow begins with extensive data preprocessing, including text cleaning, tokenization, and converting text into integer sequences. I then designed and trained an LSTM model using the Keras framework, incorporating embedding, dropout, and dense layers.
To mitigate overfitting, I implemented techniques such as early stopping and L1/L2 regularization. The final model achieved approximately 87% accuracy on unseen test data. This project highlights how deep learning and natural language processing techniques can be applied to complex tasks like sentiment classification.
Telecom Association Rules and Lift Analysis: A Deep Dive Into Transaction Data

Analyzing transaction data to uncover item associations and frequent itemsets using the Apriori algorithm. The goal is to enhance marketing strategies and increase revenue by understanding customer behavior and identifying items frequently purchased together. Recommendations include product placement optimization, inventory management, and personalized recommendations. Incorporating AI and ML techniques can further enhance market basket analysis.
Telecom Churn Analysis: Using Clustering Techniques for Customer Segmentation

This project applies k-means clustering to identify distinct customer segments, optimizing marketing efforts and reducing churn. Findings inform resource allocation, retention strategies, and personalized campaigns. Future work includes exploring advanced clustering techniques and incorporating more variables for comprehensive analysis.
Utilizing Tableau for In-Depth Hospital Readmission Data Analysis

This project utilizes Tableau to analyze and visualize hospital readmission data, focusing on Medica General Hospital. By examining factors such as readmission rates, performance metrics, demographics, and county-level analysis, the project aims to gain insights and improve patient care. The culmination is an interactive dashboard that presents clear and actionable findings, empowering data-driven decision-making in healthcare.
Leveraging Machine Learning to Predict Customer Churn for a Telecom Company

This project predicts customer churn in the telecom industry using machine learning models. Analyzing a dataset of 10,000 customers, K-Nearest Neighbors and Gradient Boosting models identify high-risk customers. Key findings highlight important features and inform retention strategies. Recommendations include pricing strategy review and targeted incentives. This project showcases machine learning’s effectiveness in predicting customer behavior and improving retention.
Improving Telecom Services: An In-depth Analysis with Multiple Linear Regression

This project applies multiple linear regression to examine factors affecting customer churn and bandwidth usage in telecommunications. With data preparation, feature engineering, and model building, significant variables are identified, yielding an accurate reduced regression model. Implications involve refining marketing strategies, segmenting customers, and prioritizing retention efforts. Recommendations center on tailored services for specific customer characteristics. The project showcases how multiple linear regression enhances telecom services and profitability.
COVID-19’s Impact on Rideshare in NYC: A Time Series Analysis of Lyft and Uber

In this comprehensive data science project, I delve into the impact of the pandemic on the rideshare industry in NYC, specifically focusing on Uber and Lyft. Using Python and Prophet, a forecasting tool developed by Facebook, I predict future trends and compare them with historical data. The analysis reveals a significant rebound in rideshare demand post-vaccination, offering valuable insights for investors and stakeholders.