I'm Kai Shiun

I build explainable machine learning systems that drive real business decisions.

Profile background

Testimonials

"

If Kai got a job offer elsewhere, I would give an excellent recommendation for a Data Science internship position, commend his understanding of statistics and data science techniques, commend his work ethic, commitment, and work culture.

Camilo Lagos

Head of Data Science

CONXAI Technologies GmBH

Camilo Lagos
"

Since joining our team, Kai Shiun has been a vocal and outspoken individual, constantly operating with the team's best interest at heart. A critical thinker, he is unafraid to clarify doubts and probe for further details during team discussions. Regarding his assigned work, he is capable of taking the initiative to direct his own projects and return promptly with his assigned tasks.

Veronica Low

Founder & President

ASEAN Business Youth Association

Veronica Low
"

From the outset, he displayed a high level of technical proficiency, independently developing robust data pipelines and feature engineering workflows to consolidate complex, multi-year SKU-level financial data... He approached challenges methodically, showing maturity in identifying data quality issues, proposing structured solutions, and ensuring reproducibility in his work.

Peter Condron

Group Technology Strategy, Architecture, & Governance Director

iNova Pharmaceuticals

Peter Condron

Experiences

Background 1
Background 2
Background 3
Background 4
Background 5
Background 6
Jul 2026 (Expected)

Graduation

National University of SingaporeSingapore
• NUS Merit Scholar, NUS College Programme • Expected graduation with a Bachelor's Degree in Business Analytics (GPA: 4.11)
Business AnalyticsData ScienceMachine Learning
Dec 2025 - Present

Group Technology Data Scientist

iNova PharmaceuticalsSingapore
• Diagnosing structural data gaps, mapping and aligning end-to-end data workflows across Finance, Supply Chain and Commerical departments to establish clear data ownership • Investigating successful drivers behind business product launches to guide launch planning and support downstream predictive modelling
PythonSQLData EngineeringPredictive Modeling
Jun 2025 - Dec 2025

Machine Learning Intern

iNova PharmaceuticalsSingapore
• Developed an interpretable deviation-detection model comparing predicted vs. realised launch sales to guide budget allocation and portfolio decisions. • Prototyped end-to-end ML pipelines and deployment architecture for integration into internal finance systems.
Pythonscikit-learnMLflowAzure
Jul 2024 - Dec 2024

Data Science Intern

CONXAI Technologies GmBHMunich
• Built robust CSV processing workflows for camera-based site monitoring data (Python) • Identified reliable prediction regions in 27D space using clustering and outlier detection (DBSCAN, LOF/KDE)
PythonDBSCANLOFKDEClustering
Jul 2022 - Jun 2024

Chief Operations Officer

ASEAN Business Youth AssociationSingapore
• Led strategic planning of community building initiatives in Vietnam • Initialised recruitment and manpower management functions, refreshing internal training and operational systems
Strategic PlanningOperations Management
Mar 2022 - Jun 2022

Business Analyst Intern

Motorist.SGSingapore
• Dashboard design tracking sales data across 3 countries for trend analysis (Looker Studio) • Business process automation with Python scripting
PythonLooker StudioData VisualizationBusiness Process Automation

Featured Projects

HDB Resale Price Prediction

HDB Resale Price Prediction

An end-to-end data science pipeline to model resale flat prices using public and external datasets.

PythonApache SparkAWS S3Snowflake+2

Problem

HDB resale prices are influenced by numerous factors including location, property attributes, market conditions, and external economic indicators. Traditional valuation methods struggle to capture the complex, non-linear relationships between these variables, making accurate price prediction challenging for both buyers and sellers.

Technical Approach

  • Built a scalable ETL pipeline with Apache Spark to ingest and preprocess large-scale structured and semi-structured data into a cloud-based data lake (AWS S3)
  • Enriched the dataset with external sources including SORA interest rates, BTO launch timelines, and geospatial data on top primary schools to capture market dynamics
  • Engineered market-relevant features including distance to amenities, school proximity scores, and temporal market indicators
  • Trained and compared multiple predictive models (Random Forest, XGBoost, CatBoost) using cross-validation to identify the best-performing ensemble approach

Results

Achieved improved prediction accuracy by incorporating external data sources and advanced feature engineering, enabling more informed decision-making for property transactions.

Technologies

PythonApache SparkAWS S3SnowflakeXGBoostCatBoost
Customer Churn & Marketing Analytics

Customer Churn & Marketing Analytics

Project with an aim to improve customer retention and optimize marketing spend through predictive modeling and analytics

PythonPandasScikit-learnMatplotlib+1

Problem

The business was experiencing high customer churn rates without a clear understanding of which customers were at risk. Additionally, marketing campaigns were not optimized, leading to inefficient spend and missed opportunities for cross-selling. There was a need to proactively identify at-risk customers and optimize marketing strategies.

Technical Approach

  • Built a churn prediction ensemble model combining Random Forest and Logistic Regression to leverage both tree-based and linear approaches
  • Analyzed historical discount impact on customer spend patterns to understand price sensitivity and optimize year-end promotion strategies
  • Developed a recommendation engine using Word2Vec embeddings and cosine similarity to identify product associations and boost cross-sell conversions
  • Created comprehensive analytics dashboards to visualize churn risk segments and marketing campaign effectiveness

Results

Achieved 79% precision and 95% recall on churn prediction, enabling targeted retention campaigns. Marketing optimization led to improved ROI on promotional spend.

Technologies

PythonPandasScikit-learnMatplotlibWord2Vec
LLM Powered Marketing Dashboard

LLM Powered Marketing Dashboard

An AI-driven analytics tool that turns raw marketing data into actionable business insights.

PythonPandasLangChainProphet+2

Problem

Marketing teams were spending excessive time manually analyzing CSV files and creating reports. Decision-makers needed quick access to insights but lacked technical skills to query data. There was a gap between raw marketing data and actionable business intelligence, slowing down strategic decision-making.

Technical Approach

  • Developed a full-stack dashboard with CSV upload functionality and natural language chat interface using LangChain and OpenAI API for on-demand chart generation
  • Integrated Prophet time series forecasting to project revenue, ad spend, and new account creation up to 4 months ahead with confidence intervals
  • Built an automated PDF report generation system with descriptive analytics, trend analysis, and forecast visualizations
  • Created a business-focused React frontend with intuitive navigation to surface key marketing insights for non-technical decision-makers

Results

Reduced report generation time from hours to minutes. Enabled real-time data exploration through natural language queries, improving decision-making speed and accessibility.

Technologies

PythonPandasLangChainProphetOpenAI APIReact

Get in Touch

If you’re working on something interesting and think my skills could help, I’d be happy to chat : )