Resume
Education
- M.S. in Biostatistics, Yale University, 2019-2021 (expected)
- B.S. in Statistics, Sun Yat-sen University, 2014-2018
Work experience
- Acumen, LLC, Data and Policy Intern - Statistical Programming Jun, 2020 - Aug, 2020
- COVID-19 Risk Surveillance
- Extracted and validated 4 million+ Medicare claims data from multiple sources via SAS. Conducted geographical spatialanalysis to measure COVID-19 risk across tracts, improving efficiency by 20%.
- Introduced coordinate mapping methods to optimize the zip+4 code to tract mapping algorithm in R.∗Scripted and automated the workflow of the outdated tract code updating with a matching rate of 92%
- Impact Analysis of Medicare Part D Claims Delay
- Investigated the delay distribution and potential factors affecting the claims delay in Medicare Part D data, helping internal stakeholders detect problems and providing actionable insights.
- Developed a simulation framework using time series clustering to adjust the claims delay in Part D Medicare data and achieved an error rate of .6 % in SAS.
- COVID-19 Risk Surveillance
- Yale University, Graduate Research Assistant Feb, 2020 - Now
- Text Summarization: Fine-tuned pre-trained deep neural networks (BERT/RoBERTa/BART) using the WikiSum dataset and Pytorch to perform complex generation tasks, like generating structured summaries of scientific topics.
- Clinical Natural Language Processing:
- Designed and maintained an ETL process on AACT database for clinical trial eligibility analysis with PostgreSQL.
- Redesigned Facebook clinical trial NLP parser and criteria2query to use on MIMIC-III clinical database.
- Built an end-to-end named entity recognition pipeline for Electronic Health Record to support pre-screening andrecruitment of clinical trial subjects with Python.
- Echocardiographic Clinical Data Analysis
- Conducted large-scale clinical data visualization and statistical analysis to assess reader reporting of low Gradient AorticStenosis based on Echocardiographic Parameters and Sex, paper pending submission to 2021 ASE scientific session.
- Misinformation about vaping study (with Human Nature Lab)
- Hangzhou Dtdream Technology Co., Ltd, Data Scientist Sep. 2018 - Apr. 2019
- Improved the allocation of water and electricity in the largest migration area in Beijing by better predicting future population growth trends with multiple machine learning algorithms (CART, Random Forest, XGBoost, LightGBM).
- Integrated daily population growth data and resource usage data from different domains and automated feature selection(LASSO) and dimension reduction (PCA/t-SNE) of data, improving the feature engineering efficiency by 10%.
-
Developed an end-to-end population growth predictive modeling system. Built interactive dashboard in Shiny to deliver findings to the policymakers
- Haolan Information Technology Co., Ltd, Artificial Intelligence Research Intern Dec. 2017 - Mar. 2018
- Algorithm Engineering: Collaborated in a six-person team of algorithm engineers to develop an Android application for instant classification of the Traditional Chinese Herbal Medicine (including herb, pills, powder medicine) photos.
- InceptionV3 implementation: Applied deep learning algorithms Inception V3 VGG19 with Tensorflow to classify over 120 different kinds of herbal medicine with 90% accuracy.
- Algorithm Optimization: Conducted hyper-parameter tuning, optimization and regularization and realized affine transformation, Gaussian pyramid algorithm, etc to augment the limited data set and improve data set quality.
Skills
- Programming Tools: Python, R, SAS, Matlab, C++, C
- Bigdata and Database: SQL(MySQL,PostgreSQL,MongoDB), Spark, AWS
- Data Science Tools:
- R Shiny, tidyverse, ggplot2, dplyr, stringr
- Python Numpy, sklearn, pandas, plotly
- Deep learning Framework: Tensorflow(Python & R), Pytorch
- Data Visualization R(ggplot2, Shiny),Python(Plotly, Matplotlib, Seaborn), Tableau
- Web developing
- Flask, Javascript, D3.js
- Git, Linux
Courses
- Statistics
- Linear Models
- Statistc Methods in Causal Inference
- Bayesian Statistics
- Survival Analysis
- Time Series Analysis with R/Python
- Theory of Statistics
- Nonparametric Statistics
- Mathematical Statistics
- Applied Regression Analysis
- Probability Theory
- Computer Science and Informatics
- Natural Language Processing
- Applied Data Mining and Machine Learning
- Computational Methods for Informatics
- Advanced Statistics Programming with SAS and R
- Data Structure and Algorithm
- Other Core
- Statistical Practice I
- Frontiers of Public Health