Experience
Lead Applied AI/ML Engineer
CATCH BIO, REMOTE
JAN 2025 - PRESENT
Nearly 1 in 2 Americans will get cancer in their lifetime, and 1 in 6 will die from it. We're on a mission to change that. Catch maps your risk factors for every major cancer and builds an optimal screening protocol and action plan personalized to you.
Principal Clinical Data Scientist/Engineer
FORMATION BIO (Formerly TrialSpark), ClINICAL DATA ANALYTICS AND PROGRAMMING, NEW YORK, NY
MAY 2023 - NOV 2024
Formation Bio is a technology company that helps bring treatments to patients faster. Today, clinical trials are the bottleneck to bringing life-saving treatments to patients. Trials are slow, inefficient, and expensive. Formation Bio is using technology to accelerate the pace of clinical trials and bridge the gap between medical research and patients who need treatment.
__
Deployed various supervised and unsupervised models, which utilize internal clinical trial and real-world data to predict patient response to investigational products (IP), detect anomalies in clinical trial data, and inform other clinical trial processes, such as protocol design, drug formulation, and adverse event monitoring.
Built custom and interactive data analysis and visualization tools, such as a Patient Profiles Application (provisional patent pending) to integrate data from multiple disparate sources, automate data cleaning, and streamline the medical and safety monitoring workflow during clinical trials.
Worked with the Data Platform team to validate pipelines to ingest clinical trial and real-world data into Redshift. Trained data engineers to use DBT to create globally useful, properly redacted, and documented tables for R&D stakeholders to utilize.
Lead cross-functional projects with various R&D and engineering teams, such as Biometrics and Site Reliability, helping to bridge the gap between non-technical and technical stakeholders and ensure robust outcomes.
Developed and ran a summer internship program to teach interns and other internal employees about clinical trial processes, data analysis, and how to code in R and Python.
Ensured GxP-relevant systems were validated in accordance with internal SOPs, set coding standards, provided code review, and managed CI/CD pipelines.
Created and deployed a Slack chat assistant, which utilizes OpenAI’s API to enable teams like Finance and TechOps to ask questions, import documents, and analyze data.
Senior Data Scientist/Data Engineer
LEVELS HEALTH, REMOTE
AUGUST 2021 - FEBRUARY 2023
Levels makes it easy for people to see how their diet affects both their health and their lifestyle in a quantifiable way by measuring biomarkers in real-time. Levels is expanding access to continuous glucose monitoring and making it mainstream, focused on people looking to find their optimal diet and improve their metabolic fitness.
__
Developed algorithms to personalize the in-app user experience by leveraging machine learning models such as SVM and Random Forest and applied Natural Language Processing (NLP) to build text classification infrastructure.
Built and managed data pipelines using AWS, Python, SQL, and DBT that integrate and reconstruct data from various wearables, such as continuous glucose monitors, smart watches, smart scales, etc., into Snowflake.
Worked cross-functionally with various department teams to roadmap data/infrastructure needs, define best analytical practices, prototype features, inform metabolic disease and diabetes research strategy, contribute to company publications and media, and translate business problems into actionable data science initiatives.
Data Scientist (Computational Genomics)
ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI, MULTISCALE NETWORK MODELING LAB, NEW YORK, NY
AUGUST 2019 - AUGUST 2021
·Used machine learning techniques, such as Random Forest and Support Vector Machine, to assess various polygenic risks generated from Genetic-Wide Association Studies (GWAS) and their association with the development of late-stage Alzheimer's disease.
Used maximal information-based nonparametric exploration (MINE) statistics to analyze relationships between RNA-seq, protein expression, and clinical covariate data in Alzheimer's disease subjects.
Analyzed the correlation between RNA-seq and protein expression data using differential expression and gene set enrichment analyses to gain more insight into Alzheimer's disease pathogenesis, diagnosis, and ultimately, therapeutics.
Data Scientist
CELGENE CORPORATION (Acq. Bristol Myers Squibb), GLOBAL DRUG SAFETY AND RISK MANAGEMENT, SUMMIT, NJ
JUNE 2017 - AUGUST 2019
Analyzed 470K+ Individual Case Safety Reports (ICSRs) - effectively comprising just short of a billion individual data points - using R, Python, and data visualization tools.
Used various statistical techniques, such as Principle Component Analysis (PCA) to perform feature reduction and classification of data.
Used machine learning techniques, such as Random Forest and Logistic Regression to optimize and automate case processing workflows.
Data Analyst
GOVERNOR ANDREW M. CUOMO, NEW YORK, NY
JULY 2013 - JUNE 2015
Analyzed constituent and fundraising data using statistical analysis techniques to inform campaign resource allocation.
Led weekly briefs to senior officials about campaign financials, events, and fundraisers.
Supported coordination of Governor Cuomo’s campaign events by identifying and solving logistical problems to optimize donations.
Academic Publications/Patents
Belloff, Helena, et al. “A ‘Shiny’ New Perspective: Unveiling Next-Generation Patient Profiles for Medical and Safety Monitoring.” PharmaSUG US Conference, TrialSpark, Inc. d/b/a Formation Bio, Apr. 2024, https://www.lexjansen.com/pharmasug/2024/DV/PharmaSUG-2024-DV-382.pdf.
Belloff, H., & Klincewicz, S. Next-Generation Patient Profiles for Medical and Safety Monitoring. U.S. Provisional Patent Application, pending. An interactive web application that automates and streamlines patient safety, medical monitoring, and reporting during clinical trials.
Belloff, H., Neff, R., Wang, M., & Zhang, B. (2021, March 8). Integrative Analysis of Large-scale Transcriptomic and Proteomic Data in Alzheimer’s Disease [Unpublished thesis, presented at thesis defense]. The Multiscale Network Modeling Lab, The Icahn School of Medicine at Mount Sinai.
Technical and Business Skills
Technical Skills
Programming & Scripting: Python (Pandas, PyTorch, TensorFlow, Scikit-Learn, FastAPI, Flask), R (ShinyR), SQL, Java, MATLAB, TypeScript, UNIX/Linux
Machine Learning & AI: Supervised & Unsupervised Learning, Deep Learning, Natural Language Processing (NLP), Time Series Analysis
Data Engineering, MLOps, & CI/CD: AWS (S3, Lambda, API Gateway, ECS, EC2, RDS, CloudWatch, DynamoDB), Snowflake, Redshift, PostgreSQL, Data Build Tool (DBT), Fivetran, High-Performance Computing (HPC), Docker, Terraform, GitHub, GitHub Actions, CircleCI
Business & Leadership Skills
Leadership & Collaboration: Expertise in leading cross-functional teams and fostering collaboration across data science, engineering, biostatistics, and product teams.
Regulatory & Compliance: Experience working with Sponsors and Contract Research Organizations (CROs); trained in HIPAA, GDPR, and GxP compliance.
Strategic Decision-Making: Proficient in translating complex data insights into actionable business strategies through strong analytical skills.
Awards and Organizations
On Deck Data Science (ODDS) Fellow (Feb 2022 – Present)
Mount Sinai Alumni Network, mentor for Biomedical Data Science students (2021 – Present)
Barnard College Alumni Network, mentor for women in STEM (2018 – Present)
NYS Department of Education Scholarship of Academic Excellence (2014)
New York State School Music Association (NYSSMA) (Levels 4, 5, 6 Piano) (2011 – 2014)
Japanese National Honor Society (2011 – 2014)
Volunteer Work
January 2016 - Columbia University Global Brigades, Public Health Brigade, El Retiro, Honduras
June 2015 - Putnam Hospital Center, Clinical Volunteer, Carmel, NY
June 2013 - Global Leadership Adventures, Public Health and Sustainable Development, Kilimanjaro, Tanzania
June 2012 - Habitat For Humanity, Yonkers, NY
Languages
Russian (Conversational)
Japanese (Intermediate)
German (Introductory)
Swahili (Introductory)