Matt Ebrahim

Matt Ebrahim

Senior Data Scientist specializing in AI for Drug Discovery

About

I'm a Senior Data Scientist at Formation Bio, where I lead drug repurposing initiatives using machine learning. I build scalable models—transformers, graph neural networks, diffusion models—for molecule generation, ADMET prediction, and clinical portfolio prioritization.

I also teach graduate-level AI courses at Northeastern University, focusing on deep learning applications in healthcare and drug discovery.

8+
First-author papers
30+
Publications
1
U.S. Patent
Experience
Jun 2025 — Present
Senior Data Scientist
Formation Bio

Lead drug repurposing with GNN pipelines on biomedical knowledge graphs. Build MRI surrogate endpoint models and fine-tune domain-specific LLMs for ontology mapping.

2024 — May 2025
AI Scientist II
1910 Genetics

Led generative AI for de novo molecular design. Developed multimodal GNN architecture for BBB permeability (NeurIPS 2025 submission). Delivered molecules progressing through in vitro/in vivo validation.

2023 — 2024
AI Scientist I
1910 Genetics

Built ADMET prediction models and molecular property optimization pipelines. Contributed to multi-objective drug design workflows integrating ML with medicinal chemistry.

2022 — 2023
Clinical Research Associate
Northwestern University

Deep learning for medical imaging. CycleGAN for MRI synthesis from CT. CNN-MLP for aortic flow estimation from wearable sensors (first-author publication + U.S. patent).

2017 — 2022
Ph.D. in Biomedical Engineering
Stony Brook University

ML models for terahertz imaging and burn injury diagnostics. 8+ first-author and 30+ co-authored peer-reviewed publications.

Technical Skills

Generative AI

Diffusion (DDPM), Transformers (GPT, T5), VAE, GAN, RNN/LSTM, Reinforcement Learning

Graph ML

Graphormer, GCN, GAT, MPNN, PyG, DGL, Knowledge Graphs

Drug Discovery

RDKit, DeepChem, AutoDock Vina, ESMFold, AlphaFold, ADMET modeling

Foundation Models

BioMistral, BioMegatron, SapBERT, MedGemma, BioMedCLIP, DINOv3

ML Frameworks

PyTorch, TensorFlow, Hugging Face, Scikit-learn, Optuna

Cloud & MLOps

Azure ML, AWS (Bedrock, SageMaker), GCP, Snowflake, Docker

Northeastern University Teaching — Northeastern University
Open Source Projects
BioMap Banner
NLP

BioMap: Biomedical Entity Linking

Python · Transformers · LLMs · 2025

Benchmarking framework for biomedical entity linking using SapBERT, BioMegatron, and LLMs (GPT-4, Gemini) on MedMentions dataset.

Python Transformers FAISS
View project
DiffBind Banner
Drug Discovery

DiffBind: Diffusion Models for Binding Affinity

Python · PyTorch · RDKit · 2024

DDPM implementation for predicting protein-ligand binding affinity using equivariant diffusion and the BindingMOAD dataset.

PyTorch RDKit Diffusion
View project
Catflix Banner
Creative AI

Catflix: AI-Generated Cat Videos

Python · MoviePy · Animation · 2025

Automated pipeline that generates and uploads animated videos for cats to YouTube, featuring procedurally generated bug animations.

MoviePy Animation YouTube API
View project
pdf2audiobook Banner
Audio AI

pdf2audiobook: PDF to Audiobook Converter

Python · FastAPI · LiteLLM · TTS · 2026

Convert any PDF into a chapter-aware audiobook with streaming pipeline. Includes web interface for real-time playback and read-along.

LiteLLM FastAPI TTS
View project
Selected Publications
Patent

Personalized Chest Acceleration Using Deep Learning

U.S. Patent · Issued 2025
View patent
NeurIPS 2025

Multimodal Graph-Attention Networks with QM-Guided Cross-Attention for ADMET Prediction

Submitted · First Author

Deep Learning for Aortic Flow Estimation from SCG

Annals of Biomedical Engineering, 2023 · First Author
Read paper

Deep Learning for Triage of In Vivo Burn Injuries

Biomedical Optics Express, 2022 · First Author
Read paper

Let's Connect

Open to collaborations in AI-driven drug discovery and biomedical research.

Get in touch