Amol Sarva's weblog - AI x Bio x NY x LifeX: highlights Amol Sarva's weblog

What we learned. Oh, and you can just listen to the highlights on a run if you want, in the podcast form, 15 min: https://amol.sarva.co/podcast/ai-x-bio-x-ny-highlights-from-our-conference/

Key Learnings and Insights from the LifeX AI x Bio Conference

Foundation Models in Biology

There is some debate on the exact definition of a foundation model.
A key characteristic is formulating tasks in an open-ended way, throwing large amounts of unstructured data at a model to see what it can do.
Foundation models are good at many tasks, even those they weren’t specifically designed to do.
The term “foundation” suggests serving as a base for multiple applications; it may also signal importance for fundraising.
Thomas at Vanta AI works on all-atom foundation models for protein-small molecule complexes. These models are expensive to train, but not necessarily $500M expensive.
Compared to LLMs trained on exabytes of text, biological data models (e.g., AlphaFold) use far less—terabytes at most.
Biodata could be nearly infinite if gathered and organized; generating and curating data is essential.
Multimodal models are emerging, but their advantages remain debated.
Surprising Insight: Sean suggests AlphaFold may be the only “true” foundation model in biomedicine, showing emergent properties.
LLMs can correlate vision and text to describe unseen images—this is analogous to AlphaFold’s capabilities.
Multimodal models exist for tasks like disease diagnosis, but true discipline-spanning foundation models remain rare.
Agent-based systems may one day abstract away model selection entirely.
Medical specialties may be social/reimbursement boundaries, not scientific ones—unlike the medicine–physics divide.

Data in Biology and AI

Data is more important than model architecture in biomedicine today.
Data hoarding is common due to its value.
Compared to finance, bio shows more anxiety around data sharing.
Surprising Insight: HIPAA concerns drive biopharma to run models on-prem, even with third-party data—distinct from regulation, it’s about ownership fear.
Data quality issues persist, especially in legacy phenotypic datasets.
Labeled data is a frontier—time-consuming to create but crucial.
Outlier biological traits (e.g., insensitivity to pain) are valuable for discovery.
Synthetic data can help; sometimes it outperforms radiologists, but quality is bounded by generation process.
Data-scarce strategies include simulation (e.g., virtual cell modeling).
Mechanistic models are deterministic but don’t reduce compute burden.
Depth > breadth: Germline mutation data can offer stronger predictive power than more shallow data.

Infrastructure

Options: pay for APIs, self-host in cloud, or buy GPUs.
AWS hosts many models; Bedrock aims to simplify access for bio use cases.
Goal: Make complex platforms usable for less technical scientists.
Bio infrastructure mixes cloud (scalable) and on-prem (privacy).
Training models is costly; some companies cloud-hop for credits. Inference is increasingly expensive too.

AI for Accelerating Drug Discovery

The panel discussed reversing Irum’s Law (rising drug discovery costs).
Challenges include combinatorial therapy and multi-target inhibition.
AI is used for hypothesis generation and predicting experimental outcomes.
AI may replace some experimental work, boosting iteration speed.
Surprising Insight: Only ~10% of drugs entering the clinic succeed—improving this is AI’s big opportunity.
The future lies in causal biology—differentiated hypotheses via mechanistic insights.
AI also helps in clinical ops: patient selection, endpoints, data management.
New architectures include flow matching, agentic systems, RL, and chain-of-thought reasoning.
Fragmented data across stages limits end-to-end modeling; bridging these gaps could drive the next wave.
Pharma’s in vivo/in vitro data is valuable for improving translation.
Surprising Insight: Biotech “sells hope” to pharma, pharma to patients, investors buy hope—this drives business models.
Some startups sell predictions (anti-hope) vs. molecules (hope).
Hype is seen as necessary to attract capital and talent.
Should vertical AI tools build pipelines (“platform envy”) or stay focused and share in upside?

Longevity – Hype vs. Reality

Longevity focuses on aging drivers, not just disease prevention.
Hallmarks include mitochondrial dysfunction and chronic inflammation.
Factions focus on specific aging mechanisms (e.g., mitochondria, immune).
Measuring dysfunction is tough outside research settings.
Defining “health span” shapes longevity goals.
The space has evolved from cosmetic/optimization to biochemistry-based interventions.
Surprising Insight/Debunking: Brian Johnson’s N=1 experiments are viewed skeptically; epigenetic clocks have high variability.
Interventions like NAD, metformin, rapamycin, fasting are unproven for lifespan extension.
Uncontrolled self-experimentation may have unknown harms.
Antioxidants could worsen some cancers in mice.
Surprising Insight: Plasma exchange therapy shows promise in mice but isn’t practical/safe for humans.
Validated biomarkers remain the standard; most physicals miss deeper aging indicators.
Biomarkers help both stratify patients and track intervention impact.
Best validated interventions: eat less, sleep more, be happy, exercise.
Fasting is a consistent positive signal.
GLP-1s could be a “miracle drug” for longevity, pending long-term studies.
Mother–daughter asymmetry shows rejuvenation is biologically possible and worth studying.

Starting a Company as a PhD

Surprising Insight: Great academics often make weak founders; bold simplification is punished in academia but necessary in startups.
Startups require prioritization, speed, and clarity.
Choose the right seed investor, not the first.
Bootstrapping can prove pull before raising.
Serendipitous conversations lead to funding opportunities.
Startup law firms can defer fees and provide crucial early help.
Trust and team empowerment are critical.
University IP can be a landmine—build off-site, license when needed.
Cofounder questionnaires help surface conflicts early.
Biotech companies tend to delay hiring GCs until IPO stage or regulatory complexity arises.
Academic rigor (p-values, caveats) must give way to rapid, imperfect action.
Simplifying your pitch is essential.
Conviction comes from personal connection to the problem and end user.
Surprising Insight: Standard startup structures (stock, boards, investor terms) exist for a reason—over-customization creates drag.
Legal simplicity allows focus on harder problems.