Unsolved problems for AI, especially in health and bio

AI is moving so fast right now! Well that was a few months back. It felt like folks were discovering whole new amazing things daily. Remember that whole “tell AI to start a business for me and run it” thing?

Suddenly it’s moving super slowly as people crash into the, oh right this “obvious” next step is super hard.

To figure out what’s hard and easy, we had a bunch of really serious health and bio people together in late September in New York at LifeX’s AI x Bio event. Who was there:

  • founder of a billion-dollar-plus digital health insurance company, Oscar Health
  • founder of billion-dollar-plus AI drug discovery company, Immunai
  • founder of yet another unicorn that has one of the largest weight loss apps, Noom
  • head of drug discovery for one of the largest Google-spinout quantum companies

plus CxOs of large drug companies, CMOs from large health systems and therapeutics startups, VCs, and multiple more unicorns….

We heard from these folks who are building hardcore and reshaping big product platforms. To me the big takeaways were about what’s very difficult.

  1. Short questions. Few followups. While the AI models are trained on complex information (“the whole internet”), they don’t handle lots of input from you very well. They get overloaded keeping too much input “in mind” and start hallucinating. So much of the hard work is long questions — handling an insurance claim involves medical history, multiple doctor visit notes, and complex reimbursement rules. Or a psychotherapy session — or multiple sessions over months and years.
  2. Failing inelegantly. The hallucination stuff is when the models say false things with authority. When it happens, the model is failing. But it isn’t caused in obvious ways for the user. One of the ways to make it hallucinate is to overload it. It doesn’t know it’s overloaded.
  3. No plans. No nuance. The models don’t think of you as someone to persuade, and in fact don’t remember you at all. Every conversation is the first time. That has its virtues (patience, if you are tutoring someone) but also means “bedside manner” is impossible. It also means the robot will foolishly repeat the same suggestions (perhaps like some annoying humans do).
  4. Not enough data. It’s awesome that reading the entire Internet can create credible short conversations, but the entire Internet is a lot of data. Figuring out whale songs would require 40,000x the amount of data we have about whale songs on the planet. That’s likely to be true for conversations between dietitians and weight loss clients, or with psychotherapists, or biologists querying proteins.
  5. Inaccessible data. Some huge systems capture zillions of patient conversations, e.g. customer care centers or claims adjustors — but all those vendors and tools and platforms weren’t designed to store every single patient conversation, interaction, and health outcome in a single platform where the robots can train. Big health systems have projects — not to train AIs — but just to digitize millions of hand-written records.
  6. Irrelevant data. When two humans talk, the humans react in certain ways. If you train an AI to talk that way, will you get the same result? Unknown. Much of our data can’t be used directly by a robot unless the interlocutors think the speaker is a human too.
  7. Disguised data. Hidden factors in conversations — like is it a robot? a child? a foreign language speaker? — influence how the data is usable or not. Should your AI talk the way a teacher talks to a kindergartner? How do you know who the speakers are?
  8. Private data. Health is full of private data that nobody is allowed to access or aggregate.
  9. Impossible data. The quantum phenomena at the particle level in biology isn’t explained or adequately observed. So some folks are simulating quantum phenomena from bottom-up quantum physics models…and using that data to train their AIs.
  10. “Better is best” models. The models got good “suddenly” as far as the world is concerned, and perhaps that’s because there’s a huge visible difference at certain thresholds. But is that at all thresholds? If GPT4 is far better than 3.5, or if the rankings on HuggingFace are crowded yet closely packed…does the winner take all? Most in the room had not even tried Google’s Bard, or any other model than OpenAI in any depth.
  11. Memory, computations, speeds. There’s more to life than just the quality. And routing around between models either sometimes or in real time to optimize these dimensions is going to be another decision — some models can take longer input, longer conversation chains, use less compute, react faster…
  12. Skeptical users. As amazingly convincing as the models are to some of us, several big players mentioned obstacles in getting patients or doctors or other experts to even trust the AIs at all. “Much easier in the back office where you have fewer people to convince.”
That’s me talking about quantum with Andrea from SandboxAQ

The overwhelming conclusion for me from all this was: choosing precise applications that actually work/impress/save time + generating your own private data from these applications is going to be the path to greatness in the near-term

Here’s a recap speaker-by-speaker

Also: our next session like this, AI x Longevity at Betaworks on October 12.

Get on my newsletter here for future posts