- What is GPT-4?
- But does GPT-4 actually know anything about medicine?
- An AI for medical experts and non-experts alike
- A new partnership with AI raises new questions
- Back to Zak and his mother
But does GPT-4 actually know anything about medicine?
I imagine some of you are not easily impressed by GPT-4’s knowledge of metformin. And you shouldn’t be. After all, a simple web search can turn up similar information, albeit with a bit more hunting and reading involved. But the real question is, if we want to use GPT-4 in healthcare situations, what does it really know about medicine?
This turns out to be a hard question to answer precisely. One thing that we know for sure is that GPT-4 has not had any specialized training in medicine. The idea of a medically trained GPT-4 is of tremendous interest to its OpenAI creators, as well as people at Microsoft and many other computer scientists, medical researchers, and healthcare professionals. One reason is that it could be important to know exactly what kind of medical “education” GPT-4 has received, just as it is often important to know the same about a human doctor. But for now, what we have is today’s general-purpose system. Therefore, its current state of knowledge is important to understand.
That state is surprisingly good. We have found that GPT-4 has extensive knowledge of medicine and can reason, explain, and empathize in common and rare clinical scenarios. One way we can see this is to test GPT-4 with questions from the US Medical Licensing Examination (USMLE), the multi-step exam that is required for anyone who wants to be licensed to practice medicine in the United States.
For example, here is a typical USMLE problem, presented to GPT-4, and its response:
In our testing, when given a full battery of USMLE problems, GPT-4 answers them correctly more than 90 percent of the time. (This shows dramatic progress since ChatGPT, which scored only “at or near” passing scores.5) Furthermore, it can provide detailed reasoning behind its answers:
GPT-4’s explanation shows off its understanding of medicine, and as we shall see in this book, it seems to show flashes of reasoning through causes and effects.
We will delve more deeply into reasoning, including about causes and effects, in Chapter 3. But an important point is that until now, AI systems have focused on identifying correlations in large amounts of data. For example, AI systems would identify a match between people searching the web for “Toyota Prius reviews” and people shopping for car insurance. But as the old saying goes, “correlation does not imply causation.”
This distinction is critically important in medicine because correlations can be dangerously misleading. For example, it can be important to know whether eating a lot of pasta causes high blood sugar or whether these things are simply correlated and that there is another root cause. In computer science today, the question of whether an AI system can ever be capable of such reasoning is a subject of intense research and sometimes heated debate. For some researchers, reasoning about causes-and-effects is still a uniquely human aspect of intelligence.
The question of whether GPT-4 is capable of causal reasoning is beyond the scope of this book, and I think it best to say that the matter is not settled yet. But if we ask GPT-4 itself, it gives a nuanced answer:
GPT-4’s testimony that it can simulate causal reasoning doesn’t make even that much true. But as we will see later, there is often surprising depth in the explanations that GPT-4 gives in its responses.