First Contact
Peter Lee, corporate vice president for research at Microsoft, recounts the early days of working with OpenAI's GPT-4 before its public release. He shares observations and experiences that illuminate the personal side of GPT-4 and how AI systems can potentially impact medicine for the greater good.
I was being scolded. And while I’ve been scolded plenty in my life, for the first time it wasn’t a person scolding me; it was an artificial intelligence system.
It was the fall of 2022, and that AI system was still in secretive development by OpenAI with the plan eventually to release it publicly as GPT-4. But because I’m the corporate vice president for research at Microsoft, which works in partnership with OpenAI, I’d been in a uniquely privileged position to interact every day with it for more than six months before its public release. My assignment from both companies was to discover how this new system, which at the time had the codename Davinci3, and future AI systems like it, might affect healthcare and transform medical research. That is the focus of this book, and the short answer is: in almost any way you can name, from diagnosis to medical records to clinical trials, its impact will be so broad and deep that we believe we need to start wrestling now with what we can do to optimize it.
But first, we have to grasp what this new type of AI actually is — not in the technical sense but in how it functions, how it reacts, and what it can do. Through thousands of chat sessions with Davinci3, I learned a lot. And I am still learning now that it has been publicly released as GPT-4. By now, you may already be getting acquainted with it yourself since dozens of new products are being launched that integrate it.
I was lucky to get introduced to GPT-4 when it was still “Davinci3.” And honestly, I lost a lot of sleep because of it. Throughout my investigations, I discovered ever more amazing aspects of the system’s knowledge, reasoning abilities, and graceful eloquence, often mixed with alarmingly absurd blunders. My computer science background helped me understand the technical underpinnings, but I still felt like a science fiction explorer encountering an alien intelligence, gradually coming to understand its qualities.
This isn’t just about feats of amazing technology. I think you will find, as I did, that the experience of GPT-4 is life-changing. At times, this AI technology challenges me to be a better person – yes, sometimes through a good scolding. GPT-4 can make me laugh with its (often dry) wit. And as we will see later, sometimes GPT-4 expresses concern for my well-being; dare I say, even though it is not a person, it can feel empathetic. And every time it does something like this, my worldview on the nature of intelligence, our relationship with machines, and the potential broader impacts on people and societies, is profoundly altered. Time and time again.
Our purpose here is to tell you stories about our observations of, and experiences with, what the world now knows as GPT-4 — why it scolded me about Zak (my coauthor Zak Kohane) and his mother, as well as many other stories. Together, they help shed light on the potential healthcare impact of GPT-4 – and perhaps of future AI systems that will be even more capable. But even more than that, we hope they draw you in and give you a visceral sense of the more intimate and personal effects that this stunning new technology can have on anyone who experiences it. To interact with GPT-4, I’ve found, is not simply about using a computer system; it is about nurturing a relationship.
Those who know me will tell you I’m no hype-monger. At heart, I’ll always be the sober, cautious academic I was for years as head of the Computer Science Department at Carnegie Mellon University and as a director at DARPA, the Defense Advanced Research Projects Agency. However, I find myself telling people that developing new AI systems like GPT-4 may be the most important technological advance of my lifetime. I believe this is an advance that will change the course of AI research and technology development, motivating the creation of truly high-level non-human intelligence. As such, it will change a great deal about human existence. Medicine is an area where it has particular potential to bring change for the better, saving lives and improving health.
What is GPT-4?
First, let’s discuss some background. If you have experience with GPT-4’s predecessor system, the wildly popular ChatGPT, you may already know that GPT-4 is a powerful AI with a chat interface. Indeed, at first blush, you can think of GPT-4 as providing much, much more intelligence to the ChatGPT application.
Uninitiated users frequently start off thinking of AI systems as a kind of smart search engine. And indeed, it is possible to use the system by giving it search queries. For example, we can give GPT-4 the query:
to which the system gives this response:
(GPT-4’s responses are shown in italic text.)
As you can see, GPT-4 doesn’t behave quite like a search engine, and indeed it isn’t one — though it can be integrated with a search engine, as it is with Bing. Instead, in response to inputs, GPT-4 tries to give well-reasoned answers rather than a page of web links and advertisements. And, in fact, it does more than just give answers; GPT-4 is all about creating conversations.
So, for example, we can continue the above by asking another question, such as:
Perhaps you have tried to converse with a smartphone system like Apple’s Siri, or a smart speaker system like Amazon’s Alexa or Google’s Assistant. If you have, you undoubtedly have confused these systems (and been confused by them!) when you try to have a conversation, even a very simple one like this. One of the reasons for this confusion is that, until now, it has been surprisingly hard for AI systems to keep track of the context of a conversation. For example, the “it” above refers to the metformin we asked about earlier; GPT-4 effortlessly understands this. In fact, we can take the conversation much further:
Like any attentive person we might be conversing with, GPT-4 understands that we are still talking about metformin, even though there is no reference to it in the prompt. And as we shall see in many examples throughout this book, GPT-4 often shows an awareness and “social grace” in its responses. To wit:
GPT-4’s ability to carry on a conversation is incredibly compelling. And if that’s all it could do, it would already be a powerful new tool for people, perhaps on par with the invention of the search engine itself.
But this doesn’t even scratch the surface of what it can do. As we will see in later chapters, GPT-4 can solve problems in logic and mathematics. It can write computer programs. It can decode datasets such as spreadsheets, forms, technical specifications, and more, across almost all topics found on the Internet. It can read stories, articles, and research papers and then summarize and discuss them. It can translate between foreign languages. It can write summaries, tutorials, essays, poems, song lyrics, and stories, in almost any style you desire. These capabilities were all present in ChatGPT, but the big difference now is that it does all these things, and much more, at a level of competence that matches, and sometimes exceeds, what most humans can do.
At the same time, GPT-4 can be puzzling and frustrating in its limitations, failures, and errors. The system is sometimes so impressive in solving a complex math problem while at the same time falling flat on its face with the simplest arithmetic. Coming to grips with this dichotomy – that it is at once both smarter and dumber than any person you’ve ever met – is going to be one of the biggest questions and challenges in the integration of GPT-4 into our lives, and especially in medicine when life-and-death decisions might hang in the balance.
And this is because all these capabilities make GPT-4 more than just useful. It will feel like a part of you. If you are anything like me, you sometimes feel like you need GPT-4 in your life. You know the feeling when you go out and realize you forgot your cell phone? Sometimes being without GPT-4 can be like that. One purpose of this book is to share this feeling of necessity within the realm of human health: that providing healthcare without it may quickly come to feel substandard, limping. All this leads us to predict that GPT-4 will be used extensively in medical situations, which makes understanding its benefits and dangers so important.
As with any powerful technology, GPT-4 offers not only new capabilities but also new risks. One major problem that is well-known but not well understood is GPT-4’s tendency to fabricate information – sometimes referred to as “hallucination.” For example, early on in GPT-4’s development, when it was still called Davinci3, we continued the above conversation by asking the following, and obtained an odd response:
(You will notice that we use a different typeface when showing outputs from the older Davinci3 system)
We might be tempted to chuckle over Davinci3’s apparent frivolity here, but when it comes to applications in medicine, making stuff up like this is not at all funny – it’s downright alarming. Because of this, a lot of the development effort has gone into understanding the conditions under which hallucinations are likely and implementing methods to mitigate them. Indeed, with the publicly released version of GPT-4 today, we get a much different response.
Still, there is a real possibility of such fabrications, and because of this, there is little doubt that the use of GPT-4 in medical situations will require care, and for many it will be controversial.
Later in this book, we will see that it is important in most situations to check or verify the output of GPT-4 for correctness. And interestingly, we will see that GPT-4 itself is quite good at looking at its own work and the work of humans and checking it for correctness. For example, we can fire up a second GPT-4 and feed it a copy of that hallucinatory conversation:
Throughout this book, we will delve more deeply into errors made by GPT-4 and humans. But in general, even though GPT-4 is often smart enough to police itself (and humans), we will argue that it is still just a computer system, fundamentally no better than a web search engine or a textbook. Medicine is an area that demands a partnership between humans and AI. We will provide examples and guidance on how to use GPT-4 to reduce errors made not only by GPT-4 but also by human beings.
Beyond errors, other questions perhaps loom even larger, such as whether GPT-4 requires any form of licensing or certification, whether government agencies should regulate it, and perhaps the biggest question of all, how to ensure fair and equitable access to what may turn out to be the most consequential new technology in medicine in decades. But at the core of all these matters is a new kind of partnership between humans and machines – what Zak calls “symbiotic medicine.”