- Introduction
- Prompt Engineering
- Working with Prompts Across Models
- Building a Q/A Bot with ChatGPT
- Summary
Working with Prompts Across Models
Prompts are highly dependent on the architecture and training of the language model, meaning that what works for one model may not work for another. For example, ChatGPT, GPT-3 (which is different from ChatGPT), T5, and models in the Cohere command series all have different underlying architectures, pre-training data sources, and training approaches, which in turn impact the effectiveness of prompts when working with them. While some prompts may transfer between models, others may need to be adapted or reengineered to work with a specific model.
In this section, we will explore how to work with prompts across models, taking into account the unique features and limitations of each model as we seek to develop effective prompts that can guide the language models to generate the desired output.
ChatGPT
Some LLMs can take in more than just a single “prompt.” Models that are aligned to conversational dialogue (e.g., ChatGPT) can take in a system prompt and multiple “user” and “assistant” prompts (Figure 3.9). The system prompt is meant to be a general directive for the conversation and will generally include overarching rules and personas to follow. The user and assistant prompts are messages between the user and the LLM, respectively. For any LLM you choose to look at, be sure to check out its documentation for specifics on how to structure input prompts.
FIGURE 3.9 ChatGPT takes in an overall system prompt as well as any number of user and assistant prompts that simulate an ongoing conversation.
Cohere
We’ve already seen Cohere’s command series of models in action in this chapter. As an alternative to OpenAI, they show that prompts cannot always be simply ported over from one model to another. Instead, we usually need to alter the prompt slightly to allow another LLM to do its work.
Let’s return to our simple translation example. Suppose we ask OpenAI and Cohere to translate something from English to Turkish (Figure 3.10).
FIGURE 3.10 OpenAI’s GPT-3 can take a translation instruction without much hand-holding, whereas the Cohere model seems to require a bit more structure.
It seems that the Cohere model in Figure 3.10 required a bit more structuring than the OpenAI version. That doesn’t mean that the Cohere is worse than GPT-3; it just means that we need to think about how our prompt is structured for a given LLM.
Open-Source Prompt Engineering
It wouldn’t be fair to discuss prompt engineering and not mention open-source models like GPT-J and FLAN-T5. When working with them, prompt engineering is a critical step to get the most out of their pre-training and fine-tuning (a topic that we will start to cover in Chapter 4). These models can generate high-quality text output just like their closed-source counterparts. However, unlike closed-source models, open-source models offer greater flexibility and control over prompt engineering, enabling developers to customize prompts and tailor output to specific use-cases during fine-tuning.
For example, a developer working on a medical chatbot may want to create prompts that focus on medical terminology and concepts, whereas a developer working on a language translation model may want to create prompts that emphasize grammar and syntax. With open-source models, developers have the flexibility to fine-tune prompts to their specific use-cases, resulting in more accurate and relevant text output.
Another advantage of prompt engineering in open-source models is the ability to collaborate with other developers and researchers. Open-source models have a large and active community of users and contributors, which allows developers to share their prompt engineering strategies, receive feedback, and collaborate on improving the overall performance of the model. This collaborative approach to prompt engineering can lead to faster progress and more significant breakthroughs in natural language processing research.
It pays to remember how open-source models were pre-trained and fine-tuned (if they were at all). For example, GPT-J is an autoregressive language model, so we’d expect techniques like few-shot prompting to work better than simply asking a direct instructional prompt. In contrast, FLAN-T5 was specifically fine-tuned with instructional prompting in mind, so while few-shot learning will still be on the table, we can also rely on the simplicity of just asking (Figure 3.11).
FIGURE 3.11 Open-source models can vary dramatically in how they were trained and how they expect prompts. GPT-J, which is not instruction aligned, has a hard time answering a direct instruction (bottom left). In contrast, FLAN-T5, which was aligned to instructions, does know how to accept instructions (bottom right). Both models are able to intuit from few-shot learning, but FLAN-T5 seems to be having trouble with our subjective task. Perhaps it’s a great candidate for some fine-tuning—coming soon to a chapter near you.