Vector DB sample scenario – Music recommendation system using a vector database – RAGs to Riches: Elevating AI with External Data

Vector DB sample scenario – Music recommendation system using a vector database

Let’s consider a music streaming platform aiming to provide song recommendations based on a user’s current listening. Imagine a user who is listening to “Song X” on the platform.

Behind the scenes, every song in the platform’s library is represented as a highly dimensional vector based on its musical features and content, using embeddings. “Song X” also has its vector representation. When the system aims to recommend songs similar to “Song X,” it doesn’t look for exact matches (as traditional databases might). Instead, it leverages a vector DB to search for songs with vectors closely resembling that of “Song X.” Using an ANN search strategy, the system quickly sifts through millions of song vectors to find those that are approximately nearest to the vector of “Song X.” Once potential song vectors are identified, the system employs similarity measures, such as cosine similarity, to rank these songs based on how close their vectors are to “Song X’s” vector. The top-ranked songs are then recommended to the user.

Within milliseconds, the user gets a list of songs that musically resemble “Song X,” providing a seamless and personalized listening experience. All this rapid, similarity-based recommendation magic is powered by the vector database’s specialized capabilities.

Common vector DB applications

  • Image and video similarity search: In the context of image and video similarity search, a vector DB specializes in efficiently storing and querying highly dimensional embeddings derived from multimedia content. By processing images through deep learning models, they are converted into feature vectors, a.k.a embeddings, that capture their essential characteristics. When it comes to videos, an additional step may need to be carried out to extract frames and then convert them into vector embeddings. Contrastive language-image pre-training (CLIP) from OpenAI is a very popular choice for embedding videos and images. These vector embeddings are indexed in the vector DB, allowing for rapid and precise retrieval when a user submits a query. This mechanism powers applications such as reverse image and video search, content recommendations, and duplicate detection by comparing and ranking content based on the proximity of their embeddings.
  • Voice recognition: Voice recognition with vectors is akin to video vectorization. Analog audio is digitized into short frames, each representing an audio segment. These frames are processed and stored as feature vectors, with the entire audio sequence representing things such as spoken sentences or songs. For user authentication, a vectorized spoken key phrase might be compared to stored recordings. In conversational agents, these vector sequences can be inputted into neural networks to recognize and classify spoken words in speech and generate responses, similar to ChatGPT.
  • Long-term memory for chatbots: Virtual database management systems (VDBMs) can be employed to enhance the long-term memory capabilities of chatbots or generative models. Many

generative models can only process a limited amount of preceding text in prompt responses, which results in their inability to recall details from prolonged conversations. As these models don’t have inherent memory of past interactions and can’t differentiate between factual data and user-specific details, using VDBMs can provide a solution for storing, indexing, and referencing previous interactions to improve consistency and context-awareness in responses.

This is a very important use case and plays a key role in implementing RAG, which we will discuss in the next section.

The role of vector DBs in retrieval-augmented generation (RAG) – RAGs to Riches: Elevating AI with External Data

The role of vector DBs in retrieval-augmented generation (RAG)

To fully understand RAG and the pivotal role of vector DBs within it, we must first acknowledge the inherent constraints of LLMs, which paved the way for the advent of RAG techniques powered by vector DBs. This section sheds light on the specific LLM challenges that RAG aims to overcome and the importance of vector DBs.

First, the big question – Why?

In Chapter 1, we delved into the limitations of LLMs, which include the following:

  • LLMs possess a fixed knowledge base determined by their training data; as of February 2024, ChatGPT’s knowledge is limited to information up until April 2023.
  • LLMs can occasionally produce false narratives, spinning tales or facts that aren’t real.
  • They lack personal memory, relying solely on the input context length. For example, take GPT4-32K; it can only process up to 32K tokens between prompts and completions (we’ll dive deeper into prompts, completions, and tokens in Chapter 5).

To counter these challenges, a promising avenue is enhancing LLM generation with retrieval components. These components can extract pertinent data from external knowledge bases—a process termed RAG, which we’ll explore further in this section.

So, what is RAG, and how does it help LLMs?

Retrievalaugmented generation (RAG) was first introduced in a paper titled Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (https://arxiv.org/pdf/2005.11401. pdf) in November 2020 by Facebook AI Research (now Meta). RAG is an approach that combines the generative capabilities of LLMs with retrieval mechanisms to extract relevant information from vast datasets. LLMs, such as the GPT variants, have the ability to generate human-like text based on patterns in their training data but lack the means to perform real-time external lookups or reference specific external knowledge bases post-training. RAG addresses this limitation by using a retrieval model to query a dataset and fetch relevant information, which then serves as the context for the generative model to produce a detailed and informed response. This also helps in grounding the LLM queries with relevant information that reduces the chances of hallucinations.

The critical role of vector DBs

A vector DB plays a crucial role in facilitating the efficient retrieval aspect of RAG. In this setup, each piece of information, such as text, video, or audio, in the dataset is represented as a highly dimensional vector and indexed in a vector DB. When a query from a user comes in, it’s also converted into a similar vector representation. The vector DB then rapidly searches for vectors (documents) in the dataset that are closest to the query vector, leveraging techniques such as ANN search. Then, it attaches the query with relevant content and sends it to the LLMs to generate a response. This ensures that the most relevant information is retrieved quickly and efficiently, providing a foundation for the generative model to build upon.

Example of an RAG workflow – RAGs to Riches: Elevating AI with External Data

Example of an RAG workflow

Let’s walk through as an example step by step, as shown in the image. Imagine a platform where users can ask about ongoing cricket matches, including recent performances, statistics, and trivia:

  1. Suppose the user asks, “How did Virat Kohli perform in the last match, and what’s an interesting fact from that game?” Since the LLM was trained until April 2023, the LLM may not have this answer.
  2. The retrieval model will embed the query and send it to a vector DB.
  3. All the latest cricket news is stored in a vector DB in a properly indexed format using ANN strategies such as HNSW. The vector DB performs a cosine similarity with the indexed information and provides a few relevant results or contexts.
  4. The retrieved context is then sent to the LLM along with the query to synthesize the information and provide a relevant answer.
  5. The LLM provides the relevant answer: “Virat Kohli scored 85 runs off 70 balls in the last match. An intriguing detail from that game is that it was the first time in three years that he hit more than seven boundaries in an ODI inning.”

The following image illustrates the preceding points:

Figure 4.11 – Representation of RAG workflow with vector database

Business applications of RAG

In the following list, we have mentioned a few popular business applications of RAG based on what we’ve seen in the industry:

  • Enterprise search engines: One of the most prominent applications of RAG is in the realm of enterprise learning and development, serving as a search engine for employee upskilling. Employees can pose questions about the company, its culture, or specific tools, and RAG swiftly delivers accurate and relevant answers.
  • Legal and compliance: RAG fetches relevant case laws or checks business practices against regulations.
  • Ecommerce: RAG suggests products or summarizes reviews based on user behavior and queries.
  • Customer support: RAG provides precise answers to customer queries by pulling information from the company’s knowledge base and providing solutions in real time.
  • Medical and healthcare: RAG retrieves pertinent medical research or provides preliminary symptom-based suggestions.

Chunking strategies – RAGs to Riches: Elevating AI with External Data

Chunking strategies

In our last discussion, we delved into vector DBs and RAG. Before diving into RAG, we need to efficiently house our embedded data. While we touched upon indexing methods to speed up data fetching, there’s another crucial step to take even before that: chunking.

What is chunking?

In the context of building LLM applications with embedding models, chunking involves dividing a long piece of text into smaller, manageable pieces or “chunks” that fit within the model’s token limit. The process involves breaking text into smaller segments before sending these to the embedding models. As shown in the following image, chunking happens before the embedding process. Different documents have different structures, such as free-flowing text, code, or HTML. So, different chunking strategies can be applied to attain optimal results. Tools such as Langchain provide you with functionalities to chunk your data efficiently based on the nature of the text.

The diagram below depicts a data processing workflow, highlighting the chunking step, starting with raw “Data sources” that are converted into “Documents.” Central to this workflow is the “Chunk” stage, where a “TextSplitter” breaks the data into smaller segments. These chunks are then transformed into numerical representations using an “Embedding model” and are subsequently indexed into a “Vector DB” for efficient search and retrieval. The text associated with the retrieved chunks is then sent as context to the LLMs, which then generate a final response:

Fig 4.12 – Chunking Process

But why is it needed?

Chunking is vital for two main reasons:

  • Chunking strategically divides document text to enhance its comprehension by embedding models, and it boosts the relevance of the content retrieved from a vector DB. Essentially, it refines the accuracy and context of the results sourced from the database.
  • It tackles the token constraints of embedding models. For instance, Azure’s OpenAI embedding models like text-embedding-ada-002 can handle up to 8,191 tokens, which is about 6,000 words, given each token averages four characters. So, for optimal embeddings, it’s crucial our text stays within this limit.

Popular chunking strategies

  • Fixed-size chunking: This is avery common approach that defines a fixed size (200 words), which is enough to capture the semantic meaning of a paragraph, and it incorporates an overlap of about 10–15% as an input to the vector embedding generation model. Chunking data with a slight overlap between text ensures context preservation. It’s advisable to begin with a roughly 10% overlap. Below is a snippet of code that demonstrates the use of fixed-size chunking with LangChain:

text = “Ladies and Gentlemen, esteemed colleagues, and honored \ guests. Esteemed leaders and distinguished members of the \ community. Esteemed judges and advisors. My fellow citizens. Last \ year, unprecedented challenges divided us. This year, we stand \ united, ready to move forward together”

from langchain.text_splitter import TokenTextSplitter

text_splitter = TokenTextSplitter(chunk_size=20, chunk_overlap=5)

texts = text_splitter.split_text(text)

print(texts)

The output is the following:

[‘Ladies and Gentlemen, esteemed colleagues, and honored guests. Esteemed leaders and distinguished members’, ’emed leaders and distinguished members of the community. Esteemed judges and advisors. My fellow citizens.’, ‘. My fellow citizens. Last year, unprecedented challenges divided us. This year, we stand united,’, ‘, we stand united, ready to move forward together’]

  • Variable-size chunking: Variable-size chunking refers to the dynamic segmentation of data or text into varying-sized components, as opposed to fixed-size divisions. This approach accommodates the diverse structures and characteristics present in different types of data.
  • Sentence splitting: Sentence transformer models are neural architectures optimized for embedding at the sentence level. For example, BERT works best when chunked at the sentence level. Tools such as NLTK and SpaCy provide functions to split the sentences within a text.
  • Specialized chunking: Documents, such as research papers, possess a structured organization of sections, and the Markdown language, with its unique syntax, necessitates specialized chunking, resulting in the proper separation between sections/pages to yield contextually relevant chunks.
  • Code Chunking: When embedding code into your vector DB, this technique can be invaluable. Langchain supports code chunking for numerous languages. Below is a snippet code to chunk your Python code:

from langchain.text_splitter import (

RecursiveCharacterTextSplitter,

Language,

)

PYTHON_CODE = “””

class SimpleCalculator:

def add(self, a, b):

return a + b

def subtract(self, a, b):

return a – b

  • Using the SimpleCalculator calculator = SimpleCalculator() sum_result = calculator.add(5, 3) diff_result = calculator.subtract(5, 3)

“””

python_splitter = RecursiveCharacterTextSplitter.from_language(

language=Language.PYTHON, chunk_size=50, chunk_overlap=0

)

python_docs = python_splitter.create_documents([PYTHON_CODE]) python_docs

The output is the following:

[Document(page_content=’class SimpleCalculator:\n def add(self, a, b):’),

Document(page_content=’return a + b’),

Document(page_content=’def subtract(self, a, b):’),

Document(page_content=’return a – b’),

Document(page_content=’# Using the SimpleCalculator’),

Document(page_content=’calculator = SimpleCalculator()’),

Document(page_content=’sum_result = calculator.add(5, 3)’),

Document(page_content=’diff_result = calculator.subtract(5, 3)’)]

Chunking considerations

Chunking strategies vary based on data type and format and the chosen embedding model. For instance, code requires a distinct chunking approach compared to unstructured text. While models such as text-embedding-ada-002 excel with 256- and 512-token-sized chunks, our understanding of chunking is ever-evolving. Moreover, preprocessing plays a crucial role before chunking, where you can optimize your content by removing unnecessary text content, such as stop words, special symbols, etc., that add noise. For the latest techniques, we suggest regularly checking the text splitters section in the LangChain documentation, ensuring you employ the best strategy for your needs

(Split by tokens from Langchain: https://python.langchain.com/docs/modules/ data_connection/document_transformers/split_by_token).

Evaluation of RAG using Azure Prompt Flow – RAGs to Riches: Elevating AI with External Data

Evaluation of RAG using Azure Prompt Flow

Up to this point, we have discussed the development of resilient RAG applications. However, the question arises: How can we determine whether these applications are functioning as anticipated and if the context they retrieve is pertinent? While manual validation—comparing the responses generated by LLMs against ground truth—is possible, this method proves to be labor-intensive, costly, and challenging to execute on a large scale. Consequently, it’s essential to explore methodologies that facilitate automated evaluation on a vast scale. Recent research has delved into the concept of utilizing “LLM as a judge” to assess output, a strategy that Azure Prompt Flow incorporates within its offerings.

Azure Prompt Flow has built-in and structured metaprompt templates with comprehensive guardrails to evaluate your output against ground truth. The following mentions four metrics that can help you evaluate your RAG solution in Prompt Flow:

  • Groundedness: Measures the alignment of the model’s answers with the input source, making sure the model’s generated response is not fabricated. The model must always extract information from the provided “context” while responding to user’s query.
  • Relevance: Measures the degree to which the model’s generated response is closely connected to the context and user query.
  • Retrieval score: Measures the extent to which the model’s retrieved documents are pertinent and directly related to the given questions.
  • Custom metrics: While the above three are the most important for evaluating RAG applications, Prompt Flow allows you to use custom metrics, too. Bring your own LLM as a judge and define your own metrics by modifying the existing metaprompts. This also allows you to use open source models such as Llama and to build your own metrics from code with Python functions. The above evaluations are more no-code or low-code friendly; however, for a more pro-code friendly approach, azureml-metrics SDK, such as ROUGE, BLEU, F1-Score, Precision, and Accuracy, can be utilized as well.

The field is advancing quickly, so we recommend regularly checking Azure ML Prompt Flow’s latest updates on evaluation metrics. Start with the “Manual Evaluation” feature in Prompt Flow to gain a basic understanding of LLM performance. It’s important to use a mix of metrics for a thorough evaluation that captures both semantic and syntactic essence rather than relying on just one metric to compare the responses with the actual ground truth.

Case study – Global chat application deployment by a multinational organization – RAGs to Riches: Elevating AI with External Data

Case study – Global chat application deployment by a multinational organization

A global firm recently launched an advanced internal chat application featuring a Q&A support chatbot. This innovative tool, deployed across various Azure regions, integrates several large language models, including the specialized finance model, BloombergGPT. To meet specific organizational requirements, bespoke plugins were developed. It had an integration with Service Now, empowering the chatbot to streamline ticket generation and oversee incident actions.

In terms of data refinement, the company meticulously preprocessed its knowledge base (KB) information, eliminating duplicates, special symbols, and stop words. The KB consisted of answers to frequently asked questions and general information to various support-related questions. They employed fixed chunking approaches, exploring varied chunk sizes, before embedding these data into the Azure AI search. Their methodology utilized Azure OpenAI’s text-ada-embedding-002 models in tandem with the cosine similarity metric and Azure AI search’s vector search capabilities.

From their extensive testing, they discerned optimal results with a chunk size of 512 tokens and a 10% overlap. Moreover, they adopted an ANN vector search methodology using cosine similarity. They also incorporated hybrid search that included keyword and semantic search with Semantic Reranker. Their RAG workflow, drawing context from Azure Vector Search and the GPT 3.5 Turbo-16K models, proficiently generated responses to customer support inquiries. They implemented caching techniques using Azure Cache Redis and rate-limiting strategies using Azure API Management to optimize the costs.

The integration of the support Q&A chatbot significantly streamlined the multinational firm’s operations, offering around-the-clock, consistent, and immediate responses to queries, thereby enhancing user satisfaction. This not only brought about substantial cost savings by reducing human intervention but also ensured scalability to handle global demands. By automating tasks such as ticket generation, the firm gained deeper insights into user interactions, allowing for continuous improvement and refinement of their services.

Summary

In this chapter, we explored the RAG approach, a powerful method for leveraging your data to craft personalized experiences, reduce hallucinations while also addressing the training limitations inherent in LLMs. Our journey began with an examination of foundational concepts such as vectors and databases, with a special focus on Vector Databases. We understood the critical role that Vector DBs play in the development of RAG-based applications, also highlighting how they can enhance LLM responses through effective chunking strategies. The discussion also covered practical insights on building engaging RAG experiences, evaluating them through prompt flow, and included a hands-on lab available on GitHub to apply what we’ve learned.

In the next chapter we will introduce another popular technique designed to minimize hallucinations and more easily steer the responses of LLMs. We will cover prompt engineering strategies, empowering you to fully harness the capabilities of your LLMs and engage more effectively with AI. This exploration will provide you with the tools and knowledge to enhance your interactions with AI, ensuring more reliable and contextually relevant outputs.

The essentials of prompt engineering – Effective Prompt Engineering Techniques: Unlocking Wisdom Through AI

The essentials of prompt engineering

Before discussing prompt engineering, it is important to first understand the foundational components of a prompt. In this section, we’ll delve into the key components of a prompt, such as ChatGPT prompts, completions, and tokens. Additionally, grasping what tokens are is pivotal to understanding the model’s constraints and managing costs.

ChatGPT prompts and completions

A prompt is an input provided to LLMs, whereas completions refer to the output of LLMs. The structure and content of a prompt can vary based on the type of LLM (e.g., the text or image generation model), specific use cases, and the desired output of the language model.

Completions refer to the response generated by ChatGPT prompts; basically, it is an answer to your questions. Check out the following example to understand the difference between prompts and completions when we prompt ChatGPT with, “What is the capital of India?”

Figure 5.2 – An image showing a sample LLM prompt and completion

Based on the use case, we can leverage one of the two ChatGPT API calls, named Completions or ChatCompletions, to interact with the model. However, OpenAI recommends using the ChatCompletions API in the majority of scenarios.

Completions API

The Completions API is designed to generate creative, free-form text. You provide a prompt, and the API generates text that continues from it. This is often used for tasks where you want the model to answer a question or generate creative text, such as for writing an article or a poem.

ChatCompletions API

The ChatCompletions API is designed for multi-turn conversations. You send a series of messages instead of a single prompt, and the model generates a message as a response. The messages sent to the model include a role (which can be a system, user, or assistant) and the content of the message. The system role is used to set the behavior of the assistant, the user role is used to instruct the assistant, and the model’s responses are under the assistant role.

The following is an example of a sample ChatCompletions API call:

import openai

openai.api_key = ‘your-api-key’

response = openai.ChatCompletion.create(

model=”gpt-3.5-turbo”,

messages=[

{“role”: “system”, “content”: “You are a helpful sports \

assistant.”},

{“role”: “user”, “content”: “Who won the cricket world cup \

in 2011?”},

{“role”: “assistant”, “content”: “India won the cricket \

world cup in 2011″},

{“role”: “assistant”, “content”: “Where was it played”}

]

)

print(response[‘choices’][0][‘message’][‘content’])

The main difference between the Completions API and ChatCompletions API is that the Completions API is designed for single-turn tasks, while the ChatCompletions API is designed to handle multiple turns in a conversation, making it more suitable for building conversational agents. However, the ChatCompletions API format can be modified to behave as a Completions API by using a single user message.

Important note

The CompletionsAPI, launched in June 2020, initially offered a freeform text interface for Open AI’s language models. However, experience has shown that structured prompts often yield better outcomes. The chat-based approach, especially through the ChatCompletions API, excels in addressing a wide array of needs, offering enhanced flexibility and specificity and reducing prompt injection risks. Its design supports multi-turn conversations and a variety of tasks, enabling developers to create advanced conversational experiences. Hence, Open AI announced that they would be deprecating some of the older models using Completions API and, in moving forward, they would be investing in the ChatCompletions API to optimize their efforts to use compute capacity. While the Completions API will remain accessible, it shall be labeled as “legacy” in the Open AI developer documentation.

Tokens

Understanding the concepts of tokens is essential, as it helps us better comprehend the restrictions, such as model limitations, and the aspect of cost management when utilizing ChatGPT.

A ChatGPT token is a unit of text that ChatGPT’s language model uses to understand and generate language. In ChatGPT, a token is a sequence of characters that the model uses to generate new sequences

of tokens and form a coherent response to a given prompt. The models use tokens to represent words, phrases, and other language elements. The tokens are not cut where the word starts or ends but can consist of trailing spaces, sub words and punctuations, too.

As stated on the OpenAI website, tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens.

To understand tokens in terms of lengths, the following is used as a rule of thumb:

  • 1 token ~= 4 chars in English
  • 1 token ~= ¾ words
  • 100 tokens ~= 75 words
  • 1–2 sentences ~= 30 tokens
  • 1 paragraph ~= 100 tokens
  • 1,500 words ~= 2048 tokens
  • 1 US page (8 ½” x 11”) ~= 450 tokens (assuming ~1800 characters per page)

For example, this famous quote from Thomas Edison (“Genius is one percent inspiration and ninety-nine percent perspiration.”) has 14 tokens:

Figure 5.3 – Tokenization of sentence

We used the OpenAI Tokenizer tool to calculate the tokens; the tool can be found at https:// platform.openai.com/tokenizer. An alternative way to tokenize text (programmatically) is to use the Tiktoken library on Github; this can be found at https://github.com/openai/

tiktoken.

Token limits in ChatGPT models – Effective Prompt Engineering Techniques: Unlocking Wisdom Through AI

Token limits in ChatGPT models

Depending on the model, the token limits on the model will vary. As of Feb 2024, the token limit for the family of GPT-4 models ranges from 8,192 to 128,000 tokens. This means the sum of prompt and completion tokens for an API call cannot exceed 32,768 tokens for the GPT-4-32K model. If the prompt is 30,000 tokens, the response cannot be more than 2,768 tokens. The GPT4-Turbo 128K is the most recent model as of Feb 2024, with 128,000 tokens, which is close to 300 pages of text in a single prompt and completion. This is a massive context prompt compared to its predecessor models.

Though this can be a technical limitation, there are creative ways to address the problem of limitation, such as using chunking and condensing your prompts. We discussed chunking strategies in Chapter 4, which can help you address token limitations.

The following figure shows various models and token limits:

Model Token Limit

GPT-3.5-turbo4,096
GPT-3.5-turbo-16k16,384
GPT-3.5-turbo-06134,096
GPT-3.5-turbo-16k-061316,384
GPT-48,192
GPT-4-061332,768
GPT-4-32K32,768
GPT-4-32-061332,768
GPT-4-Turbo 128K128,000

Figure 5.4 – Models and associated Token Limits

For the latest updates on model limits for newer versions of models, please check the OpenAI website.

Tokens and cost considerations

The cost of using ChatGPT or similar models via an API is often tied to the number of tokens processed, encompassing both the input prompts and the model’s generated responses.

In terms of pricing, providers typically have a per-token charge, leading to a direct correlation between conversation length and cost; the more tokens processed, the higher the cost. The latest cost updates can be found on the OpenAI website.

From an optimization perspective, understanding this cost-token relationship can guide more efficient API usage. For instance, creating more succinct prompts and configuring the model for brief yet effective responses can help control token count and, consequently, manage expenses.

We hope you now have a good understanding of the key components of a prompt. Now, you are ready to learn about prompt engineering. In the next section, we will explore the details of prompt engineering and effective strategies, enabling you to maximize the potential of your prompt contents through the one-shot and few-shot learning approaches.

What is prompt engineering? – Effective Prompt Engineering Techniques: Unlocking Wisdom Through AI

What is prompt engineering?

Prompt engineering is the art of crafting or designing prompts to unlock desired outcomes from large language models or AI systems. The concept of prompt engineering revolves around the fundamental idea that the quality of your response is intricately tied to the quality of the question you pose. By strategically engineering prompts, one can influence the generated outputs and improve the overall performance and usefulness of the system. In this section, we will learn about the necessary elements of effective prompt design, prompt engineering techniques, best practices, bonus tips, and tricks.

Elements of a good prompt design

Designing a good prompt is important because it significantly influences the output of a language model such as GPT. The prompt provides the initial context, sets the task, guides the style and structure of the response, reduces ambiguities and hallucinations, and supports the optimization of resources, thereby reducing costs and energy use. In this section, let’s understand the elements of good prompt design.

The foundational elements of a good prompt include instructions, questions, input data, and examples:

  • Instructions: The instructions in a prompt refer to the specific guidelines or directions given to a language model within the input text to guide the kind of response it should produce.
  • Questions: Questions in a prompt refer to queries or interrogative statements that are included in the input text. The purpose of these questions is to instruct the language model to provide a response or an answer to the query. In order to obtain the results, either the question or instruction is mandatory.
  • Input data: The purpose of input data is to provide any additional supporting context when prompting the LLM. It could be used to provide new information the model has not previously been trained on for more personalized experiences.
  • Examples: The purpose of examples in a prompt is to provide specific instances or scenarios that illustrate the desired behavior or response from ChatGPT. You can input a prompt that includes one or more examples, typically in the form of input-output pairs.

The following table shows how to build effective prompts using the aforementioned prompt elements:

Figure 5.5 – Sample Prompt formula consisting of prompt elements with examples

Prompt parameters – Effective Prompt Engineering Techniques: Unlocking Wisdom Through AI

Prompt parameters

ChatGPT prompt parameters are variables that you can set in the API calls. They allow users to influence the model’s output, customizing the behavior of the model to better fit specific applications or contexts. The following table shows some of the most important parameters of a ChatGPT API call:

Figure 5.6 – Essential Prompt Parameters

In this section, only the top parameters for building an effective prompt are highlighted. For a full list of parameters, refer to the OpenAI API reference (https://platform.openai.com/docs/ api-reference).

ChatGPT roles

System message

This is the part where youdesign your metaprompts. Metaprompts help to set the initial context, theme, and behavior of the ChatGPT API to guide the model’s interactions with the user, thus setting roles or response styles for the assistant.

Metaprompts are structured instructions or guidelines that dictate how the system should interpret and respond to user requests. These metaprompts are designed to ensure that the system’s outputs adhere to specific policies, ethical guidelines, or operational rules. They’re essentially “prompts about how to handle prompts,” guiding the system in generating responses, handling data, or interacting with users in a way that aligns with predefined standards.

The following table is a metaprompt framework that you can follow to design the ChatGPT system message:

Figure 5.7 – Elements of a Metaprompt

User

The messages from the user serve as prompts or remarks that the assistant is expected to react to or engage with. what is it establishes the anticipated scope of queries that may come from the user.