System Prompt Tips for Azure RAG Apps

When you're combining retrieval-augmented generation (RAG) with Azure AI Search, you're creating a powerful hybrid system where the model is not only generating responses but also pulling in relevant data from a specified knowledge source to enhance the answer quality. Here are some prompting tips to ensure the app reliably and accurately accesses and applies the relevant RAG content:

1. Clarify the User Intent Early

When designing your app, it’s helpful to ensure that the model knows exactly what type of data the user is asking about. It helps guide the retrieval system to provide the most relevant details.

Example Prompt: "The user is asking how to make a cup of coffee. Retrieve the steps for making coffee using a drip coffee maker, and use that information to give a simple, easy-to-follow answer."

Why it works: The model understands the exact process to focus on (drip coffee) and can provide an answer that’s targeted and not too technical for a lay user.

2. Contextualize Retrieved Content

When the retrieved content doesn’t directly answer the question or is a little fragmented, prompt the model to apply the retrieved information in a contextually relevant way.

Example Prompt: "Here’s a snippet about making coffee using a drip coffee maker: [insert snippet]. Based on this, explain the process in simple terms, breaking it down into easy-to-understand steps."

Why it works: Providing the snippet directly helps the model focus on specific content and structure the answer without deviating into unrelated topics.

3. Be Explicit About Using Search Results

If you want to ensure that the model uses the exact content from the search, make that clear in the prompt. It prevents the model from relying on outdated or general knowledge and helps produce a more accurate, context-specific answer.

Example Prompt: "Use the following retrieved instructions to explain how to make coffee with a drip coffee maker. Focus only on the steps listed here and avoid adding anything else not covered in the document."

Why it works: By emphasizing reliance on the search results, you're minimizing the chance that the model will go off-script and introduce inaccurate details.

4. Limit Answer Scope Using Retrieved Data

Sometimes the search results might be broad or overwhelming. You can guide the model to focus on the most relevant part of the document(s) to make the answer more concise and to the point.

Example Prompt: "If the search results contain details about various brewing methods, focus only on the section that explains how to use a drip coffee maker and explain that process to the user."

Why it works: Limiting the scope ensures the model doesn’t start explaining unrelated methods (e.g., French press, espresso), which could confuse a lay user.

5. Control for Confidence and Sources

Let the model know to indicate when the information is based on the search, and when it feels certain about the answer. It can help manage expectations and build trust with the user.

Example Prompt: "Use the information retrieved to explain how to make coffee with a drip coffee maker. If the answer is based on the provided data, mention that, and include the source link for reference."

Why it works: This shows transparency, so users can trust the model’s response, especially if it needs to verify the source or further clarify doubts.

6. Guide the LLM to Structured Outputs

People often prefer a structured, easy-to-follow format. Guide the model to break the answer down into digestible steps.

Example Prompt: "Please summarize the retrieved content about using a drip coffee maker in these steps: 1) Prepare the coffee maker, 2) Add water, 3) Add coffee grounds, 4) Start the brewing process, 5) Enjoy your coffee."

Why it works: Structuring the response into steps makes it easier for users to follow, especially if they’re new to the process.

7. Handle Ambiguity in Retrieval Results

Sometimes the search results might have too much irrelevant information or leave key steps unclear. Guide the model to either ask for clarification or piece together several fragments to create a comprehensive response.

Example Prompt: "If the retrieved results are missing some details, combine the fragments to explain how to use a drip coffee maker, and make sure to explain everything clearly in an easy-to-follow manner."

Why it works: Combining multiple snippets lets the model fill in the gaps and ensure a more complete and coherent response.

8. Use Query Refinement Techniques

If your app finds that the search results often don’t directly answer the user query, you can set up a feedback loop to refine the query and narrow it down.

Example Prompt: "If the information retrieved is incomplete, rephrase the query to get more specific results about the brewing process for drip coffee makers. Then, combine the new information into a single clear answer."

Why it works: The ability to refine the query iteratively allows you to zero in on more specific information and avoid wasting time with irrelevant content.

Bonus Tip: Be Prepared for Edge Cases

Sometimes, the content retrieval might be slightly off, or a user may ask a question that’s too broad or unclear. Handle those cases by guiding the model to either ask for more info or deliver a fallback response.

Example Prompt: "The retrieved content doesn't completely cover all types of coffee. If unsure, explain how to make a simple drip coffee and offer a basic guide to other brewing methods."

Why it works: This allows the model to handle cases where the search results don’t fully address the user’s needs, giving a fallback response without sounding overly generic.

Summary

Be clear about intent and expectations in prompts.
Integrate retrieved content smoothly with the model’s output.
Control for specificity, structure, and relevance.
Use query refinement and source citation to keep answers accurate.