During a recent demo for a prospect, I had the opportunity to explore and implement a solution using RAG (Retrieval-Augmented Generation) technology. My goal was to demonstrate how this approach could not only enhance a customer support tool but also facilitate the creation and continuous improvement of a FAQ. Here’s how I set up this solution and the results I achieved.
The Context: PDFs, Emails, a Wiki, and a Mountain of Information
The prospect had several critical sources of information:
• A large number of emails (in .eml format), containing detailed exchanges with clients.
• Technical documents and internal resources, often in PDF format.
• An internal Wiki, accessible but underutilized.
These data sources were vast but challenging to query efficiently. The challenge was clear: enable natural language search that not only returns a relevant answer but also points to where the information came from.
Step 1: Data Extraction and Preparation
The first step involved extracting and transforming the available data:
• Extracting Wiki pages: I used Python to extract all pages from the internal Wiki. These pages were then converted into PDFs. (While not the ideal method, this allowed me to quickly transform the information into a uniform format.)
• Processing emails: Emails in .eml format were converted into plain text to be integrated into the analysis pipeline.
The result? A thousand PDF files and emails ready for indexing.
Step 2: Indexing with RAG Technology
For the RAG implementation, I used a combination of text embeddings and advanced language models:
• Embeddings with text-embedding-ada-002: I used this OpenAI model to generate semantic similarity vectors from each document (emails, PDFs, Wiki). This transformed the textual content into vector representations that could be leveraged for search.
- GPT-3.5-turbo-0125 for responses: This model was used to interpret natural language queries and generate relevant answers. To add flexibility, I also integrated the ability to switch to GPT-4 Turbo or Claude Sonnet 3.5, depending on the need for precision or context.
Testing with Llama 3.2: A Disappointing Performance
To compare performances, I also tested the solution with Llama 3.2, running it on a Mac M4. Unfortunately, the results were far less convincing. Not only was the model significantly slower to respond, but the answers were also much less relevant, particularly for queries requiring fine-grained searches within indexed data. While Llama 3.2 shows promise for specific applications, it was clearly not up to the task for this type of use case involving contextual retrieval and generation.
Step 3: Contextual Search and User Feedback
One of the major advantages of this solution is its ability to provide contextual answers while showing their origin:
• Natural language search: Users can ask questions directly (e.g., “How do I resolve an error with Product X?”) and receive a clear answer, along with the PDFs or emails where the answer was found.
- User feedback: I added a system that allows users to indicate whether an answer was relevant or not. This feedback is stored in a local database.
Step 4: Cost Optimization with Local Storage
To avoid unnecessary queries to OpenAI or other costly models, I designed a local solution:
- If a similar question has already been asked and a relevant answer has been validated, it is retrieved directly from the local database. This avoids redundant queries and reduces costs.
Results and Advantages
This RAG-based solution provided several immediate benefits:
1. Increased productivity: Support teams can now quickly find answers without manually browsing through hundreds of documents.
2. Enhanced trust: Showing the source of the information reinforces the credibility of the answers.
3. Cost control: The local storage system significantly reduces expenses related to language model APIs.
What I Learned (and the Limitations)
1. PDFs are not always ideal: While practical, PDFs are not the most flexible format for indexing. A JSON format or a structured text database would have been more efficient.
2. Precise but resource-intensive embeddings: The embeddings generated by text-embedding-ada-002 are of high quality, but generating them for a large volume of data can be costly and time-consuming.
3. Importance of user feedback: The feedback system proved crucial for improving the relevance of answers and refining the user experience.
4. Choosing the right model: The test with Llama 3.2 showed that selecting the right model for the specific use case is essential. GPT-3.5-turbo and GPT-4 Turbo offered far superior results in this context.
Conclusion: The Potential of RAG Technology
The combination of RAG technology, advanced embeddings, and language models like GPT-3.5/4 Turbo or Claude Sonnet opens up new possibilities for businesses. Whether for customer support, FAQ creation, or internal assistance, these tools transform the way information is managed and utilized.
For prospects or companies looking to leverage their unstructured data, this approach represents a powerful, flexible, and scalable solution. With a bit more work, I am confident this system can go even further, for example, by integrating automatic summarization or translation features.

Besoin d'un expert pour votre projet ?
Je vous accompagne dans votre transformation digitale, de la conception à la réalisation, en passant par la formation de vos équipes.