Posts

AI in production

Sure, I can build you an AI RAG so that you can unlock your corporate knowledge and give all of your employees the uplift that will take your business to the next level …

Oh, you also want me to make sure its production hardened and it adheres to all your corporate policies and procedures. Ok lets start again, we need to plan this out.

Said no one ever …

This is the next stop on my AI adventure! Hot on the heels of “Navigating the AI Frontier: My Adventures with the RAG Framework,” we’re now delving into what I’ve experienced with uplifting my little homebrew solution and how to scale it up for an enterprise client. Let us imagine this is like upgrading from your homemade tricycle to your first ever Elon Musk electric car: doable but you’re going to need a better garage (and staff if we are going to make more than one car).

This article aims to provide you with sufficient contextual knowledge to engage in architectural discussions confidently and perhaps with a hint of flair. Typically the initial step involves outlining the essential architecture components that you should take into account, irrespective of the AI path you choose to pursue.

But first, how will the interaction from a human or system look like in this space? I like to have this context before I start, very businessy I know, so let’s imagine that the RAG framework I experimented with now needs to move to an enterprise, here is how I see it:

RAG-Flow

As the diagram shows, we still need to consider the perimeter controls that govern safe access to any solution we build. The interfaces (either a human UI or System API) will need to be established, and then we move into the AI space with prompting and AI Governance. Finally, we get to the real value where we establish the RAG and the supporting data.

So now that we have the interaction of a single use case, let’s kick things off with a diagram that lays out the major components from a bird’s-eye view, but expanded to consider any AI implementation.

AI Reference Architectural Components

RAG-Components

In the above diagram, I’ve divided the components into AI-specific and traditional solution components to highlight that enterprise-level AI doesn’t float in isolation but instead integrates with standard building blocks.

When I built my first RAG system, it was a humble setup on my local machine, utilizing online services like OpenAI and an S3 bucket, with the context Vector Store running locally. This setup was fantastic for learning AI’s quirks, and the ability to tweak and iterate swiftly was just what I needed. But scaling this to a production solution with larger data requirements and payloads, while sticking to enterprise policies, was another beast entirely. Let’s just say it was a technological throwaway, as most PoC are. The lessons learned were golden, but the tech stack? Not so much.

The next logical leap is to harness our enterprise expertise to elevate the solution for a business environment. This is always an opinionated process—I’ve yet to meet an architect who nods in agreement with an initial proposal without wanting to join the expedition. Think of it like assembling a merry band of adventurers who all want a say in charting the direction we are to take to find the treasure.

To spice things up a bit more, let’s introduce the various dimensions of an AI solution to appreciate the complexity of achieving AI maturity. Generally, I approach this through phased delivery of corporate initiatives, all depending on the clients I visit. Are they already AI-savvy? Midway through their AI journey? Have they begun upskilling their staff and pondering the policies and processes needed to govern this thrilling new technology?

AI Solution Maturity Considerations

RAG-Maturity

The path to maturity is a continuous journey marked by continuous progress, revealing fresh perspectives and possibilities at every turn. Throughout this expedition, numerous elements play a role, among them architectural factors like Data Management patterns. These patterns hold significant importance in upholding consistency across diverse interaction formats and in adjusting the structure of the RAG to enable the seamless coordination of different interaction techniques, like uplifting questions. Therefore, it is essential to perceive mature uplift as an integral part of the journey, encompassing architectural enhancements to address emerging needs as we progress through the maturity phases.

Look, if it were easy, we’d all have AI everywhere by now, right? Although AI’s ideas and potential have been around for ages, it’s only in the last 2-3 years that most enterprises have seriously started considering how to introduce this transformative technology. So don’t feel bad—we’re all navigating this new realm together, hashing out similar solutions and having lots of in-depth discussions.

Wrapping up this brief introduction to AI Architecture Components, I hope we’ve painted a broad-stroke picture of the many parts involved. Next, we’ll dissect each component and detail how they all come together to drive outcomes that can uplift and revolutionise our businesses, positioning us to compete better tomorrow. I’ll pick a use case and see how it can be realised with the above consideration.

Post Article Note

All the above sits in the pre-delivery dream state where we design and think about what we need to ensure success. To achieve the interactions (human or system) that will innovate and elevate the business we are calling out, each journey at a client still needs to consider the old chestnut of planning, designing, validating, implementing and ongoing management, to realise an enterprise outcome(s). This is crucial to ensure we add value and don’t contribute to the corporate “not delivering” pile.

The big conundrum with LLMs is that they’re trained on the musings of billions of people, many of whom aren’t exactly experts. Essentially, LLMs are like a giant blender mixing together millions of internet ramblings. These machines can mimic human writing, but without the depth of understanding, they often sound right while missing the mark.

— Me, Gary Febbrarino

My adventures with the RAG Framework

Embarking on the exhilarating journey into the AI domain, I recently dove deep into the fascinating world of Retrieval-Augmented Generation (RAG). This sophisticated approach in natural language processing combines the best of both retrieval-based logic and pre-trained generative models, aiming to enhance the quality and relevance of text generation. It’s like the ultimate team-up in an AI superhero movie, where different strengths come together to save the day.

So, what exactly is RAG? Imagine you have a vast library of knowledge stored in chunks. When a question is asked, RAG swoops in to find the best matching chunk of context, provides this context with the question to the LLM (Large Language Model), and then refines the answer with a specific business context. It’s a bit like having a super-intelligent librarian who not only knows where every book is but also how to interpret and explain them in the best possible way.

AI generated image, credit – Create Art or Modify Images with AI | OpenArt

Let me break it down for you:

First, we load raw data from various sources, turning a chaotic jumble of information into a goldmine of potential answers. Next, we transform this raw data into a common state, ensuring consistency and compatibility. It’s like taking the scattered pieces of a puzzle and making sure they all fit together.

The magic continues by vectorising the data, converting it into numerical representations that the AI can efficiently process (I refer to this as creating semantic similes). The retriever then steps in, locating relevant information from this vast dataset based on a given query. Think of it as a digital treasure hunt where the prize is accurate and relevant information.

The query encoder ensures that user questions are understood in context, while the user interface provides an intuitive way to interact with the system. And let’s not forget the feedback loop, which continuously improves the system based on user input. It’s a learning process that never stops, getting smarter and more accurate over time.

Of course, every adventure comes with its challenges. One of the toughest hurdles I faced was preparing contextual data for the RAG. Semantic data, which requires understanding the meaning behind words, doesn’t always play nicely with generic splitting methods. It felt like trying to slice a pie with a chainsaw — messy and imprecise. Custom code became my best friend in creating larger, contextually aware chunks that made sense to both humans and machines.

Another tricky area was Role-Based Access Control (RBAC). When dealing with sensitive data, like HR information, it’s crucial to ensure that only authorized users have access. Metadata became the hero here, tagging chunks with role information to keep everything secure.

Continual improvement is the name of the game in the AI world. Getting RAG to be 95% correct on the first try is a pipe dream. It takes continuous user feedback to refine and perfect the system. A feedback function where users can report errors or ambiguities proved invaluable, allowing the system to learn and adapt. This is where I feel RAG shines, as updates to the vector store (the DB) are quick and you can see the result of the update in minutes.

In summary, my top findings in the RAG domain are:

  • Larger context chunks that retain semantics lead to better answers.
  • Generic document splitters have their limitations; augmenting and refining these yielded better results.
  • The formatted context provided to the LLM improves answer quality by removing structural noise.
  • Continuous feedback and improvement are vital for refining the RAG system.

This journey into the AI wilderness has been both challenging and rewarding. As I continue to explore and refine my understanding of RAG, I look forward to sharing more insights and learning from this ever-evolving field. Stay tuned for more adventures!