The big conundrum with LLMs is that they’re trained on the musings of billions of people, many of whom aren’t exactly experts. Essentially, LLMs are like a giant blender mixing together millions of internet ramblings. These machines can mimic human writing, but without the depth of understanding, they often sound right while missing the mark.

— Me, Gary Febbrarino

My adventures with the RAG Framework

Embarking on the exhilarating journey into the AI domain, I recently dove deep into the fascinating world of Retrieval-Augmented Generation (RAG). This sophisticated approach in natural language processing combines the best of both retrieval-based logic and pre-trained generative models, aiming to enhance the quality and relevance of text generation. It’s like the ultimate team-up in an AI superhero movie, where different strengths come together to save the day.

So, what exactly is RAG? Imagine you have a vast library of knowledge stored in chunks. When a question is asked, RAG swoops in to find the best matching chunk of context, provides this context with the question to the LLM (Large Language Model), and then refines the answer with a specific business context. It’s a bit like having a super-intelligent librarian who not only knows where every book is but also how to interpret and explain them in the best possible way.

AI generated image, credit – Create Art or Modify Images with AI | OpenArt

Let me break it down for you:

First, we load raw data from various sources, turning a chaotic jumble of information into a goldmine of potential answers. Next, we transform this raw data into a common state, ensuring consistency and compatibility. It’s like taking the scattered pieces of a puzzle and making sure they all fit together.

The magic continues by vectorising the data, converting it into numerical representations that the AI can efficiently process (I refer to this as creating semantic similes). The retriever then steps in, locating relevant information from this vast dataset based on a given query. Think of it as a digital treasure hunt where the prize is accurate and relevant information.

The query encoder ensures that user questions are understood in context, while the user interface provides an intuitive way to interact with the system. And let’s not forget the feedback loop, which continuously improves the system based on user input. It’s a learning process that never stops, getting smarter and more accurate over time.

Of course, every adventure comes with its challenges. One of the toughest hurdles I faced was preparing contextual data for the RAG. Semantic data, which requires understanding the meaning behind words, doesn’t always play nicely with generic splitting methods. It felt like trying to slice a pie with a chainsaw — messy and imprecise. Custom code became my best friend in creating larger, contextually aware chunks that made sense to both humans and machines.

Another tricky area was Role-Based Access Control (RBAC). When dealing with sensitive data, like HR information, it’s crucial to ensure that only authorized users have access. Metadata became the hero here, tagging chunks with role information to keep everything secure.

Continual improvement is the name of the game in the AI world. Getting RAG to be 95% correct on the first try is a pipe dream. It takes continuous user feedback to refine and perfect the system. A feedback function where users can report errors or ambiguities proved invaluable, allowing the system to learn and adapt. This is where I feel RAG shines, as updates to the vector store (the DB) are quick and you can see the result of the update in minutes.

In summary, my top findings in the RAG domain are:

  • Larger context chunks that retain semantics lead to better answers.
  • Generic document splitters have their limitations; augmenting and refining these yielded better results.
  • The formatted context provided to the LLM improves answer quality by removing structural noise.
  • Continuous feedback and improvement are vital for refining the RAG system.

This journey into the AI wilderness has been both challenging and rewarding. As I continue to explore and refine my understanding of RAG, I look forward to sharing more insights and learning from this ever-evolving field. Stay tuned for more adventures!

Kong HQ

For our November Tech Forum, Vikas Vijendra from Kong visited our Melbourne office to bring us up to speed on what’s happening at KongHQ.

At Marlo we are already familiar with the Open Source Kong API Gateway and we like how it fits into our own digital enablement platform. Kong, however, are making a bold shift in product direction with the announcement of their Service Control Platform. They understand that while we might be focused on RESTful APIs today, the future will also include protocols such as gRPC, GraphQL and Kafka. Moreover, the advent of Kubernetes as the container platform of choice means Kong needs to extend into the cluster itself to provide full lifecycle service management.

The main features of the Service Control Platform are:

  • A centralized control plane to design, test, monitor and manage services
  • Multiple Runtimes – not just the nginx engine of Kong but also Istio, Kuma, Apollo and serverless
  • Multiple Protocols – REST, gRPC, GraphQL and Kafka
  • Multiple Platforms – All major cloud providers plus any Kubernetes

The open source API Gateway offering will remain with most of the new features available in the Kong Enterprise offering. These include:

  • Kong for Kubernetes (K4K8S): a supported version of the Kong Ingress Gateway for Kubernetes along with all enterprise plugins
  • Kong Studio: for designing, mocking and testing APIs
  • Kong Manager: for the runtime monitoring and management of deployed services.
  • Kong Developer Portal: a self-service portal providing access to the service catalog.

All of the above features are available as a SaaS offering (Kong Cloud) or on-premise, or any combination of the two.

Perhaps most interesting is the announcement of the Kuma service mesh. An Ingress Controller alone, is limited to managing traffic entering a cluster (north-south traffic). In a microservices architecture most of the traffic is between services on the same cluster (east-west traffic). A service mesh allows control of traffic between these services.

Of course Istio is the dominant product in the service mesh space but Kong (and others) believe Istio has become too complex and Kuma provides a more appropriate level of functionality. The functionality of the Ingress Gateway and the service mesh will eventually morph into a single product controlling both north-south and east-west traffic.

At our latest tech forum, James Liu, Hybrid Application Modernization Specialist from Google, visited Marlo’s Melbourne office and presented on Google Anthos and more broadly on some of the exciting tech coming out of the Google Cloud Platform.

Anthos lets you build and manage modern hybrid applications in your data centre or in the public cloud. Built on open source technologies pioneered by Google—including Kubernetes, Istio, and Knative—Anthos enables consistency between on-premise and cloud environments. Anthos is a vital part of strategically enabling your business with transformational technologies like service mesh, containers, and microservices.

The main takeaways from the session include:

  • GKE (Google Kubernetes Engine) on-premise lets you create a fully managed Kubernetes cluster in your own data centre, controlled and managed from the Google console control plane – all over a https connection.
  • You will soon be able run a Google-managed GKE cluster on any IaaS cloud provider (currently AWS only). This is a great approach for businesses needing a multi-cloud strategy.
  • Anthos Config Management provides a git-based common configuration tool for all policies across Kubernetes clusters both on-prem and in the cloud.
  • Google Cloud Service Mesh provides a fully managed Istio implementation. This represents the next stage of abstraction of the underlying infrastructure.

Marlo is a certified Google Partner working with large business and government clients across Australia and South East Asia. We are Australia’s leading specialists in the delivery of microservices and legacy integration solutions in cloud and on-premise environments.

Get in touch to find out how we can help enable your organisation’s digital journey.