The big conundrum with LLMs is that they’re trained on the musings of billions of people, many of whom aren’t exactly experts. Essentially, LLMs are like a giant blender mixing together millions of internet ramblings. These machines can mimic human writing, but without the depth of understanding, they often sound right while missing the mark.

— Me, Gary Febbrarino

My adventures with the RAG Framework

Embarking on the exhilarating journey into the AI domain, I recently dove deep into the fascinating world of Retrieval-Augmented Generation (RAG). This sophisticated approach in natural language processing combines the best of both retrieval-based logic and pre-trained generative models, aiming to enhance the quality and relevance of text generation. It’s like the ultimate team-up in an AI superhero movie, where different strengths come together to save the day.

So, what exactly is RAG? Imagine you have a vast library of knowledge stored in chunks. When a question is asked, RAG swoops in to find the best matching chunk of context, provides this context with the question to the LLM (Large Language Model), and then refines the answer with a specific business context. It’s a bit like having a super-intelligent librarian who not only knows where every book is but also how to interpret and explain them in the best possible way.

AI generated image, credit – Create Art or Modify Images with AI | OpenArt

Let me break it down for you:

First, we load raw data from various sources, turning a chaotic jumble of information into a goldmine of potential answers. Next, we transform this raw data into a common state, ensuring consistency and compatibility. It’s like taking the scattered pieces of a puzzle and making sure they all fit together.

The magic continues by vectorising the data, converting it into numerical representations that the AI can efficiently process (I refer to this as creating semantic similes). The retriever then steps in, locating relevant information from this vast dataset based on a given query. Think of it as a digital treasure hunt where the prize is accurate and relevant information.

The query encoder ensures that user questions are understood in context, while the user interface provides an intuitive way to interact with the system. And let’s not forget the feedback loop, which continuously improves the system based on user input. It’s a learning process that never stops, getting smarter and more accurate over time.

Of course, every adventure comes with its challenges. One of the toughest hurdles I faced was preparing contextual data for the RAG. Semantic data, which requires understanding the meaning behind words, doesn’t always play nicely with generic splitting methods. It felt like trying to slice a pie with a chainsaw — messy and imprecise. Custom code became my best friend in creating larger, contextually aware chunks that made sense to both humans and machines.

Another tricky area was Role-Based Access Control (RBAC). When dealing with sensitive data, like HR information, it’s crucial to ensure that only authorized users have access. Metadata became the hero here, tagging chunks with role information to keep everything secure.

Continual improvement is the name of the game in the AI world. Getting RAG to be 95% correct on the first try is a pipe dream. It takes continuous user feedback to refine and perfect the system. A feedback function where users can report errors or ambiguities proved invaluable, allowing the system to learn and adapt. This is where I feel RAG shines, as updates to the vector store (the DB) are quick and you can see the result of the update in minutes.

In summary, my top findings in the RAG domain are:

  • Larger context chunks that retain semantics lead to better answers.
  • Generic document splitters have their limitations; augmenting and refining these yielded better results.
  • The formatted context provided to the LLM improves answer quality by removing structural noise.
  • Continuous feedback and improvement are vital for refining the RAG system.

This journey into the AI wilderness has been both challenging and rewarding. As I continue to explore and refine my understanding of RAG, I look forward to sharing more insights and learning from this ever-evolving field. Stay tuned for more adventures!

Robots created using Midjourney

Here is how Marlo prepared this article. We fed the talking points into Chat GPT. We then reviewed the results for grammar and clarity with Grammarly and Hemmingway. After some corrections, we subjected it to internal review. Midjourney created the accompanying banner image.


We are in the early days of using AI assistants like Open AI’s Chat GPT. How will these technologies impact our industry moving forward? The IT industry has undergone many significant changes over the years. In the past, software engineers relied on manuals for guidance and support. Tools like Stack Overflow and Google have replaced manuals. We now have instant access to information and solutions. There are spelling and grammar checkers for documentation. We use sophisticated static code checkers to improve code quality. We have access to many Open-source frameworks and tools. All this makes the job for developers easier to design, build, test and deploy software. AI assistants like GitHub Copilot and Tabnine have supercharged the IDE.

Enterprise environments are complex. Our software engineers need skills beyond coding. They engage in design, testing, cloud development, and DevOps. Multi-skilled engineers need to keep up with our ever-changing industry. The increasing pace of change will demand even more from our software engineers. Development tools have helped increase productivity, as we need to do more in less time. In the future, AI assistants will play an important role. Domain specialists could become conductors or coordinators with the help of AI tools. But these tools are only part of the picture. Quality, security, and data integrity are critical concerns for consultancies like Marlo. How will auto-generated products be validated and secured?

As these tools become more capable, we may see a shift from the information age to the knowledge age. AI tools will make work easier and allow engineers to focus on challenging tasks. These tools will also create new opportunities and roles within the industry. Those companies and individuals who adapt to these changes will stay competitive.

In conclusion, the use of AI assistants has the potential to change the way we work. We are still in the early days. AI tools can increase productivity and offer new opportunities and roles. Are you ready to meet the challenges of the knowledge age?

Photo by Artem Sapegin on Unsplash

Using Randomness to Test Code

In part 1 of this series, we explored pseudo-random values. These are values that are statistically random, but are derived from a known starting point and is typically repeated over and over. In this article we explore how random values can be used in testing. You may already be familiar with randomness in test invocation. For instance JUnit5 provides an annotation to randomise the order of test execution. Here, however, we are looking at a style of testing that uses randomly generated input values that test properties of your code. This is known as "Property Based Testing".

You may be asking: "Why would you use random values in testing? Doesn’t that defeat the whole purpose of unit testing, where known values are tested to known responses?" Well, no. Often it is hard to think of suitable positive and negative test cases that exercise your code. In addition: what if many randomly selected tests can be automatically run? And what if these tests cover many different input values? And what if instances where tests fail are automatically recorded so they can be reported and replayed later? These are just some of the benefits of this approach to testing.

Property-based testing verifies your program code using a large range of relevant inputs. It does this by generating a random sample of valid input values. Perhaps an example may help. Given a utility method to convert a string field into uppercase text, a unit test would use some expected values. In pseudo-code this may look like:

Given "abc123" then expect "ABC123"
Given "ABC" then expect "ABC"
Given "123" then expect "123"

In comparison, property-based tests look at the behaviour of any field that matches the input type. In pseudo-code this looks like:

For any lowercase alphanumeric string then expect the same string but in uppercase.
For any uppercase alphanumeric string then expect the same string, unchanged.
For any non-alphabetic string then expect the same string, unchanged.

The "for any" is where the randomness comes in.

This article will review where these ideas came from, before outlining the core principles of property-based testing. It will introduce the concepts of Generators and Shrinkage and discuss approaches to reproducing tests results.

History

These concepts aren’t new. Tools like Lorem Ipsum have been around since the 1960s to model text.

Kent Beck developed a unit testing framework for Smalltalk in 1989. Those tests had to be hand crafted. This introduced a number of key concepts that we now take for granted. It organised and provided recipes for unit tests. For each test case, the test data was created then thrown away at the end. Test cases were aggregated into a test suite. The test suite formed part of a framework that also produced a report — an example of what is known as literate programming.

In 1994, Richard Hamlet wrote about Random Testing. Hamlet posited that computers could efficiently test a "vast number" of random test points. Another benefit he identified was that random testing provided a "statistical prediction of significance in the observed results". This last point is somewhat technical. In essence it describes the ability to quantify the significance of a test that does not fail. In other words: is this just testing trivial cases?

A few years later, in 1999, the influential paper by Claessen and Hughes, QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs, provided a whole new way to run tests using randomised values. Their paper was written for the functional programming language, Haskell. It inspired a suite of property-based testing tools for many other languages. A list of current implementations appears on the QuickCheck Wikipedia page.

Introducing Property Based Testing

The core principle of property-based testing is that for a function or method, any valid input should yield a valid response. Likewise, any input outside this range should return an appropriate failure. Compare this to how systematic tests are normally written: Given a specific input, check the program’s return value. Therein lies the problem: you need to be sure you have chosen correct and sufficient input values to test your code. The tools we use to check test coverage do not check the adequacy of your tests — just that you have a test for a control flow path. So the quality of a test is dependent upon the quality of the inputs. Property-based testing provides tools to test using randomly generated values selected over the range of input. This changes the focus of our tests. We concentrate on the properties of functions under test. i.e. What the inputs are and what the outputs are expected to be. Testing the properties of a function or method over a large range of values can help find bugs otherwise ignored in specific unit tests. We have experienced this first hand. It is a true "aha" moment when these tests uncover a use-case with input we hadn’t thought of.

In summary:

  • With unit testing we provide fixed inputs (e.g. 0,1,2,…) and get a fixed result (e.g. 1,2,4,…).

  • With property-based testing we provide a declaration of inputs (e.g. all non-negative ints) and declaration of conditions that must be held (e.g. result is an int).

At its core, property-based testing requires the production of randomised input test values. These test values are produced using generators.

Generators

Random values are produced using generators. These are specific functions that produce a random value. Some common generators are used to manufacture booleans, numeric types (e.g. floats, ranges of integers), characters and strings. Both QuickCheck and JUnit-QuickCheck provide many generators. Once you have a primitive generator, you can then compose these into more elaborate generators and structures like lists and maps, or other bespoke structures.

Apart from custom values, you may also want a custom distribution. Random testing is most effective when the values being tested closely match the distribution of the actual data. As the provided generators know nothing of your data, they will typically produce a uniform distribution. To control the data distribution you will need to write your own generator. Luckily, this is not a difficult task. The test tools we have used (Haskell:QuickCheck, Java:JUnit-QuickCheck and Python:Hypothesis) have rich libraries of generators that can be easily extended.

Shrinkage

Generators can produce large test values. When a failure has been detected it would be nice to find a smaller example. This is known as shrinkage.

On failure, QuickCheck reduces the selection to the minimum set. So, from a large set of test values, QuickCheck finds the minimal case that fails the test. In practice, what this does is concentrate tests to the extremes of an input value. However, this behaviour can be modified by the generator.

Shrinkage is an important feature of property based testing. Having an example of failure is good. Having a minimal example of failure is better. With a minimal example you are more likely to understand the reasons for the failure.

Test Reproduction

Random testing is useful, but once a failure has been identified and fixed, we would like to repeat the failed tests to ensure they have been resolved. Tools such as Python’s Hypothesis record all failed tests. Future runs automatically include prior failed tests.

Other tools such as Java’s JUnit-QuickCheck allow the tests to be repeated by specifying a random seed. When a test fails, the random seed used to generate that test is reported, and can then be used to reproduce the same test inputs.

Code Examples

So, what does this look like in real code? Marlo uses Java for development of digital solutions, so the first examples are based on the Java JUnit-QuickCheck package.

The following generator will create a alphanumeric "word" with length of between 1 and 12 characters.

import com.pholser.junit.quickcheck.generator.GenerationStatus;
import com.pholser.junit.quickcheck.generator.Generator;
import com.pholser.junit.quickcheck.random.SourceOfRandomness;
import java.util.stream.IntStream;

/** Generate alpha-numeric characters. */
public final class AlphaNumericGenerator extends Generator<String> {

  /** Alphanumeric characters: "0-9A-Za-z". */
  private static final String ALPHANUMERICS =
      "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

  /** Maximum word length. */
  private static final int MAX_WORD_LENGTH = 11;

  /** Inherit form super class. */
  public AlphaNumericGenerator() {
    super(String.class);
  }

  /** Generate a alphanumeric word of length 1 to 12 characters. Do not create null words. */
  @Override
  public String generate(final SourceOfRandomness randomness, final GenerationStatus status) {
    final int stringSize = randomness.nextInt(MAX_WORD_LENGTH) + 1; // non-empty words
    final StringBuilder randomString = new StringBuilder(stringSize);
    IntStream.range(0, stringSize)
        .forEach(
            ignored -> {
              final int randomIndex = randomness.nextInt(ALPHANUMERICS.length());
              randomString.append(ALPHANUMERICS.charAt(randomIndex));
            });
    return randomString.toString();
  }
}

(source)

To use this generator in a unit test:

/**
  * Test alphanumeric word is same for stream as scanner using Alphanumeric generator. Trials
  * increased from the default of 100 to 1000.
  *
  * @param word a random alphanumeric word
  */
@Property(trials = 1000)
public void testAlphanumericWord(final @From(AlphaNumericGenerator.class) String word) {
  assertEquals(1, WordCountUtils.count(new Scanner(word)));
  assertEquals(1, WordCountUtils.count(Stream.of(word)));
}

(source)

Here we are using our custom generator, and have increased the trials to 1000 from the default of 100. The expected property of our word count utility is that given this input string, its output would indicate that it counted one word.

The following code uses this generator to build a list of strings that are delimited by a space. The code to be tested contains two word count methods accepting different input types. Using our custom generator we can compose test data for both input types. Then test to see if the word count methods agree:

/**
  * Test a "sentence" of alphanumeric words. A sentence is a list of words separated by a space.
  *
  * @param words build a sentence from a word stream
  */
@Property
public void testAlphanumericSentence(
    final List<@From(AlphaNumericGenerator.class) String> words) {
  final String sentence = String.join(" ", words);
  assertEquals(
      WordCountUtils.count(new Scanner(sentence)), WordCountUtils.count(Stream.of(sentence)));
}

(source)

At Marlo we also use Ansible for automation, with some custom modules written in Python. An excellent QuickCheck library for Python is Hypothesis. An equivalent generator to the Java example above is the text strategy. Used in a test, it looks like:

@given(text(min_size=1, max_size=12, alphabet=ascii_letters + digits))
def test_alphanumeric(a_string):
    """
    Generate alphanumeric sized strings like:
        'LbkNCS4xl2Xl'
        'z3M4jc1J'
        'x'
    """
    assert a_string.isalnum()
    a_length = len(a_string)
    assert a_length >= 1 and a_length <= 12

(source)

While the above are trivial examples, they do demonstrate how this style of testing is a valuable complement to systematic tests. They enable a larger number of test cases to run against your code. The style of tests are different, in that they focus on the generalised behaviour of code rather than specific use cases. This makes them powerful addition to a test suite.

Summary

In this article we took a brief look at the features of property-based testing, which uses random inputs to improve the quality and coverage of tests.

However, it is important to note that property-based tests don’t replace unit tests. Instead they should be used to augment your existing tests with values that you may not have thought about. Generating a large number of tests may, however, give a false sense of security if most of these test cases are trivial. So, choosing the correct inputs, whether randomly generated or systematically selected is important.

Property-based tests are easy to write and can help identify bugs that traditional testing approaches might miss. So, why not use randomness to your advantage?

Get in touch with Marlo today if you’d like to learn more about how we can help modernise your testing regime.

Links

There are many resources online resources available if you want to learn more:

This is part 2 of a series on reactive microservices, the first part is available at https://marlo.com.au/reactive-microservices

In the last post we talked at length about why reactive applications are really useful and how they can help squeeze more performance out of your java applications.

Tl;dr: Backpressure and non blocking IO are amazing.

But the last post was a bit wordy, wasn’t it? All exposition and no programming or anything.

"Get to the point Mills!" I can hear the readers shout into the void, as they echo various teachers, coaches, managers, relatives, my wife and so on throughout my life.

I hear you, however before I confront the reader with a wall of code, it’s worth looking at some crucial differences between normal imperative coding and the reactive APIs.

In this series we are focusing on Spring Boot and its use of the Project Reactor reactive engine, however it’s not the only game in town. Other production ready reactive java libraries include Akka and RxJava, and the JDK even gets in on the game with the Java 9 Flow API .

Mono and Flux

The first big difference in the way you write reactive application is that whatever happens, it happens reactively! That is to say, you have to kick off a reaction before it happens and it’s quite a different approach when compared to something like just declaring and using a string.

So how do we declare and print some text?

In "normal" java I declare a String and then use the System.out.println() method:

String theString = "Hey there, World!"
System.out.println(theString);

This entails no waiting or reacting to anything. The computer put some bytes that form the string in memory and returned a reference, which I can use to conquer the world (or print to the console, either/or).

Reactive does not do that, at least not right away. Instead you ask it to acquire some data from somewhere and then instruct it on what to do when the data becomes available.

Project Reactor provides two high level APIs to do things reactively, the Mono and the Flux, which are the entry points to the reactive engine.

These concepts are explained in great detail on the Project Reactor website for Monos and Fluxes

It’s pretty simple to start:

Mono<String> stringMono = Mono.just("Hey there, World");

Ok, so you just created another String? Great.

But did I? Reactive is all about reacting to events that are outside of the direct control of the program. Central to this is setting up the event (or reaction) and then submitting it to the reactive core to wait for the thing to happen.

So what are the differences? The computer still put some bytes that form a string in memory and returned a reference (because I’m statically declaring a string), but the Mono#just does not return a String, it returns the Mono.

The reactor core is sitting on the String until something else happens.

That something is a concept of subscription. This is where it starts to get a little hairy.

Back to our String. We had a simple Mono that is of type String:

Mono<String> stringMono = Mono.just("Hey there, World");

To actually do something to the String, I need to subscribe and supply something to consume the result:

Mono<String> stringMono = Mono.just("Hey there, World");
stringMono.subscribe(System.out::println);

The Mono#just method set up the reactor to provide the string, but it did not actually occur until

  • we subscribe to the sequence with a Consumer AND
  • the String became available.

In this case the String was available to the System.out.println() method right away but, in real life, we may have to wait. The reactor core is responsible for managing that waiting in the most efficient way possible, which it does by putting the pending job to the side and using the threads for more useful work.

Something that was not overly obvious initially, (at least to me, leading to furrowed brows and confusion), is this concept of the reaction happening inside the reactive container.

When reactive-ing, trying to do this is probably not what you want:

Mono<String> stringMono = Mono.just("Hey there, World");
String theString = stringMono.subscribe() <-- compilation error

This is the first thing I tried when learning the reactive API. It does not compile because the subscribe method returns a Disposable, which is of no use to someone trying to get a String. Of course, the next thing I tried was this:

String theString = stringMono.subscribe().toString(); // lol

even though I knew it was not right, I did it anyway to prove a point. It was not right and I was greeted with an object reference.

reactor.core.publisher.LambdaMonoSubscriber@7ed7259e

The point that I did not know what I was doing was proven.

There was a way to force it to give me a string. This:

String theString = stringMono.block()

will compile, but this only works because it blocks the thread until the String is available, and that is not why we are here.

"How do I get my String?!?!?!?!", the developer will cry in frustration. This question comes up all the time.

How do I get the Mono to cough up my long-lost data?

Well, you don’t.

The paradigm shift is that you need to start thinking inside the (reactive) box. Remember subscription is submitting your process to the reactive core, which is then going to work out the nuts and bolts of when things happen. Whatever you want to do with the String happens inside (or via) the Mono.

For me, the lesson was do not think of the reactive container as something that is going to provide your program with an answer, you need to give it whatever you want to do with the answer as well, like this:

Mono<String> stringMono = Mono.just("Hey there, World");
stringMono.subscribe(s -> doTheThingWithTheString(s));

Much like an actual nuclear reactor, if the stuff you put in comes out, you are doing it wrong. Really, you do not want it to come out without knowing exactly what you are doing.

Got it. Do the things inside the Mono. So then what is the Flux all about?

The same trivial example with a Flux looks like this:

Flux<String> stringFlux = Flux.just("Hey", "there,", "World");
stringFlux.subscribe(System.out::println);

The difference being that it is working on a collection of Strings rather than a single String.

This will apply the subscribe method action to each item on the stream:

Hey
There
World

A Mono is a single (a bounded sequence of 0 .. 1), whereas a Flux is a unbounded sequence (0 .. n). Under the covers they implement the same Publisher interface, but there is some aspects that make it useful to provide separate APIs.

Great! Is that it then?

Unfortunately for you, dear reader, no. In fact, for a post entitled "Reactive Microservices" there isn’t much in the way of microservices, unless you consider printing to the console a microservice.

Therefore, we must carry on! I reckon the best way to appreciate "how to reactive" is to put it side by side with a non-reactive app, so in part 3 (I can hear the groans already) I will get busy developing the new "Customer Microservice" for our new super-hot client that will be soon taking over the world – CompuHyperGlobalMegaNet.

Until next time, thanks for reading!

Kong HQ

For our November Tech Forum, Vikas Vijendra from Kong visited our Melbourne office to bring us up to speed on what’s happening at KongHQ.

At Marlo we are already familiar with the Open Source Kong API Gateway and we like how it fits into our own digital enablement platform. Kong, however, are making a bold shift in product direction with the announcement of their Service Control Platform. They understand that while we might be focused on RESTful APIs today, the future will also include protocols such as gRPC, GraphQL and Kafka. Moreover, the advent of Kubernetes as the container platform of choice means Kong needs to extend into the cluster itself to provide full lifecycle service management.

The main features of the Service Control Platform are:

  • A centralized control plane to design, test, monitor and manage services
  • Multiple Runtimes – not just the nginx engine of Kong but also Istio, Kuma, Apollo and serverless
  • Multiple Protocols – REST, gRPC, GraphQL and Kafka
  • Multiple Platforms – All major cloud providers plus any Kubernetes

The open source API Gateway offering will remain with most of the new features available in the Kong Enterprise offering. These include:

  • Kong for Kubernetes (K4K8S): a supported version of the Kong Ingress Gateway for Kubernetes along with all enterprise plugins
  • Kong Studio: for designing, mocking and testing APIs
  • Kong Manager: for the runtime monitoring and management of deployed services.
  • Kong Developer Portal: a self-service portal providing access to the service catalog.

All of the above features are available as a SaaS offering (Kong Cloud) or on-premise, or any combination of the two.

Perhaps most interesting is the announcement of the Kuma service mesh. An Ingress Controller alone, is limited to managing traffic entering a cluster (north-south traffic). In a microservices architecture most of the traffic is between services on the same cluster (east-west traffic). A service mesh allows control of traffic between these services.

Of course Istio is the dominant product in the service mesh space but Kong (and others) believe Istio has become too complex and Kuma provides a more appropriate level of functionality. The functionality of the Ingress Gateway and the service mesh will eventually morph into a single product controlling both north-south and east-west traffic.

At our latest tech forum, James Liu, Hybrid Application Modernization Specialist from Google, visited Marlo’s Melbourne office and presented on Google Anthos and more broadly on some of the exciting tech coming out of the Google Cloud Platform.

Anthos lets you build and manage modern hybrid applications in your data centre or in the public cloud. Built on open source technologies pioneered by Google—including Kubernetes, Istio, and Knative—Anthos enables consistency between on-premise and cloud environments. Anthos is a vital part of strategically enabling your business with transformational technologies like service mesh, containers, and microservices.

The main takeaways from the session include:

  • GKE (Google Kubernetes Engine) on-premise lets you create a fully managed Kubernetes cluster in your own data centre, controlled and managed from the Google console control plane – all over a https connection.
  • You will soon be able run a Google-managed GKE cluster on any IaaS cloud provider (currently AWS only). This is a great approach for businesses needing a multi-cloud strategy.
  • Anthos Config Management provides a git-based common configuration tool for all policies across Kubernetes clusters both on-prem and in the cloud.
  • Google Cloud Service Mesh provides a fully managed Istio implementation. This represents the next stage of abstraction of the underlying infrastructure.

Marlo is a certified Google Partner working with large business and government clients across Australia and South East Asia. We are Australia’s leading specialists in the delivery of microservices and legacy integration solutions in cloud and on-premise environments.

Get in touch to find out how we can help enable your organisation’s digital journey.

 

Tech Lead Vishal Raizada recently conducted a very informative Tech Forum at the Marlo Office. He presented on Istio: Architecture, Application and Ease of Implementation.

Our tech forum presentation is downloadable here and showcases an example of Istio’s implementation, application and benefits.

Istio is now a key part of the Marlo Digital Enablement Platform – our open source, cloud-native platform which provides a complete on-demand environment for digital delivery.

The enterprise application landscape has changed a lot in the last decade: from managing on premise servers to using infrastructure as a service; from monolithic applications to building microservices.

The new world offers many benefits but it also introduces new challenges. With the distributed nature of the application landscape, service discovery and general application composition becomes extremely complex. Controls, such as traffic management, security and observability, which could previously be managed in one place now become a scattered problem.

Enter Istio, a service mesh framework, which wraps around a cloud native architecture and adds a layer of abstraction to manage these complexities. It enables a truly automated delivery process, where a development team can purely focus on code, and Istio handles the rest, including service discovery, security, circuit breaking and much more. In addition, it is programmable, hence it can be incorporated as part of the DevOps & DevSecOps process with ease. A service mesh gives control back to the enterprise application world without taking away any of the benefits.

Read Vish’s full presentation here.