Photo source: Irfan Simsar Unsplash

In the world of Agile, the role of architects is often misunderstood. A point of conjecture spotlighted by a well known principle from the Agile manifesto:

The best architectures, requirements, and designs emerge from self-organizing teams.

SAFe does its best to describe the theory and intent behind Agile architecture, however, the architects at Marlo understand that Agile skills and experience in architectural design thinking are needed in order to successfully navigate the challenges of Agile solution delivery. Our wealth of experience across diverse industries has helped us evolve and improve the way we architect in an Agile environment.

In this article, we will discuss the role of solution architects in an Agile delivery setting, looking at some of the skills required and benefits to be gained from adopting an Agile approach. This article uses SAFe terminology, however, the concepts apply equally to other Agile delivery methods.

A Complex Problem

There was a time before Agile when being a solution architect meant working on one large monolithic project for months or years on end, with the hope of one day being part of a successful go-live event. Unfortunately, many projects failed due to scope, time, budget or other unforeseen circumstances, meaning go-live was never a certainty and more often than not the architect had long since moved on to other things.

With Agile came a new way of working, with a focus on rapid delivery of incremental value to the business in a collaborative environment. This immediately disrupted the role of architects, who under a waterfall method, were used to working at their own pace, often in isolation for long periods with limited collaboration.

Marlo was recently engaged with a client on a project using SAFe as the delivery framework. The client was about 12 months into the process of adopting SAFe and transitioning away from a waterfall delivery method.

The project had documented a reasonably simple business brief to deliver a new customer benefits platform with the goal of expanding their customer base. There was little to no documentation of anything related to technology.

It was apparent from the outset that the project’s timelines were very aggressive and the following challenges only added to the complexity:

  • The project had little understanding of technology impacts before committing to cost and timing estimations

  • The iteration manager didn’t start until half-way through the project

  • The project also lost a very knowledgeable product owner half-way through the project

  • There were nine specialised delivery streams spread across six different timezones

  • Delivery streams were ready to begin prior to much of the architecture being defined

  • There was no Enterprise / Domain Architect with knowledge of the technology strategy

  • Multiple product owners were impacted by the project and had conflicting needs

  • There were several major scope changes throughout, adding pressure, disruption and rework

  • The project had committed to highly optimistic deadlines from the outset, with the major deadline brought forward in response to business request

Tools and Techniques

It was clear from the outset that there was insufficient time for a traditional Big Design Up Front (BDUF) approach. However, an intentional design (just enough upfront architecture) was needed rather than relying purely on emergent design (architecture is discovered and extended as part of each increment) to minimise the technical debt and rework that would otherwise eventuate.

Marlo’s first order of business was to agree a way forward by defining an Architecture Runway. The Architecture Runway identified the near-term architecture enablers and their relationship (dependencies) to planned features in the project backlog (this is distinct from a Technology Roadmap which provides the long-term strategic architecture view). Our Architecture Runway included: a new customer identity platform, a new customer preferences platform, data migration approach, integration approach for consumers, support solution for contact centre teams, non-prod environments, and non-functional requirements.

Effective engagement and communication was critical from the outset. Marlo led the way in establishing the meeting and communication cadence for the Squads, including Chapters for functional areas and Guilds for non-functional aspects. We established a stakeholder communication matrix to identify important stakeholders and ensured they were engaged at the appropriate times. This included reporting up to the business and project steering committees as well as the architecture review board.

Establishing clear roles and responsibilities is always important, and never more so than in an Agile delivery setting where emergent decision-making behavior can occur without full appreciation of the consequences. We drew the lines of demarcation between solution architecture, application architecture and technical design responsibilities so all team members understood their role. We also made it clear that collaboration was key and no decisions would be made in isolation or without consultation.

We established a design authority to formalise the collaboration process, supported by a central decisions register to transparently communicate all decisions and their rationale. As the architects, it was important that we demonstrate leadership and competence and influence key decisions rather than dictate the outcome. We made ourselves available and responsive at all times through corporate messaging, email, phone and face-to-face meetings to ensure we didn’t become an obstacle to be avoided. All of the architecture documentation was openly accessible on a Wiki.

One of the challenges we had was ensuring the architecture stayed just ahead of the upcoming features. Even though the Architecture Runway was unlikely to change, we did not have the luxury to plan too far ahead. This was due to the number of near-term architecture decisions required, particularly given we had large development teams already starting to code and new features being added by the day. We used architecture spikes to assess the impact to the Runway when a new feature was added, which allowed us to update the Runway and stay just ahead.

One of the most important roles we played was in our day to day discussions within the Agile squads influencing the emergent design. From these discussions we were able to catch and correct-course on a number of decisions that might otherwise have been missed:

  • Passwords being sent unencrypted in clear text

  • Data extracts being duplicated

  • Applications that were planning to pull data directly from production (risking performance)

  • An opportunity to create a reusable interface adapter that was immediately leveraged by another project

Our technical thought leadership identified risks early, minimised technical debt, exploited opportunities for reuse, fostered alignment with the broader technology direction and improved understanding of cost implications.

Benefits

The engagement presented many challenges which would have added significant time and cost to the project in a traditional setting (using a traditional mindset). However, Marlo was able to bring its Agile skills, experience and mindset to overcome these challenges and deliver the following additional benefits:

  • Avoiding Big Design Up Front (BDUF) allowed faster start time for Agile delivery teams and quicker incremental benefits to the business, providing more rapid return on investment

  • A focus on just-in-time architecture meant minimal wastage of architecture effort (no longer spending months working in isolation) and minimal rework for the Agile teams

  • Decisions were made faster (days not weeks or months), documented in a consistent format and available for all to see

  • Architects provided technical thought leadership to the Agile teams, ensuring clear technology ownership, early risk mitigation, minimisation of technical debt, architecture reuse, alignment with the broader technology direction and an informed understanding of cost implications

  • Although the project went slightly over the original budget, it delivered on time and created reusable technology assets that benefited other projects. A traditional waterfall project would have taken many months longer and would have struggled to manage the changing scope without cost blowouts

Beware

Our experience has shown that going Agile is not all sunshine and rainbows. There are many pitfalls to be aware of:

  • Architecture falling behind delivery teams, leading to a high degree of technical debt and rework

  • Decisions not made in a timely manner, leading to re-planning, workarounds and delays

  • Architecture not appropriately addressing or anticipating the needs of important technology stakeholders such as the Architecture Review Board and Change Advisory Board

  • Architecture concerns deprioritised in favour of delivering business features, leading to increased technology operating risks (e.g. security vulnerability, no backup solution, lack of customer support)

  • Inability for architects to identify red flags during dynamic discussions

  • Inability for architects to balance and trade-off compromises appropriately

The Agile Architect

Agile Architects require a broad range of skills. In addition to traditional architecture skills, an Agile architect must:

  • Be highly organised in terms of planning their own work

  • Be always thinking a few steps ahead

  • Be flexible to change

  • Be available when needed

  • Collaborate and be inclusive from the start

  • Be transparent in terms of knowledge sharing, decision making and documentation

  • Be able to shape and influence dynamic discussions

Agile Architects need to move beyond the old ways of working and adopt an Agile mindset in order to add value to the project.

Marlo has the skills and experience to establish an Agile Architecture practice and help clients navigate the challenges of Agile solution delivery. Get in touch today if you would like to learn more about how Marlo can help with your next project.

This is part 3 of a series on reactive microservices, the first two parts are available at https://marlo.com.au/reactive-microservices and https://marlo.com.au/reactive-microservices-part-2

We do not recommend csv as a data transport encodingPhoto by Mika Baumeister on Unsplash

At the end of the last post, I introduced the "Customer Microservice" from CompuHyperGlobalMegaNet1. The business at CompuHyperGlobalMegaNet have determined that a key plank in their digital strategy is to know who their customers are, so we have been asked to come up with something that will store the customer details. "Doesn’t Salesforce do that really well?" asked a junior dev. The CTO had a quick look at the web page, but to be honest it was pretty confusing and seemed complicated, so instead we decided to develop our own Customer microservice using Spring Boot.

We will build it old school first using Spring Web, then using Reactive to spell out the differences between the 2 approaches.
I’ve created the application scaffolding using Spring Initializr which is a really fast and simple way to get a project setup.

The Architect provided us with a spec:

All the whitespace is room for scaling

And we got building: Spring Boot, REST API, JPA with Spring Data. Go2!

We defined a Customer class that can be saved into a database using JPA, and sprinkle some lombok on so we can replace code with annotations:-

@Entity
@NoArgsConstructor
@AllArgsConstructor
@Data
public class Customer {

    @Id @GeneratedValue
    private Integer id;
    private String firstName;
    private String lastName;
}

More annotations than code, but Awesome.

Let’s create a DAO so we can database it (I know we say repositories these days, but it’s hard to change):

@EnableJpaRepositories
public interface CustomerDAO extends JpaRepository<Customer, Integer> {
}

Spring Data is magic.

And finally a CustomerController to do the web things:

@RestController
@RequiredArgsConstructor
@Slf4j
public class CustomerController {

    private final CustomerDAO customerDAO;

    @GetMapping
    public List<Customer> customers() {
        return this.customerDAO.findAll();
    }
}

This starts, but is hard to test as there’s no data in the DB and no way to create any! We are going to need a "add customer" feature. Until we get that, let’s add an event to populate some customer data:

I am shamelessly stealing this trick (and quite a few reactive idioms further along as well) from Spring Legend Josh Long. Check out his talks and articles, I certainly have learned a LOT from them.

@RequiredArgsConstructor
@Component
class Initialise implements ApplicationListener<ApplicationReadyEvent> {

    private final CustomerDAO customerDAO;

    @Override
    public void onApplicationEvent(ApplicationReadyEvent applicationReadyEvent) {
        List.of("Brock:Mills","Brian:Fitzgerald").stream().map(s -> {
            var c = s.split(":");
            return new Customer(null, c[0], c[1], null, null);
        }).forEach(c -> customerDAO.save(c));

        customerDAO.findAll().stream().forEach(System.out::println);
    }
}

and we should probably do that addCustomer endpoint as well, so back to the CustomerController

@PostMapping
public ResponseEntity<Void> addCustomer(@RequestBody Customer customer) throws Exception {
    this.customerDAO.save(customer);
    return ResponseEntity.created(new URI("/" + customer.getId().toString())).build();
}

Start it and test:


  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v2.3.1.RELEASE)

2020-07-06 14:52:57.929  INFO 73930 --- [           main] c.m.n.NotreactiveDbApplication           : Starting NotreactiveDbApplication on emmet.localdomain with PID 73930 (/Users/brockmills/Development/Marlo/microservices/notreactive-db/target/classes started by brockmills in /Users/brockmills/Development/Marlo/microservices/notreactive-db)
2020-07-06 14:52:57.933  INFO 73930 --- [           main] c.m.n.NotreactiveDbApplication           : No active profile set, falling back to default profiles: default
2020-07-06 14:52:59.450  INFO 73930 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFERRED mode.
2020-07-06 14:52:59.547  INFO 73930 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 80ms. Found 1 JPA repository interfaces.
2020-07-06 14:53:00.259  INFO 73930 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8080 (http)
2020-07-06 14:53:00.284  INFO 73930 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
... snip ...
2020-07-06 14:53:01.616  INFO 73930 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path ''
... snip ...
2020-07-06 14:53:02.496  INFO 73930 --- [           main] DeferredRepositoryInitializationListener : Spring Data repositories initialized!
2020-07-06 14:53:02.509  INFO 73930 --- [           main] c.m.n.NotreactiveDbApplication           : Started NotreactiveDbApplication in 5.405 seconds (JVM running for 6.239)
Customer(id=1, firstName=Brock, lastName=Mills)
Customer(id=2, firstName=Brian, lastName=Fitzgerald)

Great, we are running with some data. Let’s test the API:

brockmills@emmet microservices % curl http://localhost:8080/
[{"id":1,"firstName":"Brock","lastName":"Mills"},{"id":2,"firstName":"Brian","lastName":"Fitzgerald"}]%

Win! Let’s add a customer with a POST and query again:

brockmills@emmet microservices % curl --header 'Content-Type: application/json' --data '{"firstName":"Tester", "lastName":"Testing"}' http://localhost:8080/
brockmills@emmet microservices % curl http://localhost:8080/
[{"id":1,"firstName":"Brock","lastName":"Mills"},{"id":2,"firstName":"Brian","lastName":"Fitzgerald"},{"id":3,"firstName":"Tester","lastName":"Testing"}]%

Ship it! I did hear the architect say something after a long lunch about unfunctionals? defunctionals? IDK. The new dev mentioned they thought security, error handling and data integrity were important, so we created some JIRAs for our tech debt sprint. We’ve got features to chuck over the fence at ops deliver!

While we were deploying to Prod, the business peeps dropped some new requirements on us. Apparently they have some amazing big data science lake thing going and MUST know the weather in Perth when the customer is created.

We had some questions about this – Why Perth? Which Perth? What actual thing about the weather? Wouldn’t it be better to correlate this later in some data tool, rather than adding weather data into the customer entity? – but these concerns were swept aside ("Don’t bring that negative vibe, we can do this together!") and who are we to question the business? We better get coding:

We add the temperature to our Customer class:

@JsonInclude(JsonInclude.Include.NON_NULL)
public class Customer {
... snip ...
    private Double airTempActual;
    private Double airTempFeelsLike;
}

Someone duck duck go’ed "Weather API" and we found a suitable service with the right price of $0 features for our needs.

I’m using the Openweathermap API, which has a very nice developer experience and is simple and intuitive. 5 stars, would recommend.

We knocked up a class to hold the response from the API:

@NoArgsConstructor
@Getter
public class Weather {
    private WeatherMain main;
}

@NoArgsConstructor
@Getter
class WeatherMain {
    private double temp;

    @JsonProperty("feels_like")
    private double feelsLike;
}

And wired it all up straight into createCustomer method on the CustomerController

public class CustomerController {

    private final CustomerDAO customerDAO;
    private final RestTemplate restTemplate;

    @Value("${WEATHER_API_KEY}")
    private String weatherApiKey;
    ... snip ...
    @PostMapping
    public ResponseEntity<Void> createCustomer(@RequestBody Customer customer) throws Exception {
        // whats the temp in Perth?
        try {
            var perthWeather = perthWeather();
            customer.setAirTempActual(perthWeather.getMain().getTemp());
            customer.setAirTempFeelsLike(perthWeather.getMain().getFeelsLike());
        } catch (Exception e) {
            log.warn("cant get the weather in perth", e);
        }
        this.customerDAO.save(customer);
        return ResponseEntity.created(new URI("/" + customer.getId().toString())).build();
    }

    /**
     * call the weather api to get the weather in Perth
     * @return
     */
    private Weather perthWeather() {
        try {
            var query = Map.of("appid", weatherApiKey, "q", "perth", "units", "metric");
            return this.restTemplate.exchange("https://api.openweathermap.org/data/2.5/weather?q={q}&appid={appid}&units={units}", HttpMethod.GET, null, Weather.class, query).getBody();
        } catch (RestClientException e) {
            log.error("failed to get the weather in perth: "+ e.getMessage(), e);
            throw new RuntimeException(e);
        }
    }   

Let’s push to prod! Oh, after following the CAB process and presumably some sort of testing. In fact, I’ll even test it on my machine:

It starts and we can see the temps are null on my bootstrapped data.

2020-07-06 16:35:17.243  INFO 55154 --- [           main] c.m.n.NotreactiveDbApplication           : Started NotreactiveDbApplication in 5.739 seconds (JVM running for 6.643)
Customer(id=1, firstName=Brock, lastName=Mills, airTempActual=null, airTempFeelsLike=null)
Customer(id=2, firstName=Brian, lastName=Fitzgerald, airTempActual=null, airTempFeelsLike=null)

That’s fine, we don’t want to fail a Customer create just because we didn’t know the weather in some random city.

But does it work if we add a new customer?

brockmills@emmet microservices % curl --header 'Content-Type: application/json' --data '{"firstName":"Tester", "lastName":"Testing"}' http://localhost:8080/
brockmills@emmet microservices % curl http://localhost:8080/
[{"id":1,"firstName":"Brock","lastName":"Mills"},{"id":2,"firstName":"Brian","lastName":"Fitzgerald"},{"id":3,"firstName":"Tester","lastName":"Testing","airTempActual":17.6,"airTempFeelsLike":10.3}]

17 degrees in Perth. A lovely winter’s day and lovely microservice to boot (to Spring Boot, if you will hardi har har).

But this is not a blog about the weather, it’s about reactive! The architect re-appeared from wherever they go when they are not bothering engineers, and they had done some detailed usage projections. Turns out we are going to need to support 4x our expected load and – get this – we can’t deploy any new servers. We have to work with what we have got.

We took some stats from prod, created some flame graphs:
We got confused about the output3 so tried some other profilers, which produced some actual useful output:
We managed to produce a flame graph of the same thing:
Boom- there’s our ticket to doing a APM presentation at DevHatMineCon next year!

We now had some pretty strong evidence that most of the thread time on our app was spent waiting for the weather API (and a little at the database as well).

We also found a neat natural experiment to reinforce our hypothesis: We accidently ran production against our mock Weather API server for 3 hours4. That thing is fast and it was still spending 90% of its time talking to the API.

We need to go nuclear – with the reactor!

This section builds on reactive framework concepts we discussed in part 2

Porting the application to reactive is relatively straight forward.

There are 3 main concerns:

  • We need to replace the JPA and JDBC database components with reactive enabled ones;
  • We need to wrap our Controller method return types in either Mono or Flux containers; and
  • We need to replace the RestTemplate client with a reactive WebClient.

Let’s get into it:

I’ll start with the project dependencies. spring-boot-starter-web and spring-boot-starter-data-jpa are out, to be replaced with spring-boot-starter-webflux and spring-boot-starter-data-r2dbc.

The database bound beans require small changes to remove JPA annotations, in the case of my Customer, this means removing @Entity and @GeneratedValue:

@Data
@AllArgsConstructor
@NoArgsConstructor
@JsonInclude(JsonInclude.Include.NON_NULL)
public class Customer {

    @Id
    private Integer id;
    private String firstName;
    private String lastName;

    private Double airTempActual;
    private Double airTempFeelsLike;
}

Q: If we remove @GeneratedValue annotation, how does the primary key get updated?
A: It depends. The reactive Spring Data library really only supports the database sequence / identity approach from JPA, which is driven by the @ID annotation. As such it is aware of database sequences on fields with the @ID and manages them.

The DAO changes from implementing a JpaRepository to a ReactiveCrudRepository.

public interface CustomerDAO extends ReactiveCrudRepository<Customer, Integer> {
}

Pretty painless so far.

The Initialiser class is the first taste of the real differences. Firstly, since we are not using JPA / Hibernate, there is no auto generation of database schemas, so we are going to do this manually with the r2dbc provided DatabaseClient.

Next, to bootstrap the test data we need to create it in the context of a reactive type, being a Flux for the list of customers.

To put it all together:

@RequiredArgsConstructor
@Component
class Initialiser implements ApplicationListener<ApplicationReadyEvent> {

    private final CustomerDAO customerDAO;
    private final DatabaseClient databaseClient;

    @Override
    public void onApplicationEvent(ApplicationReadyEvent applicationReadyEvent) {
        Flux<Customer> customers = Flux.just("Brock:Mills", "Brian:Fitzgerald")
                .map(s -> {
                    var c = s.split(":");
                    return new Customer(null, c[0], c[1], null, null);
                })
                .flatMap(customerDAO::save);

        databaseClient.execute("create table CUSTOMER(ID identity auto_increment, FIRST_NAME varchar(50), LAST_NAME varchar(50), AIR_TEMP_ACTUAL double, AIR_TEMP_FEELS_LIKE double)")
                .fetch()
                .rowsUpdated()
                .thenMany(customers)
                .thenMany(this.customerDAO.findAll())
                .subscribe(System.out::println);
    }
}

This is interesting. customers is a flux that is setup, but (because we are reacting) nothing happens with it until we hit the subscribe() down the track.

This is demonstrated nicely by adding in a few log.info() prints:

    @Override
    public void onApplicationEvent(ApplicationReadyEvent applicationReadyEvent) {
        Flux<Customer> customers = Flux.just("Brock:Mills", "Brian:Fitzgerald")
                .map(s -> {
                    log.info("1: creating customer: " + s);
                    var c = s.split(":");
                    return new Customer(null, c[0], c[1], null, null);
                })
                .flatMap(customerDAO::save);

        log.info("2: this is after the customer");
        databaseClient.execute("create table CUSTOMER(ID identity auto_increment, FIRST_NAME varchar(50), LAST_NAME varchar(50), AIR_TEMP_ACTUAL double, AIR_TEMP_FEELS_LIKE double)")
                .fetch()
                .rowsUpdated()
                .thenMany(customers)
                .thenMany(this.customerDAO.findAll())
                .subscribe(c -> {
                    log.info("3: in the subscribe: " + c.toString());
                });

        log.info("4: the end, we are initialised");
    }

Results look like this:

[  main] c.m.reactivedb.ReactiveDbApplication     : Started ReactiveDbApplication in 5.608 seconds (JVM running for 6.9)
[  main] com.marlo.reactivedb.Initialiser         : 2: this is after the customer
[  main] com.marlo.reactivedb.Initialiser         : 1: creating customer: Brock:Mills
[  main] com.marlo.reactivedb.Initialiser         : 1: creating customer: Brian:Fitzgerald
[  main] com.marlo.reactivedb.Initialiser         : 3: in the subscribe: Customer(id=1, firstName=Brock, lastName=Mills, airTempActual=null, airTempFeelsLike=null)
[  main] com.marlo.reactivedb.Initialiser         : 3: in the subscribe: Customer(id=2, firstName=Brian, lastName=Fitzgerald, airTempActual=null, airTempFeelsLike=null)
[  main] com.marlo.reactivedb.Initialiser         : 4: the end, we are initialised

The customers flux is set up, but there’s no subscription so execution of the stream is deferred. The 2nd statement, at the end of the databaseClient.execute() chain, does subscribe to the stream which causes the publisher to execute, create the DB tables and insert the customer records and finally log the resulting records5.

The CustomerController is where the rubber really hits the road. Firstly we need to replace RestTemplate with the WebClient HTTP Client and modify the return types to use the reactive containers. We do retain the @RestController annotation though, as it is reactive aware:

@RestController
@RequiredArgsConstructor
@Slf4j
public class CustomerController {
    private final CustomerDAO customerDAO;
    private final WebClient webClient;
    ...snip...
    @GetMapping("/")
    public Flux<Customer> getCustomers() {
        return this.customerDAO.findAll();
    }

    @PostMapping("/")
    public Mono<ResponseEntity<Void>> createCustomer(@RequestBody Customer customer) throws Exception {
    }

The List on the getCustomers method becomes a Flux but the ResponseEntity on the createCustomer method must be wrapped in a Mono like this: Mono;.

Now we get to the createCustomer method. A quick recap, This API:

  • calls the weather API,
  • merges the weather data into the Customer from the client,
  • saves the Customer in the Database
  • returns a 201 Created HTTP code that includes a Location header with a link to the newly created Customer.
  • a failure for the weather API call should not be fatal.

Before I go into the controller method, let’s focus on the call to the Weather API in perthWeather(). This will now join the reactive party with a Mono wrapper, however I need to restructure the way this method fails. Whereas it previously simply threw an Exception that was handled by the createCustomer method, there’s a better way work now we need to operate within the reactive container.

perthWeather becomes this:

    private Mono<Optional<Weather>> perthWeather() {
        var q = Map.of("appid", weatherApiKey,
                "q", "perth",
                "units", "metric");

        return webClient.get().uri("https://api.openweathermap.org/data/2.5/weather?q={q}&appid={appid}&units={units}", q)
                .retrieve()
                .bodyToMono(Weather.class)
                .map(Optional::of)
                .onErrorReturn(Optional.empty());
    }

WebClient has a friendly API, where we define:

  • The method get()
  • the target URI with template params along with a Map of the query parameters
  • tell it what to do with the body – bodyToMono(Weather.class)

We could stop there and simply return the Mono however given the try/catch approach that was used before isn’t going fly, I’m now going to wrap the return Weather class in an Optional and add an error handler that will return an empty Optional if something goes wrong.

The createCustomer method now needs to be refactored. We now need to work within the reactive stream, which means using the methods provided by the Mono to do our businesses bidding.

Here’s the code:

    @PostMapping("/")
    public Mono<ResponseEntity<Void>> createCustomer(@RequestBody Customer customer) throws Exception {

        return perthWeather().map(o -> {
            o.ifPresent(w -> {
                customer.setAirTempActual(w.getMain().getTemp());
                customer.setAirTempFeelsLike(w.getMain().getFeelsLike());
            });
            return customer;
        })
            .map(customerDAO::save)
            .map(c -> ResponseEntity.created(UriComponentsBuilder.fromPath("/" + customer.getId().toString()).build().toUri()).build());

    }

First up, call the perthWeather() method, then use the resulting Weather to fill out our Customer object weather fields. Here I’m calling map(), using the Optional container to determine if we can actually add the weather and then return the Customer, ready for the database. Just like that, I’ve removed a try catch from my method, which definitely looks cleaner. Then we map() again to save the record to the database and finally map to compose the response object including the location path.

Does it work?

brockmills@emmet ~ % curl http://localhost:8080
[{"id":1,"firstName":"Brock","lastName":"Mills"},{"id":2,"firstName":"Brian","lastName":"Fitzgerald"}]%

brockmills@emmet ~ % curl --header 'Content-Type: application/json' --data "{\"firstName\":\"Tester\", \"lastName\":\"$(cat /dev/urandom | env LC_CTYPE=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -1)\"}" http://localhost:8080/

brockmills@emmet ~ % curl http://localhost:8080
[{"id":1,"firstName":"Brock","lastName":"Mills"},{"id":2,"firstName":"Brian","lastName":"Fitzgerald"},{"id":3,"firstName":"Tester","lastName":"MEz8ibpIWSL34vWQM285aGmnrdidH7qL"}]%

brockmills@emmet ~ %

Sort of! The new record is now in the db, but there’s no temperature…

Hmm, so it looks like the API call fails for some reason, however the exception is swallowed and overridden by the onErrorReturn(), leaving us without the all important weather data. Worse is that we might have let this slip into prod if we weren’t being so careful6.

The reactive API also has a onErrorResume() method which allows for more flexible error handling. Lets try that, replacing the onErrorReturn():

                .onErrorResume(e -> {
                   log.error("error calling weather API: " + e.getMessage(), e);
                   return Mono.just(Optional.empty());
                });

Now we still default to the optional, however we log the exception from the web client, so we now have a fighting chance of working out what’s gone wrong.

Testing again results in:

2020-07-07 12:57:12.755 ERROR 2894 --- [ctor-http-nio-4] com.marlo.reactivedb.CustomerController  : error calling weather API: 401 Unauthorized from GET https://api.openweathermap.org/data/2.5/weather?q=perth&appid=weatherkey&units=metric

org.springframework.web.reactive.function.client.WebClientResponseException$Unauthorized: 401 Unauthorized from GET https://api.openweathermap.org/data/2.5/weather?q=perth&appid=weatherkey&units=metric
    at org.springframework.web.reactive.function.client.WebClientResponseException.create(WebClientResponseException.java:181) ~[spring-webflux-5.2.7.RELEASE.jar:5.2.7.RELEASE]
    Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: 
Error has been observed at the following site(s):
    |_ checkpoint ⇢ 401 from GET https://api.openweathermap.org/data/2.5/weather?q=perth&appid=weatherkey&units=metric [DefaultWebClient]
Stack trace:
        at org.springframework.web.reactive.function.client.WebClientResponseException.create(WebClientResponseException.java:181) ~[spring-webflux-5.2.7.RELEASE.jar:5.2.7.RELEASE]
        at org.springframework.web.reactive.function.client.DefaultClientResponse.lambda$createException$1(DefaultClientResponse.java:206) ~[spring-webflux-5.2.7.RELEASE.jar:5.2.7.RELEASE]
        at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:100) ~[reactor-core-3.3.6.RELEASE.jar:3.3.6.RELEASE]
        at reactor.core.publisher.FluxDefaultIfEmpty$DefaultIfEmptySubscriber.onNext(FluxDefaultIfEmpty.java:92) ~[reactor-core-3.3.6.RELEASE.jar:3.3.6.RELEASE]
... cut for brevity, however all classes below are reactor.netty.channel, reactor.core.publisher, io.netty or java.lang namespaces..

Ah, the API key isn’t being set and is defaulting to weatherapikey. Whoops, oh well that’s easy fixed, I need to add the WEATHER_API_KEY env variable.

An interesting effect of the reactive setup then subscribe/execute paradigm is that the stack trace in the exception is not particularly useful in locating the source of the error. It does identify the class where the thing has gone wrong, but since my code is really just defining the execution rather than getting executed, it does nothing to help me find the line in my code that’s blown up. This is annoying, however with careful logging, shouldn’t be a blocker to running reactor in production.

Right, add the API key:

brockmills@emmet ~ % export WEATHER_API_KEY=the_api_key java -jar target/reactive-db-0.0.1-SNAPSHOT.jar
... snip ...
2020-07-07 13:02:35.163  INFO 91619 --- [           main] c.m.reactivedb.ReactiveDbApplication     : Started ReactiveDbApplication in 4.838 seconds (JVM running for 11.345)

and retest:

brockmills@emmet ~ % curl --header 'Content-Type: application/json' --data "{\"firstName\":\"Tester\", \"lastName\":\"$(cat /dev/urandom | env LC_CTYPE=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -1)\"}" http://localhost:8080/
brockmills@emmet ~ % curl http://localhost:8080
[{"id":1,"firstName":"Brock","lastName":"Mills"},{"id":2,"firstName":"Brian","lastName":"Fitzgerald"},{"id":3,"firstName":"Tester","lastName":"31VqMkuXXVRa06lZsfXiNTdgZYKkwjIP","airTempActual":16.72,"airTempFeelsLike":16.18}]%

It works! Deploy and we are done. Get it into prod and bask in the glow of simple, performant, modern microservices.

Hopefully this has provided a reasonable introduction into how reactive does its thing and what microservices using Spring Boot Reactive look like.

CompuHyperGlobalMegaNet will surely dominate with this, but we aren’t finished. Oh no, not even close. We recently hired an intern and thought we could get them started on unit tests and error handling and one of the BA’s asked what the new performance stats were like on reactive (did we re-run the test? Might need to follow that up) so keep an eye out for posts in the near future.

The code is available on gitlab at (https://gitlab.com/themarlogrouppublic/reactive-blogs).

Endnotes

  1. Richer than astronauts!
  2. I mean start, not golang. We are enterprise, take this seriously, please.
  3. Lots of flame graph tools measure CPU usage and computers are smart enough to not worry the CPU with IO tasks. I was actually surprised at the amount of time the JVM spends in the compiler here and wonder how long I would need to run it for to get the majority of samples to be business logic rather than JVM. I sense a follow up post!
  4. Yes, it did (but with something far more important than this. In my defense the client was using my test environment 🙃).
  5. Calls to subscribe() are meant to be non blocking, however in the log here we can see these are all executing on the main thread, i.e. the thread is blocked. I think this is a spring context specific thing, but I don’t actually know why this happens. ¯\(ツ)
  6. We wont have to be nearly as careful once we get those automated tests the QA team have been banging on about for ages.


Photo source: Christopher Gower Unsplash

Omni-channel, Cloud, Open Source, Microservices, Security, Scalability, Agility – these are just some of the concerns facing technology teams as they work to quickly deliver customer focused digital solutions.

At Marlo, we have seen organisations spin their wheels while designing and building the infrastructure and delivery capability to operate in a digital environment. In response, we have tapped into our combined experience to produce the Marlo Digital Enablement Platform [MDEP]. MDEP is an opinionated and extensible platform that has been designed around the following principles:

  • Combine the best open-source, SaaS and cloud-native services
  • Containerised workloads are the unit of deployment
  • Managed Kubernetes is the runtime environment
  • APIs/messaging are the standard model of external interaction
  • The platform is cloud agnostic
  • Security is designed in from the ground up
  • Delivery pipelines are fully automated
  • Platform provisioning and upgrades are zero-outage

That’s nice, but what do I do with it?

Much as it’s fun to kick off a CI/CD pipeline and see a new production-ready cloud platform spring into life in less that an hour, we knew that we had to show how this platform can reduce the workload on teams including developers, testers, and DevOps.

To do this, we have set about building two technology demonstrators that cover business domains that we are heavily involved in. Even if you don’t work in banking or government, they will still show how the platform accelerates delivery.

Our demonstration applications

The Open Banking demonstration provides both web and mobile interfaces allowing users to logon and interact with typical banking features including account and transaction lookups, changing personal details and making payments. Core system functionality comes from a mix of a mock banking system and live calls to public Open Banking APIs.

The Victorian Government Planning demonstration simulates providing access to VicPlan information for a citizen wishing to find details of a property including the local government area and planning scheme overlays. This demonstration retrieves details from public APIs on the Internet.

Each application showcases technology features that are critical to providing modern real-world applications:

Microservices managed as a mesh. A microservice is a small, business-oriented software component that takes exclusive responsibility for an individual domain. This architecture helps teams manage scale and the need for rapid change. The platform automatically deploys microservices into the open source Istio service mesh which abstracts API traffic management concerns such as discovery and security away from developers, as well as providing common resilience patterns including retries, and circuit breakers.

APIs and Integration. Microservice logic as well as core systems and external interfaces are abstracted behind well structured REST and RPC APIs. This provides quick adoption by multiple user channels such as the web and mobile interfaces implemented in the demonstrations.

Containerised deployment onto the Cloud. By packaging into containers and deploying onto public cloud infrastructure, MDEP leverages the enormous scalability and resilience that can be provided by the major cloud providers. Deployable units are Docker images which allows them to be distributed across Kubernetes clusters.

On demand provisioning of supporting components. The build pipelines have been designed to readily provision extension components such as databases and caching in support of the business logic.

Security. MDEP has been designed to be secure from its inception. Many security features including inter-service communication, network zoning, and policy enforcement via an API gateway and service mesh are provisioned by default using the CI/CD pipelines that build both the platform instances and deploy applications. The Open Banking application demonstrates the integration of an external identity provider to provide OAuth 2.0 and multi-factor authentication.

DevOps pipeline automation. The MDEP platform and agile development practices are aligned with modern DevOps practices. Changes to platforms are only permitted via the CI/CD pipelines, ensuring that all infrastructure and code is managed under Source Control Management and CI/CD processes.

What’s a Digital Enablement Platform?

Digital delivery requires speed and a focus on customer experience rather than technology. To enable this, a digital platform needs to remove as many technology concerns as possible. Marlo’s platform provides an opinionated and automated default configuration for the entire end-to-end lifecycle of digital development. To achieve this it leverages what we believe to be current best-practice tools and services including:

  • Deployment onto any of the major cloud providers
  • Use of cloud-native and open source components to encourage scaling cost to zero for unused components
  • Full automation via CI/CD pipelines using a combination of GitLab, Red Hat Ansible, and Hashicorp Terraform
  • Docker, Kubernetes and Istio for workload management


What do build teams get from the platform?

Product Owners avoid a lengthy planning, architecture and procurement ramp-up period by using an opinionated platform based on our experience and best practice.

Architects avoid license driven architectures and product lock-in by using cloud-native, SaaS, and open source components.

Designers and Developers focus on business logic while using development standards including SCM, naming standards, monitoring & logging, automated code defect scanning, and API documentation.

Testers benefit from the Karate test automation framework that is embedded into the CI/CD pipelines, tests are written using the Behaviour Driven Development (BDD) syntax. The Selenium framework provides UI testing. Together they provide full coverage of different testing types including functional, UI and performance.

DevOps teams are provided with automated and zero-outage deployments, the ability to quickly provision new platform instances, source and artefact management, and a simple mechanism to provide supporting components such as databases.

Support teams can readily visualise the state of both the platform instances and the microservices running on them. The open source Kiali service mesh management console, and cloud platform services such as AWS CloudWatch are utilised to ensure each platform is easy to operate.

Can I see this for myself?

If you are starting your digital journey or if your current technology practises are delivering too slowly then Marlo would be happy to demonstrate and discuss how MDEP can address your specific needs. Using automation, we can show a new secure and scalable platform instance being created in real-time during our discussions.

Photo by Christian Fregnan on Unsplash

At Marlo we specialise in systems integration: Message Brokers, Enterprise Service Buses, Business Process Management, EAI – you name it, we’ve seen (and implemented) them all – tomorrow’s legacy today!

One of our current favourite application architectures for complex systems is microservices. Developing microservices is great – we have seen great benefits derive from using this approach – and we’ve built them using various technologies.

But your typical integration developer is never happy!

They complain about their frameworks, tools or platforms and will constantly hassle architects about why we should be using X for this or Y for that. And our architects feel their pain, because they develop too.

So when we started a big API and microservices project a couple of years ago, of course we jumped at the chance to write all the services in fully modern async, reactive monad actor model microservices.

No we didn’t: we wrote them in good old fashioned imperative if-then-else using Spring Boot: we had projects to deliver and the reactive frameworks weren’t quite mature enough to convince our stakeholders to go for it.

But then, early in 2018, Spring Boot version 2.0 went GA and delivered a very capable reactive library to our chosen framework. It was time to get down into the weeds with reactive microservices.

What are Reactive Microservices?

Reactive programming is a paradigm where the application is anchored around data streams and propagation of change. Some definitions:

  • Microservices are small, self contained services that scale well and are independently deployable.
  • Reactive microservices are microservices that conform to the Reactive Manifesto:

We believe that a coherent approach to systems architecture is needed, and we believe that all necessary aspects are already recognised individually: we want systems that are Responsive, Resilient, Elastic and Message Driven. We call these Reactive Systems
Source: The Reactive Manifesto

The Reactive Manifesto takes the paradigm of reactive programming and lays out explicit application behaviour. It’s worth reading and understanding in full, but we’ll summarise it for you now. Let’s break down the four key concepts:

  • Responsive: services respond consistently in a timely manner and establish a reliable upper bound for response times.
  • Resilient: services stay responsive in the event of failure. By making use of replication, containment, isolation and delegation patterns, they ensure that failures in one component do not affect another.
  • Elastic: services react to changes in demand by increasing or decreasing resources allocated to them as required.
  • Message Driven: services use asynchronous messaging at the boundaries between components, driving loose coupling, isolation and location transparency. Non-blocking, asynchronous communication protocols allow systems to use resources only when there is something to be done.

But what does all that really mean?

Reactive Microservices are the computer program equivalent of that hardworking, conscientious co-worker who is always doing something useful, as opposed to the layabout who will say they are working when they send an email and then sit around waiting for the reply.

Less of this:
xkcd 303 - compiling

In plain English, reactive means the service works well:

  • It doesn’t tie up vital resources, such as CPU threads, when it’s not doing anything – like waiting around for bytes to appear over a network connection from some remote web server
  • It can scale easily and talk to other services regardless of where that service runs
  • When part of a system fails, it only affects the bit that’s broken – other unrelated parts of the application can continue working as though nothing is wrong

Why Haven’t Reactive Microservices Taken Over The World (Yet)?

This is a reasonable question and it’s got some interesting answers. It already has taken over some aspects of computing, especially in the user space APIs in file systems and networking for virtually any modern language.

The main reason it really hasn’t taken hold in the enterprise application development is that across the range of things that a typical application needs to do – make HTTP requests over the network or query a database – reactive tools simply weren’t available. Until now!

Reactive in Java

So now we know about reactive microservices, let’s talk about them in our usual tech stack: Java.

The underlying design principles of reactive services are nothing new, and (hopefully) you’ve been doing this stuff for years:

  • Writing small, independently deployable components
  • Designing clean APIs with true separation of concerns
  • Deploying your applications in containers

This is a good start, but there is still more to achieve. The bad guy here is Blocking I/O.

A Brief Foray Into Java History

The old way of developing microservices used the classic servlet style. This was a simpler time when applications were not too fussed with dispatching a request and then waiting around for the response to come back and tying up an entire OS thread while that happened. The code was simpler too. For our classic Java application, this means we have API endpoints in Controller classes that utilise a set of Service classes that implement various business logic, all wrapped up in a servlet engine.

Servlet engines such as Tomcat, Jetty, Websphere and WebLogic are all very well engineered, but the Servlet specification that they implement has not aged quite so well.

They all suffer from a variant of the 10k connection problem: they struggle to scale past about 10,000 concurrent connections.

The primary reason for this is that v2.1 servlet engines typically block on I/O. Not only that, but they allocate a thread per incoming request which then executes the necessary code, waiting patiently for network clients to return results, until it finally sends its response to the caller.

Version 3 and 3.1 of the Servlet specification resolved this somewhat, by providing asynchronous readers and writers via the ReadListener and WriteListener interfaces. These are pretty clever in how they mediate between clients and each servlet invocation, but all code inside the service call is still synchronous and will block on outbound I/O.

Back To Today

All the good modern Java HTTP clients – HTTP Components v4, OkHttp, RestTemplate, JerseyClient – are well written, efficient, utilising connection pooling and efficient thread management. But they all use blocking I/O.

Like we said before, the classic Java servlet application will scale to around 10,000 concurrent requests. That’s a lot of requests, but once you get more things go bad fast. A major reason they go bad is they use Java threads, which are backed by an Operating System thread, which require:

  • Time: a CPU context switch and 2 system calls
  • Resources: the JVM will typically allocate between 256Kb and 512Kb for the stack per thread NOTE: this is off heap memory!

Java threads (and OS threads) are simply too expensive to be sitting around doing nothing while waiting for a mobile phone on a GPRS connection to dribble its bytes over the internet! With the current Java threading and memory model, it is simply not possible to scale to millions of concurrent operations per JVM.

So… how do we break through this barrier?

MOAR THREADS???
No.

Use golang?
Uhh… no. Too confusing with its GOPATHS, back to front type declarations and whatnot. I realise this is not a great argument against Go, work with me here.

Node!
Please, let’s be serious! We are enterprise!

Highly scalable Java applications are impossible!
They were. Until now. Enter the reactor…

The Reactor Engine

The reactor engine is the thing that turns the layabout app into an eager always busy app, by enabling 2 key capabilities:

  1. It doesn’t block a thread on I/O – data transfer doesn’t occupy an OS thread while waiting for data to be received; and
  2. It handles backpressure – a mechanism that attempts to ensure that producers don’t overwhelm consumers, which works by having the producer slow down when the consumer of the I/O is too busy. If you’ve seen those "one vehicle per green" lights on a freeway entry ramp, you’ve seen a real life application of back pressure in action.

Non-blocking I/O and backpressure mean the application doesn’t need to go adding threads to service more requests. It scales with very few threads and matches the rate data is produced to that which can be consumed!

Many platforms and applications that utilise reactive principles have only 1 thread! Node.js, which has been the butt of many an enterprise developer joke, can handle 1,000,000 concurrent requests with a single thread, as can NGINX, Erlang and others.

Spring reactive gets it’s non blocking abilities from Netty, which it uses for handling both server and client network I/O. Java applications built on this will use more than 1 thread, but it will be in the order of 10 threads, as opposed to the hundreds or thousands that may have been configured in a classic servlet engine. In turn the cost of context switching is avoided in both memory and CPU usage and the time spent managing the application goes down and time spent doing useful work goes up.

We have let the performance genie out of the bottle!

Conclusion

Now we know what a reactive microservice is and why we should want to start writing one, it’s time to have a look at what it means for the code.

Next time we’ll dive into a reactive Spring Boot application and see how it works under the hood.

Kong HQ

For our November Tech Forum, Vikas Vijendra from Kong visited our Melbourne office to bring us up to speed on what’s happening at KongHQ.

At Marlo we are already familiar with the Open Source Kong API Gateway and we like how it fits into our own digital enablement platform. Kong, however, are making a bold shift in product direction with the announcement of their Service Control Platform. They understand that while we might be focused on RESTful APIs today, the future will also include protocols such as gRPC, GraphQL and Kafka. Moreover, the advent of Kubernetes as the container platform of choice means Kong needs to extend into the cluster itself to provide full lifecycle service management.

The main features of the Service Control Platform are:

  • A centralized control plane to design, test, monitor and manage services
  • Multiple Runtimes – not just the nginx engine of Kong but also Istio, Kuma, Apollo and serverless
  • Multiple Protocols – REST, gRPC, GraphQL and Kafka
  • Multiple Platforms – All major cloud providers plus any Kubernetes

The open source API Gateway offering will remain with most of the new features available in the Kong Enterprise offering. These include:

  • Kong for Kubernetes (K4K8S): a supported version of the Kong Ingress Gateway for Kubernetes along with all enterprise plugins
  • Kong Studio: for designing, mocking and testing APIs
  • Kong Manager: for the runtime monitoring and management of deployed services.
  • Kong Developer Portal: a self-service portal providing access to the service catalog.

All of the above features are available as a SaaS offering (Kong Cloud) or on-premise, or any combination of the two.

Perhaps most interesting is the announcement of the Kuma service mesh. An Ingress Controller alone, is limited to managing traffic entering a cluster (north-south traffic). In a microservices architecture most of the traffic is between services on the same cluster (east-west traffic). A service mesh allows control of traffic between these services.

Of course Istio is the dominant product in the service mesh space but Kong (and others) believe Istio has become too complex and Kuma provides a more appropriate level of functionality. The functionality of the Ingress Gateway and the service mesh will eventually morph into a single product controlling both north-south and east-west traffic.

At our latest tech forum, James Liu, Hybrid Application Modernization Specialist from Google, visited Marlo’s Melbourne office and presented on Google Anthos and more broadly on some of the exciting tech coming out of the Google Cloud Platform.

Anthos lets you build and manage modern hybrid applications in your data centre or in the public cloud. Built on open source technologies pioneered by Google—including Kubernetes, Istio, and Knative—Anthos enables consistency between on-premise and cloud environments. Anthos is a vital part of strategically enabling your business with transformational technologies like service mesh, containers, and microservices.

The main takeaways from the session include:

  • GKE (Google Kubernetes Engine) on-premise lets you create a fully managed Kubernetes cluster in your own data centre, controlled and managed from the Google console control plane – all over a https connection.
  • You will soon be able run a Google-managed GKE cluster on any IaaS cloud provider (currently AWS only). This is a great approach for businesses needing a multi-cloud strategy.
  • Anthos Config Management provides a git-based common configuration tool for all policies across Kubernetes clusters both on-prem and in the cloud.
  • Google Cloud Service Mesh provides a fully managed Istio implementation. This represents the next stage of abstraction of the underlying infrastructure.

Marlo is a certified Google Partner working with large business and government clients across Australia and South East Asia. We are Australia’s leading specialists in the delivery of microservices and legacy integration solutions in cloud and on-premise environments.

Get in touch to find out how we can help enable your organisation’s digital journey.

 

Tech Lead Vishal Raizada recently conducted a very informative Tech Forum at the Marlo Office. He presented on Istio: Architecture, Application and Ease of Implementation.

Our tech forum presentation is downloadable here and showcases an example of Istio’s implementation, application and benefits.

Istio is now a key part of the Marlo Digital Enablement Platform – our open source, cloud-native platform which provides a complete on-demand environment for digital delivery.

The enterprise application landscape has changed a lot in the last decade: from managing on premise servers to using infrastructure as a service; from monolithic applications to building microservices.

The new world offers many benefits but it also introduces new challenges. With the distributed nature of the application landscape, service discovery and general application composition becomes extremely complex. Controls, such as traffic management, security and observability, which could previously be managed in one place now become a scattered problem.

Enter Istio, a service mesh framework, which wraps around a cloud native architecture and adds a layer of abstraction to manage these complexities. It enables a truly automated delivery process, where a development team can purely focus on code, and Istio handles the rest, including service discovery, security, circuit breaking and much more. In addition, it is programmable, hence it can be incorporated as part of the DevOps & DevSecOps process with ease. A service mesh gives control back to the enterprise application world without taking away any of the benefits.

Read Vish’s full presentation here.

Cutting Environment Costs In The Digital Age

If you’re a CIO, or an infrastructure manager, then you’ve probably got a mandate from the CFO or the CEO to cut costs. And you’re running a complex set of applications, across multiple environments – at least 3 (production, test and dev). Depending on how mature your infrastructure team is, you might already be running 5 or 6 environments, or even more.

But how many environments do you really need?

Multiple dev and test environments are needed to deal with different projects and agile teams delivering at different cadences, all wanting their own separate dev and test environments. You’re probably operating in multiple data centres and have to worry about multiple cloud providers and SaaS vendors.

If money was no object, you’d be scaling to 20 or 30 environments because that’s what your delivery teams are telling you that they need. Costs aren’t going down in line with your cost-cutting mandate, they’re going up.

So, here’s a radical thought: the number of environments that you actually need to look after is… 1. (And if you’re good, it might be none).

What Do You Actually Want, Anyway?

You want to do the things you need to be able to do and do them well. So, if you’re working for a brewing company, that means you need to ensure your company is good at making, selling and delivering beer.

But as the CIO, you’re in charge of the apps that enable all that good stuff. You want software that works, running in production, on kit that doesn’t fall over, at a reasonable cost. That’s about it.

If you didn’t have to worry about managing multiple non-production environments across the data centre and the cloud, and all the cost and complexity that comes with them, then we bet that, frankly, you’d give it all up tomorrow.

Getting to One

To see why you only need that one environment, and why you can get rid of all the rest, let’s think about how the development of 3 key technologies that have grown up over the last 10 years: Cloud, DevOps, and API’s and microservices.

Cloud

The grand promise of cloud is that Cloud says infrastructure is available on demand. You can have any number of servers, at any scale, whenever you want them. As much as you like. Somewhere in Sydney, Tokyo, Stockholm, London, São Paolo or Mumbai is a data centre the size of a football field, and it’s yours for the taking. If you want a dozen 128-CPU boxes with over 3TB of RAM, several petabytes of storage and 25-gigabit networking, they’re all yours (as long as your credit card is working!) You can have this, literally in minutes, any time of day or night.

DevOps

We can go one step further than that: DevOps says not only is infrastructure available on demand, but that it is code. You can automate the provisioning of infrastructure, and on top of that, automate the deployment of all your applications.

You can have software on demand, not just infrastructure. By extension you can construct an entire environment whenever you need it, wherever you need it – and again by extension, you can throw it away whenever you don’t need it.

API’s and Microservices

But that’s not going quite far enough. The API Gateway means you can securely compartmentalise your environments – by insisting that every interaction between systems is mediated through an API gateway, you build a standard interface mechanism that is network-agnostic – so it matters less which network your API’s (and the (micro)services they provide façades for) live on. Coupled with the ability – in non-production environments at least – to mock and stub API services, this vastly reduces the need to be managing and running monolithic environments that contain all your services at once.

If your infrastructure is available on demand, and infrastructure is code, and environments are compartmentalised by API Gateways, then anyone can bring a dev or test environment – you don’t need to care where it is. It doesn’t need to be in your data centre, and it doesn’t really need to be in your VPC either.

Which Environments Do You Actually Need?

Production, maybe, and then only because you’ve still got legacy applications that you haven’t yet hidden behind API’s. But give as much of that away as you can, as soon as you can, using the SaaS model as your template.

Wherever possible, you should outsource the problem of running dev environments to your vendors who do build and test. They should be doing it on their kit at their cost.

They’ll be super-efficient: there will be no dev environment running if they’re not actually doing dev right this minute, unless they enjoy the smell of burning (their own) money. There’s no point in you running dev environments any more. Platforms like Marlo’s Digital Enablement Platform [MDEP] provide for very rapid start environments where dev teams can be up and running, building business code, in a few hours, not days or weeks.

Furthermore, you should be making vendors run your testing environments for the applications that they’re delivering, and for the same reasons as dev. You still have to manage test data (and most organisations still have to solve for privacy, but they seem to manage that just fine when they implement Salesforce). And you’ll need to ensure that they make their environments available whenever end-to-end testing is going on.

What You’re Still Going To Have To Solve

  • Security provisioning and network access to any environments that you’re still running
  • Making sure that legacy applications have their own API’s (and API Gateways) in front of them, so they can be accessed safely by external developers
  • Vendor contracts that encourage the right behaviour when vendors run dev and test environments
  • Access to code (escrow arrangements)
  • Standards and guidelines for vendors delivering applications and services to you
  • Providing platforms like the Marlo Digital Enablement Platform [MDEP] to standardise and govern the way that your applications are built and deployed – mostly for non-functionals like security, monitoring, logging and auditing
  • Dependency management on a grand scale (but you already have this problem, and well-designed API’s help)

Conclusion

  • Make your vendors bring their own environments for digital delivery; embed requirements for how they should behave in contracts
  • Implement standards and guidelines for delivery – solve problems like containerisation, security, reliability, scalability, monitoring and logging etc in standard, cloud-native ways
  • Provide standardised platforms for hosting in production like MDEP, so that delivery can concentrate on business value
  • Engage with organisations like Marlo who truly understand the challenges of – and how to succeed in – today’s complex digital environments