- Start with a small candidate that is easy to extract, in order to gain early experiences with microservices
- Focus on build and deployment automation and monitoring up front
- Handle cross-cutting concerns early on to avoid counter-productive consequences, such as feeding the monolith or re-implementing cross-cutting concerns with every microservice
- Design your system event-driven to be easy to evolve, and consider event streams to reduce the overhead of data duplication and to lower the barrier of entry for new microservices
- Be aware that the transformation process to microservices is not running in isolation. Instead, it's affected by a lot circumstances. Watch out for circumstances that hold you back and slow you down, and adjust them accordingly or at least create awareness throughout your organization
When starting a journey to microservices, knowing what to consider might be overwhelming- especially with a small team. Unfortunately, no golden rule that is easily applicable exists. Every journey is different, since every organization is facing different circumstances. In this article I am sharing some lessons learned and challenges from a startup perspective, and what I would do differently the next time introducing microservices.
How our journey from a monolith to microservices started
At the very beginning we started with a monolith in every aspect: we had one team working on one collaboration product, implemented as one code base, based on one technology stack. That worked fine for a while.
After a while everything evolved: the team was growing, we added more and more features to our product, the code base got bigger and bigger, the number of users increased. Which sounds great, right? But …
It took quite a long time to get things done - meetings, discussions, and decisions took longer than before. Responsibilities were not clearly assigned. It took some time before someone felt responsible, e.g. when a bug occured. Our processes were slowing down and our productivity suffered.
The more features we added, the more complex the usability of our product became. Its usability and user experience suffered through continuous feature amendments. Instead of solving our users’ problem well, we were increasingly confusing them.
Due to the monolithic software architecture, it was difficult to add new features without affecting the entire system and it was quite complex to release new changes, since we had to rebuild and redeploy the entire product, even though we changed only a few lines of code. That resulted in high risk deployments which happened less frequently - new features got released slowly.
The need to split and shift things emerged.
Over three years ago we changed our product strategy. We were focusing on usability and user experience improvements, and split our one product JUST SOCIAL into separate apps - each of them taking care of a specific use case. We evolved the idea to provide different apps for sharing documents, communicating in real-time, managing tasks and sharing editorial content and corporate news, as well as managing profiles.
In the meantime, we were splitting our one team into multiple smaller teams and assigned to each of them a specific set of collaboration apps to achieve well-defined responsibilities. We wanted to establish autonomous teams, enabling them to work on different parts of the system independently at their own pace, with minimal impact across teams.
After having divided our one product into separate collaboration apps and having split our one team into multiple smaller teams, the next logical, reasonable step was to reflect autonomy and flexibility in our software architecture as well - by introducing microservices.
Our motivation to introduce microservices is to enable autonomous working at different parts of the system at their own pace, with minimal impact across teams. By developing, deploying and scaling our collaboration apps independently, we want to release changes quickly.
We started our microservices journey by identifying good candidates for microservices first. To identify good candidates, we have to consider the key concepts of modelling good services. The key concepts follow the principles of loose coupling between services and high cohesion within a service. High cohesion within a service is typically reflected by related behaviour that shall stay consistent. In Domain Driven Design, related behaviour is reflected by Bounded Contexts. A Bounded Contexts are semantic boundaries where your domain model lives in and describes services that are responsible for a well-defined business function.
In our case, we used our collaboration apps as high-level Bounded Contexts, reflecting coarse-grained service boundaries. They represented a good starting point to divide them into finer-grained services later on.
We started with the Bounded Context of JUST DRIVE - the collaboration app taking care of document management. Each document is created by an author. The author data is stemming from the profile data, which is managed by the Bounded Context of profile management still residing in the monolith.
We built it as a co-existing service and built it from scratch. It was actually not an exact equivalent of the current one; instead, we introduced a new UI, added more features and made significant changes to the data structure. The Bounded Context of the new service is composed of its domain model taking care of the business logic, the Application Service orchestrating use cases and managing transactions and its input- and output-adapters, such as REST endpoints and adapters for persistence management. The new service owns the document state exclusively - that’s the only service that can both read and write documents.
As mentioned before, each document is created by an author, and the author data is stemming from the profile data managed by the monolith.
The question arouse how the new service and the monolith would interact with each other.
To avoid requesting the author data from the profile service each time we were displaying a document, we kept a local copy of the relevant author data in our new service. The data redundancy is ok as long as data ownership is not undermined - as long as the profile related Bounded Context still owns the profile state exclusively.
Since the local copy and the original data could diverge over time, the monolith needs to notify our new service whenever a profile has been updated. The monolith publishes a ProfileUpdatedEvent as soon as a profile has changed that the new service subscribed to. The new service consumes this event and updates its local copy accordingly.
This event-driven service interaction increased decoupling between services, since we did not have to do a remote cross-context query directly to the monolith. It increased autonomy since the new service could do whatever it liked with its local copy, and could make joins more effectively since it could join the author data locally by its local copy, instead of over the network.
We started with a co-existing service from scratch, and introduced event-driven service interaction for data duplication purposes.
What challenges we faced and how we dealt with them
A co-existing service from scratch is in general a good decomposition strategy, especially if you want to move away from something, e.g. if you would like to move away from obsolete business logic or from your technology stack. But in our case we weaved in too many steps at once when decomposing our first service. As described earlier, we not only built a co-existing service from scratch, but also introduced a new UI, added more features and made significant changes to the data structure. We took so much load on our shoulder in the beginning that we were getting results very late. But especially in the beginning, it’s very important to retrieve fast results to gain early experiences and confidence with microservices.
With the next candidate, we followed a different approach. We focused on the high-level Bounded Context of our chat app next and followed an incremental top-down decomposition strategy by extracting existing code, step by step. We extracted the UI first as a separate web app and introduced a REST-API on the monolith side that the extracted web app could access. At this step we could develop and deploy the web app independently, which allowed us to iterate on the UI rapidly.
After having extracted the UI we could now go further down and decompose the business logic. Untangling business logic creates significant code changes. Depending on the dependencies, we might need to provide a temporary REST API that the monolith uses to address the extracted business logic. At this stage we are still sharing the same data storage.
To become an uncoupled standalone service, we finally need to split the data storage to ensure the new service owns the chat state exclusively.
In each chat discussion chat, participants are involved. The chat participant data stems from the profile data residing in the monolith. As described in the previous DRIVE example, we are keeping a local copy for the chat participant data and subscribed to the ProfileUpdatedEvent to keep this local copy in sync with the original data from the monolith.
From that point on, we could go on and carve out the next Bounded Context from the monolith or divide our coarse-grained services into finer-grained ones later on.
Another challenge was authorization handling.
With almost every service, we were confronted with the question how to handle authorization. To give you a context: the authorization handling is very fine-grained, down to domain object level. Each collaboration app is controlling the authorization of its domain objects, e.g. the authorization of a document is controlled by the authorization settings of the parent folder it resides in.
On the other hand, the authorization is not only fine-grained, but also inter-service dependent; in some cases the authorization of a domain object also depends on the authorization information of parent domain objects residing in a different service, e.g. reading or adding a document attached to a content page depends on the authorization settings of this page, which resides in a different service than the document itself.
Due to these complex requirements, solving distributed authorization caused us a lot of headaches, and we did not provide a solution early on. What happened as a consequence was quite counter-productive. One consequence was that we added a new service to that part of the system, where authorization has already been solved - to the monolith. We feeded the monolith instead of shrinking it. Another consequence was that we started to implement authorization per service. That looked reasonable for us in the very beginning, since our early assumption was that authorization belonged to the same bounded context the domain model lives in, but we missed the inter-service dependencies. As a result, we were copying data back and forth and introduced the risk of collision.
To make a long story short: we merged authorization handling to a centralized microservice in the end.
Along with centralized services comes the risk of introducing a distributed monolith. When you change one part of your system and you have to change another part at the same time, it’s a strong indicator of having introduced a distributed monolith. For instance in our case, when we are introducing a new collaboration app that requires authorization and we need to adjust the centralized authorization service at the same time, we are combining the disadvantages of both worlds: The services are tightly coupled, and in addition, have to communicate over a slow, unreliable network.
Instead, we provided a common contract that the centralized authorization service owns and all downstream services are conforming to. In our case, services translate actions that are authorization related into a common contract that the authorization understands without extra translation. The translation happens in each downstream service, but not in the centralized authorization service. This common contract makes sure that we can now introduce new services without touching and redeploying the centralized authorization service at the same time. One prerequisite is that this common contract is stable, or at least downward-compatible, otherwise you are shifting the problem to the downstream services that need to be updated constantly.
What we have learned
Especially in the beginning, it’s better to start with small services that are easy to extract in order to get fast results and gain early experiences with microservices. If dealing with coarse-grained large services, for us it was more manageable to break the decomposition into incremental steps, e.g. an incremental top down decomposition - doing one manageable step at a time.
Handling cross-cutting concerns early on is critical in order to avoid counter-productive consequences, such as feeding the monolith instead of shrinking it or reimplementing cross-cutting concerns with every service.
When introducing a centralized cross-cutting service, it’s necessary to be careful not to introduce a distributed monolith. In this case, a common stable contract helps to avoid a distributed monolith.
To design a system to be easy to evolve, event-driven service interaction is key to achieving high decoupling between services. Events can be used for notification and for data duplication purposes (event driven state transfer; see example “Co-Existing Service from Scratch” above) and as a primary datasource through an event store by retaining events long term.
When using events purely for notification purposes, additional data from another context is typically requested by a remote cross-context query directly to its source, e.g. through a REST request. We might prefer the simplicity of a remote query rather than dealing with the overhead of maintaining datasets locally, especially when datasets grow. But remote queries are adding a lot of coupling between services and tying services together at runtime.
We can avoid remote queries to another context by internalising them, by introducing a local copy of the relevant cross-context data. As described in the previous JUST DRIVE example, to avoid requesting the relevant author data from the profile service each time we were displaying a document, we duplicated the author data and kept a local copy within our document microservice. We need to keep the duplicated data in sync with the original data - meaning updating our local copy as soon as the original data has changed. To be notified of modified data, the service subscribes to that event containing the changed data and updates its local copy accordingly. In this case, events are used for data duplication purposes, which eliminates remote queries and increases decoupling between services. It also achieves better autonomy, since the service can potentially do whatever it likes with that local copy.
For event-driven service interaction, we have introduced Apache Kafka early on - a distributed, fault-tolerant, scalable commit log service. First, we used Apache Kafka mainly for notification and data duplication purposes. Recently, we introduced Apache Kafka Streams as a shared source of truth to eliminate data duplication overhead, and to achieve high pluggability of services and a lower barrier to entry for new services.
A stream is an unbounded, ordered, continuously updated sequence of structured data records. A data record consists of a key-value pair.
When starting your service in the Apache Kafka streaming context, a Kafka topic will be loaded into your stream that you can process in the scope of your service. A topic is a logical category to which services can publish and subscribe. Each stream is buffered in a state store - a lightweight embedded disk-backed database. The loaded stream is used in your own codebase and is not running oh the Kafka broker; it's running in the process of your microservice. Streams make data available wherever it's needed, which increases performance and autonomy.
Apache Kafka comes with a Stream API. Streams can be joined, filtered, grouped or aggregated using a Domain Specific Language (DSL) and each message in this stream can be processed at a time using function-like operations such as map, transform, peek, etc.
When implementing stream processing, you typically need both a stream and a database for enrichment. Kafka’s Streams API provides such functionality through its core abstractions for streams and tables. There is actually a close relationship between streams and tables; the so-called stream-table duality. A stream can be considered as a changelog of a table, where each data record in the stream captures a state change of the table. A table can be considered as a snapshot, at a point in time, of the latest value for each key in a stream.
When we want to display a document with its author data, with Kafka Streams we can now do the following: the document service is creating a KStream from the document topic and would like to enrich the document data with author related profile data coming from the profile topic. For this enrichment, the document service is creating a KTable from the profile topic. We can now join the stream and table, and store its result as a new state store which can be accessed from outside – to work as an inbuilt Materialized View. Whenever a profile or document gets updated, its related Materialized View gets updated, too.
Compared to the other event driven approaches, Apache Kafka Streams does not require maintaining a local copy, which reduces the overhead for data duplication and keeps this data in sync. It pushes data to where it’s needed, and runs in the same process as your service. It increases pluggability; you can plug-in a new service and can use the stream right away without setting up extra data stores. It reduces overhead and increases performance & autonomy, and lowers the barrier to entry for new services.
The transformation process is not running in isolation, rather, it’s affected by a variety of circumstances: your team size, structure and skillset have an impact on what is manageable for you - especially in the beginning, e.g. a small team with few DevOps practices in place will have an impact on the transformation velocity.
Your transformation process is also affected by the fact that you still have to take care of your legacy system. The time for its maintenance reduces the time available for the transformation process. Your runtime environment is also affecting your journey. Are you running on-premises or on cloud native? Can you rely on managed services, e.g. a managed API-Gateway, or do you have to set up and maintain it by yourself?
And if your strategy is to introduce new features in a short period of time, you might struggle with the decision of where to implement the new requirements: as a new standalone service which takes time, or taking a shortcut and adding it to the monolith - and risking feeding the monolith instead of shrinking it.
Watch out for circumstances that hold you back and slow you down, and adjust them accordingly, or at least create awareness throughout your organization. And keep in mind: every journey is different – your journey might look totally different than ours.
What I would do differently next time introducing microservices
First of all, I would check whether the organization’s strategy is aligned with the microservices' goals of maximal product velocity and releasing changes independently and quickly, e.g. if your organization focuses on long release cycles and deploying everything together, then microservices might not be an optimal choice, since you cannot take advantage of microservices at its full extent.
If you decide to go on a microservices journey, it's necessary that everyone be committed - including management. And everyone needs to be aware that this journey is complex and time consuming - especially in the beginning when you don’t have much experience, yet.
Product-aligned, cross-functional, autonomous teams work very well with microservices, but the shift towards a DevOps culture should be considered very early on. Each team should be ready for constant iterations and be able to develop, release, operate and monitor the services they are responsible for.
To decompose the monolith into multiple, independent services, is just one part of the journey - to operate them is another. The more services you have, the more critical it becomes to automate their build and deployment processes.
If I had to do the journey again, I would start with a small candidate that is easy to extract and would focus not only on its decomposition, but also on build and deployment automation, and monitoring up front with the very first services - that can be used as a foundation for future services. To create that foundation, it might be helpful to build an ephemeral task force composed of individual people from each team.
Each microservice should have its own CI/CD pipeline from the very beginning. Another consideration is to containerize each microservice to get a lightweight, encapsulated runtime environment consistent across stages - especially if you focus on running your services in a cloud environment later on.
Also, monitoring including log aggregation should be considered early on. Monitoring not only server, but also service metrics, such as request latency, throughput and error rate, is necessary to keep track of the services' health and availability. Structuring and standardizing log output, such as timeformat (e.g. ISO8601) and timezone (e.g. UTC), and introducing request contexts with correlation ids and log aggregation, facilitate the diagnostic and forensic processes.
A lot of things need to be covered up front, which is time consuming and requires awareness throughout the entire organization. Microservices are an investment for achieving maximal product velocity, and not about cutting costs.
To remain competitive in the market, product velocity and continuous improvements are some of the key factors in differentiating yourself from your competitors. Microservices can promote your product velocity and continuous improvements, but only if everyone is committed, including management.
About the Author
Susanne Kaiser is an independent tech consultant from Hamburg, Germany, and was previously working as a startup CTO, transforming their SaaS solution from monolith to microservices. She has a background in computer sciences, and experience in software development and software architecture since over 15 years and regularly presents at international tech conferences.