Septimal Mind Blog

October 2019 (1381 Words, 8 Minutes)

SDLC infrastructure

Monorepo or Multirepo? Role-Based Repositories

Summary

When you have a lot of code, it’s always hard to find the right way to organize it.

Typically engineers choose between monorepo and multirepo layouts. Both approaches have well-known advantages and disadvantages which significantly affect team productivity.

It’s possible to establish a combined workflow that keeps all the advantages of a multirepo layout without giving up the positive traits of a monorepo layout.

We created a draft tool that implements our approach for Scala projects using SBT.

The problem

Monorepo: good and bad

The strengths of a monorepo layout are:

Coherence: you always have a coherent codebase that represents all your work. You may build all product components together and have good guarantees of their compatibility. When you make a change and your build finishes, you are done: you don’t have to modify and test any other repository/component which may be affected by the change, you may always perform any global operations globally with a single command and be sure there will be no discrepancies between expected and actual codebase state.
Cheap workflows: you may use just one CI job, and you can easily release and deploy your components together, you may easily refactor the code.

But there are significant shortcomings:

Isolation: a monorepo does not prevent engineers from using code they should not use. So, big projects in a monorepo have a tendency to degrade and become unmaintainable over time. It’s possible to enforce strict code review and artifact layout rules to prevent such degradation, but it’s not easy and it’s time-consuming.
Build time: if you have a monolithic project in a monorepo, you have to build (and often test) all the components together. It may be addressed by an incremental compiler, but it does not solve all issues. It may also be addressed by keeping independent projects within one repository but in that case most of the multirepo shortcomings (see below) apply.
Merge conflicts: teams working in a monorepo environment have to maintain a good VCS flow to avoid interference. While it’s a very good idea to teach engineers how to use GIT properly, the discipline doesn’t come for free.
VCS actions take time: when you host a huge project (like Chromium) in Git, it may take a lot of time even to perform a checkout. This affects only huge projects and huge teams so it’s outside of the scope of this post.

Multirepo: good and bad

A multirepo layout is often considered the first answer to monorepo issues because:

It enforces strict isolation between independent software components,
It allows people to quickly build independent components,
It allows people not to interfere while working on independent projects.

However, multirepo also has serious drawbacks:

Global refactorings that affect a shared component are painful: even simple renames cannot be done in one click.
It may be hard to perform any kind of integration. When you have multiple components, you have to build a comprehensive orchestration solution for your integration testing and deployments, you have to set up sophisticated CI flows, etc.
If your release flow involves several components, it’s always painful.

These things are especially bad when you have explicit or implicit dependencies between your components, which is a typical case. Usually we have at least one shared library (aka SDK), and many or all our components (aka microservices) depend on it.

The solution

The idea

Let’s assume that we have a product (an online auction platform, for example) consisting of several software components:

iam: Identity and Account Management Service
billing: Billing Service
analytics: Analytics Service
bidding: Bidding Service
catalog: Item Catalog Service

All these projects use one shared SDK named sdk.

We may also assume that there would be several teams working on these projects. For example, we may assign sdk, iam, and catalog projects to the “infrastructure” team, billing and analytics to “finance” team and bidding to “store” team.

Imagine that you have a magic tool project that allows us to choose which projects we want to work on and set up the corresponding environment:

# Prepares workspace for all our components
project prepare *

# Prepares workspace to work on `billing` and `analytics`
# Pulls in `sdk` as well
project prepare +billing +analytics

# Prepares workspace to work on `sdk`, `iam` and `catalog`
project prepare :infrastructure

# Prepares cross-build project
project prepare --platforms js,jvm,native :infrastructure

This tool would need some kind of declarative description of our product stored in a repository. The rest can be as flexible as we want. For example, if we don’t want to keep all source code in one repository, the tool may pull the components from different repositories, take care of commits, etc, etc.

We may say that our repository has roles and at any time we may choose which roles we wish to activate. So, we may call this approach “Role-Based Repository”, or RBR.

Such a tool would solve most of these problems. When we need to perform a global refactoring, we may generate an all-in-one project. When we wish to implement a quick patch we may generate a project with just one component. When we need to integrate several components we may choose what exactly we need. Etc, etc.

The reality

Unfortunately, there is no such tool that is polyglot, convenient, and easy to use. Something can be done with Bazel, but as far as I know there are no good solutions at this moment (October 2019).

And things get worse when we need this for Scala. They get even worse when we need to work with cross-platform Scala environments (ScalaJS and Scala Native).

SBT and IntellijIDEA

There is no sane way to exclude some projects from an SBT build based on selected criteria. You may write something like

lazy val conditionalProject = if (condition) {
  project.in(...)
} else {
  null
}

But it’s ugly, inconvenient and hard to compose.

Cross-platform projects were always painful. It takes at least twice as long to build a cross-project. And there is no way to, for example, omit all the ScalaJS projects from a build.

For example, IDEA frequently fails to compile any project if sbt-crossproject plugin is on. IDEA cannot run tests in cross-projects. And so on.

SBT builds become very verbose and hard to maintain when you use cross-projects. Usually you have to write at least 3 redundant expressions per artifact.

sbtgen: a prototype of a tool for RBR flow

We’ve created our own dirty tool which prototypes the approach we wish to have. Essentially, it’s a library intended to be used in an ammonite script which takes declarative project definitions and emits SBT build files.

You may find a real project using it here. In case you want to play with it you would need Coursier installed. After you clone the project you may try the following commands:

# generates pure JVM project
./sbtgen.sc

# generates JVM/JS cross-project
./sbtgen.sc --js

# generates pure JVM project for just one of our components
./sbtgen.sc -u distage

Currently sbtgen is a very simple and dirty prototype but it made our team happy. Now it’s easy to release, when we need it we may choose what to work on, what to build and what to test. Also, surprisingly, SBT startup time is much shorter when we generate our projects instead of using sophisticated plugins to avoid settings duplication.

I don’t encourage you to use sbtgen, but next time you think about organizing your code, consider the RBR workflow even if you would have to write your own code generator.

I can say for sure that you will not be disappointed.

Things to do

sbtgen needs to support multi-repository layouts. At this point all the source code needs to be kept together with the build descriptor,
I think such functionality should be incorporated into SBT. There are some plugins (sbt-projectmatrix and sriracha) that make SBT projects somewhat configurable and less rigid, but they are still far from what we actually need.

P.S.

The idea of roles is very useful in many different domains. For example, we may fuse microservices into “flexible monoliths”; check our slides. You may also read about our project, distage, a module system with an automatic solver for Scala. It allows you to build multi-role applications.

You may follow me on twitter.

Updates

Nov/2019

The author of a similar tool for .NET/C# and JS projects approached me recently. Here you may find his tool. I think it’s a good proof that the idea is viable. I also think we need new flexible polyglot build tools that support a role-based approach (Bazel?). I have a computational model suitable for such tools, and one day I’ll make another post about it.