Intro

If you are a developer or software engineer who builds complex applications in a distributed system, chances are you are working with data-intensive applications.

In 2022 applications are more concerned with problems in the domain of data. Consistency, complexity, evolvability, and state management of data are one of the key concerns of which we build and design our solutions around. Compute is rarely the limiting factor and when it does become one, many technologies and design patterns exist now a days to accomodate raw CPU power into our systems. The y2k problem arose because a long time ago compute and memory were so limited that every optimisation had to be undertaken to simply make things work, and some were made without a 10, 20, 30 year look into the future!

Data Systems

If you have built an application in a distributed system, you may have used any of (and potentially even all!) of the following standard building blocks:

Databases: Store data so that one or many applications are able to find it again
Caches: Store the result of an expensive operation, speeding up reads
Search Indexes: Search data by a keyword or filter in various ways
Stream processing: Send a message to another application to be handled asynchronously
Batch Processing: Process very large amounts of accumulated systems.

All of the above are typically thought of as being different categories of tools, but can be lumped together under the umbrella term of Data Systems because they face the same underlying design challenges and principles that come with distributed systems. For example, a message broker and noSQL database are concerned with the following:

Replication
Partitioning
Transactions
Consistency

Further more, boundaries are becoming more and more blurred. Datastores can now be used as message queues (Redis). Message queues can now provide database-like guarantees (Kafka) and some message queues provide functionality be used as key-value datastores (NATS). And even traditional RDBMS such as mySQL and PostgreSQL are embracing document data models commonly associated with distributed noSQL databases.

Designing Data Systems

When building applications in a distributed system we need to figure out which tools and patterns are the most appropriate to solve the task at hand. When we combine the tools mentioned above we have now created a new special-purpose Data System from general purpose components.No longer are we just application developers, but data system designers.

But designing data systems are no easy task, and come with many challenges:

How do we handle errors with the network or errors with our application node themselves?
How can we provide data integrity and consistency to clients in the face of service degradation?
What happens when our load increases 10x in one minute?

Additionally there may be many factors specific to the situation of your company and or project:

Legacy System Requirements
Organisational risk tolerance
Regulatory constraints
Size and skill of your team

The list is seemingly endless, but rest assured that robust, secure system that meets your requirements can always be designed, albeit with some trade-offs. The more challenges we try to solve the more complex our system becomes, so then we must also learn to understand and manage complexity. Too much complexity can encroach on maintainability but an overly simplistic design may not account for all the ways your system may fail!

These ideas will be explored in great detail in further articles, but first we need to really understand what the “task at hand” entails. The best way to do this is to model out our assumptions of our system; functional and non-functional.

Functional Requirements

Your functional requirements will be the business domain related concerns of your application: What is it supposed to do? What are its features? What features are more or less important to your customers? etc.

Domain driven design is a great framework for how we can model and breakdown a complex domain. This will differ from company to company and project to project, but there are first-principles approaches we can use to break down and understand our functional requirements. Read my blog series on Domain Driven Design to get amongst it (TBD!)

Non-functional Requirements

We can categorise our non-functional requirements into three concerns:

Reliability: The system will work correctly in the face of hardware & software faults and human error.
Scalability: As the system becomes more busy, good performance must be maintained
Maintainability: People that work on the system who maintain current behaviour and evolve functionality for new use cases should be able to work on it productively.

Engineers will often throw around these terms, and sprinkle in a bit of CAP theorem without a clear understanding of what it really means.

For example, scalability is not a one dimensional property; We cannot say that a system is scalable or not-scalable without considering what it is scalable in respect to. In general, scalability is the idea that we when we increase some metric or load that measures how busy our system is, the performance of our system as a whole should continue to provide reasonable performance. Additionally performance is not a single property, it could be latency, through-put, response time or anything else that you and your system care about!

When designing a system, we must make sure that our key measurables such as load and performance are defined in some way that makes sense for our application When we break things down to this level of granularity we can gain a level of clarity over what our problems are, and design solutions accordingly against these problems. If our assumptions about what problems we are solving for are wrong, we may find our selves building entirely the wrong thing. These steps and first-principles approach are repeatable in the context of different problems and domains.

The book Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems explores our ways of thinking about reliability, scalability, and maintainability and takes a deep look into various techniques, architectures and algorithms to achieve those goals. This article is the first of many that serve as education for myself and others into the concepts discussed in the book through the lens of a Software Developer!

Conclusion

In a nutshell:

When designing a complex distributed system we should be conscious of the operations that are happening; what they are, how often they are happening, and how they can best be enabled.

Sources

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems.