A Modern Stack for Building Scalable Systems
Dive into modern stack recommendations for building robust and scalable systems. This stack is agnostic and can be integrated into all programming languages.
Join the DZone community and get the full member experience.
Join For FreeIn software engineering, we have a lot of tools—tens or hundreds of different tools, products, and platforms. We have SQL DBs, we have NoSQL DBs with multiple subtypes, we have queues, data streaming platforms, caches, orchestrators, cloud, cloud versions of all the previous. We have enough ....
In this article, I want to describe a “basic” modern stack that will allow you to build robust and scalable systems. They are language agnostic and can be easily integrated into most of the modern day programming languages.
Disclaimer
The content of this text and the recommendations presented are loosely based on the annual JetBrains developer survey and my own experience.
Problem | Tool Recommendation |
---|---|
Orchestration | Kubernetes |
Cloud Providers | AWS/GCP |
Relational DB | Postgres |
Non-Relational DB | Mongo/BigQuery |
Messaging/Streaming | Kafka |
Caching | Redis |
Monitoring | Prometheus/Grafana |
All of these tools have a few things in common:
- a very mature ecosystem built around them
- a strong community of everyday users
- widely adopted across the industry
Due to these traits, I strongly believe that most if not all the tools will stay with us for years to come. Thus, I think that they are all good points when you cannot decide on what to learn next. More importantly, they are worth investing your time in learning.
However, I do not suggest becoming an expert in a specific tool - unless it is a skill you want to build your career on. Quite the opposite, I would recommend getting somewhat familiar with its basics: how they work, the trade-offs, and best practices of each. You can gain more in-depth knowledge and experience later on when you start using them.
Let’s go into all the tools in more detail.
Tools
Kubernetes
Most applications nowadays utilize Kubernetes in one form or the other. Either indirectly in the form of GKE, EKS, or other managed K8s service, or directly via a self-managed Kubernetes cluster.
Tools like Helm or Terraform add even more features to the already extensive possibilities of Kubernetes.
Additionally, many tools used in the cloud-native approach are directly dependent on the correct operations of K8s.
There is an enormous community built around Kubernetes, with tutorials, training, conferences. You will be able to find tons of useful learning material and people willing to help. If you decide to get some more professional confirmation of your skills, then the certificates are ready for you.
Cloud
Despite the fact that the cloud is not as big a deal as it was a couple of years ago and that there are ongoing discussions on whether migration to the cloud is a cost-effective undertaking, cloud services are still very important. They remain a viable option for any startup and/or company that does not want to build its own data centers and/or provision its own machines.
There is a constant movement of companies migrating in and out of the cloud. Some companies opt for the first while others opt for the second. The overall number of the companies following the first trend, combined with the number of current cloud based services users, seems rather bigger than the second.
I strongly believe that with the current number of different services and tools offered by different cloud providers, it is very unlikely - if not impossible at all - that this trend will be short-lived. At least without substantive change in the world around us. Cloud will stay with us for a long time.
You do not have to care too much as to the choice of exact cloud provider and just have to pick either AWS, GCP, or Alibaba Cloud (if you are based in Asia). Then, in case of a possible change, a reasonably large part of your knowledge will remain valid. In most cases, the differences are slight, and the biggest one is often the name of the service. Conceptually, it remains largely the same.
Postgres
Postgres is one of the most widely used databases in the world. It was created in Berkeley almost 30 years ago. Currently, it is a number one pick for developers who need a relational database for their projects. Together with MySQL, it forms the core of the modern day internet.
While in terms of pure numbers, it is not as popular as MySQL, Postgres is not far behind. From my own experience, there was only one project I worked on that was not based on Postgres, at least among those reliant on relational databases.
Postgres is stable, battle tested, and has a strong ecosystem. Furthermore, it provides an extensive set of features for example geo-spacial queries or JSON as data format support. On top of that, it is completely free.
NoSQL
NoSQL databases are getting more important by the day. They serve multiple purposes, from read-optimized MongoDB through very, very large datastores like BigQuery or BigTable, easily able to handle petabyte scale data, to write-optimized like Cassandra. Building some system and achieving higher and/or better guarantees would not be possible without them.
Here, I would recommend not focusing on a particular tool, but rather getting familiar with the concept as a whole. What are the different types of NoSQL databases, and what are their use cases? Maybe read about the leading database in each category. Learn the basics as to how they work and what they offer, and some of their pros and cons.
However, if you want specific recommendations, then I suggest looking at MongoDB and GCP BigQuery:
- Mongo, while having its quirks, is still the most commonly used NoSQL database. It is also quite simple from a user perspective and provides a nice entry point into NoSQL.
- BigQuery is quite a powerful data store capable of handling petabytes of data. Which makes it ideal to use as a sink or source in data analysis flows and working with massive datasets.
Kafka
I think that I do not need to introduce Kafka to anyone, but I will do so anyway. It is probably the most performant streaming platform currently available on the market. Kafka is able to achieve enormous throughput and process hundreds of thousands or even millions of messages per second. It is a go-to choice whenever someone needs to process a large quantity of data.
Their client portfolio contains some of the biggest technology companies in the world including Uber, Netflix, and LinkedIn, just to name a few. There is a large community built around Kafka with its own events, talks, experts, courses, and certifications.
Moreover, most if not all modern day programming languages are supported by Kafka. There are a number of connectors for many of the tools used in building data processing flows, making it easier to integrate Kafka into your system.
I bet that you will encounter Kafka sooner or later during your journey as a software engineer. Even if it will not be Kafka, but some other platform, your knowledge and expertise with it will surely count. In my opinion, this is a good starting point for your journey in learning how Kafka operates.
Redis
Redis is an in-memory data store, most commonly used as a simple key-value store. However, it can be successfully used as a vector database or queue. Redis is primarily known for its high scalability and simplicity.
It is easy to set up and integrate, and is capable of handling tens of thousands of requests with ease. A combination of all the traits above makes Redis an excellent pick for a caching utility. When applied correctly, it can greatly reduce the load put on our application.
What is interesting about Redis is its relative lack of direct competitors. The only tool that comes to mind is Memcached, and it has a much smaller user base in comparison to Redis.
Monitoring
Here, I would mostly like to stress the importance of correct monitoring in present day systems. Sooner or later, you will need them or regret not having them properly set up at the beginning.
I have mostly worked with the Prometheus/Grafana stack, and I can honestly recommend this duo to anyone. In my opinion, it is a simple yet powerful combination, easy to set up, configure, and manage later. It is one of a few industry tested approaches to monitoring the application state. Other tools like these include for example ELK(Elastic, Logstash, Kibana), Coralogix, or Datadog.
However, remember that, due to my work experience with Prometheus/Grafana, I can be slightly biased towards this duo, so treat my opinion with a grain of salt.
Summary
Here we are, at the end. You have read my tech stack proposition - does it sound good to you? Below, you can take another look at this proposition.
I have prepared a set of links to sites where you can start your journey with each of them.
Problem | Tool Recommendation |
---|---|
Orchestration | Kubernetes |
Cloud Providers | AWS/GCP |
Relational DB | Postgres |
Non-Relational DB | MongoDB/BigQuery |
Messaging/Streaming | Kafka |
Caching | Redis |
Monitoring | Prometheus/Grafana |
Remember not to get overwhelmed with the number of different tools. You do not have to be an expert in all of them. Just grasp the general idea, and know where and how you may use it. The expertise will come with the experience.
Thank you for your time.
Published at DZone with permission of Bartłomiej Żyliński. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments