DZone Spotlight

Sunday, April 27 View All Articles »

CRDTs Explained: How Conflict-Free Replicated Data Types Work

By Bartłomiej Żyliński

CORE

In today's post, I would like to dive deeper into one of the newest—relatively speaking— topics in the distributed systems domain. As you may have guessed already, the spotlight is on Conflict-free Replicated Data Types or CRDTs for short. I will explain what they are and what role they play in the larger landscape of distributed systems. Let’s start our journey from explaining what Strong Eventual Consistency (SEC) means in this context. Why SEC Matters Consistency is one of the most important—if not the most important—traits in any system. However, the original strong consistency model imposes a significant toll on performance. It also limits the scalability and availability of our systems. As a result, “weaker” consistency models became more and more popular and widely adopted. Eventual consistency promises to solve some of the issues created by strong consistency models. However, it also introduces some totally new types of problems—conflict resolution is one of them. SEC aims to tackle this particular issue. It is a consistency model built atop an eventual consistency that aims to provide a conflict-free environment to ensure availability in the face of failure. It also reduces the cognitive load put on system architects by removing the need for implementing complex conflict-resolution and rollback logic. The theoretical base for SEC is simple mathematics properties like monotonicity, commutativity, and associativity. As such, it is only valid for very specific data types and operations. These data types are commonly denoted as CRDT. It's not surprising, taking into consideration SEC was introduced in the original, as far as I know, CRDT paper. What are CRDTs? CRDTs are a data structure designed to ensure that data on different computers (replicas) will eventually converge—and will be merged—into a consistent state. All of that, no matter what modifications were made and without any special conflict resolution code or user intervention. Additionally, CRDTs are decentralized, and thus, they do not need any coordination between the replicas. Particular replicas exchange data between each other. This trait makes them quite interesting and different from algorithms used in most online gaming and Distributed File Systems (DFS). We can differentiate two basic types of CRDTs: object-based and state-based. There is also a delta-based type which is an extension on top of the state-based CRDTs family. Convergent Replicated Data Types (CvRDTs) CvRDTs are state-based CRDTs. They rely on continuous exchanges of current states between particular replicas. By the way, this is a classic use case for the gossip protocol. When a replica receives the new version of the state, it uses a predefined merge function, effectively updating its own state. In such a setting, when updates stop coming, all the replicas will reach the same, consistent state. Keep in mind that the key here is that the replicas exchange the total state each time. Thus, the size of messages may become quite big. Commutative Replicated Data Types (CmRDTs) CmRDTs (also called operation-based CRDT) are an alternative to state-based types. Contrary to state-based types, they do not have a merge method. Instead, they split the update operations into two steps: prepare update and effect-update. The first phase is executed locally at a particular replica, and it is directly followed by the second phase that executes across all other replicas, effectively equalizing the state across the whole deployment. However, for the second phase to work correctly, CmRDTs require a reliable communication protocol that provides causal ordering of messages. While it is not a very complex problem, because such tools are very common nowadays, it adds another layer of complexity. Equivalence There is one interesting fact regarding both CvRDTs and CmRDTs. They are equivalent to each other, at least from a mathematical perspective. In the previously linked paper, there is an entire subsection (3.2) explaining in great detail why this statement holds true. I will not be copy-pasting the same text here—TLDR, it is based on emulating one type with the other. Delta-State Conflict-Free Replicated Data Types (δ-CRDT) Paulo Sérgio Almeida et al., in their paper Efficient State-based CRDTs by Delta-Mutation, proposed δ-CRDT. It is an extension on top of classic state-based CRDTs which addresses its biggest weakness; the continuous exchange of messages containing the full state of the object. There are two key concepts used to achieve this, namely, the delta-mutator and a delta-state. The δ-state is a representation of changes applied by the mutator to the current state. This delta is later sent to other replicas, effectively reducing the size of messages. Additionally, to reduce the number of messages exchanged between replicas, we can group multiple deltas into a delta-group. I do not want to get too much into the details of different types; there is much more here to uncover. If you are interested in all the math behind CRDTs, you can find all of these details here. CRDTs Fault Tolerance In terms of classic availability and fault tolerance, CRDTs are quite an interesting case. Their base consistency model—SEC—promises to provide very high resilience. It is possible, mostly thanks to the eventual consistency of SEC itself, but also the resilient nature of the CRDTs algorithms themselves. In case of state-based CRDTs, they exchange full state between each other, so besides the case of total failure, sooner or later the replicas should be able to converge into a consistent state. On the other hand, in the case of operation-based (op-based) CRDTs, the update-effect is cumulative, so again no matter the order of messages spread throughout the replicas, they will also be able to converge on an equivalent state. With delta CRDTs, the situation is similar, as it is built upon both op- and state-based types. There are 3 traits of CRDTs that make them especially resilient: Decentralized CRDTs operate without a central coordinator, eliminating single points of failure. Thus, they naturally handle network partitions. Updates are applied locally and propagate when communication is restored. Asynchronous Communication CRDTs utilize only async communication, either via a gossip-based protocol or some broadcasting protocols. Nodes do not need to wait for any type of acknowledgments, nor do they use any type of consensus algorithm. The CRDT-wide state convergence happens asynchronously. Node Failures and Recovery Nodes continue to store and process their local state even in case of network failure. Upon recovery, they can synchronize with other replicas to merge any missed updates. Byzantine fault tolerance Despite all the traits above and in general, very high fault tolerance, CRDTs are not fully invincible. There is a very particular type of failures which CRDTs cannot easily recover from—Byzantine faults. Ironically, the exact same thing that makes CRDTs so highly available—decentralization—is also the main factor of them being susceptible to Byzantine fault. Byzantine faults occur when nodes in a distributed system behave maliciously or send malformed states, potentially leading to inconsistencies. In such a situation, reaching a consistent state across all the replicas through a gossip-based protocol or broadcast can be highly problematic. Unfortunately, at least in this case, CRDTs heavily rely on exactly these approaches. Making CRDTs Byzantine fault-tolerant is a relatively new and hot topic among researchers focused on distributed systems, with Martin Kleppmann’s paper Making CRDTs Byzantine Fault Tolerant being one of the most cited CRDTs papers ever. CRDTs vs CAP CAP Theorem describes the spectrum of availability and consistency while stating that having both of them at the same time is not possible. CRDTs put this claim into question to some extent, at least part of it, as CAP is more nuanced than just consistency vs availability. CRDTs promise very high availability and eventual consistency. CRDT replicas are always available for reads and writes no matter the network partition or failures, and what is more, any subset of communicating replicas will eventually be consistent. While it is not the same as the lineralization required by CAP, it still gives strong guarantees as to the eventual state consistency. CRDTs show that CAP is more of a spectrum than an exact choice, and that we can balance both availability and consistency throughout our system. Types of CRDT The full list of all existing CRDTs is very, very long and would require multiple pages to list, not to mention describe. Here I will cover only some basic types which can later be used to build more complex structures. Let’s start with a simple register. Register Register is the simplest CRDT structure. It is responsible for holding a single value, like a variable. There are two basic semantics for building CRDT registers, depending on how they approach the resolving of concurrent writes: Multi-value Register - Stores and returns all concurrently written values, effectively returning a multi-set. Requires a conflict resolution mechanism on a higher level.Last-write-wins Register (LWW) - As the name suggests, only the newest value will be stored in the register. Counter A counter is similar to a register in the fact that it stores only one value, to be precise, a numeric type. In the case of the counter, we can also differentiate two basic types: Grow-only counter (GCounter) - The simplest counter that only supports an increment operation. In this counter, each replica holds its own state, and the global state is the sum of all local counters.Positive-Negative Counter (PN-Counter) - Somewhat more complex counter; it supports both increment and decrement operations. It tracks increments and decrements as two counters (GCounters in particular). The result is computed by counting the difference between them. Global state, similarly as in the case of GCounter, is the total sum of all counters across the nodes. Set Surprising as it may be, this is just a normal set, but distributed in a CRDT manner. We have multiple different set-like CRDTs. Grow-only set (GSet) is one of the most basic ones. It works almost the same way as GCounter, so I will not spend too much time on it.Another one is USet that works in a similar fashion to PN-Counter, using GSet to handle adds and removes. The USet returns the set difference between them.We also have Add-wins sets that favor the add operation while resolving conflicts between addition and removal of a particular element in the set.There is Remove-wins set that works in the directly opposite manner to Add-wins set and favors removal operations during conflict resolution.Later, we have even more CRDTs like Last-write-wins set, ORSet (observable-removal), and many more. Sequence Sequence CRDTs are a very specialized type of structure. They are extensively used in the field of collaborative editing—documents shared and edited in Google Docs. There are multiple open-source implementations of this type of CRDT. Here are a few examples, with Yjs probably being the most popular one (over 17k stars on GitHub), followed by Automerge (4k stars on GitHub), and many, many more. Map The case of Map is very similar to the Set CRDTs. We have Add-wins Map, Remove-wins Map, Last-write-wins Map. All of these structures have similar behavior as their set counterparts but with one difference: the conflict resolution is handled on a per-key basis. An interesting case is the Multi-value Map, similar to Multi-value Register, where the result of each concurrent put operation is stored within the same key, and conflict resolution needs to be handled on a higher level. A more advanced case of a Map-based structure is a Map of CRDTs, for example a PN-Counter Map that holds PN-Counters as entry values. There is some more nuance behavior when we want to update such entries, but in the end, composing CRDTs is a relatively easy task. This is just a simplified and shortened list of all available basic CRDTs, probably not even all high-level types are covered above. For example, we also have graph-based CRDTs which can be implemented using a GSet of GSets. As to the full one, I’m not sure if it even exists, however the list available here is somewhat lengthier. Moreover, as you could see above, with PN-Counter you can build more complex CRDTs from simpler ones as building blocks. Summary You could read a quite comprehensive introduction to the subject of CRDTs above. You now know what CvRDTs mean and what the difference is between them and delta CRDTs. You also have some insight on how they behave when put in unfavorable situations. Moreover, you know some of the basic CRDT types, what they are, and where you can use them. If you would like to read more about CRDTs, here is a very good page. It is run by Martin Kleppmann and aggregates a lot of data around CRDTs, like white papers and actual implementations. Thank you for your time. More

*You* Can Shape Trend Reports: Join DZone's Software Supply Chain Security Research

By Lauren Forbes

Hey, DZone Community! We have an exciting year of research ahead for our beloved Trend Reports. And once again, we are asking for your insights and expertise (anonymously if you choose) — readers just like you drive the content we cover in our Trend Reports. Check out the details for our research survey below. Software Supply Chain Security Research Supply chains aren't just for physical products anymore; they're a critical part of how software is built and delivered. At DZone, we're taking a closer look at the state of software supply chain security to understand how development teams are navigating emerging risks through smarter tooling, stronger practices, and the strategic use of AI. Take our short research survey (~10 minutes) to contribute to our upcoming Trend Report. We're exploring key topics such as: SBOM adoption and real-world usageThe role of AI and ML in threat detectionImplementation of zero trust security modelsCloud and open-source security posturesModern approaches to incident response Join the Security Research We’ve also created some painfully relatable memes about the state of software supply chain security. If you’ve ever muttered “this is fine” while scanning dependencies, these are for you! Over the coming month, we will compile and analyze data from hundreds of respondents; results and observations will be featured in the "Key Research Findings" of our Trend Reports. Your responses help inform the narrative of our Trend Reports, so we truly cannot do this without you. Stay tuned for each report's launch and see how your insights align with the larger DZone Community. We thank you in advance for your help! —The DZone Content and Community team More

Trend Report

Generative AI

AI technology is now more accessible, more intelligent, and easier to use than ever before. Generative AI, in particular, has transformed nearly every industry exponentially, creating a lasting impact driven by its (delivered) promises of cost savings, manual task reduction, and a slew of other benefits that improve overall productivity and efficiency. The applications of GenAI are expansive, and thanks to the democratization of large language models, AI is reaching every industry worldwide.Our focus for DZone's 2025 Generative AI Trend Report is on the trends surrounding GenAI models, algorithms, and implementation, paying special attention to GenAI's impacts on code generation and software development as a whole. Featured in this report are key findings from our research and thought-provoking content written by everyday practitioners from the DZone Community, with topics including organizations' AI adoption maturity, the role of LLMs, AI-driven intelligent applications, agentic AI, and much more.We hope this report serves as a guide to help readers assess their own organization's AI capabilities and how they can better leverage those in 2025 and beyond.

Refcard #158

Machine Learning Patterns and Anti-Patterns

By Tuhin Chattopadhyay

CORE

Machine Learning Patterns and Anti-Patterns

Refcard #269

Getting Started With Data Quality

By Miguel Garcia

CORE

Failure Handling Mechanisms in Microservices and Their Importance

Microservices architecture has gained significant popularity due to its scalability, flexibility, and modular nature. However, with multiple independent services communicating over a network, failures are inevitable. A robust failure-handling strategy is crucial to ensure reliability, resilience, and a seamless user experience. In this article, we will explore different failure-handling mechanisms in microservices and understand their importance in building resilient applications. Why Failure Handling Matters in Microservices? Without proper failure-handling mechanisms, these failures can lead to system-wide disruptions, degraded performance, or even complete downtime. Failure scenarios commonly occur due to: Network failures (e.g., DNS issues, latency spikes)Service unavailability (e.g., dependent services down)Database outages (e.g., connection pool exhaustion)Traffic spikes (e.g., unexpected high load) In Netflix, if the recommendation service is down, it shouldn’t prevent users from streaming videos. Instead, Netflix degrades gracefully by displaying generic recommendations. Key Failure Handling Mechanisms in Microservices 1. Retry Mechanism Sometimes, failures are temporary (e.g., network fluctuations, brief server downtime). Instead of immediately failing, a retry mechanism allows the system to automatically reattempt the request after a short delay. Use cases: Database connection timeoutsTransient network failuresAPI rate limits (e.g., retrying failed API calls after a cooldown period) For example, Amazon’s order service retries fetching inventory from a database before marking an item as out of stock. Best practice: Use Exponential Backoff and Jitter to prevent thundering herds. Using Resilience4j Retry: Java @Retry(name = "backendService", fallbackMethod = "fallbackResponse") public String callBackendService() { return restTemplate.getForObject("http://backend-service/api/data", String.class); } public String fallbackResponse(Exception e) { return "Service is currently unavailable. Please try again later."; } 2. Circuit Breaker Pattern If a microservice is consistently failing, retrying too many times can worsen the issue by overloading the system. A circuit breaker prevents this by blocking further requests to the failing service for a cooldown period. Use cases: Preventing cascading failures in third-party services (e.g., payment gateways)Handling database connection failuresAvoiding overloading during traffic spikes For example, Netflix uses circuit breakers to prevent overloading failing microservices and reroutes requests to backup services. States used: Closed → Calls allowed as normal.Open → Requests are blocked after multiple failures.Half-Open → Test limited requests to check recovery. Below is an example using Circuit Breaker in Spring Boot (Resilience4j). Java @CircuitBreaker(name = "paymentService", fallbackMethod = "fallbackPayment") public String processPayment() { return restTemplate.getForObject("http://payment-service/pay", String.class); } public String fallbackPayment(Exception e) { return "Payment service is currently unavailable. Please try again later."; } 3. Timeout Handling Slow service can block resources, causing cascading failures. Setting timeouts ensures a failing service doesn’t hold up other processes. Use cases: Preventing slow services from blocking threads in high-traffic applicationsHandling third-party API delaysAvoiding deadlocks in distributed systems For example, Uber’s trip service times out requests if a response isn’t received within 2 seconds, ensuring riders don’t wait indefinitely. Below is an example of how to set timeouts in Spring Boot (RestTemplate and WebClient). Java @Bean public RestTemplate restTemplate() { var factory = new SimpleClientHttpRequestFactory(); factory.setConnectTimeout(3000); // 3 seconds factory.setReadTimeout(3000); return new RestTemplate(factory); } 4. Fallback Strategies When a service is down, fallback mechanisms provide alternative responses instead of failing completely. Use cases: Showing cached data when a service is downReturning default recommendations in an e-commerce app Providing a static response when an API is slow For example, YouTube provides trending videos when personalized recommendations fail. Below is an example for implementing Fallback in Resilience4j. Java @Retry(name = "recommendationService") @CircuitBreaker(name = "recommendationService", fallbackMethod = "defaultRecommendations") public List<String> getRecommendations() { return restTemplate.getForObject("http://recommendation-service/api", List.class); } public List<String> defaultRecommendations(Exception e) { return List.of("Popular Movie 1", "Popular Movie 2"); // Generic fallback } 5. Bulkhead Pattern Bulkhead pattern isolates failures by restricting resource consumption per service. This prevents failures from spreading across the system. Use cases: Preventing one failing service from consuming all resourcesIsolating failures in multi-tenant systemsAvoiding memory leaks due to excessive load For example, Airbnb’s booking system ensures that reservation services don’t consume all resources, keeping user authentication operational. Java @Bulkhead(name = "inventoryService", type = Bulkhead.Type.THREADPOOL) public String checkInventory() { return restTemplate.getForObject("http://inventory-service/stock", String.class); } 6. Message Queue for Asynchronous Processing Instead of direct service calls, use message queues (Kafka, RabbitMQ) to decouple microservices, ensuring failures don’t impact real-time operations. Use cases: Decoupling microservices (Order Service → Payment Service)Ensuring reliable event-driven processing Handling traffic spikes gracefully For example, Amazon queues order processing requests in Kafka to avoid failures affecting checkout. Below is an example of using Kafka for order processing. Java @Autowired private KafkaTemplate<String, String> kafkaTemplate; public void placeOrder(Order order) { kafkaTemplate.send("orders", order.toString()); // Send order details to Kafka } 7. Event Sourcing and Saga Pattern When a distributed transaction fails, event sourcing ensures that each step can be rolled back. Banking applications use Saga to prevent money from being deducted if a transfer fails. Below is an example of a Saga pattern for distributed transactions. Java @SagaOrchestrator public void processOrder(Order order) { sagaStep1(); // Reserve inventory sagaStep2(); // Deduct balance sagaStep3(); // Confirm order } 8. Centralized Logging and Monitoring Microservices are highly distributed, without proper logging and monitoring, failures remain undetected until they become critical. In a microservices environment, logs are distributed across multiple services, containers, and hosts. A log aggregation tool collects logs from all microservices into a single dashboard, enabling faster failure detection and resolution. Instead of storing logs separately for each service, a log aggregator collects and centralizes logs, helping teams analyze failures in one place. Below is an example of logging in microservices using the ELK stack (Elasticsearch, Logstash, Kibana). YAML logging: level: root: INFO org.springframework.web: DEBUG Best Practices for Failure Handling in Microservices Design for Failure Failures in microservices are inevitable. Instead of trying to eliminate failures completely, anticipate them and build resilience into the system. This means designing microservices to recover automatically and minimize user impact when failures occur. Test Failure Scenarios Most systems are only tested for success cases, but real-world failures happen in unexpected ways. Chaos engineering helps simulate failures to test how microservices handle them. Graceful Degradation In high-traffic scenarios or service failures, the system should prioritize critical features and gracefully degrade less essential functionalities. Prioritize essential services over non-critical ones. Idempotency Ensure retries don’t duplicate transactions. If a microservice retries a request due to a network failure or timeout, it can accidentally create duplicate transactions (e.g., charging a customer twice). Idempotency ensures that repeated requests have the same effect as a single request. Conclusion Failure handling in microservices is not optional — it’s a necessity. By implementing retries, circuit breakers, timeouts, bulkheads, and fallback strategies, you can build resilient and fault-tolerant microservices.

By Arunkumar Kallyodan

Why Testing is a Long-Term Investment for Software Engineers

In the world of software engineering, we’re constantly racing against the clock—deadlines, deployments, and decisions. In this rush, testing often gets sidelined. Some developers see it as optional, or something they’ll “get to later.” But that’s a costly mistake. Because just like documentation, testing is a long-term investment—one that pays off in quality, safety, and peace of mind. Testing is crucial. It’s about ensuring quality, guaranteeing expected behavior, and enabling safe refactoring. Without tests, every change becomes a risk. With tests, change becomes an opportunity to improve. Testing doesn’t just prevent bugs. It shapes the way we build software. It enables confident change, unlocks collaboration, and acts as a form of executable documentation. Tests are a Guarantee of Behavior At its core, a test is a contract. It tells the system—and anyone reading the code—what should happen when given specific inputs. This contract helps ensure that as the software evolves, its expected behavior remains intact. A system without tests is like a building without smoke detectors. Sure, it might stand fine for now, but the moment something catches fire, there’s no safety mechanism to contain the damage. Testing Supports Safe Refactoring Over time, all code becomes legacy. Business requirements shift, architectures evolve, and what once worked becomes outdated. That’s why refactoring is not a luxury—it’s a necessity. But refactoring without tests? That’s walking blindfolded through a minefield. With a reliable test suite, engineers can reshape and improve their code with confidence. Tests confirm that behavior hasn’t changed—even as the internal structure is optimized. This is why tests are essential not just for correctness, but for sustainable growth. Tests Help Teams Move Faster There’s a common myth: tests slow you down. But seasoned engineers know the opposite is true. Tests speed up development by reducing time spent debugging, catching regressions early, and removing the need for manual verification after every change. They also allow teams to work independently, since tests define and validate interfaces between components. The ROI of testing becomes especially clear over time. It’s a long-term bet that pays exponential dividends. When to Use Mocks (and When not to) Not every test has to touch a database or external service. That’s where mocks come in. A mock is a lightweight substitute for an absolute dependency—valid when you want to isolate logic, simulate failures, or verify interactions without relying on complete integration. Use mocks when: You want to test business logic in isolationYou need to simulate rare or hard-to-reproduce scenariosYou want fast, deterministic tests that don’t rely on external state But be cautious: mocking too much can lead to fragile tests that don’t reflect reality. Always complement unit tests with integration tests that use real components to validate your system holistically. A Practical Stack for Java Testing If you're working in Java, here's a battle-tested stack that combines readability, power, and simplicity: JUnit Jupiter JUnit is the foundation for writing structured unit and integration tests. It supports lifecycle hooks, parameterized tests, and extensions with ease. AssertJ This is a fluent assertion library that makes your tests expressive and readable. Instead of writingassertEquals(expected, actual), you write assertThat(actual).isEqualTo(expected)—much more human-friendly. Testcontainers These are perfect for integration tests. With Testcontainers, you can spin up real databases, message brokers, or services in Docker containers as part of your test lifecycle—no mocks, no fakes—just the real thing, isolated and reproducible. Here’s a simple example of combining all three: Java @Test void shouldPersistGuestInDatabase() { Guest guest = new Guest("Ada Lovelace"); guestRepository.save(guest); List<Guest> guests = guestRepository.findAll(); assertThat(guests).hasSize(1).extracting(Guest::getName).contains("Ada Lovelace"); } This kind of test, when paired with Testcontainers and a real database, gives you confidence that your system works, not just in theory, but in practice. Learn More: Testing Java Microservices For a deeper dive into testing strategies—including contract testing, service virtualization, and containerized tests—check out Testing Java Microservices. It’s an excellent resource that aligns with modern practices and real-world challenges. Understanding the Value of Metrics in Testing Once tests are written and passing, a natural follow-up question arises: how do we know they're doing their job? In other words, how can we be certain that our tests are identifying genuine problems, rather than merely giving us a false sense of security? This is where testing metrics come into play—not as final verdicts, but as tools for better judgment. Two of the most common and impactful metrics in this space are code coverage and mutation testing. Code coverage measures how much of your source code is executed when your tests run. It’s often visualized as a percentage and can be broken down by lines, branches, methods, or even conditions. The appeal is obvious: it gives a quick sense of how thoroughly the system is being exercised. But while coverage is easy to track, it’s just as easy to misunderstand. The key limitation of code coverage is that it indicates where the code executes, but not how effectively it is being executed. A line of code can be executed without a single meaningful assertion. This means a project with high coverage might still be fragile underneath—false confidence is a real risk. That’s where mutation testing comes in. This approach works by introducing small changes—known as mutants—into the code, such as flipping a conditional or changing an arithmetic operator. The test suite is then rerun to see whether it detects the change. If the tests fail, the mutant is considered “killed,” indicating that the test is practical. If they pass, the mutant “survives,” exposing a weakness in the test suite. Mutation testing digs into test quality in a way coverage cannot. It challenges the resilience of your tests and asks: Would this test catch a bug if the logic were to break slightly? Of course, this comes with a cost. Mutation testing is slower and more computationally intensive. On large codebases, it can take considerable time to run, and depending on the granularity and mutation strategy, the results can be noisy or overwhelming. That’s why it’s best applied selectively—used on complex business logic or critical paths where the risk of undetected bugs is high. Now here’s where things get powerful: coverage and mutation testing aren’t competing metrics—they’re complementary. Coverage helps you identify what parts of your code aren't being tested at all. Mutation testing indicates how well the tested parts are protected. Used together, they offer a fuller picture: breadth from coverage, and depth from mutation. But even combined, they should not become the ultimate goal. Metrics exist to serve understanding, not to replace it. Chasing a 100% mutation score or full coverage can lead to unrealistic expectations or, worse, wasted effort on tests that don’t matter. What truly matters is having enough coverage and confidence in the parts of the system that are hard to change or essential to your business. In the end, the most valuable metric is trust: trust that your system behaves as expected, trust that changes won’t break things silently, and trust that your test suite is more than a checkbox—it’s a safety net that allows you to move fast without fear. Coverage and mutation testing, when used wisely, help you build and maintain that trust. Final Thoughts: Test Like a Professional Testing is more than a safety net; it’s a form of engineering craftsmanship. It’s how we communicate, refactor, scale, and collaborate without fear. So, treat tests like you treat production code—because they are. They’re your guarantee that what works today still works tomorrow. And in the ever-changing world of software, that’s one of the most valuable guarantees you can have.

By Otavio Santana

CORE

5 Best Node.js Practices to Develop Scalable and Robust Applications

Since the software industry is evolving at an incredibly fast speed, there have been many developments in terms of frameworks and different component libraries to enhance the performance, functionalities, and features of applications. And the name Node.js has come among the top ones as it has taken the app development world by storm. Just to clarify, Node.js is not a framework, but it is an open-source JavaScript runtime environment that can be operated on Linux, Unix, macOS, and Windows as well. After its initial release in 2009, Node.js has dominated the software industry. There are many benefits of using Node.js, such as: It can help you create lightning-fast applications.Node.js enables developers to build server-side and frontend JavaScript code more easily.Node.js is a flexible platform, meaning that developers can effortlessly add and remove modules as per their requirements. Further, you can make the most of Node.js by using it with React. It enables you to build server-side rendering applications, thus boosting the performance, reducing page load times, and building better user interfaces. On the other hand, building an app with Node.js is not a piece of cake since there can be many complications in the process. Therefore, it is necessary that you take care of those complications with the right development strategies to build scalable and powerful Node.js applications. 1. Using Node.js Module System Node.js has an effective module system. These modules are the chunks of encapsulated code. Developers can use this system as it offers reusability and the ability to split those complex blocks of code into more manageable and smaller parts. Moreover, you can organize or manage your code into modules with simplicity. Thus, splitting up those apps into smaller modules can also help developers test and maintain them easily. There are three types of modules: core modules, third-party modules, and custom modules. Core Modules In Node.js, core modules are built-in modules and are an essential part of the Node.js platform, providing various important functionalities and features. You don’t need to install any external packages for them since they are readily available. HTTP server (HTTP), file system operations (fs), utilities (util), Path (path), Event Emitter (events), and URL (URL) are some of the famous core modules. Other than these, Node.js also offers plenty more core modules. For example, 'crypto' can be used for cryptography functionality or features, and when it comes to streaming data, the ‘stream’ core module is going to work wonders. So, it is obvious that these core modules enable developers to use different strong features and abilities in order to make flexible and scalable apps. Third-Party Modules The Node.js community has developed these third-party modules. Third-party modules are easily available on package registries like npm (Node Package Manager). Express, Mongoose, Async, Helmet, etc., are some of the most famous third-party modules. You can easily install these modules and add them to your application by using require (). Custom Modules There is no doubt that the built-in core modules in Node.js offer various benefits. But in terms of providing flexibility as per the project’s particular requirements, custom modules are the best choice for developers. Custom modules work by encapsulating particular functionality or features in an application. These modules make it easy for developers to maintain code, enabling smooth reusability; therefore, also strengthening the maintainability of your application. Developers also build and use custom modules in order to make sure that the code is modular; thus, streamlining the process of easily understanding, testing, and refactoring. What Are the Advantages of Modules? Modules enable the encapsulation of code, so as a result, developers are able to conceal the specifics of the execution process and display the vital features, interfaces, and functionalities. Using modules streamlines the process of organizing your code easily into smaller and feasible units; hence, ultimately scaling apps that are more complicated or complex in nature.Moreover, it is easy to maintain and refactor the modular code because if there is any change, update, or any kind of modification in the implementation of the modules, this will not impact the app as a whole. Further, these modules enable code reusability since developers can take advantage of reusing modules in various sections of an app. So, this proves that modules are very useful to implement in the development process. They make it easy for Node.js developers to maintain or organize code. Further, modules allow for maximum reusability of code; therefore, enhancing the performance of applications. 2. Error Handling in Node.js In Node.js, errors can be split into two main categories: programmer errors and operational errors. So, first, let’s talk about programmer errors: Programmer Errors Often, while programming, programmers make errors, and most of the time, programmers are not able to handle these errors completely. If they want to fix these errors, then they can only be corrected by fixing the codebase. Here you can take a look at the most common programming errors: Syntax errors: These types of errors happen if you do not close the curly braces in the process of defining a JavaScript function. Array index out of bounds: This common error happens when you want to have a 7th element of the array, while on the other hand, there are only six available. Reference errors: This is the most common error that happens only when you access functions that are not well-defined. Operational Errors It does not matter if your program is correct, however it will have operational errors. These issues can happen during runtime, and not to mention, external factors can contribute to interrupting the regular flow of your program. However, developers can better understand and handle operational errors than programmer errors. Here are the most common operational errors: Socket hang-upRequest timeoutFile not found Unable to connect to a server Developers can get rid of these common errors to prevent the sudden ending or closing of their program if they know the best practices to handle these errors. Now, let’s discuss the best practices that will significantly help developers handle these errors: Try-Catch Block You can use try-catch blocks, which are a simple method to handle those errors in your Node.js app development. This technique is very useful since you can try coding in the try block. And in case you find any error in the catch block, then the catch block will eliminate those errors. Try-catch blocks are a very constructive technique for synchronous functions as well. While using the Try-Catch Block, the try block wraps the code where there are possibilities of code errors. Putting it into simple words, you just have to surround the piece of code for which you want to check the errors. Error-First Callbacks In order to build strong and reliable applications, it is a must for Node.js developers to do complete error handling. The Error-first callback in Node.js is a functionality that works by returning an error object while there is any successful data that is carried back by the function. Moreover, Asynchronous programming is a vital feature of Node.js development as it enables Node.js developers to conduct non-blocking I/O operations; thus, ultimately boosting applications’ performance. But, when it comes to handling errors in asynchronous programming, this is where the real challenges lie. Because you don’t know where an error might happen in the procedure, it is difficult for you to find and fix it. In situations like these, you can use Error-first callbacks as it is the regular technique that helps in error handling for mainly asynchronous code as well as in the callback pattern. The Error-first callback in Node.js is a functionality that works by returning an error object, even if there is any successful data carried back by the function. Using this best practice, developers can find errors as they happen and solve them in an effective and well-organized way. For example, the error object includes details of the error, like a description, as well as a stack trace. As a result, it streamlines the process of debugging and fixing code issues. On top of this, you can write defensive code by using error-first callbacks. So, developers can further enhance the performance and stability of their apps, as they can identify potential errors and solve them accordingly. Thus, in the end, it helps to minimize the possible scenarios of crashes. Promises Promises are a further evolution of callbacks in Node.js. This is one of the best practices in the process of error handling in Node.js since it offers a very organized process to handle those asynchronous codes. And it is considered even better when we compare it with the traditional callback functions. The promise constructor enables developers to easily build a promise and takes a function as its argument, which is later known as the executor functionality. Then, further in the process, the executor functionality is passed into two functions, such as resolve and reject. These two features or functionalities are utilized to indicate that the Promise is fulfilled or rejected. 3. Using Asynchronous Programming Asynchronous programming allows developers to operate various tasks simultaneously. So, there is no need to wait for the completion of the task in order to start another one. Most of the I/O operations, such as reading from a database, making HTTP requests, etc, are asynchronous in the Node.js platform. As a Node.js developer, you can use Asynchronous programming that enables improved performance, scalability, as well as interactive user interfaces. You can take full advantage by using its callback function. The main functionality of a callback is to run code in response to an event. Callbacks are a great system in Node.js that assists in handling asynchronous programming. A callback is a functionality that passes an argument to another function. And when there is a completion of an operation, then it is executed further. Thus, there will be full continuity of executing code without any blocking. 4. Use NPM Package Manager Being a default package manager for Node.js, NPM is best for its amazing integration with Node.js. By using this package manager, developers are able to streamline the process of installing, managing, and sharing code dependencies. Java developers can take advantage of its extensive repository that has more than 2 million packages. Benefits of Using NPM Package Manager Now, let’s talk about the different benefits NPM can offer to developers: Wide Range of Package Repositories Available Developers prefer to use NPM as you can’t compare its extraordinary repository housing over 2 million packages. Further, these packages provide extensive varieties of functions as well as use cases. Developers can also speed up the process of development since they can easily access a wide range of open-source libraries as well as modules. NPM CLI As NPM is referred to as a Node Package Manager, the CLI is referred to as a Command Line Interface. Using the command line means accessing Node.js packages as well as dependencies. CLI has various commands to manage scripts, configurations, and packages. Moreover, by using CLI, you as a developer will be able to do the tasks effectively and easily, which can include installation, updates, as well as scripting. Custom Scripts By utilizing the NPM run command, developers can easily define custom scripts in the “package.json” file. Moreover, this function allows the automation of different development procedures that can range from building, testing, and deploying to simplifying the development workflow. Well-Established Ecosystem Since NPM has a strong infrastructure as well as great community support, developers can take advantage of its well-organized and well-established environment. The community and the NPM ecosystem have been utilized for a long time and have seen significant improvements over the years. Therefore, they are considered steady, secure, and authentic tools to smoothly manage project dependencies. NPM is a powerful package manager that enables developers to solve the problem while building Node.js apps. From its extensive range of package repositories to a well-established ecosystem, developers prefer to use the NPM package manager. Not to mention, some developers try to use both Yarn and NPM in their projects, but it is not an effective thing to do. Because NPM and Yarn both work in their own way when it comes to handling dependencies, as a result of this, you will find lots of volatility as well as inaccuracies in your project. Thus, it is best to use only one package manager while working on a Node.js application. 5. Securing Node.js Application As a developer you want to secure your application, therefore you must implement secure coding. Nevertheless, you can’t be 100% sure of the security of your code if you are utilizing open-source packages. Moreover, hackers are always looking for any badly handled data in order to get access to your codebase. Use a Robust Authentication System If your Node.js has a bad and imperfect authentication system in place, then your app is more vulnerable to attacks by hackers. Therefore, it becomes more than important that you implement a robust authentication mechanism. For example, while implementing native Node.js authentication, you must use Bcrypt or Scrypt instead of using the Node.js integrated crypto library. Further, you must limit unsuccessful login attempts and don’t show that your username or password is wrong; instead, show a common error like a failed login or attempt. If you want to significantly improve the security and safety of your Node.js application, then 2FA authentication is the best solution to use with modules like node-2fa or speakeasy. Server-Side Logging Developers can use effective features and functions for debugging and tracking other tasks by using a quality logging library. If there are many logs, then it will negatively affect the performance of your application. For example, when deploying an app, make sure to use logical logging. On the other hand, if you log indistinct messages, it will create misconceptions. Therefore, make sure to structure the log messages and format them in a way that both humans and machines can easily read and recognize them. In addition to this, using reasonable logging mechanisms will include IP address, username as well as activities carried out by a user. However, storing sensitive details within the logs means you are not complying with PCI and GDPR. But, if it is important to store sensitive information within app logs, then concealing such details before capturing and writing them into the application logs will be the best solution for you as a Node.js developer. You can use these best practices and authentication mechanisms to secure your Node.js application. Not to mention, you should also work on upgrading and managing those dependencies so that you are able to avoid the risk of any security threats to your Node.js application. 5 Key Takeaways Node.js Module System offers chunks of encapsulated code, enabling developers to easily divide complicated code into tiny parts. Thus, it is easy to manage those small parts of your code. While programming, programmers tend to make mistakes most of the time. Moreover, your program can also have operation errors such as Socket hang-up, Request Timeout, File not Found, etc. Programmers can solve such errors by using Try-Catch Block, Error-First Callbacks, and Promises. Using Asynchronous programming enables developers to do multiple tasks at the same time. Furthermore, you can take advantage of its callback function, which can help you a lot while using asynchronous programming. Further, NPM Package Manager offers an extensive range of package repositories to boost the development process. On top of this, developers can also use NPM CLI to streamline the process of installation, updation, and scripting. In order to improve the security of your Node.js app, you can use a strong authentication system like 2FA authentication. This authentication system works best with modules such as node-2fa or speakeasy. In addition to this, you can also implement reasonable logging mechanisms that will help store the IP address, username, and other activities done by the end user. Therefore, using these advanced, well-tested practices such as using Node.js Module System, prioritizing error handling, implementing asynchronous programming language, updating dependencies, and securing Node.js applications, you will be well on your way to developing faster, responsive, robust, and better-performing Node.js applications.

By Sukhwinder Singh

Python curses, Part 3: Working With Windowed Content

Welcome back to the third — and final — installment in our series on how to work with the curses library in Python to draw with text. If you missed the first two parts of this programming tutorial series — or if you wish to reference the code contained within them — you can read them here: "Python curses, Part 1: Drawing With Text" "Python curses, Part 2: How to Create a Python curses-Enabled Application" Once reviewed, let’s move on to the next portion: how to decorate windows with borders and boxes using Python’s curses module. Decorating Windows With Borders and Boxes Windows can be decorated using custom values as well as a default “box” adornment in Python. This can be accomplished using the window.box() and window.border(…) functions. The Python code example below creates a red 5×5 window and then alternates displaying and clearing the border on each key press: Python # demo-window-border.py import curses import math import sys def main(argv): # BEGIN ncurses startup/initialization... # Initialize the curses object. stdscr = curses.initscr() # Do not echo keys back to the client. curses.noecho() # Non-blocking or cbreak mode... do not wait for Enter key to be pressed. curses.cbreak() # Turn off blinking cursor curses.curs_set(False) # Enable color if we can... if curses.has_colors(): curses.start_color() # Optional - Enable the keypad. This also decodes multi-byte key sequences # stdscr.keypad(True) # END ncurses startup/initialization... caughtExceptions = "" try: # Create a 5x5 window in the center of the terminal window, and then # alternate displaying a border and not on each key press. # We don't need to know where the approximate center of the terminal # is, but we do need to use the curses terminal size constants to # calculate the X, Y coordinates of where we can place the window in # order for it to be roughly centered. topMostY = math.floor((curses.LINES - 5)/2) leftMostX = math.floor((curses.COLS - 5)/2) # Place a caption at the bottom left of the terminal indicating # action keys. stdscr.addstr (curses.LINES-1, 0, "Press Q to quit, any other key to alternate.") stdscr.refresh() # We're just using white on red for the window here: curses.init_pair(1, curses.COLOR_WHITE, curses.COLOR_RED) index = 0 done = False while False == done: # If we're on the first iteration, let's skip straight to creating the window. if 0 != index: # Grabs a value from the keyboard without Enter having to be pressed. ch = stdscr.getch() # Need to match on both upper-case or lower-case Q: if ch == ord('Q') or ch == ord('q'): done = True mainWindow = curses.newwin(5, 5, topMostY, leftMostX) mainWindow.bkgd(' ', curses.color_pair(1)) if 0 == index % 2: mainWindow.box() else: # There's no way to "unbox," so blank out the border instead. mainWindow.border(' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ') mainWindow.refresh() stdscr.addstr(0, 0, "Iteration [" + str(index) + "]") stdscr.refresh() index = 1 + index except Exception as err: # Just printing from here will not work, as the program is still set to # use ncurses. # print ("Some error [" + str(err) + "] occurred.") caughtExceptions = str(err) # BEGIN ncurses shutdown/deinitialization... # Turn off cbreak mode... curses.nocbreak() # Turn echo back on. curses.echo() # Restore cursor blinking. curses.curs_set(True) # Turn off the keypad... # stdscr.keypad(False) # Restore Terminal to original state. curses.endwin() # END ncurses shutdown/deinitialization... # Display Errors if any happened: if "" != caughtExceptions: print ("Got error(s) [" + caughtExceptions + "]") if __name__ == "__main__": main(sys.argv[1:]) This code was run over an SSH connection, so there is an automatic clearing of the screen upon its completion. The border “crops” the inside of the window, and any text that is placed within the window must be adjusted accordingly. And as the call to the window.border(…) function suggests, any character can be used for the border. The code works by waiting for a key to be pressed. If either Q or Shift+Q is pressed, the termination condition of the loop will be activated and the program will quit. Note that, pressing the arrow keys may return key presses and skip iterations. How to Update Content in “Windows” With Python curses Just as is the case with traditional graphical windowed programs, the text content of a curses window can be changed. And just as is the case with graphical windowed programs, the old content of the window must be “blanked out” before any new content can be placed in the window. The Python code example below demonstrates a digital clock that is centered on the screen. It makes use of Python lists to store sets of characters which when displayed, look like large versions of digits. A brief note: The code below is not intended to be the most efficient means of displaying a clock; rather, it is intended to be a more portable demonstration of how curses windows are updated. Python # demo-clock.py # These list assignments can be done on single lines, but it's much easier to see what # these values represent by doing it this way. space = [ " ", " ", " ", " ", " ", " ", " ", " ", " ", " "] colon = [ " ", " ", " ::: ", " ::: ", " ", " ", " ::: ", " ::: ", " ", " "] forwardSlash = [ " ", " //", " // ", " // ", " // ", " // ", " // ", " // ", "// ", " "] number0 = [ " 000000 ", " 00 00 ", " 00 00 ", " 00 00 ", " 00 00 ", " 00 00 ", " 00 00 ", " 00 00 ", " 00 00 ", " 000000 "] number1 = [ " 11 ", " 111 ", " 1111 ", " 11 ", " 11 ", " 11 ", " 11 ", " 11 ", " 11 ", " 111111 "] number2 = [ " 222222 ", " 22 22 ", " 22 22 ", " 22 ", " 22 ", " 22 ", " 22 ", " 22 ", " 22 ", " 22222222 "] number3 = [ " 333333 ", " 33 33 ", " 33 33 ", " 33 ", " 3333 ", " 33 ", " 33 ", " 33 33 ", " 33 33 ", " 333333 "] number4 = [ " 44 ", " 444 ", " 4444 ", " 44 44 ", " 44 44 ", "444444444 ", " 44 ", " 44 ", " 44 ", " 44 "] number5 = [ " 55555555 ", " 55 ", " 55 ", " 55 ", " 55555555 ", " 55 ", " 55 ", " 55 ", " 55 ", " 55555555 "] number6 = [ " 666666 ", " 66 66 ", " 66 ", " 66 ", " 6666666 ", " 66 66 ", " 66 66 ", " 66 66 ", " 66 66 ", " 666666 "] number7 = [ " 77777777 ", " 77 ", " 77 ", " 77 ", " 77 ", " 77 ", " 77 ", " 77 ", " 77 ", " 77 "] number8 = [ " 888888 ", " 88 88 ", " 88 88 ", " 88 88 ", " 888888 ", " 88 88 ", " 88 88 ", " 88 88 ", " 88 88 ", " 888888 "] number9 = [ " 999999 ", " 99 99 ", " 99 99 ", " 99 99 ", " 999999 ", " 99 ", " 99 ", " 99 ", " 99 99 ", " 999999 "] import curses import math import sys import datetime def putChar(windowObj, inChar, inAttr = 0): #windowObj.box() #windowObj.addstr(inChar) # The logic below maps the normal character input to a list which contains a "big" # representation of that character. charToPut = "" if '0' == inChar: charToPut = number0 elif '1' == inChar: charToPut = number1 elif '2' == inChar: charToPut = number2 elif '3' == inChar: charToPut = number3 elif '4' == inChar: charToPut = number4 elif '5' == inChar: charToPut = number5 elif '6' == inChar: charToPut = number6 elif '7' == inChar: charToPut = number7 elif '8' == inChar: charToPut = number8 elif '9' == inChar: charToPut = number9 elif ':' == inChar: charToPut = colon elif '/' == inChar: charToPut = forwardSlash elif ' ' == inChar: charToPut = space lineCount = 0 # This loop will iterate each line in the window to display a "line" of the digit # to be displayed. for line in charToPut: # Attributes, or the bitwise combinations of multiple attributes, are passed as-is # into addstr. Note that not all attributes, or combinations of attributes, will # work with every terminal. windowObj.addstr(lineCount, 0, charToPut[lineCount], inAttr) lineCount = 1 + lineCount windowObj.refresh() def main(argv): # Initialize the curses object. stdscr = curses.initscr() # Do not echo keys back to the client. curses.noecho() # Non-blocking or cbreak mode... do not wait for Enter key to be pressed. curses.cbreak() # Turn off blinking cursor curses.curs_set(False) # Enable color if we can... if curses.has_colors(): curses.start_color() # Optional - Enable the keypad. This also decodes multi-byte key sequences # stdscr.keypad(True) caughtExceptions = "" try: # First things first, make sure we have enough room! if curses.COLS <= 88 or curses.LINES <= 11: raise Exception ("This terminal window is too small.rn") currentDT = datetime.datetime.now() hour = currentDT.strftime("%H") min = currentDT.strftime("%M") sec = currentDT.strftime("%S") # Depending on how the floor values are calculated, an extra character for each # window may be needed. This code crashed when the windows were set to exactly # 10x10 topMostY = math.floor((curses.LINES - 11)/2) leftMostX = math.floor((curses.COLS - 88)/2) # Note that print statements do not work when using ncurses. If you want to write # to the terminal outside of a window, use the stdscr.addstr method and specify # where the text will go. Then use the stdscr.refresh method to refresh the # display. stdscr.addstr(curses.LINES-1, 0, "Press a key to quit.") stdscr.refresh() # Boxes - Each box must be 1 char bigger than stuff put into it. hoursLeftWindow = curses.newwin(11, 11, topMostY,leftMostX) putChar(hoursLeftWindow, hour[0:1]) hoursRightWindow = curses.newwin(11, 11, topMostY,leftMostX+11) putChar(hoursRightWindow, hour[-1]) leftColonWindow = curses.newwin(11, 11, topMostY,leftMostX+22) putChar(leftColonWindow, ':', curses.A_BLINK | curses.A_BOLD) minutesLeftWindow = curses.newwin(11, 11, topMostY, leftMostX+33) putChar(minutesLeftWindow, min[0:1]) minutesRightWindow = curses.newwin(11, 11, topMostY, leftMostX+44) putChar(minutesRightWindow, min[-1]) rightColonWindow = curses.newwin(11, 11, topMostY, leftMostX+55) putChar(rightColonWindow, ':', curses.A_BLINK | curses.A_BOLD) leftSecondWindow = curses.newwin(11, 11, topMostY, leftMostX+66) putChar(leftSecondWindow, sec[0:1]) rightSecondWindow = curses.newwin(11, 11, topMostY, leftMostX+77) putChar(rightSecondWindow, sec[-1]) # One of the boxes must be non-blocking or we can never quit. hoursLeftWindow.nodelay(True) while True: c = hoursLeftWindow.getch() # In non-blocking mode, the getch method returns -1 except when any key is pressed. if -1 != c: break currentDT = datetime.datetime.now() currentDTUsec = currentDT.microsecond # Refreshing the clock "4ish" times a second may be overkill, but doing # on every single loop iteration shoots active CPU usage up significantly. # Unfortunately, if we only refresh once a second it is possible to # skip a second. # However, this type of restriction breaks functionality in Windows, so # for that environment, this has to run on Every. Single. Iteration. if 0 == currentDTUsec % 250000 or sys.platform.startswith("win"): hour = currentDT.strftime("%H") min = currentDT.strftime("%M") sec = currentDT.strftime("%S") putChar(hoursLeftWindow, hour[0:1], curses.A_BOLD) putChar(hoursRightWindow, hour[-1], curses.A_BOLD) putChar(minutesLeftWindow, min[0:1], curses.A_BOLD) putChar(minutesRightWindow, min[-1], curses.A_BOLD) putChar(leftSecondWindow, sec[0:1], curses.A_BOLD) putChar(rightSecondWindow, sec[-1], curses.A_BOLD) # After breaking out of the loop, we need to clean up the display before quitting. # The code below blanks out the subwindows. putChar(hoursLeftWindow, ' ') putChar(hoursRightWindow, ' ') putChar(leftColonWindow, ' ') putChar(minutesLeftWindow, ' ') putChar(minutesRightWindow, ' ') putChar(rightColonWindow, ' ') putChar(leftSecondWindow, ' ') putChar(rightSecondWindow, ' ') # De-initialize the window objects. hoursLeftWindow = None hoursRightWindow = None leftColonWindow = None minutesLeftWindow = None minutesRightWindow = None rightColonWindow = None leftSecondWindow = None rightSecondWindow = None except Exception as err: # Just printing from here will not work, as the program is still set to # use ncurses. # print ("Some error [" + str(err) + "] occurred.") caughtExceptions = str(err) # End of Program... # Turn off cbreak mode... curses.nocbreak() # Turn echo back on. curses.echo() # Restore cursor blinking. curses.curs_set(True) # Turn off the keypad... # stdscr.keypad(False) # Restore Terminal to original state. curses.endwin() # Display Errors if any happened: if "" != caughtExceptions: print ("Got error(s) [" + caughtExceptions + "]") if __name__ == "__main__": main(sys.argv[1:]) Checking Window Size Note how the first line within the try block in the main function checks the size of the terminal window and raises an exception should it not be sufficiently large enough to display the clock. This is a demonstration of “preemptive” error handling, as if the individual window objects are written to a screen which is too small, a very uninformative exception will be raised. Cleaning Up Windows With curses The example above forces a cleanup of the screen for all 3 operating environments. This is done using the putChar(…) function to print a blank space character to each window object upon breaking out of the while loop. The objects are then set to None. Cleaning up window objects in this manner can be a good practice when it is not possible to know all the different terminal configurations that the code could be running on, and having a blank screen on exit gives these kinds of applications a cleaner look overall. CPU Usage Like the previous code example, this too works as an “infinite” loop in the sense that it is broken by a condition that is generated by pressing any key. Showing two different ways to break the loop is intentional, as some developers may lean towards one method or another. Note that this code results in extremely high CPU usage because, when run within a loop, Python will consume as much CPU time as it possibly can. Normally, the sleep(…) function is used to pause execution, but in the case of implementing a clock, this may not be the best way to reduce overall CPU usage. Interestingly enough though, the CPU usage, as reported by the Windows Task Manager for this process is only about 25%, compared to 100% in Linux. Another interesting observation about CPU usage in Linux: even when simulating significant CPU usage by way of the stress utility, as per the command below: Python $ stress -t 30 -c 16 The demo-clock.py script was still able to run without losing the proper time. Going Further With Python curses This three-part introduction only barely scratches the surface of the Python curses module, but with this foundation, the task of creating robust user interfaces for text-based Python applications becomes quite doable, even for a novice developer. The only downsides are having to worry about how individual terminal emulation implementations can impact the code, but that will not be that significant of an impediment, and of course, having to deal with the math involved in keeping window objects properly sized and positioned. The Python curses module does provide mechanisms for “moving” windows (albeit not very well natively, but this can be mitigated), as well as resizing windows and even compensating for changes in the terminal window size! Even complex text-based games can be (and have been) implemented using the Python curses module, or its underlying ncurses C/C++ libraries. The complete documentation for the ncurses module can be found in the "curses — Terminal handling for character-cell displays" section of the Python documentation. As the Python curses module uses syntax that is “close enough” to the underlying ncurses C/C++ libraries, the manual pages for those libraries, as well as reference resources for those libraries can also be consulted for more information. Happy “faux” Windowed Programming!

By DZone Editorial

Auto-Instrumentation in Azure Application Insights With AKS

Monitoring containerized applications in Kubernetes environments is essential for ensuring reliability and performance. Azure Monitor Application Insights provides powerful application performance monitoring capabilities that can be integrated seamlessly with Azure Kubernetes Service (AKS). This article focuses on auto-instrumentation, which allows you to collect telemetry from your applications running in AKS without modifying your code. We'll explore a practical implementation using the monitoring-demo-azure repository as our guide. What Is Auto-Instrumentation? Auto-instrumentation is a feature that enables Application Insights to automatically collect telemetry, such as metrics, requests, and dependencies, from your applications. As described in Microsoft documentation, "Auto-instrumentation automatically injects the Azure Monitor OpenTelemetry Distro into your application pods to generate application monitoring telemetry" [1]. The key benefits include: No code changes requiredConsistent telemetry collection across servicesEnhanced visibility with Kubernetes-specific contextSimplified monitoring setup Currently, AKS auto-instrumentation supports (this is currently in preview as of Apr 2025) JavaNode.js How Auto-Instrumentation Works The auto-instrumentation process in AKS involves: Creating a custom resource of type Instrumentation in your Kubernetes clusterThe resource defines which language platforms to instrument and where to send telemetryAKS automatically injects the necessary components into application podsTelemetry is collected and sent to your Application Insights resource Demo Implementation Using monitoring-demo-azure The monitoring-demo-azure repository provides a straightforward example of setting up auto-instrumentation in AKS. The repository contains a k8s directory with the essential files needed to demonstrate this capability. Setting Up Your Environment Before applying the example files, ensure you have: An AKS cluster running in AzureA workspace-based Application Insights resourceAzure CLI version 2.60.0 or greater Run the following commands to prepare your environment: Shell # Install the aks-preview extension az extension add --name aks-preview # Register the auto instrumentation feature az feature register --namespace "Microsoft.ContainerService" --name "AzureMonitorAppMonitoringPreview" # Check registration status az feature show --namespace "Microsoft.ContainerService" --name "AzureMonitorAppMonitoringPreview" # Refresh the registration az provider register --namespace Microsoft.ContainerService # Enable Application Monitoring on your cluster az aks update --resource-group <resource_group> --name <cluster_name> --enable-azure-monitor-app-monitoring Key Files in the Demo Repository The demo repository contains three main Kubernetes manifest files in the k8s directory: 1. namespace.yaml Creates a dedicated namespace for the demonstration: YAML apiVersion: v1 kind: Namespace metadata: name: demo-namespace 2. auto.yaml This is the core file that configures auto-instrumentation: YAML CopyapiVersion: monitor.azure.com/v1 kind: Instrumentation metadata: name: default namespace: demo-namespace spec: settings: autoInstrumentationPlatforms: - Java - NodeJs destination: applicationInsightsConnectionString: "InstrumentationKey=your-key;IngestionEndpoint=https://your-location.in.applicationinsights.azure.com/" The key components of this configuration are: autoInstrumentationPlatforms: Specifies which languages to instrument (Java and Node.js in this case)destination: Defines where to send the telemetry (your Application Insights resource) 3. The Deployment and Manifests The three services can be deployed using the 3 YAML files in the k8s folder. In this case, I used the Automated Deployments to create the images and deploy them into the AKS cluster. Notice that this deployment file doesn't contain any explicit instrumentation configuration. The auto-instrumentation is entirely handled by the Instrumentation custom resource. Deploying the Demo Deploy the demo resources in the following order: Shell # Apply the namespace first kubectl apply -f namespace.yaml # Apply the instrumentation configuration kubectl apply -f auto.yaml # Deploy the application # Optional: Restart any existing deployments to apply instrumentation kubectl rollout restart deployment/<deployment-name> -n demo-namespace Verifying Auto-Instrumentation After deployment, you can verify that auto-instrumentation is working by: Generating some traffic to your applicationNavigating to your Application Insights resource in the Azure portalLooking for telemetry with Kubernetes-specific metadata Key Visualizations in Application Insights Once your application is sending telemetry, Application Insights provides several powerful visualizations: Application Map The Application Map shows the relationships between your services and their dependencies. For Kubernetes applications, this visualization displays how your microservices interact within the cluster and with external dependencies. The map shows: Service relationships with connection linesHealth status for each componentPerformance metrics like latency and call volumesKubernetes-specific context (like pod names and namespaces) Performance View The Performance view breaks down response times and identifies bottlenecks in your application. For containerized applications, this helps pinpoint which services might be causing performance issues. You can: See operation durations across servicesIdentify slow dependenciesAnalyze performance by Kubernetes workloadCorrelate performance with deployment events Failures View The Failures view aggregates exceptions and failed requests across your application. For Kubernetes deployments, this helps diagnose issues that might be related to the container environment. The view shows: Failed operations grouped by typeException patterns and trendsDependency failuresContainer-related issues (like resource constraints) Live Metrics Stream Live Metrics Stream provides real-time monitoring with near-zero latency. This is particularly useful for: Monitoring deployments as they happenTroubleshooting production issues in real timeObserving the impact of scaling operationsValidating configuration changes Conclusion Auto-instrumentation in AKS with Application Insights provides a streamlined way to monitor containerized applications without modifying your code. The monitoring-demo-azure repository offers a minimal, practical example that demonstrates: How to configure auto-instrumentation in AKSThe pattern for separating instrumentation configuration from application deploymentThe simplicity of adding monitoring to existing applications By leveraging this approach, you can quickly add comprehensive monitoring to your Kubernetes applications and gain deeper insights into their performance and behavior. References [1] Azure Monitor Application Insights Documentation [2] Auto-Instrumentation Overview [3] GitHub: monitoring-demo-azure

By Aritra Ghosh

Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis

In the field of big data analytics, Apache Doris and Elasticsearch (ES) are frequently utilized for real-time analytics and retrieval tasks. However, their design philosophies and technical focuses differ significantly. This article offers a detailed comparison across six dimensions: core architecture, query language, real-time capabilities, application scenarios, performance, and enterprise practices. 1. Core Design Philosophy: MPP Architecture vs. Search Engine Architecture Apache Doris employs a typical MPP (Massively Parallel Processing) distributed architecture, tailored for high-concurrency, low-latency real-time online analytical processing (OLAP) scenarios. It comprises front-end and back-end components, leveraging multi-node parallel computing and columnar storage to efficiently manage massive datasets. This design enables Doris to deliver query results in sub-seconds, making it ideal for complex aggregations and analytical queries on large datasets. In contrast, Elasticsearch is based on a full-text search engine architecture, utilizing a sharding and inverted index design that prioritizes rapid text retrieval and filtering. ES stores data as documents, with each field indexed via an inverted index, excelling in keyword searches and log queries. However, it struggles with complex analytics and large-scale aggregation computations. The core architectural differences are summarized below: Architectural Philosophy Apache Doris (MPP Analytical Database) Elasticsearch (Distributed Search Engine) Design Intent Geared toward real-time data warehousing/BI, supporting high-throughput parallel computing OLAP engine; emphasizes high-concurrency aggregation queries and low latency Focused on full-text search/log retrieval, built on Lucene’s inverted index; excels at keyword search and filtering, primarily a search engine despite structured query support Data Storage Columnar storage with column-encoded compression, achieving high compression ratios (5-10×) to save space; supports multiple table models (Duplicate, Aggregate, Unique) with pre-aggregation during writes Document storage , with inverted indexes per field (low compression ratio, ~1.5×); schema changes are challenging post-index creation, requiring reindexing for field additions or modifications Scalability and Elasticity Shared-nothing node design for easy linear scaling; supports strict read-write separation and multi-tenant isolation; version 3.0 introduces storage-compute separation for elastic scaling Scales via shard replicas but is constrained by single-node memory and JVM GC limits, risking memory shortages during large queries; thread pool model offers limited isolation Typical Features Fully open-source (Apache 2.0), MySQL protocol compatible; no external dependencies, offers materialized views and rich SQL functions for enhanced analytics Core developed by Elastic (license changes over time), natively supports full-text search and near-real-time indexing; rich ecosystem (Kibana, Logstash), with some advanced features requiring paid plugins Analysis: Doris’s MPP architecture provides a natural edge in big data aggregation analytics, leveraging columnar storage and vectorized execution to optimize IO and CPU usage. Features like pre-aggregation, materialized views, and a scalable design make it outperform ES in large-scale data analytics. Conversely, Elasticsearch’s search engine roots make it superior for instant searches and basic metrics, but it falters in complex SQL analytics and joins. Doris also offers greater schema flexibility, allowing real-time column/index modifications, while ES’s fixed schemas often necessitate costly reindexing. Overall, Doris emphasizes analytical power and usability, while ES prioritizes retrieval, giving Doris an advantage in complex enterprise analytics. 2. Query Language: SQL vs. DSL Ease of Use and Expressiveness Doris and ES diverge sharply in query interfaces: Doris natively supports standard SQL, while Elasticsearch uses JSON DSL (Domain Specific Language). Doris aligns with the MySQL protocol, offering robust SQL 92 features such as SELECT, WHERE, GROUP BY, ORDER BY, multi-table JOINs, subqueries, window functions, UDFs/UDAFs, and materialized views. This comprehensive SQL support allows analysts and engineers to perform complex queries using familiar syntax without learning a new language. Elasticsearch, however, employs a proprietary JSON-based DSL, distinct from SQL, requiring nested structures for filtering and aggregation. This presents a steep learning curve for new users and complicates integration with traditional BI tools. The comparison is detailed below: Query Language Apache Doris (SQL Interface) Elasticsearch (JSON DSL) Syntax Style Standard SQL (MySQL-like), intuitive and readable Proprietary DSL (JSON), nested and less intuitive Expressiveness Supports multi-table JOINs, subqueries, views, UDFs for complex logic; enables direct associative analytics Limited to single-index queries, no native JOINs or subqueries; complex analytics require pre-processed data models Learning Cost SQL is widely known, low entry barrier; mature debugging tools available DSL is custom, high learning threshold; error troubleshooting is challenging Ecosystem Integration MySQL protocol compatible, integrates seamlessly with BI tools (e.g., Tableau, Grafana) Closed ecosystem, difficult to integrate with BI tools without plugins; Kibana offers basic visualization Analysis: Doris’s SQL interface excels in usability and efficiency, lowering the entry threshold by leveraging familiar syntax. For instance, aggregating log data by multiple dimensions in Doris requires a simple SQL GROUP BY, while ES demands complex, nested DSL aggregations, reducing development efficiency. Doris’s support for JOINs and subqueries also suits data warehouse modeling (e.g., star schemas), whereas ES’s lack of JOINs necessitates pre-denormalized data or application-layer processing. Thus, Doris outperforms in query ease and power, enhancing integration with analytics ecosystems. 3. Real-Time Data Processing Mechanisms: Write Architecture and Data Updates Doris and ES adopt distinct approaches to real-time data ingestion and querying. Elasticsearch prioritizes near-real-time search with document-by-document writes and frequent index refreshes. Data is ingested via REST APIs (e.g., Bulk), tokenized, and indexed, becoming searchable after periodic refreshes (default: 1 second). This ensures rapid log retrieval but incurs high write overhead, with CPU-intensive indexing limiting single-core throughput to ~2 MB/s, often causing bottlenecks during peaks. Apache Doris, conversely, uses a high-throughput batch write architecture. Data is imported in small batches (via Stream Load or Routine Load from queues like Kafka), written efficiently in columnar format across multiple replicas. Avoiding per-field indexing, Doris achieves write speeds 5 times higher than ES per ES Rally benchmarks, and supports direct queue integration, simplifying pipelines. Key differences in updates and real-time capabilities include: Storage mechanism: Doris’s columnar storage achieves 5:1 to 10:1 compression, using ~20% of ES’s space for the same data, enhancing IO efficiency. ES’s inverted indexes yield a ~1.5:1 compression ratio, inflating storage. Data updates: Doris’s Unique Key model supports primary key updates with minimal performance loss (<10%), while ES’s document updates require costly reindexing (up to 3x performance hit). Doris’s Aggregate Key model ensures consistent aggregations during imports, unlike ES’s less flexible, eventually consistent rollups. Query visibility: ES offers second-level visibility post-refresh, ideal for instant log retrieval. Doris achieves sub-minute visibility via batch imports, sufficient for most real-time analytics, with memory-buffered data ensuring timely query access. Analysis: Doris excels in high-throughput, consistent analysis, while ES focuses on millisecond writes and near-real-time retrieval. Doris’s batch writes and compression outperform ES in write performance (5x), query speed (2.3x), and storage efficiency (1/5th), making it ideal for high-frequency writes and fast analytics, with flexible schema evolution further enhancing its real-time capabilities. 4. Typical Application Scenario Comparison: Log Analysis, BI Reporting, etc. Doris and ES shine in different scenarios due to their architectural strengths: Scenario Apache Doris Elasticsearch Log Analysis Excels in storage and multi-dimensional analysis of large logs; supports long-term retention and fast aggregations/JOINs. Enterprises report 10x faster analytics and 60% cost savings, integrating search and analysis with inverted index support Ideal for real-time log search and simple stats; fast keyword retrieval suits monitoring and troubleshooting (e.g., ELK). Struggles with complex aggregations and long-term analysis due to cost and performance limits BI Reporting Perfect for interactive reporting and ad-hoc analysis; full SQL and JOINs support data warehousing and dashboards. A logistics firm saw 5-10x faster queries and 2x concurrency Rarely used for BI; lacks JOINs and robust SQL, limiting complex reporting. Best for simple metrics in monitoring, not rich BI logic Analysis: In log analysis, Doris and ES complement each other: ES handles real-time searches, while Doris manages long-term, complex analytics. For BI, Doris’s SQL and performance make it far superior, directly supporting enterprise data warehouses and reporting. 5. Performance Benchmark Comparison ES Rally benchmarks highlight Doris’s edge: Log analysis: Elasticsearch vs Apache Doris - Apache Doris Performance comparison: write throughput, storage, query response time Doris achieves 550 MB/s write speed (5x ES), uses 1/5th the storage, and offers 2.3x faster queries (e.g., 1s vs. 6-7s for 40M log aggregations). Its MPP architecture ensures stability under high concurrency, unlike ES, which struggles with memory limits. 6. Enterprise Practice Cases 360 security browser: Replaced ES with Doris, improving analytics speed by 10x and cutting storage costs by 60%. Tencent music: Reduced storage by 80% (697GB to 195GB) and boosted writes 4x with Doris. Large bank: Enhanced log analysis efficiency, eliminating redundancy. Payment firm: Achieved 4x write speed, 3x query performance, and 50% storage savings. These cases underscore Doris’s superiority in large-scale writes and complex queries, often supplementing ES’s search strengths. Summary Doris excels in complex analytics, SQL usability, and efficiency, ideal for unified real-time platforms, while ES dominates in full-text search and real-time queries. Enterprises can combine them — Doris for analysis, ES for retrieval — to maximize value, with Doris poised to expand in analytics and ES in intelligent search.

By haijun huang

Reinforcement Learning for AI Agent Development: Implementing Multi-Agent Systems

The field of AI has advanced at a breathtaking pace, and reinforcement learning (RL) is now fast emerging as the leading paradigm for the development of intelligent AI agents. You make RL much more powerful when correctly combined with multi-agent systems. That enables agents to compete, coordinate, and train in dynamic environments. This article introduces the concept of reinforcement learning in building AI agents, and more specifically, how to develop multi-agent systems. But first, what is reinforcement learning? Reinforcement learning is a subset of machine learning in which an agent is trained to behave in an environment. The agent must balance long-term expected rewards, so it must take risks and rely on its understanding of the environment. RL works well in situations where the optimal solution is not known and has to be discovered through repeated trials. Here are some of the key features of reinforcement learning: Agent: The decision-maker or the learner.Environment: The location where the agent operates.State (S): A representation of the environment at a given time.Action (A): The options available to the agent.Reward (R): Feedback you receive after doing something.Policy (π): A policy relates various circumstances to actions.Value Function (V): Indicates the predicted long-term payoff of a state. Now that we know what Reinforcement learning is, let's see what the Multi-Agents are.What are multi-agent systems? A multi-agent system (MAS) is a system composed of multiple interacting intelligent agents. Multi-agent systems help address problems where agents need to work together or against each other, such as controlling fleets of self-driving cars, optimizing resources, and developing simulated marketplaces. We can define the features of multi-agent systems as follows: Decentralized control: Every agent makes decisions independently.Coordination: The agents collaborate to achieve the same outcome.Adaptability: Agents adapt and modify based on their experiences.Scalability: Easily extended by adding more agents. Adding RL to MAS involves educating numerous agents to learn the best strategies while taking into account what others are performing. This complicates matters because agents have to learn from the environment and also anticipate and respond to other agents' actions. Now that you have some background knowledge, let's dive right into the code. Step 1: Prepare the Area The environment must be established such that various agents can communicate with one another. Popular simulation environments such as OpenAI Gym, PyMARL, and Unity ML-Agents provide robust platforms for creating multi-agent systems. Utilizing the Gym Python package for multi-agent reinforcement learning: Python import gym from gym import spaces import numpy as np Create a unique environment with numerous agents. Python class MultiAgentEnv(gym.Env): def __init__(self, num_agents=2): super().init() self.num_agents = num_agents self.observation_space = spaces.Box(low=0, high=1, shape=(number_of_agents,)) self.action_space = spaces.Discrete(3) # Actions are: 0, 1, and 2 def reset(self): self.state = np.random.rand(self.num_agents) return self.state def step(self, actions): rewards = np.random.rand(self.num_agents) done = false return self.state, rewards, done, {} Step 2: Selecting a Means of Learning Most RL algorithms are suitable for multi-agent systems: Q-Learning: Useful for discrete action spaces.Deep Q-Networks (DQN): Apply Q-learning and neural networks.Proximal Policy Optimization (PPO): Optimizes policies when there are ongoing actions.Multi-Agent Deep Deterministic Policy Gradient (MADDPG): Handles continuous and competitive/cooperative scenarios. Example: Multi-Agent Q-Learning Python Use np as numpy class MultiAgentQLearning: def __init__(self, number_of_agents, size_of_state, size_of_action, rate_of_learning=0.1, discount_factor=0.9, exploration_rate=1.0): self.num_agents = num_agents self.state_size = state_siz self.action_size == action_size self.q_tables = [np.zeros((state_size, action_size)) for i in range(num_agents)]self.learning_rate = learning_rate self.gamma = gamma self.epsilon = epsilon def choose_action(self, state, agent_id): if np.random.rand() < self.epsilon: return np.random.choice(self.action_size) return np.argmax(self.q_tables[agent_id][state]) def update(state, action, reward, next_state, agent_id): best_next_action = np.argmax(self.q_tables[agent_id][next_state]) td_target = reward + self.gamma * self.q_tables[agent_id][next_state][best_next_action]. td_error = td_target - self.q_tables[agent_id][state][action]. self.q_tables[agent_id][state][action] += self.learning_rate * td_error Step 3: Instructions to the Agents Training involves numerous sessions in which agents interact with the world, learn from rewards, and modify their strategies. Example: Python env = MultiAgentEnv(number_of_agents=2) agents = MultiAgentQLearning(number_of_agents=2, size_of_state=10, size_of_action=3) number_of_episodes = 1000 for episode in range(total_episodes): state = env.begin_again() actions = [agents.select_action(state[agent], agent) for agent in range(2) next_state, rewards, done, _ = env.step(actions) for agent in agents: agent.update(state[agent], actions[agent], rewards[agent], next_state[agent], agent) state = next_state Step 4: Evaluating the System Observe how the agents are performing and consider figures such as: Cumulative rewards: Measures long-term performanceCooperation levels: Assesses how well agents collaborateConflict resolution: Evaluates performance in competitive settings Conclusion Reinforcement learning and multi-agent systems enable the development of intelligent agents capable of solving complex problems. There are some issues, such as variable environments and scalability, but with improved algorithms and increased computer capacity, it becomes simpler to implement these systems in real-world scenarios. Developers can enhance reinforcement learning in multi-agent environments using proper tools and frameworks to develop intelligent and autonomous AI solutions.

By Srinivas Chippagiri

CORE

How to Build Local LLM RAG Apps With Ollama, DeepSeek-R1, and SingleStore

In a previous article, we explored how to use Ollama and DeepSeek-R1 with SingleStore for a simple example. In this article, we'll build on that example by working with a PDF document from the internet. We'll store the document and its vector embeddings in SingleStore, then use DeepSeek-R1 to identify blockchain investment opportunities. The notebook file used in this article is available on GitHub. Introduction We'll follow the setup instructions from a previous article. Fill Out the Notebook We'll configure the code to use the smallest DeepSeek-R1 model, as follows: Python llm = "deepseek-r1:1.5b" ollama.pull(llm) We'll download a PDF file that contains information about FinTech investment opportunities in Northern Ireland: Python loader = OnlinePDFLoader("https://www.investni.com/sites/default/files/2021-02/NI-fintech-document.pdf") data = loader.load() We'll split the document as follows: Python text_splitter = RecursiveCharacterTextSplitter( chunk_size = 2000, chunk_overlap = 20 ) texts = text_splitter.split_documents(data) print (f"You have {len(texts)} pages") This gives us 23 pages. We'll use LangChain to store the vector embeddings and document, as follows: Python docsearch = SingleStoreDB.from_documents( texts, embeddings, table_name = "fintech_docs", distance_strategy = DistanceStrategy.DOT_PRODUCT, use_vector_index = True, vector_size = dimensions ) Next, we'll use the following prompt: Python prompt = "What are the best investment opportunities in Blockchain?" docs = docsearch.similarity_search(prompt) data = docs[0].page_content print(data) Example output: Plain Text Within our well respected financial and related professional services cluster, global leaders including Deloitte and PwC are currently working on the application of blockchain solutions within insurance, digital banking and cross-border payments. PwC Vox Financial Partners The PwC global blockchain impact centre in Belfast comprises a team of fintech professionals with deep expertise and a proven record of delivery of insurance, banking, e-commerce and bitcoin products and services. The Belfast team is exploring the application of this disruptive technology to digital currencies, digital assets, identity and smart contracts. The specialist team has already delivered a significant proof of concept project for the Bank of England, to investigate the capability of distributed ledger technology. www.pwc.co.uk Founded in 2016, the Belfast based Fintech consultancy Vox Financial Partners works with top-tier banks and broker- dealer clients in the US and Europe. Vox offers high quality regulatory expertise to enable its clients to plan, resource and deliver major regulatory change projects. Its Opal software, is a suite of tools that provide structured contract drafting and management on a distributed ledger (permissioned blockchain). Opal reduces operational risk and legal cost by managing the single ‘golden copy' of a legal doc, and by storing documents with metadata to enable easy searching, querying and reporting. Rakuten Blockchain Lab www.voxfp.com We'll then use the prompt and response as input to DeepSeek-R1, as follows: Python output = ollama.generate( model = llm, prompt = f"Using this data: {data}. Respond to this prompt: {prompt}." ) content = output["response"] remove_think_tags = True if remove_think_tags: content = re.sub(r"<think>.*?</think>", "", content, flags = re.DOTALL) print(content) We'll disable <think> and </think> using a flag so that we can control the output of its reasoning process. Example output: Plain Text **Best Investment Opportunities in Blockchain Technology** 1. **PwC and Deloitte: Insurance and Banking with Blockchain** - **Focus:** Utilizes blockchain for secure transactions, cross-border payments, and insurance solutions. - **Opportunities:** Explores innovative applications beyond traditional methods, such as digital currencies and smart contracts. 2. ** Vox Financial Partners: Identity and Smart Contracts** - **Focus:** Delivers structured contract drafting tools (Opal) on permissioned blockchain, aiming to enhance identity verification and secure payments. - **Opportunities:** Offers potential for innovative projects in identity management, leveraging blockchain's scalability benefits. 3. **Rakuten Blockchain Lab: Opal Software Application** - **Focus:** Implements DLT solutions for efficient contract management, which could be expanded or acquired for further development. - **Opportunities:** Provides scalable and secure project opportunities due to DLT's potential for high returns through economies of scale. **Strategic Investment Considerations:** - **Investment Strategy:** Look into joint ventures or partnerships with Deloitte, Vox Financial Partners, and Rakuten. Consider acquisitions of existing projects or expanding current initiatives. - **Competition:** Monitor competition in the market for blockchain software and services, comparing against established players to identify potential unique opportunities. - **Risks:** Note the rapid evolution of blockchain technology requiring continuous investment and the possibility of regulatory changes impacting identity-related applications. - **Scalability:** Consider the potential for high returns from large-scale blockchain projects due to economies of scale but also requiring significant initial investment. **Conclusion:** The best investment opportunities lie in companies like Deloitte with PwC involvement and Vox Financial Partners, particularly their focus on identity and smart contracts. Rakuten's Opal software offers another key area with potential for further development. The output contains some inaccuracies, such as incorrectly attributing blockchain work to Deloitte, misrepresenting Vox Financial Partners' focus on identity verification instead of regulatory contract management, and mistakenly associating Rakuten with Opal software. Additionally, Rakuten Blockchain Lab's role is unclear in the source data. Summary In this article, we used DeepSeek-R1 in a local RAG setup using Ollama. We walked through loading a document, generating embeddings, and storing them in SingleStore for retrieval. We used LangChain to perform a similarity search and feed relevant context to the model. With DeepSeek-R1 running well in this setup, developers now have more flexibility to experiment, iterate, and build robust, fully local AI applications without relying on cloud-based APIs. Overall, a more accurate summary from DeepSeek-R1 would focus on PwC's financial blockchain initiatives, Vox's regulatory technology, and the need to verify Rakuten's involvement. However, strategic investment considerations, such as assessing competition, scalability, and regulatory risks, agree with key factors investors should consider in the blockchain space.

By Akmal Chaudhri

CORE

A Modern Stack for Building Scalable Systems

In software engineering, we have a lot of tools—tens or hundreds of different tools, products, and platforms. We have SQL DBs, we have NoSQL DBs with multiple subtypes, we have queues, data streaming platforms, caches, orchestrators, cloud, cloud versions of all the previous. We have enough .... In this article, I want to describe a “basic” modern stack that will allow you to build robust and scalable systems. They are language agnostic and can be easily integrated into most of the modern day programming languages. Disclaimer The content of this text and the recommendations presented are loosely based on the annual JetBrains developer survey and my own experience. ProblemTool RecommendationOrchestrationKubernetesCloud ProvidersAWS/GCPRelational DBPostgresNon-Relational DBMongo/BigQueryMessaging/StreamingKafkaCachingRedisMonitoringPrometheus/Grafana All of these tools have a few things in common: a very mature ecosystem built around thema strong community of everyday userswidely adopted across the industry Due to these traits, I strongly believe that most if not all the tools will stay with us for years to come. Thus, I think that they are all good points when you cannot decide on what to learn next. More importantly, they are worth investing your time in learning. However, I do not suggest becoming an expert in a specific tool - unless it is a skill you want to build your career on. Quite the opposite, I would recommend getting somewhat familiar with its basics: how they work, the trade-offs, and best practices of each. You can gain more in-depth knowledge and experience later on when you start using them. Let’s go into all the tools in more detail. Tools Kubernetes Most applications nowadays utilize Kubernetes in one form or the other. Either indirectly in the form of GKE, EKS, or other managed K8s service, or directly via a self-managed Kubernetes cluster. Tools like Helm or Terraform add even more features to the already extensive possibilities of Kubernetes. Additionally, many tools used in the cloud-native approach are directly dependent on the correct operations of K8s. There is an enormous community built around Kubernetes, with tutorials, training, conferences. You will be able to find tons of useful learning material and people willing to help. If you decide to get some more professional confirmation of your skills, then the certificates are ready for you. Cloud Despite the fact that the cloud is not as big a deal as it was a couple of years ago and that there are ongoing discussions on whether migration to the cloud is a cost-effective undertaking, cloud services are still very important. They remain a viable option for any startup and/or company that does not want to build its own data centers and/or provision its own machines. There is a constant movement of companies migrating in and out of the cloud. Some companies opt for the first while others opt for the second. The overall number of the companies following the first trend, combined with the number of current cloud based services users, seems rather bigger than the second. I strongly believe that with the current number of different services and tools offered by different cloud providers, it is very unlikely - if not impossible at all - that this trend will be short-lived. At least without substantive change in the world around us. Cloud will stay with us for a long time. You do not have to care too much as to the choice of exact cloud provider and just have to pick either AWS, GCP, or Alibaba Cloud (if you are based in Asia). Then, in case of a possible change, a reasonably large part of your knowledge will remain valid. In most cases, the differences are slight, and the biggest one is often the name of the service. Conceptually, it remains largely the same. Postgres Postgres is one of the most widely used databases in the world. It was created in Berkeley almost 30 years ago. Currently, it is a number one pick for developers who need a relational database for their projects. Together with MySQL, it forms the core of the modern day internet. While in terms of pure numbers, it is not as popular as MySQL, Postgres is not far behind. From my own experience, there was only one project I worked on that was not based on Postgres, at least among those reliant on relational databases. Postgres is stable, battle tested, and has a strong ecosystem. Furthermore, it provides an extensive set of features for example geo-spacial queries or JSON as data format support. On top of that, it is completely free. NoSQL NoSQL databases are getting more important by the day. They serve multiple purposes, from read-optimized MongoDB through very, very large datastores like BigQuery or BigTable, easily able to handle petabyte scale data, to write-optimized like Cassandra. Building some system and achieving higher and/or better guarantees would not be possible without them. Here, I would recommend not focusing on a particular tool, but rather getting familiar with the concept as a whole. What are the different types of NoSQL databases, and what are their use cases? Maybe read about the leading database in each category. Learn the basics as to how they work and what they offer, and some of their pros and cons. However, if you want specific recommendations, then I suggest looking at MongoDB and GCP BigQuery: Mongo, while having its quirks, is still the most commonly used NoSQL database. It is also quite simple from a user perspective and provides a nice entry point into NoSQL.BigQuery is quite a powerful data store capable of handling petabytes of data. Which makes it ideal to use as a sink or source in data analysis flows and working with massive datasets. Kafka I think that I do not need to introduce Kafka to anyone, but I will do so anyway. It is probably the most performant streaming platform currently available on the market. Kafka is able to achieve enormous throughput and process hundreds of thousands or even millions of messages per second. It is a go-to choice whenever someone needs to process a large quantity of data. Their client portfolio contains some of the biggest technology companies in the world including Uber, Netflix, and LinkedIn, just to name a few. There is a large community built around Kafka with its own events, talks, experts, courses, and certifications. Moreover, most if not all modern day programming languages are supported by Kafka. There are a number of connectors for many of the tools used in building data processing flows, making it easier to integrate Kafka into your system. I bet that you will encounter Kafka sooner or later during your journey as a software engineer. Even if it will not be Kafka, but some other platform, your knowledge and expertise with it will surely count. In my opinion, this is a good starting point for your journey in learning how Kafka operates. Redis Redis is an in-memory data store, most commonly used as a simple key-value store. However, it can be successfully used as a vector database or queue. Redis is primarily known for its high scalability and simplicity. It is easy to set up and integrate, and is capable of handling tens of thousands of requests with ease. A combination of all the traits above makes Redis an excellent pick for a caching utility. When applied correctly, it can greatly reduce the load put on our application. What is interesting about Redis is its relative lack of direct competitors. The only tool that comes to mind is Memcached, and it has a much smaller user base in comparison to Redis. Monitoring Here, I would mostly like to stress the importance of correct monitoring in present day systems. Sooner or later, you will need them or regret not having them properly set up at the beginning. I have mostly worked with the Prometheus/Grafana stack, and I can honestly recommend this duo to anyone. In my opinion, it is a simple yet powerful combination, easy to set up, configure, and manage later. It is one of a few industry tested approaches to monitoring the application state. Other tools like these include for example ELK(Elastic, Logstash, Kibana), Coralogix, or Datadog. However, remember that, due to my work experience with Prometheus/Grafana, I can be slightly biased towards this duo, so treat my opinion with a grain of salt. Summary Here we are, at the end. You have read my tech stack proposition - does it sound good to you? Below, you can take another look at this proposition. I have prepared a set of links to sites where you can start your journey with each of them. ProblemTool RecommendationOrchestrationKubernetesCloud ProvidersAWS/GCPRelational DBPostgresNon-Relational DBMongoDB/BigQueryMessaging/StreamingKafkaCachingRedisMonitoringPrometheus/Grafana Remember not to get overwhelmed with the number of different tools. You do not have to be an expert in all of them. Just grasp the general idea, and know where and how you may use it. The expertise will come with the experience. Thank you for your time.

By Bartłomiej Żyliński

CORE

Software Bill of Materials (SBOM): Enhancing Software Transparency and Security

Abstract This article explores the concept of a Software Bill of Materials (SBOM) as an essential tool in modern software development and cybersecurity frameworks. The SBOM acts as a detailed inventory of all software components, dependencies, and associated metadata within an application. By providing transparency, facilitating risk mitigation, and supporting regulatory compliance—particularly for software products intended for U.S. federal agencies—the SBOM strengthens software security. Through a detailed examination of SBOM implementation, benefits, and associated technologies—such as composition analysis and binary detonation—this article highlights the SBOM's role in fostering a secure development environment. Introduction The Software Bill of Materials (SBOM) is a comprehensive list detailing every software component, dependency, and metadata associated with an application. By cataloging software parts, it enables organizations to manage software more effectively and enhances visibility into potential security risks. The significance of SBOMs lies in their ability to offer transparency, build trustworthy software, and address cybersecurity challenges—especially relevant in compliance-heavy environments. This article aims to provide an in-depth analysis of SBOMs, their roles in cybersecurity and compliance, and how they integrate into modern software frameworks. Understanding SBOMs and Their Importance Definition and Structure of an SBOM An SBOM comprises a detailed record of software libraries, dependencies, and metadata for each component. It serves as an inventory for developers, enabling them to understand what’s inside their software products and make informed security decisions. Key Benefits of an SBOM Transparency: SBOMs enhance the visibility of a software product's components, providing a clearer view of what’s running under the hood.Risk mitigation: They support proactive vulnerability identification and allow for timely remediation.Compliance requirements: SBOMs are critical for products sold to U.S. federal agencies, aligning with Executive Order 14028, which emphasizes improved cybersecurity in critical software infrastructure (The White House, 2021).Software security: By identifying components susceptible to supply chain attacks or vulnerabilities in third-party libraries, SBOMs contribute significantly to a secure software environment (CrowdStrike, n.d.). Technologies and Methods Used in SBOMs Composition Analysis Composition analysis identifies and assesses each software component within an SBOM. This method enables organizations to evaluate dependencies and potential risks tied to each component, helping them manage risk effectively and improve transparency (NTIA, 2021). Variability Management In the SBOM context, variability management maintains consistency across software versions and configurations, helping organizations ensure compatibility and minimize security risks in software variations. This process is essential for managing software that may be customized or modified for different environments (Synopsys, 2022). Binary Detonation Binary detonation is an advanced cybersecurity technique used to analyze software binaries for potential threats. By isolating and detonating these binaries in a controlled environment, organizations can detect malicious behaviors, enhancing SBOM's ability to manage security risks in executable files (Jin & Austin, 2020). Open-Source Approaches With the extensive use of open-source components in modern software, SBOMs play a pivotal role in addressing associated security risks. By integrating open-source SBOM tools, organizations can automate the tracking of open-source components and manage vulnerabilities more effectively (OpenSSF, 2022). Approach Architecture SBOMs integrate with CI/CD pipelines, facilitating automation in creating, updating, and managing SBOMs throughout the software lifecycle. This approach allows for continuous monitoring and ensures that SBOMs remain up-to-date as software evolves (CycloneDX, 2023). Implementation of SBOMs in Organizations Steps for SBOM Implementation Implementing SBOMs involves identifying components, performing composition analysis, and ensuring continuous monitoring. Automated SBOM generation tools can streamline this process, reducing the complexity of manual updates and integration into workflows (Garfinkel & Cox, 2022). Challenges and Solutions Common challenges include managing SBOM complexity, resource allocation, and maintaining current data. Solutions involve automating SBOM generation and leveraging SBOM-integrated tools in development workflows, simplifying compliance and security efforts (Synopsys, 2023). Case Studies Successful SBOM implementations are seen in high-security industries, such as finance and defense, where SBOMs help protect software from supply chain vulnerabilities and regulatory risks. Benefits and Impact of SBOMs on Cybersecurity Enhanced Vulnerability Detection SBOMs improve vulnerability management by providing a clear view of all components in use, allowing organizations to pinpoint potential threats and address them proactively (Black Duck by Synopsys, 2023). Facilitating Incident Response In the event of a security incident, SBOMs enable rapid response by identifying specific affected components, making it easier to resolve issues efficiently. Compliance and Legal Benefits SBOMs offer substantial compliance benefits, especially regarding regulatory requirements. By clearly documenting components, they aid organizations in meeting legal standards and cybersecurity mandates (CISA, 2022). Improving Software Supply Chain Security SBOMs enhance supply chain security by identifying and mitigating risks associated with dependencies. By providing transparency into third-party components, they help secure software ecosystems against potential attacks (NIST, 2022). Future Directions and Trends in SBOMs Advances in SBOM Standards Emerging SBOM standards, such as SPDX and CycloneDX, aim to unify SBOM formats and facilitate widespread adoption. These standards will enable greater interoperability between tools and systems, making SBOMs more versatile (SPDX, 2020; CycloneDX, 2023). SBOM Automation and AI Integration AI-driven tools offer promising potential for enhancing SBOM generation, vulnerability scanning, and threat detection. As AI tools mature, they may automate many SBOM functions, simplifying software transparency and security processes (Gartner, 2022). Regulatory Landscape With regulatory mandates evolving, particularly in the U.S., organizations across various industries may be required to adopt SBOM practices. This trend is anticipated to expand as cybersecurity becomes a national and international priority (The White House, 2021). Conclusion SBOMs are crucial for building secure, transparent, and compliant software systems. As software environments become increasingly complex, SBOMs stand to play a foundational role in modern cybersecurity practices and secure the software supply chain. Future research should focus on AI-enhanced SBOM management and advancements in SBOM standards, which can promote more robust security measures across industries. References The White House. (2021). Executive Order 14028: Improving the Nation's Cybersecurity. Retrieved from https://www.whitehouse.gov/briefing-room/presidential-actions/2021/05/12/executive-order-on-improving-the-nations-cybersecurity/National Telecommunications and Information Administration (NTIA). (2021). The Minimum Elements For a Software Bill of Materials (SBOM). Retrieved from https://www.ntia.gov/report/2021/minimum-elements-software-bill-materials-sbomCybersecurity and Infrastructure Security Agency (CISA). (2022). Software Bill of Materials (SBOM). Retrieved from https://www.cisa.gov/sbomNational Institute of Standards and Technology (NIST). (2022). Secure Software Development Framework (SSDF) Version 1.1. NIST Special Publication 800-218. Available at: https://doi.org/10.6028/NIST.SP.800-218SPDX. (2020). SPDX Specification Version 2.3. The Linux Foundation. Retrieved from https://spdx.github.io/spdx-spec/CycloneDX. (2023). CycloneDX SBOM Standard Specification Version 1.4. OWASP Foundation. Retrieved from https://cyclonedx.org/specification/CrowdStrike. (n.d.). What is a Software Bill of Materials (SBOM)? Retrieved from https://www.crowdstrike.com/platform/cloud-security/Synopsys. (2022). Understanding the Software Bill of Materials (SBOM). Synopsys Blog. Retrieved from https://www.synopsys.com/blogs/software-security/understanding-sbom/Open Source Security Foundation (OpenSSF). (2022). Supply Chain Security Best Practices. Retrieved from https://openssf.org/blog/2022/09/01/npm-best-practices-for-the-supply-chain/

By Vishal Raina