Introduction to Apache Pekko | Baeldung on Scala

1. Introduction

In September 2022, Lightbend announced it would change Akka’s license model. In particular, effective Akka 2.7.x, the license changes from the open-source Apache 2.0 to the source-available Business Source License (BSL) 1.1. Therefore, all companies wishing to use Akka 2.7.x onwards on production environments must get a license from Lightbend.

Consequently, a fork of Akka 2.6.x was born: Apache Pekko. In this tutorial, we’ll learn about the Apache Pekko libraries. First, we’ll take a look at a few general concepts. Then, we’ll briefly see an overview of the currently available modules in Apache Pekko, even though we won’t dive deep into them.

All core Apache Pekko functionality is open source. For the remainder of this article, we’ll refer to it as Pekko for short.

2. Core Concepts in Pekko

Similar to Akka, Pekko is a collection of open-source libraries that enable the design of resilient, scalable systems across processor cores and networks. This section outlines some of the most important concepts when working with Pekko.

2.1. Actors and Their Interactions

The most basic building block in Pekko is the actor.

Generally, actors are objects that encapsulate state and behavior enhanced with the concept of a mailbox. Actors can only communicate with one another through their mailboxes. More precisely, they send messages to each other that get queued in the mailboxes. When an actor A wants another actor B to perform a task, it sends a message. If B has to reply to A, it sends another message. All interactions are asynchronous and available when running within a single JVM or on a cluster of distributed machines.

When implemented correctly, this communication paradigm makes actors safe and lightweight in a multi-threaded environment. Actors are only allowed to process one message at a time. Therefore, race conditions in their internal state won’t happen.

Furthermore, actors are given a thread only when they have to run so that one actor doesn’t block other actors from running.

2.2. Actor Systems

In Pekko, actor systems are objects to manage several actors as well as other facilities, such as scheduling, logging, and configuration. They’re heavyweight structures, as they allocate several threads to be loaned to the actors they manage.

Depending on our application, we can decide to run a single actor system, many actor systems within the same JVM, or on different machines forming a distributed cluster.

Having a single object managing groups of actors makes it possible to implement supervision and monitoring. As a matter of fact, actors can form hierarchies, where parent actors supervise the children.

Supervision deals with unexpected failures and should be separated from the business logic of an actor. Pekko allows for three different supervision strategies: resuming the failed actor, restarting it, or stopping the work permanently. The first possibility keeps the internal state of the actor, whereas the second one doesn’t. What strategy to choose depends on the nature of the operation being performed.

Monitoring, on the other hand, isn’t reserved for parents. Any actor can monitor zero or more other actors, reacting to their termination. This differs from supervision, where parents react to their children’s failures.

2.3. Transparency

As we saw before, the interaction patterns among actors are the same whether they’re running within the same JVM or on different machines in a distributed cluster.

Since all operations are asynchronous, there are no hard guarantees for the delivery of a message. Messages can get lost (for example, due to a network failure or partitioning) or might take a long time to get delivered (maybe because the network is slow).

This scenario is quite unlikely if the two actors are running on the same JVM, but it becomes a serious concern in distributed clusters. If an actor A expects a message from B but never receives it, it’s, in general, impossible to determine if the message was lost, if B never sent it in the first place (perhaps because it crashed), or if the network is just slow. This issue is generally resolved by re-sending the message multiple times until either A acknowledges it or a timeout triggers.

To enable location transparency, Pekko makes use of actor references. We can think of an actor reference as an address used to send a message to a given actor. In Pekko, the communication among different actor systems is transparent.

In particular, the Remoting module takes care of connecting different actor systems, whether they’re on the same JVM or not. This, combined with the use of actor references, makes the communication between different actor locations transparent. This means Actor A doesn’t need to know where Actor B is running because actor systems and Pekko Remoting take care of resolving a given actor reference.

3. Apache Pekko Modules

At the time of writing, the following modules are available for use in Apache Pekko:

Actor library, based on the Akka Typed Actor API. We can use this module to access the actors model provided by Pekko. Similarly to Akka, an “untyped” actors model is available as well.
Remoting, to let actors on different JVMs communicate transparently
Cluster, to manage multiple actor systems all together
Cluster Sharding, to distribute different actors on multiple actor systems across different JVMs
Cluster Singleton, to create actors with a single instance within a Pekko cluster
Persistence, to leverage event sourcing to persist the state of the actors in a Pekko actor system
Projections, to build alternate or aggregate views over an event stream
Distributed Data, to leverage Conflict-Free Replicated Data Types (CRDTs) to accept writes on different nodes of a Pekko cluster, even in the presence of network partitions
Streams, to process streams of data
Apache Pekko Connectors, a separate module from Pekko, built on top of Pekko Streams to implement streaming-aware connectors for third-party services
HTTP, a separate module from Pekko to construct and consume HTTP-based services
gRPC, a separate module from Pekko, to integrate gRPC into Pekko-based applications

All the aforementioned libraries are available in Maven Central in the org.apache.pekko group.

4. Migrating From Akka 2.6 to Pekko

Broadly speaking, if our project depends on Akka 2.6.x, there are a few changes we’ll have to make to migrate to Apache Pekko. Besides the changes in the dependencies (from com.typesafe.akka to org.apache.pekko) and in the packages (from akka to org.apache.pekko), there are a few modules/configuration-related differences to take into account. The main ones are as follows:

Pekko Connectors corresponds to Alpakka in the Akka ecosystem
The configuration settings start with the prefix pekko, instead of akka
Pekko uses different ports for Classic Remoting and Artery Remoting: 7355 and 17355, instead of 2552 and 25520 in Akka, respectively.

Depending on the Pekko version we’re migrating to, there might be other differences to tackle.

5. Conclusion

In this article, we learned about Apache Pekko. First, we discussed some general concepts. In particular, we saw what actors and actor systems are.

We then scratched the surface of how communication between actors works, introducing the concept of location transparency and clusters. After that, we looked at the modules available at the time of writing.

Lastly, we briefly analyzed the main things to watch out for when migrating from Akka to Pekko.

Full Archive

About Baeldung

Scala Basics

Functional Programming

Akka

Scala OOP

Scala Type System

Testing

Play Framework