1. Introduction

In this article, we’re going to explore Lightrun – a Developer Observability platform – by introducing it into an application and showing what we can achieve with it.

2. What Is Lightrun?

Lightrun is an observability platform that allows us to instrument our Java (other languages are also supported) applications and then view the instrumentation directly from within IntelliJ, Visual Studio Code, and many other logging platforms and APMs. It’s designed to be able to seamlessly add instrumentation to applications running in any environment and access them from anywhere, allowing us to quickly diagnose issues anywhere from our local workstation all the way to production instances.

Lightrun works with two different components that integrate together:

  • The Lightrun Agent runs as part of the application and instruments telemetry as requested. In Java applications, this works as a Java Agent. We’ll run this agent as part of every application that we want to use Lightrun with.
  • The Lightrun Plugin runs as part of our development environment and allows us to communicate with the agents. This is our means to see what is running, add new instrumentation to an application and receive the results of this instrumentation.

Once all of this is set up, we can then manage three different types of instrumentation:

  • Logs – These are the ability to add arbitrary log statements into the running application at any point, logging out any available values (including complex expressions). These logs can be sent either to the standard output, back to the Lightrun plugin in our development environment, or both at the same time. In addition, they can be invoked conditionally – for example, based on a specific user or session ID pre-defined in the code.
  • Snapshots – These allow us to capture a live snapshot of the application at any point. This will record the details of exactly when and where the snapshot was triggered, the value of all variables, and the complete call stack to this point. These can also be invoked conditionally, much like Logs.
  • Metrics – These allow us to record metrics similar to what can be generated by Micrometer, allowing us to count the number of times a line of code is executed, record timings for a block of code, or any other numerical calculation we might want.

All of these things can be done easily in our code already. What Lightrun gives us here is the ability to do these things in an already running application without needing to change or re-deploy the application. This means we can get targeted instrumentation in production with zero downtime.

Furthermore, all these logs are ephemeral. They do not persist in the source code or running application and can be added and removed as needed.

3. Example Application

For this article, we have an application that is already built and ready to work with. This application is designed for tracking tasks that are assigned to people and allows users to query this data. This code can be found on GitHub and will require Java 17+ and Maven 3.6 to build it correctly.

This application is architected as three different services – one for managing users, another for managing tasks, and a third that orchestrates over the two of them. The tasks-service and users-services then have their own databases, and there is a JMS queue between the two – allowing for the users-service to indicate that a user was deleted so that the tasks-service can tidy things up.

These databases and the JMS queue are all embedded within the applications for convenience. However, in reality, this would naturally use real infrastructure.

3.1. Tasks Service

In this article, we’re only interested in the tasks-service. However, in future articles, we’re going to explore all three of them and how they interact with each other.

This service is a Spring Boot application built with Maven on Java 17. When running, this has HTTP endpoints for:

  • GET / – Allows the client to search tasks, filtering by the user that created it and by the status of it.
  • POST / – Allows the client to create a new task.
  • GET /{id} – Allows the client to get a single task by ID.
  • PATCH /{id} – Allows the client to update a task, changing the status and the user it’s assigned to.
  • DELETE /{id} – Allows the client to delete a task.

We also have a JMS listener, which can indicate when a user was deleted from our users-service. In this case, we automatically delete all tasks created by that user and unassign all tasks assigned to that user.

We also have a couple of bugs in our application that we’ll be able to diagnose with the help of Lightrun.

4. Setting Up Lightrun

Before we start, we’ll need an account with Lightrun and to set it up locally. This can be done by visiting https://app.lightrun.com/ and following the instructions.

Once we have registered, we’ll need to select the development environment and programming language. For this article, we’ll be using IntelliJ and Java, so we’ll select those and move on:

lightrun setup

We then get instructions for how to install the Lightrun plugin into our environment, so we can just follow these.

We also need to ensure that we sign in to our new account from our development environment, after which we’ll have access to our Lightrun agents – none yet – from within the editor:

lightrun connect

Finally, we get instructions on how to download the Java agent that we’ll use to instrument our applications. These instructions are platform-specific, so we need to make sure we follow the ones that work for our exact setup.

Once we’ve done this, we can start our application with the agent installed. Make sure that the tasks-service is built, and then we can run it:

$ java -jar -agentpath:../agent/lightrun_agent.so target/tasks-service-0.0.1-SNAPSHOT.jar

At this point, the Onboarding screen in our web browser will allow us to progress, and the UI in our development environment will update automatically to show our application running:

lightrun connected

Note that these are all connected to our Lightrun account, so we can see them regardless of where the applications are running. This means we can use the exact same tooling on our applications running on our local machine, inside Docker containers, or any other environment that supports our runtime, regardless of where it is in the world.

5. Capturing Snapshots

One of the most powerful features of Lightrun is the ability to add snapshots to currently running applications. These will then allow us to capture the exact state of execution at a given point in our application. This can then give invaluable insights into exactly what is happening within our code. They can be thought of as “virtual breakpoints”, except that they don’t interrupt the flow of the program. Instead, they capture all of the information that you would be able to see from a breakpoint for us to look at later.

Snapshots – as well as Logs and Metrics – are added from within our development environment. We’ll typically do this by right-clicking on the line that we want to add the instrumentation and then selecting the “Lightrun” option.

Then we can add our instrumentation by selecting it from the subsequent menu:

lightrun snapshots

This will then open a panel allowing us to add the snapshot:

lightrun create snapshot

Here we need to select the agent that we want to instrument, and possibly specify other details about exactly how it will work.

When we’re happy with everything, we then hit the Create button. This will then add a new Snapshot entry into our sidebar, and we’ll get a blue camera icon against the line of code.

This then indicates that this line will capture a snapshot when executed:

lightrun snapshot entry

Note that if something goes wrong, the camera will be red instead. Typically, this would mean that the running code doesn’t correspond to the source code, though other reasons might exist and need to be explored here as well.

6. Diagnosing A Bug – Searching Tasks

Our tasks-service, unfortunately, has a bug where performing a filtered search of tasks never returns anything. If we perform an unfiltered search, then this will correctly return all tasks, but as soon as a filter is added – whether it’s createdBy, status, or both – then we suddenly get no results.

For example, if we make a call to http://localhost:8082?status=PENDING then we should get some results, but instead, we always get an empty array.

Our application is architected such that we have a TasksController to handle the incoming HTTP request. This then calls the TasksService to do the real work, and this works in terms of a TasksRepository.

This repository is a Spring Data interface meaning that we have no code in there directly that we can instrument. Instead, we’ll add a snapshot in the TasksService. In particular, we’ll add it on the very first line of the search() method. This will let us see the initial conditions that exist when the method is called, regardless of which code path we end up going through inside the method:

lightrun add snapshot

Having done this, we’ll then call our endpoint. Again, we’ll get the same result of an empty array.

However, this time we’ll capture a snapshot in our development environment – which we can see on the Snapshots tab:

lightrun snapshots tab

This shows us the stack trace to where our snapshot was captured and the state of all visible variables at the time it was captured. Let’s focus on the variables here. Two of these are the parameters that were passed to the method, and the third is this. The parameters are the ones that are potentially most interesting, so we’ll look at those.

Immediately, we can see the problem. We’ve been given the value “PENDING” – which is the status that we’re searching for – in the createdBy parameter!

Looking closer at the code, we see that we’ve unfortunately transposed the parameters between TasksController and TasksService. This is an easy fix, and if we were to make it – either by swapping the parameters in TasksService or the values passed in from TasksController – then suddenly, our search will start working properly.

7. Summary

Here we’ve seen a quick introduction to the Lightrun observability platform, how to get started with it, and some of the benefits it can give us. We’ll be exploring these in more depth in upcoming articles.

Why not use it in your next application, to give more confidence and insight into how it operates.

The examples can be found over on GitHub.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.