Choosing the Subarray That Adds Up to a Target Number

1. Overview

The problem of choosing a subarray that fulfills some conditions can have many versions. In this tutorial, we’ll discuss finding the subarray with the closest sum to a target number without exceeding it.

We’ll discuss multiple solutions and provide a comparison between them.

2. Defining the Problem

The problem gives us an array of size $n$ and a target number $k$ . We’re asked to find the subarray whose sum is as close to $k$ as possible, without exceeding it.

Let’s take the array $A$ as an example:

Suppose the target number is $k=12$ . In this case, we can see that some subarrays, such as $[0..3]$ and $[5..7]$ , have a sum equal to 10. However, the best choice is to choose the subarray $[2..5]$ because it has a sum of 11. Hence, its sum is just smaller by one from $k$ . Also, note that this is the closest we can get to $k$ without exceeding it.

We’ll discuss three approaches to solve this problem.

3. Naive Approach

The naive approach checks every possible subarray of the array $A$ . Take a look at the implementation of the naive approach:

algorithm NaiveApproach(A, n, k):
    // INPUT
    //    A = The array of elements indexed from 0 to n - 1
    //    n = The size of the array A
    //    k = The target sum
    // OUTPUT
    //    The maximum sum of a subarray of A that does not exceed k

    answer <- 0
    for L <- 0 to n - 1:
        sum <- 0
        R <- L
        while R < n and sum + A[R] <= k:
            sum <- sum + A[R]
            R <- R + 1
        answer <- max(answer, sum)
    return answer

First, we initialize the best answer with zero. Second, we iterate over all possible beginning $L$ of the subarray to take.

From each beginning, we check to see how far this subarray can be extended. Therefore, we keep moving $R$ forward as long as the sum doesn’t exceed $k$ . When the current subarray can’t be extended any farther, we compare its sum with the stored answer and keep the maximum between them.

Finally, we return the best answer we managed to find.

It’s important to note that, for each beginning $L$ , we’re making the right side of the range $R$ start from $L$ . Then, we keep moving $R$ as far as possible.

Hence, the complexity of the naive approach is $O(n^2)$ , where $n$ is the number of elements inside the array $A$ .

4. Two-Pointers Approach

The two-pointers approach is an update over the naive approach. Consider the pseudocode implementation of the two-pointers approach:

algorithm TwoPointersApproach(A, n, k):
    // INPUT
    //    A = The array of elements
    //    n = The size of the array A
    //    k = The target sum
    // OUTPUT
    //    The maximum sum of a subarray of A that does not exceed k

    answer <- 0
    sum <- 0
    R <- 0
    for L <- 0 to n - 1: 
        if L > 0:
            sum <- sum - A[L - 1]
        while R < n and sum + A[R] <= k:
            sum <- sum + A[R]
            R <- R + 1
        answer <- max(answer, sum)
    return answer

As we can see, the implementation is similar to the naive approach. However, the main difference is that we don’t set the value of $R$ to $L$ each time.

Instead, we let $R$ keep its old value, as well as the sum.

Let’s explain this idea.

Suppose we’re checking a subarray that begins at $L$ . Also, let’s assume we managed to find the best ending at $R$ . When we move to the next subarray, the one that starts at $L+1$ , the difference is that the number at index $L$ is out of the range. Therefore, we subtract it from the sum.

However, now that the number at index $L$ is out of the subarray, we may be able to get new elements to the subarray. Hence, we check to see how far can we push $R$ forward, without having the sum exceed $k$ .

Although the complexity may sound similar to the naive approach, the complexity of the two-pointers approach is $O(n)$ , where $n$ is the number of elements inside the array $A$ . The reason is that $R$ never goes back. As a result, the inner $while$ loop gets executed at most $n$ times in total.

5. Prefixes Approach

The problem with the first two approaches is that we assumed there’d be no negative values. Hence, whenever the sum of a subarray exceeds $k$ , we always stop and check the answer.

On the other hand, the prefixes approach handles the problem even if the array contains negative values.

Let’s have a quick background of prefix sum; then we can jump into the third approach to solve this problem.

5.1. Prefix Sum Background

Suppose the problem was to answer some queries that ask for the sum of a certain subarray. In this case, we can calculate the prefix array in advance. Each index inside the prefix array $prefix[i]$ holds the sum of the prefix that ends at the index $i$ .

In other words, $prefix[i]$ holds the sum of the range $[0..i]$ . For simplicity, we’ll shift the $prefix$ array by one. Hence, $prefix[i+1]$ will hold the sum for the subarray $[0..i]$ . Also, $prefix[0]$ will equal 0 indicating the empty prefix.

To calculate the sum of a certain range $[L, R]$ inside the array, we can split this range into two ranges.

The first range is $[0, R]$ , which contains the needed rage with some extra elements at the beginning.

The second range is $[0..L-1]$ , which contains all the extra elements that the first range had.

As a result, the sum of the subarray $[L..R]$ can be obtained by subtracting the range $[0..L-1]$ from the range $[0..R]$ . Therefore, we get the following equation:

$sum[L..R] \leftarrow prefix[R] - prefix[L-1]$

Now that we know the concept of prefix sum, we can use it to enhance our approach.

5.2. General Idea

In this approach, we’ll iterate over all possible endings $R$ of the subarray. For each ending, we need to choose the best beginning that makes the sum add up as close to $k$ as possible.

Since the array may contain negative values, we can’t use the concept of the two-pointers approach. The reason is that the sum can exceed $k$ and then, by adding some negative values, can drop below $k$ again.

Therefore, we’ll use the prefix equation we presented above:

$sum[L..R] \leftarrow prefix[R] - prefix[L-1]$

We need the value $sum[L..R]$ to be as close to $k$ as possible. Also, the value of $prefix[R]$ is known because we’re iterating over all possible endings. Hence, we need to find the value of $prefix[L-1]$ that fits the equation:

$prefix[L-1] \leftarrow prefix[R] - k$

This value of $prefix[L-1]$ is the best that makes the sum equal to $k$ . In case we didn’t find such a value, we need to find the value that is as close to this one as possible. Therefore, we need to find the largest value of $prefix[L-1]$ that doesn’t exceed $prefix[R] - k$ .

5.3. Implementation

Let’s take a look at the implementation of the described approach:

algorithm PrefixesApproach(A, n, k):
    // INPUT
    //    A = The array of elements indexed from 0 to n - 1
    //    n = The size of the array A
    //    k = The target sum
    // OUTPUT
    //    The maximum sum of a subarray of A that doesn't exceed k
    
    prefix[0] <- 0
    for i <- 0 to n - 1:
        prefix[i + 1] <- prefix[i] + A[i]
    
    answer <- 0
    prefixes <- an empty set
    for i <- 0 to n:
        prefixL <- binarySearch for prefix[i] - k in prefixes
        answer <- max(answer, prefix[i] - prefixL)
        insert prefix[i] into prefixes
    return answer

To perform the query of finding the largest value that doesn’t exceed a specific number, we’ll use a tree-set. In the beginning, we’ll initialize the set to be empty.

Next, we’ll iterate over all possible endings of the range. For each ending $R$ , we perform a binary search operation to find the closest value to $prefix[i] - k$ . We’ll assume that the function returns a zero in case such value doesn’t exist.

After getting this value, we store the maximum answer between the old answer and the newly calculated one. After that, we insert the value of the current prefix to the set to search for it later.

Finally, we return the best answer we found.

The complexity of the prefixes approach is $O(n \cdot log(n))$ , where $n$ is the number of elements inside the array.

6. Comparison

Take a look at a comparison between the described approaches:

	Naive	Two-Pointers	Prefixes
Main Idea	Check all subarrays.	Don’t start each subarray from the beginning. Continue from the last subarray.	Get the closest prefix that differs by k from the current
Requirements	Only works with non-negative numbers	Only works with non-negative numbers.	May contain negative numbers.
Complexity	$O(n^2)$	$O(n)$	$O(n \cdot log(n))$

As we can see, the naive approach is not the best choice to go with. However, based on the types of numbers inside the array, we have two options. If all the numbers are non-negative, it’s always better to use the two-pointers approach since it has the lowest complexity.

On the other hand, if the array contains negative numbers, we should go with the prefixes approach since it’s the only approach that can handle negative values inside the array.

7. Conclusion

In this tutorial, we presented the problem of finding the subarray that adds up to a target number. We explained multiple approaches and showed the advantages of each approach.

Finally, we presented a comparison between the described approaches and illustrated when to use each one.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Core Concepts

Operating Systems

Neural Networks

Graph Theory

Latex

Full Archive

About Baeldung