Let's get started with a Microservice Architecture with Spring Cloud:
Calculate the Cosine Similarity of Two Vectors in Java
Last updated: February 14, 2026
1. Overview
In this tutorial, we’re going to learn how to calculate the cosine similarity of two vectors in Java. We’ll begin by implementing the core math natively in Java using a traditional loop approach and then a more modern Stream approach. Finally, we’ll see how easy the task becomes using the ND4J library.
Cosine similarity is a key metric in data science and information retrieval. It measures the cosine of the angle between two non-zero vectors, which effectively determines how similar they are. When the angle between two vectors is 0 degrees, the similarity is 1 (identical direction); when the angle is 90 degrees, the similarity is 0 (no relation).
2. Native Java Implementation
The formula for cosine similarity relies on the dot product of the vectors (the numerator) and the product of their magnitudes (the denominator):
C = (A⋅B) / (∥A∥⋅∥B∥)
To keep our code focused, we’ll use a single utility method that computes all three parts. We’ll handle the vector lengths and zero-magnitude checks inside this method:
static double calculateCosineSimilarity(double[] vectorA, double[] vectorB) {
if (vectorA == null || vectorB == null || vectorA.length != vectorB.length || vectorA.length == 0) {
throw new IllegalArgumentException("Vectors must be non-null, non-empty, and of the same length.");
}
double dotProduct = 0.0;
double magnitudeA = 0.0;
double magnitudeB = 0.0;
for (int i = 0; i < vectorA.length; i++) {
dotProduct += vectorA[i] * vectorB[i];
magnitudeA += vectorA[i] * vectorA[i];
magnitudeB += vectorB[i] * vectorB[i];
}
double finalMagnitudeA = Math.sqrt(magnitudeA);
double finalMagnitudeB = Math.sqrt(magnitudeB);
if (finalMagnitudeA == 0.0 || finalMagnitudeB == 0.0) {
return 0.0;
}
return dotProduct / (finalMagnitudeA * finalMagnitudeB);
}
Our test cases will use two simple vectors, [3, 4] and [5, 12], which we know should yield a similarity of approximately 0.969:
static final double[] VECTOR_A = {3, 4};
static final double[] VECTOR_B = {5, 12};
static final double EXPECTED_SIMILARITY = 0.9692307692307692;
Let’s verify the native loop implementation works with the expected high similarity score:
@Test
void givenTwoHighlySimilarVectors_whenCalculatedNatively_thenReturnsHighSimilarityScore() {
double actualSimilarity = calculateCosineSimilarity(VECTOR_A, VECTOR_B);
assertEquals(EXPECTED_SIMILARITY, actualSimilarity, 1e-15);
}
We’re using a tolerance of 1e-15 in our assertion because floating-point math can introduce small precision errors.
3. Native Implementation With Java Streams
For a more functional approach, we can rewrite the calculation using Java 8 Stream operations. We’ll use IntStream to iterate over the indices and perform the same mathematical logic, just in a more declarative style:
public static double calculateCosineSimilarityWithStreams(double[] vectorA, double[] vectorB) {
if (vectorA == null || vectorB == null || vectorA.length != vectorB.length || vectorA.length == 0) {
throw new IllegalArgumentException("Vectors must be non-null, non-empty, and of the same length.");
}
double dotProduct = IntStream.range(0, vectorA.length).mapToDouble(i -> vectorA[i] * vectorB[i]).sum();
double magnitudeA = Arrays.stream(vectorA).map(v -> v * v).sum();
double magnitudeB = IntStream.range(0, vectorA.length).mapToDouble(i -> vectorB[i] * vectorB[i]).sum();
double finalMagnitudeA = Math.sqrt(magnitudeA);
double finalMagnitudeB = Math.sqrt(magnitudeB);
if (finalMagnitudeA == 0.0 || finalMagnitudeB == 0.0) {
return 0.0;
}
return dotProduct / (finalMagnitudeA * finalMagnitudeB);
}
This approach is slightly less performant than the traditional loop but is often preferred for its conciseness. We’ll calculate the dot product and magnitudes by using the reduce operation on Streams.
Let’s verify that the Stream-based calculation delivers the same expected result:
@Test
void givenTwoHighlySimilarVectors_whenCalculatedNativelyWithStreams_thenReturnsHighSimilarityScore() {
double actualSimilarity = calculateCosineSimilarityWithStreams(VECTOR_A, VECTOR_B);
assertEquals(EXPECTED_SIMILARITY, actualSimilarity, 1e-15);
}
Using Streams for complex math operations keeps our code clean, making it easier to read and maintain.
4. Using ND4J for High-Performance Calculation
While native implementations are fine for small, single-threaded operations, if we’re working with large datasets, deep learning, or require GPU acceleration, we should use a dedicated numerical library like ND4J (Numerical Data for Java). ND4J offers superior performance and is the backbone of the Deeplearning4j ecosystem.
We’ll need to include the nd4j-api dependency in our pom.xml:
<properties>
<nd4j.version>1.0.0-M2.1</nd4j.version>
</properties>
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-api</artifactId>
<version>${nd4j.version}</version>
</dependency>
ND4J uses the INDArray class to represent vectors and matrices. We’ll convert our double arrays into INDArray objects and then use the dedicated CosineSimilarity operation provided by the framework:
@Test
void givenTwoHighlySimilarVectors_whenCalculatedNativelyWithCommonsMath_thenReturnsHighSimilarityScore() {
INDArray vec1 = Nd4j.create(VECTOR_A);
INDArray vec2 = Nd4j.create(VECTOR_B);
CosineSimilarity cosSim = new CosineSimilarity(vec1, vec2);
double actualSimilarity = Nd4j.getExecutioner().exec(cosSim).getDouble(0);
assertEquals(EXPECTED_SIMILARITY, actualSimilarity, 1e-15);
}
The use of Nd4j.getExecutioner().exec() is necessary because ND4J offloads the mathematical operation to the underlying execution device, which can be the CPU or a GPU.
5. Conclusion
In this article, we’ve covered the practical ways to calculate cosine similarity in Java. We saw that we can implement the core logic ourselves using either a traditional loop or the more modern Java Stream API.
Ultimately, for production code dealing with large data, the best choice is a highly optimized library like ND4J, which provides superior performance and GPU capabilities.
The complete code for this article is available over on GitHub.
















