1. Introduction

File searching by both metadata and data (payload) is critical in most environments. For example, administrators may need a configuration file, while developers can search for parts of code.

In this tutorial, we’ll look at two ways to search for a class file within multiple JAR files. First, we explore both the class and JAR formats in general. Next, we compile a simple dataset. After that, we show a crude approach to searching for class files in all JAR files under a given directory. Finally, we augment and refine our method to reach a better solution.

We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.

2. Class and JAR Files

The Java programming language employs many file formats. Let’s briefly look at the class and JAR files of its Java Virtual Machine (JVM).

2.1. JVM Class File

JVM class files are a stream of 8-bit bytes, which contains the bytecode definition of a single class or interface.

As such, it’s not pure ASCII, although it can contain ASCII characters, and it isn’t the usual binary file either. In fact, there are different tools to see the bytecode inside a class. The extension of a class file is .class.

2.2. Java JAR File

JAva Runtime (JAR) files are compressed archives based on the ubiquitous ZIP format. Usually, they organize and store Java bytecode, data, and metadata.

As such, these files are binary in nature with perhaps only partial ASCII content. The extension of JAR files is .jar.

Commonly, to work with JAR files, we use the jar tool, although zip can manipulate them as well.

Basically, JAR files contain class files, sources, data, and metadata that groups them for easier package deployment.

3. JAR File Dataset

After looking at the format and extension of class and JAR files, let’s explore a way to search for and within them.

First, we have to identify the path which contains all our JAR files. Naturally, they don’t need to be in the root of the same directory, just beneath it.

Let’s consider a tree structure:

$ tree /jars
├── apache-log4j-2.19.0-bin
│   ├── LICENSE.txt
│   ├── log4j-1.2-api-2.19.0.jar
│   ├── log4j-api-2.19.0.jar
│   ├── log4j-core-2.19.0.jar
│   ├── log4j-docker-2.19.0.jar
│   ├── log4j-kubernetes-2.19.0.jar
│   ├── log4j-to-slf4j-2.19.0.jar
│   ├── NOTICE.txt
│   └── RELEASE-NOTES.txt
├── commons-io-2.11.0.jar
├── google-guava
│   ├── dep
│   │   └── failureaccess-1.0.1.jar
│   └── guava-31.1-jre.jar
├── invalid.jar
└── release.nfo

3 directories, 15 files

Notably, we use the above structure as our sample dataset.

4. Simple Crude Approach

Initially, we can simply combine the find, xargs, and grep standard POSIX tools:

$ find /jars -type f -iname '*.jar' | xargs grep --text --files-with-matches '.class'

First, the find command searches in the /jars directory for objects of the [f]ile -type that match the case-insensitive pattern after -iname, which in turn specifies their extension as .jar with any (*) filename before it.

After getting the list of files, we pipe it to xargs. The latter executes a –text grep search for .class in each file, printing its full path in case of a match.

Of course, including a name before .class will search for a particular instead of any class file.

4.1. Enhance grep

To be more flexible, we can even leverage grep regular expressions but with a case-insensitive match:

$ find /jars -type f -iname '*.jar' | xargs grep --text --ignore-case --files-with-matches 'Xml.*\.class'

Now, we only find one file for the given class name pattern.

4.2. Enhance find

Of course, we can use other operands or even find with regular expressions to refine our search:

$ find /jars -size +2M -type f -iname '*.jar' | xargs grep --text --ignore-case --files-with-matches 'Xml.*\.class'

Here, we’re only left with one of the previous matches since this JAR is the only one over 2MB (+2M) in size.

4.3. Problems

While it works, there are many pitfalls to the approach above:

  • using find without precautions can cause issues when file paths contain special characters
  • grep isn’t optimal for binary files, as it ignores structure and formats and relies on regular ASCII
  • we ignore the possibility of invalid JAR files

After understanding some of its drawbacks, let’s build on top of this simple but crude solution.

5. Comprehensive Approach

Since file formats exist to introduce consistency, following their rules is important. Still, not all valid Java JAR files have the .jar extension, and not all .jar files are valid Java JAR files.

5.1. Validate JAR File

In addition to checking its extension, we can also check the validity of the JAR file. Let’s verify two of our files via the jar tool:

$ jar --list --file /jars/commons-io-2.11.0.jar
$ echo $?
$ jar --list --file /jars/invalid.jar
java.util.zip.ZipException: zip END header not found
$ echo $?

Most jar tool operations return an exit status code, which we can check via the special $? variable. The –list flag shows the table of contents, including .class files, returning 0 for success only for valid JAR archives.

Let’s leverage this fact.

5.2. Enumerate JAR Files Safely and With Validity

To validate our archives, we use the -c switch of bash, allowing a single-quoted script to run for each path supplied through xargs.

Also, we employ the -print0 argument of find in combination with its relative –null of xargs. By combining them, we use NULL instead of a new line as the record separator for the whole chain. In this way, we properly process any special characters in the paths.

Finally, –replace ensures we run our script once per argument and that each occurrence of {} after the xarg command gets replaced with that argument, i.e., a JAR file path.

As a result, we see all JAR file paths with [X] prefixing the invalid ones and [V] – as the valid ones:

$ find /jars -type f -iname '*.jar' -print0 | xargs --replace --null bash -c 'jar --list --file {} >/dev/null 2>&1 && echo "[V] {}" || echo "[X] {}"'
[V] /jars/apache-log4j-2.19.0-bin/log4j-api-2.19.0.jar
[V] /jars/apache-log4j-2.19.0-bin/log4j-docker-2.19.0.jar
[V] /jars/apache-log4j-2.19.0-bin/log4j-kubernetes-2.19.0.jar
[V] /jars/apache-log4j-2.19.0-bin/log4j-core-2.19.0.jar
[V] /jars/apache-log4j-2.19.0-bin/log4j-to-slf4j-2.19.0.jar
[V] /jars/apache-log4j-2.19.0-bin/log4j-1.2-api-2.19.0.jar
[X] /jars/invalid.jar
[V] /jars/google-guava/dep/failureaccess-1.0.1.jar
[V] /jars/google-guava/guava-31.1-jre.jar
[V] /jars/commons-io-2.11.0.jar

To see only the output of interest, we use a conditional expression shorthand with && and || while suppressing all output from the jar command.

5.3. Search for Class Files in Each JAR File

Now, we convert the shorthand condition to a regular conditional expression and enhance the output by filtering the file list returned by the jar tool with grep:

$ find /jars -type f -iname '*.jar' -print0 | \
xargs --replace --null bash -c \
'if jar --list --file {} >/dev/null 2>&1; then
echo "[V] Searching in {}:"; jar --list --file {} | grep --ignore-case ".class";
else echo "[X] Bad file {}."; fi; echo "";'
[V] Searching in /jars/apache-log4j-2.19.0-bin/log4j-api-2.19.0.jar:
[X] Bad file /jars/invalid.jar.
[V] Searching in /jars/commons-io-2.11.0.jar:

As before, we can change “.class” and the whole grep command according to our needs.

6. Summary

In this article, we looked at one simple and one complex but precise way to search for class files within JAR files under a given directory.

In conclusion, while we can always use online resources to do so, local searches for a .class can be beneficial in many cases.

Comments are closed on this article!