1. Overview

Extracting the base filename from a file path or a URL is a common task in Linux shell programs.

In this quick tutorial, we’ll explore two methods to find the base filename of a URL in Linux.

2. Parameter Expansion

In Linux shell programming, a parameter is an entity that stores values. It may be referenced by name, number, or one of several special characters.

Meanwhile, a variable is a value that’s referenced by name. For example, we can use a variable to store the name of a file:

$ FILENAME="filename.txt"

Parameter expansion is the substitution of a reference with its value. To expand a variable, for instance, we use the $ prefix:

$ echo ${FILENAME}
filename.txt

The braces around the variable name are optional in this simple case. However, they allow us to take advantage of other operators to solve more complex problems.

For instance, the # operator removes a substring matching a given pattern from the front of a variable value:

$ VAR1="apple pie"
$ echo ${VAR1#*p}
ple pie

A single # operator removes the shortest prefix matching the given pattern, which is *p in our case. Here, the asterisk is a wildcard to indicate zero or more characters before the p. In this case, the # operator removes the first prefix substring ending in the letter p, which is ap.

By contrast, a double ## operator removes the longest prefix matching the given pattern, or the longest substring ending in the letter p in this case:

$ VAR1="apple pie"
$ echo ${VAR1##*p}
ie

Likewise, the % and %% operators remove suffixes matching a given pattern. For instance, we can use the %% operator to strip the largest suffix that begins with a p from our example:

$ VAR1="apple pie" 
$ echo ${VAR1%%p*}
a

These operators can be useful when it comes to URL data extraction.

3. Parameter Expansion With URLs

We can now use variable expansion operators to find the base filename from a given URL.

First, it’s helpful to understand that a URL is a type of URI. URLs are composed of several parts:

scheme:[//authority][/path][?query][#fragment]

For URLs, scheme is the name of the access protocol. Examples include http or https (and there are many others).

The authority element often consists of a hostname or IP address (and optional port). The path specifies a resource in the scope of its scheme and authority.

The query and fragment suffixes are optional. If they are present, though, they must be ordered as we see above for a URL to be well-formed.

Now, we can use the ## operator with a forward slash (/) pattern to find the base filename from a URL:

$ URL="http://example.com/dir/file.html"
$ echo ${URL##*/}
file.html

On the other hand, we can use the %% operator to remove the suffix from a more complex URL that contains a query, a fragment, or both:

$ URL="http://example.com/dir/file.html?par1=value#frag"
$ echo ${URL%%[?#]*}
http://example.com/dir/file.html

In this example, the %% operator searches for the ? or # characters in the globbing pattern [?#]. It then removes the largest matching substring.

Now, we can use ## and %% to construct a solution that finds the base filename in any well-formed URL:

$ URL="http://example.com/dir/file.html?par1=value#frag"
$ fileAndSuffix=${URL##*/}
$ echo ${fileAndSuffix%%[?#]*}
file.html

The fileAndSuffix variable holds the original URL, but with the prefix removed. The parameter expansion in the echo command then removes the query and fragment suffixes.

Parameter expansion is supported by all common shells. GNU.org maintains a complete list of parameter expansion modifiers.

4. The basename Command

Another option for finding the base filename in a simple URL is the basename command, which is part of the GNU Coreutils library.

$ URL="http://example.com/dir/file.html" 
$ basename $URL
file.html

While basename strips the prefix from a URL, it doesn’t remove suffixes.

In other words, it won’t work for our more complex URL:

$ URL="http://example.com/path/to/page.html?par1=value&par2=value#frag1"
$ basename $URL
page.html?par1=value&par2=value#frag1

Most (but not all) Linux distributions use Coreutils and therefore offer basename as an available command

5. Conclusion

In this article, we explored two methods to extract the base filename from a URL in a Linux shell.

First, we learned ways to use parameter expansion to trim prefixes and suffixes from a URL. Then, we saw how the basename command can achieve the same goal for simple cases. Both are common Linux tools.

While basename works well for simple URLs, it fails for more complex cases. Parameter expansion, on the other hand, is a powerful tool for solving problems beyond just filename extraction.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.