How to Operate on Files Listed in a File

1. Overview

In Linux, there may be situations where we have a list of files stored in a separate file, and we need to perform operations on each file listed. This can be achieved by utilizing various command-line tools and techniques.

In this tutorial, we’ll explore different approaches to operating on files listed in a file in Linux.

2. Introduction to the Problem

To better address each approach, let’s first prepare a simple directory structure and an input file:

$ tree /tmp/test
/tmp/test
├── dir1
│   └── file1.txt
├── dir2
│   └── file2.txt
└── dir3
    └── file 3.txt

4 directories, 3 files

As the tree output shows, we have three subdirectories under /tmp/test. Also, each subdirectory has a text file. Finally, it’s worth mentioning that there is a space in the filename “dir3/file 3.txt“.

Next, let’s create an input file containing these three files:

$ cat input.txt 
/tmp/test/dir1/file1.txt
/tmp/test/dir2/file2.txt
/tmp/test/dir3/file 3.txt

Then, we can apply various operations on the files listed in the input file, such as move, copy, rename, delete, and so on. For simplicity, let’s say our goal in this tutorial is to make a backup copy for each file so that we have this file structure tree:

$ tree /tmp/test
/tmp/test
├── dir1
│   ├── file1.txt
│   └── file1.txt.bak
├── dir2
│   ├── file2.txt
│   └── file2.txt.bak
├── dir3
│   ├── file 3.txt
│   └── file 3.txt.bak
└── input.txt

4 directories, 7 files

If we understood how to perform copy on those files, replacing the copy operation with others won’t be a challenge for us.

So next, let’s see how it’s done.

3. Using a for Loop

Let’s look at a straightforward approach: Looping through the files listed in the input file and performing the operation after we read each file.

First, let’s write a for loop to do the job:

for file in $(cat /tmp/test/input.txt); do cp "$file" "${file}.bak" && echo "backup $file done" ; done

The one-liner looks pretty simple. Next, let’s execute it:

$ for file in $(cat /tmp/test/input.txt); do cp "$file" "${file}.bak" && echo "backup $file done" ; done        
backup /tmp/test/dir1/file1.txt done
backup /tmp/test/dir2/file2.txt done
cp: cannot stat '/tmp/test/dir3/file': No such file or directory
cp: cannot stat '3.txt': No such file or directory

Oops, the command failed. This is because for file in cat(…); uses the default Internal Field Separator (IFS), which is a space, a tab, and a newline, to separate fields:

$ printf "The default IFS: %q\n" "$IFS"
The default IFS: \ $'\t'$'\n'$'\0'

In other words, for treats “foo bar” as two fields, “foo” and “bar”.

So, we need to set the IFS variable only containing the newline character to tell for to read an entire line as a field.

IFS is a pretty important variable, so we want to restore its default value after executing our for loop. So, usually, we can take the “backup and restore” approach: OIFS=”$IFS”; IFS=$’\n’; for ….. done; IFS=”$OIFS”

Here, we’ll see another trick:

$ (IFS=$'\n'; for file in $(cat /tmp/test/input.txt); do cp "$file" "${file}.bak" && echo "backup $file done" ; done)
backup /tmp/test/dir1/file1.txt done
backup /tmp/test/dir2/file2.txt done
backup /tmp/test/dir3/file 3.txt done

As we can see, the entire command sits in parentheses. This means the command will be executed in a subshell so that the IFS variable in the current shell won’t get inferred. Let’s see it with an example:

$ printf "The default IFS: %q\n" "$IFS"
The default IFS: \ $'\t'$'\n'$'\0'

$ (IFS=$'\n'; for file in $(cat /tmp/test/input.txt); do printf "IFS in the for loop:%q\n" "$IFS" ; done) 
IFS in the for loop:$'\n'
IFS in the for loop:$'\n'
IFS in the for loop:$'\n'

$ printf "The default IFS: %q\n" "$IFS"
The default IFS: \ $'\t'$'\n'$'\0'

The test above shows that the IFS’s value in the current shell isn’t changed before and after the (IFS=… for file in …) command execution, although within the for loop, the IFS’s value is ‘\n‘.

Next, let’s check the directory tree:

$ tree /tmp/test
/tmp/test
├── dir1
│   ├── file1.txt
│   └── file1.txt.bak
├── dir2
│   ├── file2.txt
│   └── file2.txt.bak
├── dir3
│   ├── file 3.txt
│   └── file 3.txt.bak
└── input.txt

4 directories, 7 files

As we can see, the for loop does the job.

Now, let’s remove the backup files and move to other approaches.

4. Using a while Loop

Alternatively, we can use a while loop to solve the problem. We still need to handle the space in the filename:

$ while IFS= read -r file; do cp "$file" "${file}.bak" && echo "backup $file done"; done < /tmp/test/input.txt
backup /tmp/test/dir1/file1.txt done
backup /tmp/test/dir2/file2.txt done
backup /tmp/test/dir3/file 3.txt done

As we can see, we’ve used the read command to read data from the input. Also, we set IFS to an empty value to make sure the read command reads each line as it is, without any unintentional modifications. Finally, the -r option is used to treat the input as raw data, preventing the interpretation of escape sequences.

Now, let’s verify whether the files are backed up as expected:

$ tree /tmp/test
/tmp/test
├── dir1
│   ├── file1.txt
│   └── file1.txt.bak
├── dir2
│   ├── file2.txt
│   └── file2.txt.bak
├── dir3
│   ├── file 3.txt
│   └── file 3.txt.bak
└── input.txt

4 directories, 7 files

Some of us may have noticed that we didn’t start the while loop in a subshell, as we’ve done with the for approach. This is because we put IFS= in while IFS= read -r file to temporarily set the IFS variable only for the read command duration without polluting the shell’s default value. Once the read command completes, the original value of IFS is restored:

$ printf "The default IFS: %q\n" "$IFS"
The default IFS: \ $'\t'$'\n'$'\0'

$ while IFS= read -r file; do echo $file; done < /tmp/test/input.txt
/tmp/test/dir1/file1.txt
/tmp/test/dir2/file2.txt
/tmp/test/dir3/file 3.txt

$ printf "The default IFS: %q\n" "$IFS"
The default IFS: \ $'\t'$'\n'$'\0'

So far, we’ve seen two loop-based solutions to solve the problem. Apart from the file operation itself, we still need to handle the spaces in the filename. Also, we must prevent the IFS variable in the current shell from being inferred.

Next, let’s remove the backup files and look at a simpler approach.

5. Using the xargs Command

The xargs command is a powerful command-line utility that can efficiently operate on files listed in a file. Let’s look at how xargs solves our backup problem:

$ xargs -a /tmp/test/input.txt -I {} cp {} {}.bak

$ tree /tmp/test
/tmp/test
├── dir1
│   ├── file1.txt
│   └── file1.txt.bak
├── dir2
│   ├── file2.txt
│   └── file2.txt.bak
├── dir3
│   ├── file 3.txt
│   └── file 3.txt.bak
└── input.txt

4 directories, 7 files

As the example above shows, the xargs command does the job efficiently. Let’s pass through the command’s components quickly to understand how it works:

xargs – Call the xargs command
-a inputFile – The -a option tells xargs to read input from a file instead of stdin.
-I {} – The -I option specifies a placeholder {} to represent each file listed. It allows us to insert the filename to the file operation command later.
cp {} {}.bak – The file operation command. The placeholder {} will be replaced with a filename in the list.

We also observe that we didn’t require specific handling for spaces in the filenames, such as setting the IFS variable. The xargs command handled them correctly without any issues. This is because -I takes the newline character as the separator. That is to say, spaces won’t terminate input items.

6. Conclusion

In this article, we’ve learned to use various techniques to operate on files listed in a file.

Looping through the file list and applying the operation on each file is a straightforward and fundamental approach. First, however, we must consider cases where some file paths may contain spaces. Also, after modifying the IFS variable, we should restore its original value after processing the files.

The xargs is also a good choice to solve this problem. Its -I option with a placeholder can correctly handle spaces in the filenames.

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung