1. Overview

Bash scripts are essential for automating system-level tasks in Linux, whereas Python provides advanced libraries for solving complex problems, including data analysis tasks. By calling Python scripts from within Bash, we can perform a wider range of complex tasks and automate workflows efficiently.

In this tutorial, we’ll explore how to call the Python interpreter from a Bash script and how to embed Python code within Bash.

2. Sample Task

Let’s suppose we have a comma-delimited (CSV) data file named db.csv. First, we check the contents of db.csv with cat:

$ cat ./db.csv
ID,NAME,OCCUPATION
1,Ron,Engineer
2,Lin,Engineer
3,Tom,Architect
4,Mat,Engineer
5,Ray,Botanist
6,Val,Architect

Our goal is to add a column to the data consisting of True or False Boolean values depending on the value of the OCCUPATION field in any given row. In particular, a value of True is appended to a row if the occupation entry occurs overall with neither the minimum nor maximum frequency. Otherwise, a value of False is appended to the row.

In this case, the occupation that occurs with the minimum frequency is Botanist, whereas that which occurs with the maximum frequency is Engineer. Therefore, we should append a True value only to rows where the occupation is Architect and a False value to rows where the occupation is Botanist or Engineer.

Let’s explore how we can call Python from a Bash script to handle the data processing of this task.

3. Using python3 -c

For smaller tasks, we can call Python via python3 -c as a one-liner. The -c option is for specifying Python commands within a string.

For example, we can call Python to read and display our CSV as a data frame:

$ python3 -c 'import pandas as pd; df = pd.read_csv("./db.csv"); print(df.to_string(index=False))'
 ID NAME OCCUPATION
  1  Ron   Engineer
  2  Lin   Engineer
  3  Tom  Architect
  4  Mat   Engineer
  5  Ray   Botanist
  6  Val  Architect

Here, we imported the pandas module to read the CSV file via the pd.read_csv() method. Then, to display the data frame without the default index column, we converted it to a string and set the index option to False.

We can also pass arguments to the Python command:

$ path_to_file='./db.csv'
$ python3 -c 'import pandas as pd; import sys; df = pd.read_csv(sys.argv[1]); print(df.to_string(index=False))' "$path_to_file"
 ID NAME OCCUPATION
  1  Ron   Engineer
  2  Lin   Engineer
  3  Tom  Architect
  4  Mat   Engineer
  5  Ray   Botanist
  6  Val  Architect

In this case, we pass the file path as an argument instead of hard-coding it within the command string. To do so, we imported the sys module and used sys.argv[1] to refer to the provided argument with index 1.

4. Using a Standalone Python Script

For larger tasks, such as our sample task, we can use a standalone Python script and call it from within Bash:

$ cat ./count_filter.py
import pandas as pd
import numpy as np
import sys

path_to_file = sys.argv[1]
df = pd.read_csv(path_to_file)
values, counts = np.unique(df['OCCUPATION'], return_counts=True)
x = [values[i] for i in range(len(values)) if counts[i] not in [min(counts), max(counts)]]

df['Selected'] = df['OCCUPATION'].isin(x)
df.to_csv('./result.csv', index=False, header=False)

The Python script, named count_filter.py, performs a number of steps:

  1. import the pandas, numpy, and sys modules
  2. define the file path as the first argument passed to the script
  3. load the CSV file into a data frame variable named df
  4. call np.unique() over the OCCUPATION column to return the occupation values and their counts
  5. use list comprehension to select occupations that occur with neither the minimum nor maximum count
  6. append a column, named Selected, with Boolean values depending on the status of an occupation with respect to the previous step
  7. save the new data frame to a CSV file named result.csv, without the default index column or header

Next, we call the Python interpreter to execute the count_filter.py script while providing the file path as an argument:

$ python3 ./count_filter.py ./db.csv

This saves the result in a new file named result.csv:

$ cat ./result.csv
1,Ron,Engineer,False
2,Lin,Engineer,False
3,Tom,Architect,True
4,Mat,Engineer,False
5,Ray,Botanist,False
6,Val,Architect,True

We see that only the third and sixth rows get True appended since the Architect occupation occurs with neither the minimum nor maximum frequency.

5. Using a Here-Document

Another option is to embed the Python code explicitly within the Bash script using a here-document.

In this case, we have two options:

  • save the Python code to an intermediate script file and then call the Python interpreter
  • directly execute the code, skipping the intermediate file

Let’s explore both approaches.

5.1. Saving to a Script File

Using a here-document, we can simply direct the content to a script file, named count_filter.py:

$ cat sample_task.sh
#!/usr/bin/env bash
cat << EOF > ./count_filter.py
import pandas as pd
import numpy as np
import sys

path_to_file = sys.argv[1]
df = pd.read_csv(path_to_file)
values, counts = np.unique(df['OCCUPATION'], return_counts=True)
x = [values[i] for i in range(len(values)) if counts[i] not in [min(counts), max(counts)]]

df['Selected'] = df['OCCUPATION'].isin(x)
df.to_csv('./result.csv', index=False, header=False)
EOF

python3 ./count_filter.py ./db.csv

We’ve used the EOF marker within the sample_task.sh Bash script to mark the start and end of the here-document. The Python code is the same as that of the standalone script discussed earlier.

Then, in the last line, we call the Python interpreter over the count_filter.py script while passing the path to the CSV file as an argument.

Let’s grant execute permissions to the Bash script via chmod:

$ chmod +x ./sample_task.sh

Finally, let’s run the script:

$ ./sample_task.sh

The result is saved in the result.csv file, overwriting the previous version:

$ cat ./result.csv
1,Ron,Engineer,False
2,Lin,Engineer,False
3,Tom,Architect,True
4,Mat,Engineer,False
5,Ray,Botanist,False
6,Val,Architect,True

We see the same outcome as before.

5.2. Running Python Code Directly

Alternatively, we can skip saving the contents of the here-document to an intermediate file:

$ cat sample_task.sh
#!/usr/bin/env bash
python3 - ./db.csv << EOF
import pandas as pd
import numpy as np
import sys

path_to_file = sys.argv[1]
df = pd.read_csv(path_to_file)
values, counts = np.unique(df['OCCUPATION'], return_counts=True)
x = [values[i] for i in range(len(values)) if counts[i] not in [min(counts), max(counts)]]

df['Selected'] = df['OCCUPATION'].isin(x)
df.to_csv('./result.csv', index=False, header=False)
EOF

The difference here is that we call python3 directly over the contents of the here-document. The Python code is passed to the python3 command via stdin indicated by a hyphen, while the file path to the CSV data is passed as an argument.

Next, we run the Bash script and view the result.csv file that was generated:

$ ./sample_task.sh
$ cat ./result.csv
1,Ron,Engineer,False
2,Lin,Engineer,False
3,Tom,Architect,True
4,Mat,Engineer,False
5,Ray,Botanist,False
6,Val,Architect,True

Of course, we obtain the same result as before.

6. Conclusion

In this article, we explored different methods for calling Python from within a Bash script. In particular, we saw how to use the python3 -c command, as well as how to call a standalone Python script, and how to use a here-document for embedding Python code within a Bash script.

Comments are closed on this article!