1. Overview

In this article, we’ll cover the implementation of arrays/lists in the Linux shell along with their quirks.

2. Bash vs. POSIX Shell Arrays

Compared to a basic POSIX-compliant shell, bash arrays are much more powerful and convenient to use. Let’s illustrate this by trying to create an indexed array and access its 3rd member.

In bash, we declare arrays with the (…) syntax:

$ array=(item1 item2 item3)
$ echo "${array[2]}"
item3

In POSIX shell, we declare arrays with set:

$ set -- item1 item2 item3
$ echo "$3" # Arrays indices start from 1
item3

We can see that bash has a much cleaner syntax for arrays, making it easier to use for more complex operations.

3. Creating Arrays

We don’t have first-class support for arrays in POSIX shell. However, we can use the list of positional parameters as an array.

Positional parameters are all the parameters passed to a shell script/function. For example, in my_function 1 2 3, the numbers following my_function are the positional parameters.

We can modify arrays using the set built-in and access them via the $@ variable, which represents all the positional parameters, i.e., our array:

$ set -- 1 2 3
$ echo $@
1 2 3

We mark the end of options with the double-dash.

4. Basic Array Operations

Now, let’s look at the various array operations we can perform.

4.1. Adding Items

For adding items, we simply pass the original array to set, along with the new items:

$ set -- 1 2 3
$ echo "$@" # Original array
1 2 3
$ set -- "$@" 4
$ echo "$@" # New array
1 2 3 4

Here “$@” expands to all of our original positional parameters, i.e., items of the array.

4.2. Removing Items

We can easily remove multiple items from the start of an array, but arbitrary removal is tricky.

For removing elements, we use the shift built-in, passing it the number of items to remove:

$ set -- 1 2 3
$ shift 2 # Remove first 2 items
$ echo "$@"
3

There is no direct way of removing items at a given index, so let’s write a function for it:

# Argument 1: The index to remove
# Argument 2: The array
# Usage: set -- "$(array_remove N "$@")"
array_remove() {
    index="$
    shift # Remove the index from argument list

    counter=1 # Array indexing starts from 1

    # Print elements upto the index, "-lt" means less than.
    while [ "$counter" -lt "$index" ]; do
        : $((counter+=1)) # Increment counter
        echo "$1" # First item of current array
        shift # Move to the next item
    done

    # Skip the element at the removal index, we've printed everything before it.
    shift

    # Print the rest of the array.
    echo "$@"
}

Let’s test it:

$ set -- 1 2 3 4 5
$ set -- "$(array_remove 4 "$@")" # Remove at 4th index
$ echo "$@"
1 2 3 5

Note that this method is inefficient since we traverse the whole array up to the given index.

4.3. Indexing

We can index arrays with the variable ${N} where N is the required index:

$ set -- 3 2 1
$ echo "${3}"
1

We must wrap the number in curly braces to allow indexes greater than one digit long. For example, the shell might evaluate “$98” as the string “8” appended to the value of the “$9” variable. “${99}” prevents this behavior.

However, we need to resort to eval if we are storing the index in an environment variable:

$ set -- one two three
$ index=3
$ eval "echo \${${index}}"
three

To avoid eval, we can create a function to take in the array, skip N elements, and then print the first element:

# Argument 1: The index
# Argument 2: The array
array_index() {
    shift "$1" # Shift N number of elements, including the first argument

    # Return non-zero if index is out of bounds ($1 will be empty)
    echo "${1:?Index out of bounds}" # Print the first item after shifting
}

Let’s run it:

$ set -- 0 1 2 3 4 5 6 7 8 9 10
$ array_index 12 "$@"
/bin/sh: 1: Index out of bounds
$ array_index 11 "$@"
10

4.4. Iteration

We use for to iterate over an array:

$ set -- 1 2 3
$ for item in "$@"; do echo "$item"; done
1
2
3

The in “$@” is optional here. We can iterate over positional parameters with for item; do …; done without the in “$@” as well.

4.5. Generating Arrays From Commands

We can also pass subshell commands as an argument to set to generate arrays. Say we want an array of 100 integers:

$ set -- $(seq 100)
$ echo "$@"
1 2 3 4 5 6 7 8 9 10 11 12 ...

The seq command generates numbers in a given range.

5. Associative Arrays / Hash Maps

If we’re at the point of needing hash maps in the shell, we should consider using more powerful languages such as Python.

While it is still possible to implement them using files and interact with them via functions, more complex operations like nested keys can’t be implemented cleanly.

Additionally, fetching or creating new keys will also have much more latency as the system needs to create new file descriptors each time for reading the data.

5.1. Implementation

For the implementation, we just use file names as hashed keys and their content as values.

We also take the checksum of the key instead of just using the key string as the filename. This allows us to not only bypass the filename length limit but also avoid extra slashes in the name. For example, creating a file with the name “filewith/slash” would be invalid since the slash separates directories.

The hash table directory itself is created with mktemp:

hm_create() {
    # Create a temporary directory and return it's name
    mktemp -d
}

# Lazy hash function that just generates a checksum.
# Feel free to replace this with a more secure checksum like sha256.
hm_hash() {
    echo "$1" | md5sum -
}

# Argument 1: Hash Table
# Argument 2: Key
# Argument 3: Value
hm_put() {
    echo "$3" > "$1/$(hm_hash "$2")"
}

# Argument 1: Hash Table
# Argument 2: Key
hm_delete() {
    rm -f "$1/$(hm_hash "$2")"
}

# Argument 1: Hash Table
# Argument 2: Key
hm_get() {
    cat "$1/$(hm_hash "$2")"
}

5.2. Usage

Let’s create a hash table with a few keys and print them:

$ hm="$(hm_create)"
$ echo "Created hashmap "$hm""
Created hashmap /tmp/tmp.K6Kuuv
$ hm_put "$hm" mykey myvalue
$ hm_put "$hm" hash table
$ hm_get "$hm" hash
table
$ hm_get "$hm" mykey
myvalue
$ hm_delete "$hm" hash
$ hm_get "$hm" hash # Deleted key "hash" doesn't exist, will raise an error.
cat: can't open '/tmp/tmp.K6Kuuv/4e76434eea3c9d9cf9cb10bbf3f4a74b  -': No such file or directory

Then, we can delete the whole hash table with a simple rm -rf on the $hm directory.

6. Conclusion

In this article, we learned about the differences between bash and POSIX arrays, along with their usage. We can also conclude from the implementations that for more complex data structures like the hash map, it is better to go for powerful, higher-level scripting languages such as Python for ease of use and robustness.

Comments are closed on this article!