I feel like Bash is one of those tools that people only ever learn just enough to be able to get some small task done and then pretend that it doesn’t exist. Unfortunately this means people often overlook really useful Bashisms that would make their lives easier and reduce some of the problems that they encounter when a script doesn’t work quite the way they think it should.

A quick call-out on Bash and the use of it. Some people would argue that we should just avoid the use of Bash and use something else like Python or whatever other language instead. Sure, that’s a possible solution but let’s be real for a second.

Bash is available on just about every machine (except Windows unless you use WSL). You don’t need to add an step to install another interpreter/runtime which saves time, resources, and money.

It’s also incredibly stable. There hasn’t been any big changes to Bash in ages which means you don’t need to worry about some piece of syntax missing or having to update Bash before you can proceed. It just works.

Bash is the standard glue of Unix, you’re just going to have to deal with that.

I ran into a recent example of this at work when someone was adding a new step to our CI/CD pipeline. They made a change, the build was green, and the output was where it was supposed to be after their new step ran so it was mission accomplished!

Except when I had to check on some of the output that the new step had to get a URL to the output and found that the log line was missing and instead there was an error about an invalid flag being given to a command. Even though the command returned an error the whole step was still green because Bash was swallowing the error. Why was it doing that?

Let’s look at a simple example of what was happening.

version="$(fetch-manifest --envronment=staging target | jq '.version')"

fetch-manifest is some internal command that can be used to fetch deployment information such as the currently released version. You would give it some target resource name and the deployment environment. The data that is returned is in JSON format.

That information is then being piped into the jq utility which will extract out just the version from the manifest.

All of that information is then written to the version variable.

On the surface this seem simple enough but there’s a couple problems.

  1. The environment flag is mispelled and I would be willing to bet that gets overlooked most of the time
  2. We pipe that manifest data directly into jq using the pipe operator without concern for the status code

In combination this would mean that version is set to be an empty string and execution continues as if nothing went wrong. However you would probably expect that to have failed and an error code to have been returned? What gives?

Unfortunately Bash is a very forgiving language in the worst of ways and fixing that would be a breaking change that would likely have massive and very negative repercussions that no one wants. Thankfully there’s a way to opt into better behavior that deals with the majority of these issues.

Bash options

Bash has a set of options that you can toggle to change how it’s interpreter runs. The full list of them can be found here. In my experience though, there’s only a few that are actually important and should absolutely be used as often as possible.

A quick note on specifying these options in your Bash scripts, please in the name of all that is good, use the long form of their names!

Instead of doing set -e use set -o errexit instead. That’s the name it’s mother gave it and it’s significantly easier for other people to read and understand what the script is doing.

I’ve too often seen people specify the wrong thing and get surprised when Bash didn’t save them. Do yourself and everyone who reads your script a favor and spell things out the long way. I promise it won’t hurt.

These options are set with some pretty simple syntax.

set -o errexit

# Or multiple options at once

set -o errexit -o nounset

If you want to disable an option you can use a similar syntax.

set +o errexit

errexit

errexit tells Bash to stop evaluation as soon as some command returns a non-zero exit code.

Normally you want to know as soon as something goes wrong and not continue execution in an unknown state. For example, if a variable hasn’t been properly set because the thing that was supposed to set it failed that might make the rest of the script do weird things.

The only case that I can recall having to disable this option was when I was calling some other command that might fail and I wanted to capture the stdout and stderr output as well as the exit code. I then quickly enable this option again as soon as I have the information I need. It’s a very niche situation.

nounset

nounset makes Bash return an error when you attempt to use a variable that hasn’t been initialized.

By default Bash will give any variable you attempt to access that hasn’t been initilized an empty string as its value. In some ways this can be helpful. Such as if you’re allowing users to modify the behavior of your script with environment variables. However, it also means that you might mispell a variable or use one that hasn’t actually been setup properly yet and you won’t know until this fail in some unexpected way.

In my experience it’s better to set this and then work around it in the relatively few cases where you need to access a potentially unset external environment variable.

For example, you could do something like:

some_external="${SOMETHING_EXTERNAL:-}"

# Or if you need a default

some_external="${SOMETHING_EXTERNAL:-default}"

It’s not a pretty syntax but it’s relatively short and allows you to safely check if something exists and give it some known value if not.

You could also just hold off setting nounset until after you try to check for something that might not be set but keep that unsafe region as small as possible.

set +o nounset

some_external="${SOMETHING_EXTERNAL}"

set -o nounset

pipefail

pipefail is the solution to the original example I gave in this post. It tells Bash to stop a pipeline as soon as a command fails vs continuing execution. In my experience this is the more expected behavior and likely what you want since it gives you a chance to properly handle any errors and set default values if needed.

I have yet to encounter a situation where I wanted pipeline execution to continue after a failure. I guess if I knew the probable output from whatever command I’m running when it errors out I could rely on that but it seems pretty risky to me.

xtrace

Finally we are at xtrace which makes execution of your Bash script significantly more verbose. With this Bash will print out every line that it is evaluating before executing it. This includes printing each individual step in a pipeline as well as conditional check in branches and loops.

This can be pretty helpful when you want a debug mode on your script or you’re trying to debug some issue. Most often though I use this in Dockerfile’s to make the build steps more obvious.

RUN set -o errexit -o xtrace -o nounset -o pipefail \
    && git clone [email protected]:source.git \
    && cd source \
    && make \
    && make install

Now if something goes wrong (maybe some dependency is missing in your image that makes compilation fail) the user knows exactly which lines were run and what the output was for it. Otherwise the user just sees the output from the command but has no inclination on what was actually happening.

TL;DR

Always start your Bash scripts with something like this to make your life easier:

#!/usr/bin/env bash

set -o errexit -o nounset -o pipefail

Or if you’re feeling a bit extra (or just need extra verbosity):

#!/usr/bin/env bash

set -o errexit -o nounset -o pipefail -o xtrace

It’ll make your Bash scripts more reliable and reduce the weird behavior that Bash has had to maintain over the years.