Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Common shell script mistakes (2008) (pixelbeat.org)
81 points by gautamsomani on Dec 7, 2021 | hide | past | favorite | 53 comments


Advice at a higher level: don’t try too hard to handle error cases in your scripts.

The best shell scripts are a list of imperative instructions to get something done. If an error occurs, bail as early as possible and with a useful message. Don’t try any harder than that.

For example:

  if ! complex_function
  then
    handle_error
  fi 
…doesn’t work as you’d expect because the “if” context changes the rules of set -e in complex_function. It’s much better to have complex_function crash your script with a helpful error message that the operator can ameliorate.

You may have a clever way of solving this problem, but the cleverer your solution gets, the further you diverge from the core language of shell scripting. You will suffer the lisp/Ruby problem of every script being in its own unique language that’s based on but not identical to the language with which your colleagues are familiar.

One shouldn't dismiss shell scripts completely – the command line is the primary interface to Unix and with more and more idempotent commands showing up every year ("ip route replace") it only gets easier to write simple, imperative lists of commands.


This was one of the annoyances that I aimed to solve with my own shell, murex. It has predictable if blocks but also try blocks.

I used to be an advocate for `set -e` etc but these days I've come to the conclusion that the POSIX standard just needs to be retired for simple shell scripts, even if it is just for personal use. And it doesn't have to be my shell language that replaces it, but it shouldn't be something aiming for Bash/sh compatibility. This is probably going to be an unpopular opinion though but I base it on years of Bash/sh use and then getting fed up with it enough to write me own shell + scripting language.

As for anything that needs to be used and maintained by a team, that's probably around the time you have to question if shell script is even the right solution and whether you need to break out into Python (though Python has it's own issues too so one has to make an informed decision based on your own business requirements).


I agree. You must fail fast.

My best shell advice is to put these commands at the beginning of every script:

    set -e # stop scripts on errors
    set -u # stop script if undefined var
    set -o pipefail # stop script if there's a pipe failure 
Use them to have a saner life.


And you can compact them to

    set -euo pipefail


For most uses, I consider all of these unwanted at best and potential disasters at worst. If you use them, make sure you understand exactly what each of these do. If you just follow a dictum you found on HN blindly and just copy-paste them, you're in for a world of hurt.

Example for set -e: https://mywiki.wooledge.org/BashFAQ/105


My favourite set of pragmas:

  set -o errexit -o noclobber -o nounset -o pipefail
  shopt -s failglob inherit_errexit
These are all easy to find in the Bash manual.


Also the fact that shell IF condition is backwards from other programming languages always bugs me

In C we always write `if (func()) handle_error` for the standard 0=success convention, and reading shell IFs is mentally straining


What a nasty shell. You can't even error check like a man.


OK, not nasty. Terrible. Outdated. And I even like old things, but bash never wakes up any nostalgy in me, it just wakes up a monster.


This may be controversial, but sometimes I feel the biggest mistake is using a shell script in the first place. For example, on servers where you have PHP installed anyway, a PHP CLI script is often an alternative. Say what you will about PHP, but PHP code is much more readable than shell scripts - especially if everyone working on the server is fluent in PHP anyway. Shell scripts do have their niche, but often you find that the scope of a script keeps growing, then soon it becomes cumbersome and makes you wish you would have chosen another alternative from the start...


Once I get a script to a point of defining functions or anything but the simplest if or for loop, I rewrite in a proper language (Perl, Python, even PHP) and bundle it as a deb onto our internal repo.

Too many times I didn’t move from bash until too late.

I still write bash, I’ve got one which runs up a 20 line ffmpeg command with a couple of variables, that’s not a problem. On the other hand I changed the complexity of a file analysis tool from grep/sort/uniq/sed to Perl a couple of weeks ago before it became too large.


Not controversial at all!

Most linux distributions come with python installed. Anything more than invoking a binary and redirecting the output? Just write it in (pure) python I say!

Edit: by pure Python, I mean don’t require any `pip install`s


Python is terrible at juggling files, setting up pipelines and working with processes. The resulting code is more complicated than the equivalent bash, not to mention longer, and hidden gotchas abound (subprocess deadlocks anyone?). I'll take 50 lines of bash instead of 200+ lines of Python.


Welcome to (shameless plug) Next Generation Shell. It's a modern programming language for DevOps. It is both concise for running external programs (uses small subset of bash syntax) and it is a "real" programming language with data structures and sane error handling among other things.

Compared to general purpose programming languages, it was built for "DevOps"y use cases in mind. See https://github.com/ngs-lang/ngs/wiki/Use-Cases

Ability to run external programs is high on the priorities list. In NGS it means having its own short syntax and handling exit codes among other things. NGS throws exceptions on "bad" exit codes. Hint: not for all tools non zero exit code is "bad". Did not see equivalent exit code handling anywhere else.

Compared to bash... It's not fair comparison even. Another era, another reality, other expectations. Small example: When APIs return structured data, well... you better be able to handle it. See https://ilya-sher.org/2018/09/10/jq-is-a-symptom/

Compared to other modern shells, I would say, NGS is programming language first as opposed to shell first.


This has only been true for the smallest shell scripts in my experience: bash will be less code at first but once you to be portable, handle errors, locking, unicode, buffer or process output, perform non-trivial redirections, or need any data structure more complicated than a simple string it's been pretty common to replace hundreds of lines of spaghetti bash with less Python even after you add comments and a proper CLI interface.

I've never had subprocess deadlock — was this on Windows?


> I've never had subprocess deadlock — was this on Windows?

I've seen this happen seemingly randomly in linux trying to pipe the stdout of one subprocess into the stdin of another.


Interesting, I've never had that happen in heavy usage. I've had cases where the called processes blocked for some reason but that was solved by adding a timeout.


How does a subprocess deadlock happen in Python? Why would this be specific to Python, and not caused by deadlock in general?

I agree that 50 lines of bash is generally more maintainable than 200 lines of Python.


I once worked in an R shop where a lot of “system scripts” were written in R instead of shell.

It made sense: everyone could read R, not many could read bash/shell as most had learned R on a windows PC in grad school of statistics.

So while sometimes a bit clumsy and not very portable, these scripts could be read by anyone.


This is really huge. If a script is going to be used by anyone other than yourself, it needs to be readily accessible (e.g. understood) for the group that will be using it.


Agreed! If it’s more than a few lines I switch to something else usually Python these days.

Couple decades ago even wrote some init scripts in Perl because they needed to be complicated.


Why not use Ansible?


Ansible didn't exist 20 years ago.


The question is "why not switch to Ansible instead of Python" rather than "why didn't OP use Ansible 20 years ago instead of Perl".

Seeing that Ansible is basically a set of Python libraries that enable users to write declarative (and hopefully idempotent) scripts, it isn't a stupid question.


hmm, I can take a stab at that. I have extensively used both. Ansible is pretty good for 'get something or set of machines into a particular state'. Python is pretty good for 'run these bits and give a nice error if something goes weird'. The 'error path' is where ansible falls down for me. It can generate some very strange ones. Also its yml syntax can sometimes create odd issues with people who have not used it before. Ansible is also terrible for any sort of data processing. It is very good at iterate over a list and do this action to those items though. I recently did an audit script using ansible. Because I thought hey I have this cool tool to do it. It was a mistake. It probably would have taken 1/3rd the amount of time to do in python. Also if someone (probably me) has to extend that script getting the yml right will be tricky. I also recently used ansible to write a bunch of upgrade scripts. It was a breeze and would have taken much longer to do in python. In my case many times you have to write to the lowest common denominator. So bash it is, with some sort of central ansible setup controller.


I think the YAML syntax is definitely one of the weakest points of Ansible user experience, especially if you're creating complex books/roles. It's extremely long, and when you try to do anything slightly more complex in the YAML syntax itself, e.g. handle an error of a module, it gets very hairy.

As a side note, Bash + Ansible tends not to be an amazing combination, unless you can write idempotent Bash scripts, IMHO. That sounds like a recipe for false safety, where you think you wrote strong automation, and someone that is used to pure Ansible will quickly realize that it does not work in specific conditions, e.g. after second time or after specific conditions are run multiple times in a row.


I agree about the bash+ansible bit. For me it is usually using ansible to push bash scripts to some remote machine. Those remote machines may or may not have python or ansible. So ... bash for those bits :(


Shell script is a kind of DSL for the OS operations and PHP a kind of DSL for the web programming. Similarly, it does not make sense to advocate the usage of R for web programming but people do it anyway.

Shell script have their deficiencies but it's the best for their intended environment. Alternatively now there's Oil shell that looks promising for modern approach to shell programming [1]. Expect to see exponential usages of shell programming now that we have new popular OS related technology including containers, isolates, and functional package management like Nix and Guix.

[1]https://www.oilshell.org/


Use whatever tool works best for you. I use node.js for "shell scripting" on my own systems because that's what I use anyway for other stuff.


The advantage I find with Python is that it's commonly part of the Linux systems I come across (e.g. Debian-based and RHEL-based Linux systems). I'd have to install the Node.js runtime. So despite preferring Node, I find myself writing more and more Python.


This advice is largely automated by https://github.com/koalaman/shellcheck


It felt nice when i discovered shellcheck. I made the mistake of trying to get my scripts 100% compliant... and was then astounded to find i broke several of them.

If you don't know what you are doing, and shellcheck knows more, it might be useful. But if you know what you are doing, shellcheck becomes annoying quickly.

Example: echo "$(command)" is marked as SC2005, useless echo. But command does not always print a newline, so "fixing" this would garble your output under some conditions.


> Example: echo "$(command)" is marked as SC2005, useless echo. But command does not always print a newline, so "fixing" this would garble your output under some conditions.

It should still be `printf '%s\n' "$(command)"` for portability unless the intention is for the output of your command to possible be interpreted as a flag for echo.


Amazing example of why robust shell scripts are hard.


One of my pet peeves with command line tools is where they take a list of values but also support flags. You then have no end of fun working with hyphen-prefixed file names and no end of hard to find bugs too (at least most shell script writers are aware that spaces in file names are a pain but these kinds of bugs are a lot more dangerous and easier to forget).

I can forgive a lot of design flaws because the tech was new and people didn't know any better but I always struggle to apply that same reasoning with the design of having commands that support both free text and flags that change that command's operation. I'm surprised nobody at the time paused and said "this might be a problem" when they were writing their tools and only using the hyphen to distinguish between a flag or data. Maybe I'm being a little unfair though -- it is hard to put yourself in their shoes and unlearn 50 years of tech.

My biggest peeve about it though is that I can write a shell that doesn't apply to POSIX and which fixes a lot of shortcomings of shell scripting. But I cannot fix this specific class of bugs without rewriting the entirety of coreutils.


Do you know about "--" ?


I do but it's not a good enough solution:

- it is a GNUism so not available on all UNIX-like systems

- it is something you pro-actively have to remember to put in

- you cannot simply add `--` to every command because those that do not support it would fail (or worse, not fail in unexpected ways)

In an ideal world operators should be out of band from data (murex does this with `config` https://murex.rocks/docs/commands/config.html whereas other commands might solve this with environmental variables. Neither are as convenient as flags though).

In a less ideal world data should be encapsulated inside operators (see below), but this breaks globbing.

  command --flag1 --flag2 --data=file1 --data=file2 --data=file3
In my perfect world arguments would be typed so you could denote whether a parameter was an operator or data. However this doesn't translate well with the terse input of strings (as one does on the command line).

Maybe if `--` was a standard from the beginning to act as a separator between operators and data (and enforced too, ie commands would fail if `--` wasn't present) then I'd be more forgiving for `--` as a solution.


> It should still be `printf '%s\n' "$(command)"` for portability

Damn that’s ugly though. Not much better than Perl in terms of characters soup. And Perl has a massive advantage with its deep integration of regexes.


Perl's character soup is because it's an evolution of sed and awk so Perl adopted many of the UNIX shell idioms.


You got me there! There are indeed values that start with a hyphen, somehow it didn't break previously.


Oh, yeah. I've broken multiple historic scripts in obscure ways "fixing" them under shellcheck's direction.

It's a general lesson for linters: if it points out code where the results may be surprising, then be aware that the "surprising" result may be necessary for the code's function.


Shellcheck is still incredibly handy to re-check your work for simple mistakes, but man do I ever end up using a lot of shellcheck ignore statements.


Very true, and I experienced something similar just yesterday.

I've begun piping such untrustworthy commands to xargs -n1 to guarantee good behavior.


Take note that this splits words into separate lines.


Bare “xargs” should work here, I would think?


As well, having a bash language server enabled (https://github.com/bash-lsp/bash-language-server) helps immensely.


I'm a big fan of shellcheck, but use it for your new scripts. Don't try to change working code that depends on shell idiosyncrasies.


The minute I need anything other than 4-5 commands executed sequentially, I switch to python. It's not worth dealing with the hassle of string substitution, error handling, list handling, filtering, etc in bash.


Nice tips, although doesn't look comprehensive. It's certainly hard learning all the gotchas of shell scripting.

In Deployment from Scratch I teach minimum amount of Bash to get servers up and running. I avoid teaching anything I don't have to. For example, most people are fine with just "set -euo pipefail" and understanding simple functions and pipes.

If your script is small enough, you will be fine. If your script is getting big, it might be a time to switch to smth else.


Comparing shamelessly to my own Next Generation Shell, puting aside the completely insane syntax matters that nobody today would do or accept (such as $x expanding to zero or more arguments).

cleaning up temp files

Use TmpFile. It is automatically removed when the script exits.

stopping automatically on error

As in many other modern languages, NGS has exceptions and prints stack trace when they are not handled. Nothing special is required from the programmer.

echoing errors

Use built in error("my message") function. Alternatively, use exit("my message") to print the error and exit, the exit code defaults to 1 but can provided. Tired of seeing bash scripts starting with definition of warn(), debug(), error(), etc functions. These are part of standard library in NGS.

You are welcome to check out the Next Generation Shell project.

https://github.com/ngs-lang/ngs


A very common one I've hit once or twice is people assuming bash is the system shell (which is common with Linux, but far from ubiquitous as Debian for example uses dash) and using bashisms while leaving the hash-bang as #!/bin/sh.

I always go with #!/bin/bash as I do often use bashisms, and only #!/bin/sh when I know I've been careful (because at the start I know I'm writing something intended to be as portable as practical, or I've gone through with a fine-tooth comb to test compatibility at a later time).


Or its superior(?) friend "#!/usr/bin/env bash" which will select the bash that the user has on their PATH, which is especially considerate to Homebrew/Linuxbrew folks who have a modern version in $HOMEBREW_PREFIX/bin

I believe there are some esoteric systems that relocate "env" as "/bin/env" or such, but the good ole "90/10" rule applies here


I used to use `#!/usr/bin/env bash` for this reason (Linux / MacOS + Homebrew user) but have been bitten a few times by bash version differences when I write a script that is later called by a system utility like launchd (which doesn't inherit my PATH) that goes with /bin/bash instead.

Because the backwards compatibility tends to be good, if I'm writing a script that will be run non-interactively, I will usually write a fixed `#!/bin/bash` shebang just so I can be sure that it run with the expected bash version on Linux or Mac.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: