I Don't Like Getopt

Command line flags are a user interface


tl;dr: don’t use getopt, use a less ambiguous library for your users instead.

This is a short rant, inspired by my exposure to many programs over time, and the ways in which they handle flags. The whole thing rests on the premise that the command line invocation of a program is a user interface, and should have the same affordances that GUI programs do. Things like a lack of ambiguity, clear purpose, and things should follow the principle of least surprise.

So why don’t you like getopt?

It’s ambiguous for users.

Not the users of the library, but for the users of the program. They have to read shell history, scripts, or refactor a system(3) call into exec*(3) call. And they can’t easily determine what the strings being passed to it should be, or what they do.

Well, shouldn’t they go read the docs and find out?

I won’t argue against reading documentation, but is that really the best experience for your users? How much do they have to read to determine what -iEoe does?

What if, collectively, we agreed that -iEoe could only be interpreted by a program in one way, instead of leaving it implementation dependent? Wouldn’t that lead to a better user experience for consumers of command line programs?

Let’s really think for a moment about -iEoe could mean.

It could be equivalent to:

That’s a lot of possible interpretations!

This is because getopt has no opinion about the use of short flags having parameters or not. I disagree with this choice very strongly! Short flags should be reserved for setting boolean flags only!

This would mean that -iEoe is always a set of flags, and the flags can be moved around and not change the meaning. For each of those values, you can also unambigously look them up in the manual page or --help output, and be sure that you’ve found the correct things.

Doesn’t that break -vvv?

That seems like an allowable exception to this policy.

Hey wait, you didn’t talk about ambiguous long flag names and their paramaters! What does --first --second third mean?

In my experience we don’t have a lot of flags with opaque long names, and they’re often named such that we can infer if they might take a parameter or not. And even when they’re a problem it’s not like you can concatenate the names together and have the same sort of confusion you get with single character names.

Wait but what about -e=thing, how do you feel about that?

That’s a great question, instead of answering directly I’m going to lay out the rules I’d expect from a flag parsing library (and have implemented twice).

  1. Disambiguate between short and long flags:
    1. Short flags always start with a single dash -, while long flags always start with a double dash --.
  2. Disambiguate between Boolean vs other value flags:
    1. Boolean flags never take parameters.
    2. To set a boolean flag it is specified on the command line. eg: --thing or using the short form -t. These always set the value to true.
    3. To invert a boolean flag, use the long name prefixed by no-. eg: --thing becomes --no-thing. This always sets the value to false.
    4. No flags can register the no- prefix on their long names, it is reserved.
    5. A group of short flags (-ABC) is always interpreted as a set of boolean flags, equivalent to -A -B -C. This is true for one or more flags.
    6. A short flag that specifies a non-boolean type must always have an = to assign a value, otherwise the command line fails to parse. eg: -A=....
    7. Long flags can either have an attached value, or consume the next parameter. eg: --thing=abc or --thing abc.
  3. Reducing ambiguity with Positional values:
    1. Users are not allowed to intermingle positional arguments and flags. This keeps non-flag items between flags unambiguously parameters. eg: --thing abc -ABC is always equivalent to --thing=abc -ABC.
    2. Programs must opt-in to positional arguments to reduce the ambiguity of dangling paramaters. eg: --thing abc.
    3. After a single -- parameter, all remaining values are positional.

I have some more notes, but these are suggestions around the handling of lists and maps in arguments to ensure that you can nest them and still parse easily. Otherwise you can’t have a list as a map value.

  1. Lists must use the , to separate values, eg: 1,2,3
  2. Maps must use ; to seprate pairs of values, eg: abc=1,2,3;def=4

Does any library do this?

I’ve written two! One in Python, the other in Rust. I’ve made the source code of neither available. I’m not aware of other libraries that follow these rules otherwise, but have seen libraries that come close.

I think the builtin flag package from Go comes reasonable close, they solve many of the above problems by not allowing short flags. I’ve not looked at enough other libraries to be certain if they will by default be this strict about their flag parsing behaviour.


Previously: RSS