- Option and Argument Conventions
- Basic Command-Line Processing
- Option Parsing: getopt() and getopt_long()
- The Environment
- Summary
- Exercises
2.3 Option Parsing: getopt() and getopt_long()
Circa 1980, for System III, the Unix Support Group within AT&T noted that each Unix program used ad hoc techniques for parsing arguments. To make things easier for users and developers, they developed most of the conventions we listed earlier. (The statement in the System III intro(1) manpage is considerably less formal than what's in the POSIX standard, though.)
The Unix Support Group also developed the getopt() function, along with several external variables, to make it easy to write code that follows the standard conventions. The GNU getopt_long() function supplies a compatible version of getopt(), as well as making it easy to parse long options of the form described earlier.
2.3.1 Single-Letter Options
The getopt() function is declared as follows:
#include <unistd.h> POSIX int getopt(int argc, char *const argv[], const char *optstring); extern char *optarg; extern int optind, opterr, optopt;
The arguments argc and argv are normally passed straight from those of main(). optstring is a string of option letters. If any letter in the string is followed by a colon, then that option is expected to have an argument.
To use getopt(), call it repeatedly from a while loop until it returns -1. Each time that it finds a valid option letter, it returns that letter. If the option takes an argument, optarg is set to point to it. Consider a program that accepts a -a option that doesn't take an argument and a -b argument that does:
int oc; /* option character */ char *b_opt_arg; while ((oc = getopt(argc, argv, "ab:")) != -1) { switch (oc) { case 'a': /* handle -a, set a flag, whatever */ break; case 'b': /* handle -b, get arg value from optarg */ b_opt_arg = optarg; break; case ':': ... /* error handling, see text */ case '?': default: ... /* error handling, see text */ } }
As it works, getopt() sets several variables that control error handling.
char *optarg
-
The argument for an option, if the option accepts one.
int optind
-
The current index in argv. When the while loop has finished, remaining operands are found in argv[optind] through argv[argc-1]. (Remember that 'argv[argc] == NULL'.)
int opterr
-
When this variable is nonzero (which it is by default), getopt() prints its own error messages for invalid options and for missing option arguments.
int optopt
-
When an invalid option character is found, getopt() returns either a '?' or a ':' (see below), and optopt contains the invalid character that was found.
People being human, it is inevitable that programs will be invoked incorrectly, either with an invalid option or with a missing option argument. In the normal case, getopt() prints its own messages for these cases and returns the '?' character. However, you can change its behavior in two ways.
First, by setting opterr to 0 before invoking getopt(), you can force getopt() to remain silent when it finds a problem.
Second, if the first character in the optstring argument is a colon, then getopt() is silent and it returns a different character depending upon the error, as follows:
Invalid option
-
getopt() returns a '?' and optopt contains the invalid option character. (This is the normal behavior.)
Missing option argument
-
getopt() returns a ':'. If the first character of optstring is not a colon, then getopt() returns a '?', making this case indistinguishable from the invalid option case.
Thus, making the first character of optstring a colon is a good idea since it allows you to distinguish between "invalid option" and "missing option argument." The cost is that using the colon also silences getopt(), forcing you to supply your own error messages. Here is the previous example, this time with error message handling:
int oc; /* option character */ char *b_opt_arg; while ((oc = getopt(argc, argv, ":ab:")) != -1) { switch (oc) { case 'a': /* handle -a, set a flag, whatever */ break; case 'b': /* handle -b, get arg value from optarg */ b_opt_arg = optarg; break; case ':': /* missing option argument */ fprintf(stderr, "%s: option '-%c' requires an argument\n", argv[0], optopt); break; case '?': default: /* invalid option */ fprintf(stderr, "%s: option '-%c' is invalid: ignored\n", argv[0], optopt); break; } }
A word about flag or option variable-naming conventions: Much Unix code uses names of the form xflg for any given option letter x (for example, nflg in the V7 echo; xflag is also common). This may be great for the program's author, who happens to know what the x option does without having to check the documentation. But it's unkind to someone else trying to read the code who doesn't know the meaning of all the option letters by heart. It is much better to use names that convey the option's meaning, such as no_newline for echo's -n option.
2.3.2 GNU getopt() and Option Ordering
The standard getopt() function stops looking for options as soon as it encounters a command-line argument that doesn't start with a '-'. GNU getopt() is different: It scans the entire command line looking for options. As it goes along, it permutes (rearranges) the elements of argv, so that when it's done, all the options have been moved to the front and code that proceeds to examine argv[optind] through argv[argc-1] works correctly. In all cases, the special argument '--' terminates option scanning.
You can change the default behavior by using a special first character in optstring, as follows:
optstring[0] == '+'
-
GNU getopt() behaves like standard getopt(); it returns options in the order in which they are found, stopping at the first nonoption argument. This will also be true if POSIXLY_CORRECT exists in the environment.
optstring[0] == '-'
-
GNU getopt() returns every command-line argument, whether or not it represents an argument. In this case, for each such argument, the function returns the integer 1 and sets optarg to point to the string.
As for standard getopt(), if the first character of optstring is a ':', then GNU getopt() distinguishes between "invalid option" and "missing option argument" by returning '?' or ':', respectively. The ':' in optstring can be the second character if the first character is '+' or '-'.
Finally, if an option letter in optstring is followed by two colon characters, then that option is allowed to have an optional option argument. (Say that three times fast!) Such an argument is deemed to be present if it's in the same argv element as the option, and absent otherwise. In the case that it's absent, GNU getopt() returns the option letter and sets optarg to NULL. For example, given
while ((c = getopt(argc, argv, "ab::")) != 1) ...
for -bYANKEES, the return value is 'b', and optarg points to "YANKEES", while for -b or '-b YANKEES', the return value is still 'b' but optarg is set to NULL. In the latter case, "YANKEES" is a separate command-line argument.
2.3.3 Long Options
The getopt_long() function handles the parsing of long options of the form described earlier. An additional routine, getopt_long_only() works identically, but it is used for programs where all options are long and options begin with a single '-' character. Otherwise, both work just like the simpler GNU getopt() function. (For brevity, whenever we say "getopt_long()," it's as if we'd said "getopt_long() and getopt_long_only().") Here are the declarations, from the GNU/Linux getopt(3) manpage:
#include <getopt.h> GLIBC int getopt_long(int argc, char *const argv[], const char *optstring, const struct option *longopts, int *longindex); int getopt_long_only(int argc, char *const argv[], const char *optstring, const struct option *longopts, int *longindex);
The first three arguments are the same as for getopt(). The next option is a pointer to an array of struct option, which we refer to as the long options table and which is described shortly. The longindex parameter, if not set to NULL, points to a variable which is filled in with the index in longopts of the long option that was found. This is useful for error diagnostics, for example.
2.3.3.1 Long Options Table
Long options are described with an array of struct option structures. The struct option is declared in <getopt.h>; it looks like this:
struct option { const char *name; int has_arg; int *flag; int val; };
The elements in the structure are as follows:
const char *name
-
This is the name of the option, without any leading dashes, for example, "help" or "verbose".
int has_arg
-
This describes whether the long option has an argument, and if so, what kind of argument. The value must be one of those presented in Table 2.1.
Table 2.1. Values for has_arg
Symbolic constant
Numeric value
Meaning
no_argument
0
The option does not take an argument.
required_argument
1
The option requires an argument.
optional_argument
2
The option's argument is optional.
The symbolic constants are macros for the numeric values given in the table. While the numeric values work, the symbolic constants are considerably easier to read, and you should use them instead of the corresponding numbers in any code that you write.
int *flag
-
If this pointer is NULL, then getopt_long() returns the value in the val field of the structure. If it's not NULL, the variable it points to is filled in with the value in val and getopt_long() returns 0. If the flag isn't NULL but the long option is never seen, then the pointed-to variable is not changed.
int val
-
This is the value to return if the long option is seen or to load into *flag if flag is not NULL. Typically, if flag is not NULL, then val is a true/false value, such as 1 or 0. On the other hand, if flag is NULL, then val is usually a character constant. If the long option corresponds to a short one, the character constant should be the same one that appears in the optstring argument for this option. (All of this will become clearer shortly when we see some examples.)
Each long option has a single entry with the values appropriately filled in. The last element in the array should have zeros for all the values. The array need not be sorted; getopt_long() does a linear search. However, sorting it by long name may make it easier for a programmer to read.
The use of flag and val seems confusing at first encounter. Let's step back for a moment and examine why it works the way it does. Most of the time, option processing consists of setting different flag variables when different option letters are seen, like so:
while ((c = getopt(argc, argv, ":af:hv")) != -1) { switch (c) { case 'a': do_all = 1; break; case 'f': myfile = optarg; break; case 'h': do_help = 1; break; case 'v': do_verbose = 1; break; ... Error handling code here } }
When flag is not NULL, getopt_long() sets the variable for you. This reduces the three cases in the previous switch to one case. Here is an example long options table and the code to go with it:
int do_all, do_help, do_verbose; /* flag variables */ char *myfile; struct option longopts[] = { { "all", no_argument, & do_all, 1 }, { "file", required_argument, NULL, 'f' }, { "help", no_argument, & do_help, 1 }, { "verbose", no_argument, & do_verbose, 1 }, { 0, 0, 0, 0 } }; ... while ((c = getopt_long(argc, argv, ":f:", longopts, NULL)) != -1) { switch (c) { case 'f': myfile = optarg; break; case 0: /* getopt_long() set a variable, just keep going */ break; ... Error handling code here } }
Notice that the value passed for the optstring argument no longer contains 'a', 'h', or 'v'. This means that the corresponding short options are not accepted. To allow both long and short options, you would have to restore the corresponding cases from the first example to the switch.
Practically speaking, you should write your programs such that each short option also has a corresponding long option. In this case, it's easiest to have flag be NULL and val be the corresponding single letter.
2.3.3.2 Long Options, POSIX Style
The POSIX standard reserves the -W option for vendor-specific features. Thus, by definition, -W isn't portable across different systems.
If W appears in the optstring argument followed by a semicolon (note: not a colon), then getopt_long() treats -Wlongopt the same as --longopt. Thus, in the previous example, change the call to be:
while ((c = getopt_long(argc, argv, ":f:W;", longopts, NULL)) != -1) {
With this change, -Wall is the same as --all and -Wfile=myfile is the same as --file=myfile. The use of a semicolon makes it possible for a program to use -W as a regular option, if desired. (For example, GCC uses it as a regular option, whereas gawk uses it for POSIX conformance.)
2.3.3.3 getopt_long() Return Value Summary
As should be clear by now, getopt_long() provides a flexible mechanism for option parsing. Table 2.2 summarizes the possible return values and their meaning.
Table 2.2. getopt_long() return values
Return code |
Meaning |
---|---|
0 |
getopt_long() set a flag as found in the long option table. |
1 |
optarg points at a plain command-line argument. |
'?' |
Invalid option. |
':' |
Missing option argument. |
'x' |
Option character 'x'. |
1 |
End of options. |
Finally, we enhance the previous example code, showing the full switch statement:
int do_all, do_help, do_verbose; /* flag variables */ char *myfile, *user; /* input file, user name */ struct option longopts[] = { { "all", no_argument, & do_all, 1 }, { "file", required_argument, NULL, 'f' }, { "help", no_argument, & do_help, 1 }, { "verbose", no_argument, & do_verbose, 1 }, { "user" , optional_argument, NULL, 'u' }, { 0, 0, 0, 0 } }; ... while ((c = getopt_long(argc, argv, ":ahvf:u::W;", longopts, NULL)) != 1) { switch (c) { case 'a': do_all = 1; break; case 'f': myfile = optarg; break; case 'h': do_help = 1; break; case 'u': if (optarg != NULL) user = optarg; else user = "root"; break; case 'v': do_verbose = 1; break; case 0: /* getopt_long() set a variable, just keep going */ break; #if 0 case 1: /* * Use this case if getopt_long() should go through all * arguments. If so, add a leading '-' character to optstring. * Actual code, if any, goes here. */ break; #endif case ':': /* missing option argument */ fprintf(stderr, "%s: option `-%c' requires an argument\n", argv[0], optopt); break; case '?': default: /* invalid option */ fprintf(stderr, "%s: option `-%c' is invalid: ignored\n", argv[0], optopt); break; } }
In your programs, you may wish to have comments for each option letter explaining what each one does. However, if you've used descriptive variable names for each option letter, comments are not as necessary. (Compare do_verbose to vflg.)
2.3.3.4 GNU getopt() or getopt_long() in User Programs
You may wish to use GNU getopt() or getopt_long() in your own programs and have them run on non-Linux systems. That's OK; just copy the source files from a GNU program or from the GNU C Library (GLIBC) CVS archive.3 The source files are getopt.h, getopt.c, and getopt1.c. They are licensed under the GNU Lesser General Public License, which allows library functions to be included even in proprietary programs. You should include a copy of the file COPYING.LIB with your program, along with the files getopt.h, getopt.c, and getopt1.c.
Include the source files in your distribution, and compile them along with any other source files. In your source code that calls getopt_long(), use '#include <getopt.h>', not '#include "getopt.h"'. Then, when compiling, add -I. to the C compiler's command line. That way, the local copy of the header file will be found first.
You may be wondering, "Gee, I already use GNU/Linux. Why should I include getopt_long() in my executable, making it bigger, if the routine is already in the C library?" That's a good question. However, there's nothing to worry about. The source code is set up so that if it's compiled on a system that uses GLIBC, the compiled files will not contain any code! Here's the proof, on our system:
$ uname -a Show system name and type Linux example 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 GNU/Linux $ ls -l getopt.o getopt1.o Show file sizes -rw-r--r-- 1 arnold devel 9836 Mar 24 13:55 getopt.o -rw-r--r-- 1 arnold devel 10324 Mar 24 13:55 getopt1.o $ size getopt.o getopt1.o Show sizes included in executable text data bss dec hex filename 0 0 0 0 0 getopt.o 0 0 0 0 0 getopt1.o
The size command prints the sizes of the various parts of a binary object or executable file. We explain the output in Section 3.1, "Linux/Unix Address Space," page 52. What's important to understand right now is that, despite the nonzero sizes of the files themselves, they don't contribute anything to the final executable. (We think this is pretty neat.)