1.2 Command-Line Arguments
Most programs process some input to produce some output; that’s pretty much the definition of computing. But how does a program get input data on which to operate? Some programs generate their own data, but more often, input comes from an external source: a file, a network connection, the output of another program, a user at a keyboard, command-line arguments, or the like. The next few examples will discuss some of these alternatives, starting with command-line arguments.
The os package provides functions and other values for dealing with the operating system in a platform-independent fashion. Command-line arguments are available to a program in a variable named Args that is part of the os package; thus its name anywhere outside the os package is os.Args.
The variable os.Args is a slice of strings. Slices are a fundamental notion in Go, and we’ll talk a lot more about them soon. For now, think of a slice as a dynamically sized sequence s of array elements where individual elements can be accessed as s[i] and a contiguous subsequence as s[m:n]. The number of elements is given by len(s). As in most other programming languages, all indexing in Go uses half-open intervals that include the first index but exclude the last, because it simplifies logic. For example, the slice s[m:n], where 0 ≤ m ≤ n ≤ len(s), contains n-m elements.
The first element of os.Args, os.Args[0], is the name of the command itself; the other elements are the arguments that were presented to the program when it started execution. A slice expression of the form s[m:n] yields a slice that refers to elements m through n-1, so the elements we need for our next example are those in the slice os.Args[1:len(os.Args)]. If m or n is omitted, it defaults to 0 or len(s) respectively, so we can abbreviate the desired slice as os.Args[1:].
Here’s an implementation of the Unix echo command, which prints its command-line arguments on a single line. It imports two packages, which are given as a parenthesized list rather than as individual import declarations. Either form is legal, but conventionally the list form is used. The order of imports doesn’t matter; the gofmt tool sorts the package names into alphabetical order. (When there are several versions of an example, we will often number them so you can be sure of which one we’re talking about.)
gopl.io/ch1/echo1 // Echo1 prints its command-line arguments. package main import ( "fmt" "os" ) func main() { var s, sep string for i := 1; i < len(os.Args); i++ { s += sep + os.Args[i] sep = " " } fmt.Println(s) }
Comments begin with //. All text from a // to the end of the line is commentary for programmers and is ignored by the compiler. By convention, we describe each package in a comment immediately preceding its package declaration; for a main package, this comment is one or more complete sentences that describe the program as a whole.
The var declaration declares two variables s and sep, of type string. A variable can be initialized as part of its declaration. If it is not explicitly initialized, it is implicitly initialized to the zero value for its type, which is 0 for numeric types and the empty string "" for strings. Thus in this example, the declaration implicitly initializes s and sep to empty strings. We’ll have more to say about variables and declarations in Chapter 2.
For numbers, Go provides the usual arithmetic and logical operators. When applied to strings, however, the + operator concatenates the values, so the expression
sep + os.Args[i]
represents the concatenation of the strings sep and os.Args[i]. The statement we used in the program,
s += sep + os.Args[i]
is an assignment statement that concatenates the old value of s with sep and os.Args[i] and assigns it back to s; it is equivalent to
s = s + sep + os.Args[i]
The operator += is an assignment operator. Each arithmetic and logical operator like + or * has a corresponding assignment operator.
The echo program could have printed its output in a loop one piece at a time, but this version instead builds up a string by repeatedly appending new text to the end. The string s starts life empty, that is, with value "", and each trip through the loop adds some text to it; after the first iteration, a space is also inserted so that when the loop is finished, there is one space between each argument. This is a quadratic process that could be costly if the number of arguments is large, but for echo, that’s unlikely. We’ll show a number of improved versions of echo in this chapter and the next that will deal with any real inefficiency.
The loop index variable i is declared in the first part of the for loop. The := symbol is part of a short variable declaration, a statement that declares one or more variables and gives them appropriate types based on the initializer values; there’s more about this in the next chapter.
The increment statement i++ adds 1 to i; it’s equivalent to i += 1 which is in turn equivalent to i = i + 1. There’s a corresponding decrement statement i-- that subtracts 1. These are statements, not expressions as they are in most languages in the C family, so j = i++ is illegal, and they are postfix only, so --i is not legal either.
The for loop is the only loop statement in Go. It has a number of forms, one of which is illustrated here:
for initialization; condition; post { // zero or more statements }
Parentheses are never used around the three components of a for loop. The braces are mandatory, however, and the opening brace must be on the same line as the post statement.
The optional initialization statement is executed before the loop starts. If it is present, it must be a simple statement, that is, a short variable declaration, an increment or assignment statement, or a function call. The condition is a boolean expression that is evaluated at the beginning of each iteration of the loop; if it evaluates to true, the statements controlled by the loop are executed. The post statement is executed after the body of the loop, then the condition is evaluated again. The loop ends when the condition becomes false.
Any of these parts may be omitted. If there is no initialization and no post, the semicolons may also be omitted:
// a traditional "while" loop for condition { // ... }
If the condition is omitted entirely in any of these forms, for example in
// a traditional infinite loop for { // ... }
the loop is infinite, though loops of this form may be terminated in some other way, like a break or return statement.
Another form of the for loop iterates over a range of values from a data type like a string or a slice. To illustrate, here’s a second version of echo:
gopl.io/ch1/echo2 // Echo2 prints its command-line arguments. package main import ( "fmt" "os" ) func main() { s, sep := "", "" for _, arg := range os.Args[1:] { s += sep + arg sep = " " } fmt.Println(s) }
In each iteration of the loop, range produces a pair of values: the index and the value of the element at that index. In this example, we don’t need the index, but the syntax of a range loop requires that if we deal with the element, we must deal with the index too. One idea would be to assign the index to an obviously temporary variable like temp and ignore its value, but Go does not permit unused local variables, so this would result in a compilation error.
The solution is to use the blank identifier, whose name is _ (that is, an underscore). The blank identifier may be used whenever syntax requires a variable name but program logic does not, for instance to discard an unwanted loop index when we require only the element value. Most Go programmers would likely use range and _ to write the echo program as above, since the indexing over os.Args is implicit, not explicit, and thus easier to get right.
This version of the program uses a short variable declaration to declare and initialize s and sep, but we could equally well have declared the variables separately. There are several ways to declare a string variable; these are all equivalent:
s := "" var s string var s = "" var s string = ""
Why should you prefer one form to another? The first form, a short variable declaration, is the most compact, but it may be used only within a function, not for package-level variables. The second form relies on default initialization to the zero value for strings, which is "". The third form is rarely used except when declaring multiple variables. The fourth form is explicit about the variable’s type, which is redundant when it is the same as that of the initial value but necessary in other cases where they are not of the same type. In practice, you should generally use one of the first two forms, with explicit initialization to say that the initial value is important and implicit initialization to say that the initial value doesn’t matter.
As noted above, each time around the loop, the string s gets completely new contents. The += statement makes a new string by concatenating the old string, a space character, and the next argument, then assigns the new string to s. The old contents of s are no longer in use, so they will be garbage-collected in due course.
If the amount of data involved is large, this could be costly. A simpler and more efficient solution would be to use the Join function from the strings package:
gopl.io/ch1/echo3 func main() { fmt.Println(strings.Join(os.Args[1:], " ")) }
Finally, if we don’t care about format but just want to see the values, perhaps for debugging, we can let Println format the results for us:
fmt.Println(os.Args[1:])
The output of this statement is like what we would get from strings.Join, but with surrounding brackets. Any slice may be printed this way.
Exercise 1.1: Modify the echo program to also print os.Args[0], the name of the command that invoked it.
Exercise 1.2: Modify the echo program to print the index and value of each of its arguments, one per line.
Exercise 1.3: Experiment to measure the difference in running time between our potentially inefficient versions and the one that uses strings.Join. (Section 1.6 illustrates part of the time package, and Section 11.4 shows how to write benchmark tests for systematic performance evaluation.)