Operations on Sequences
The following operators can be applied to sequence types, including strings, lists, and tuples:
Operation |
Description |
s + r |
Concatenation |
s * n, n * s |
Makes n copies of s, where n is an integer |
s % d |
String formatting (strings only) |
s[i] |
Indexing |
s[i:j] |
Slicing |
s[i:j:stride] |
Extended slicing |
x in s, x not in s |
Membership |
for x in s: |
Iteration |
len(s) |
Length |
min(s) |
Minimum item |
max(s) |
Maximum item |
The + operator concatenates two sequences of the same type. The s * n operator makes n copies of a sequence. However, these are shallow copies that replicate elements by reference only. For example, consider the following code:
a = [3,4,5] # A list b = [a] # A list containing a c = 4*b # Make four copies of b # Now modify a a[0] = -7 # Look at c print c
The output of this program is the following:
[[-7, 4, 5], [-7, 4, 5], [-7, 4, 5], [-7, 4, 5]]
In this case, a reference to the list a was placed in the list b. When b was replicated, four additional references to a were created. Finally, when a was modified, this change was propagated to all the other "copies" of a. This behavior of sequence multiplication is often unexpected and not the intent of the programmer. One way to work around the problem is to manually construct the replicated sequence by duplicating the contents of a. For example:
a = [ 3, 4, 5 ] c = [a[:] for j in range(4)] # [:] makes a copy of a list
The copy module in the standard library can also be used to make copies of objects.
The indexing operator s[n] returns the nth object from a sequence in which s[0] is the first object. Negative indices can be used to fetch characters from the end of a sequence. For example, s[-1] returns the last item. Otherwise, attempts to access elements that are out of range result in an IndexError exception.
The slicing operator s[i:j] extracts a subsequence from s consisting of the elements with index k, where i <= k < j. Both i and j must be integers or long integers. If the starting or ending index is omitted, the beginning or end of the sequence is assumed, respectively. Negative indices are allowed and assumed to be relative to the end of the sequence. If i or j is out of range, they're assumed to refer to the beginning or end of a sequence, depending on whether their value refers to an element before the first item or after the last item, respectively.
The slicing operator may be given an optional stride, s[i:j:stride], that causes the slice to skip elements. However, the behavior is somewhat more subtle. If a stride is supplied, i is the starting index, j is the ending index, and the produced subsequence is the elements s[i], s[i+stride], s[i+2*stride], and so forth until index j is reached (which is not included). The stride may also be negative. If the starting index i is omitted, it is set to the beginning of the sequence if stride is positive or the end of the sequence if stride is negative. If the ending index j is omitted, it is set to the end of the sequence if stride is positive or the beginning of the sequence if stride is negative. Here are some examples:
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] b = a[::2] # b = [0, 2, 4, 6, 8 ] c = a[::-2] # c = [9, 7, 5, 3, 1 ] d = a[0:5:2] # d = [0,2] e = a[5:0:-2] # e = [5,3,1] f = a[:5:1] # f = [0,1,2,3,4] g = a[:5:-1] # g = [9,8,7,6] h = a[5::1] # h = [5,6,7,8,9] i = a[5::-1] # i = [5,4,3,2,1,0] j = a[5:0:-1] # j = [5,4,3,2,1]
The x in s operator tests to see whether the object x is in the sequence s and returns True or False. Similarly, the x not in s operator tests whether x is not in the sequence s. For strings, the in and not in operators accept subtrings. For example, 'hello' in 'hello world' produces True.
The for x in s operator iterates over all the elements of a sequence and is described further in Chapter 5, "Control Flow." len(s) returns the number of elements in a sequence. min(s) and max(s) return the minimum and maximum values of a sequence, respectively, although the result may only make sense if the elements can be ordered with respect to the < operator (for example, it would make little sense to find the maximum value of a list of file objects).
Strings and tuples are immutable and cannot be modified after creation. Lists can be modified with the following operators:
Operation |
Description |
s[i] = x |
Index assignment |
s[i:j] = r |
Slice assignment |
s[i:j:stride] = r |
Extended slice assignment |
del s[i] |
Deletes an element |
del s[i:j] |
Deletes a slice |
del s[i:j:stride] |
Deletes an extended slice |
The s[i] = x operator changes element i of a list to refer to object x, increasing the reference count of x. Negative indices are relative to the end of the list and attempts to assign a value to an out-of-range index result in an IndexError exception. The slicing assignment operator s[i:j] = r replaces elements k, where i <= k < j, with elements from sequence r. Indices may have the same values as for slicing and are adjusted to the beginning or end of the list if they're out of range. If necessary, the sequence s is expanded or reduced to accommodate all the elements in r. Here's an example:
a = [1,2,3,4,5] a[1] = 6 # a = [1,6,3,4,5] a[2:4] = [10,11] # a = [1,6,10,11,5] a[3:4] = [-1,-2,-3] # a = [1,6,10,-1,-2,-3,5] a[2:] = [0] # a = [1,6,0]
Slicing assignment may be supplied with an optional stride argument. However, the behavior is somewhat more restricted in that the argument on the right side must have exactly the same number of elements as the slice that's being replaced. Here's an example:
a = [1,2,3,4,5] a[1::2] = [10,11] # a = [1,10,3,11,5] a[1::2] = [30,40,50] # ValueError. Only two elements in slice on left
The del s[i] operator removes element i from a list and decrements its reference count. del s[i:j] removes all the elements in a slice. A stride may also be supplied, as in del s[i:j:stride].
Sequences are compared using the operators <, >, <=, >=, ==, and !=. When comparing two sequences, the first elements of each sequence are compared. If they differ, this determines the result. If they're the same, the comparison moves to the second element of each sequence. This process continues until two different elements are found or no more elements exist in either of the sequences. If the end of both sequences is reached, the sequences are considered equal. If a is a subsequence of b, then a < b. Strings are compared using lexicographical ordering. Each character is assigned a unique index determined by the machine's character set (such as ASCII or Unicode). A character is less than another character if its index is less.
The modulo operator (s % d) produces a formatted string, given a format string, s, and a collection of objects in a tuple or mapping object (dictionary). The string s may be a standard or Unicode string. The behavior of this operator is similar to the C sprintf() function. The format string contains two types of objects: ordinary characters (which are left unmodified) and conversion specifiers, each of which is replaced with a formatted string representing an element of the associated tuple or mapping. If d is a tuple, the number of conversion specifiers must exactly match the number of objects in d. If d is a mapping, each conversion specifier must be associated with a valid key name in the mapping (using parentheses, as described shortly). Each conversion specifier starts with the % character and ends with one of the conversion characters shown in Table 4.1.
Table 4.1 String Formatting Conversions
Character |
Output Format |
d,i |
Decimal integer or long integer. |
u |
Unsigned integer or long integer. |
o |
Octal integer or long integer. |
x |
Hexadecimal integer or long integer. |
X |
Hexadecimal integer (uppercase letters). |
f |
Floating point as [-]m.dddddd. |
e |
Floating point as [-]m.dddddde±xx. |
E |
Floating point as [-]m.ddddddE±xx. |
g,G |
Use %e or %E for exponents less than 4 or greater than the precision; otherwise use %f. |
s |
String or any object. The formatting code uses str() to generate strings. |
r |
Produces the same string as produced by repr(). |
c |
Single character. |
% |
Literal %. |
Between the % character and the conversion character, the following modifiers may appear, in this order:
- A key name in parentheses, which selects a specific item out of the mapping object. If no such element exists, a KeyError exception is raised.
- One or more of the following:
- - sign, indicating left alignment. By default, values are right-aligned.
- -+ sign, indicating that the numeric sign should be included (even if positive).
- 0, indicating a zero fill.
- A number specifying the minimum field width. The converted value will be printed in a field at least this wide and padded on the left (or right if the flag is given) to make up the field width.
- A period separating the field width from a precision.
- A number specifying the maximum number of characters to be printed from a string, the number of digits following the decimal point in a floating-point number, or the minimum number of digits for an integer.
In addition, the asterisk (*) character may be used in place of a number in any width field. If present, the width will be read from the next item in the tuple.
The following code illustrates a few examples:
a = 42 b = 13.142783 c = "hello" d = {'x':13, 'y':1.54321, 'z':'world'} e = 5628398123741234L print 'a is %d' % a # "a is 42" print '%10d %f' % (a,b) # " 42 13.142783" print '%+010d %E' % (a,b) # "+000000042 1.314278E+01" print '%(x)-10d %(y)0.3g' % d # "13 1.54" print '%0.4s %s' % (c, d['z']) # "hell world" print '%*.*f' % (5,3,b) # "13.143" print 'e = %d' % e # "e = 5628398123741234"