- 2.1 collectionsContainer Data Types
- 2.2 arraySequence of Fixed-Type Data
- 2.3 heapqHeap Sort Algorithm
- 2.4 bisectMaintain Lists in Sorted Order
- 2.5 QueueThread-Safe FIFO Implementation
- 2.6 structBinary Data Structures
- 2.7 weakrefImpermanent References to Objects
- 2.8 copyDuplicate Objects
- 2.9 pprintPretty-Print Data Structures
2.6 struct—Binary Data Structures
- Purpose Convert between strings and binary data.
- Python Version 1.4 and later
The struct module includes functions for converting between strings of bytes and native Python data types, such as numbers and strings.
2.6.1 Functions vs. Struct Class
There is a set of module-level functions for working with structured values, and there is also the Struct class. Format specifiers are converted from their string format to a compiled representation, similar to the way regular expressions are handled. The conversion takes some resources, so it is typically more efficient to do it once when creating a Struct instance and call methods on the instance, instead of using the module-level functions. The following examples all use the Struct class.
2.6.2 Packing and Unpacking
Structs support packing data into strings and unpacking data from strings using format specifiers made up of characters representing the data type and optional count and endianness indicators. Refer to the standard library documentation for a complete list of the supported format specifiers.
In this example, the specifier calls for an integer or long value, a two-character string, and a floating-point number. The spaces in the format specifier are included to separate the type indicators and are ignored when the format is compiled.
import struct import binascii values = (1, 'ab', 2.7) s = struct.Struct('I 2s f') packed_data = s.pack(*values) print 'Original values:', values print 'Format string :', s.format print 'Uses :', s.size, 'bytes' print 'Packed Value :', binascii.hexlify(packed_data)
The example converts the packed value to a sequence of hex bytes for printing with binascii.hexlify(), since some characters are nulls.
$ python struct_pack.py Original values: (1, 'ab', 2.7) Format string : I 2s f Uses : 12 bytes Packed Value : 0100000061620000cdcc2c40
Use unpack() to extract data from its packed representation.
import struct import binascii packed_data = binascii.unhexlify('0100000061620000cdcc2c40') s = struct.Struct('I 2s f') unpacked_data = s.unpack(packed_data) print 'Unpacked Values:', unpacked_data
Passing the packed value to unpack() gives basically the same values back (note the discrepancy in the floating-point value).
$ python struct_unpack.py Unpacked Values: (1, 'ab', 2.700000047683716)
2.6.3 Endianness
By default, values are encoded using the native C library notion of endianness. It is easy to override that choice by providing an explicit endianness directive in the format string.
import struct import binascii values = (1, 'ab', 2.7) print 'Original values:', values endianness = [ ('@', 'native, native'), ('=', 'native, standard'), ('<', 'little-endian'), ('>', 'big-endian'), ('!', 'network'), ] for code, name in endianness: s = struct.Struct(code + ' I 2s f') packed_data = s.pack(*values) print print 'Format string :', s.format, 'for', name print 'Uses :', s.size, 'bytes' print 'Packed Value :', binascii.hexlify(packed_data) print 'Unpacked Value :', s.unpack(packed_data)
Table 2.1 lists the byte order specifiers used by Struct.
Table 2.1. Byte Order Specifiers for struct
Code |
Meaning |
@ |
Native order |
= |
Native standard |
< |
Little-endian |
> |
Big-endian |
! |
Network order |
$ python struct_endianness.py Original values: (1, 'ab', 2.7) Format string : @ I 2s f for native, native Uses : 12 bytes Packed Value : 0100000061620000cdcc2c40 Unpacked Value : (1, 'ab', 2.700000047683716) Format string : = I 2s f for native, standard Uses : 10 bytes Packed Value : 010000006162cdcc2c40 Unpacked Value : (1, 'ab', 2.700000047683716) Format string : < I 2s f for little-endian Uses : 10 bytes Packed Value : 010000006162cdcc2c40 Unpacked Value : (1, 'ab', 2.700000047683716) Format string : > I 2s f for big-endian Uses : 10 bytes Packed Value : 000000016162402ccccd Unpacked Value : (1, 'ab', 2.700000047683716) Format string : ! I 2s f for network Uses : 10 bytes Packed Value : 000000016162402ccccd Unpacked Value : (1, 'ab', 2.700000047683716)
2.6.4 Buffers
Working with binary packed data is typically reserved for performance-sensitive situations or when passing data into and out of extension modules. These cases can be optimized by avoiding the overhead of allocating a new buffer for each packed structure. The pack_into() and unpack_from() methods support writing to preallocated buffers directly.
import struct import binascii s = struct.Struct('I 2s f') values = (1, 'ab', 2.7) print 'Original:', values print print 'ctypes string buffer' import ctypes b = ctypes.create_string_buffer(s.size) print 'Before :', binascii.hexlify(b.raw) s.pack_into(b, 0, *values) print 'After :', binascii.hexlify(b.raw) print 'Unpacked:', s.unpack_from(b, 0) print print 'array' import array a = array.array('c', '\0' * s.size) print 'Before :', binascii.hexlify(a) s.pack_into(a, 0, *values) print 'After :', binascii.hexlify(a) print 'Unpacked:', s.unpack_from(a, 0)
The size attribute of the Struct tells us how big the buffer needs to be.
$ python struct_buffers.py Original: (1, 'ab', 2.7) ctypes string buffer Before : 000000000000000000000000 After : 0100000061620000cdcc2c40 Unpacked: (1, 'ab', 2.700000047683716) array Before : 000000000000000000000000 After : 0100000061620000cdcc2c40 Unpacked: (1, 'ab', 2.700000047683716)
See Also:
- struct (http://docs.python.org/library/struct.html) The standard library documentation for this module.
- array (page 84) The array module, for working with sequences of fixed-type values.
- binascii (http://docs.python.org/library/binascii.html) The binascii module, for producing ASCII representations of binary data.
- Endianness (http://en.wikipedia.org/wiki/Endianness) Wikipedia article that provides an explanation of byte order and endianness in encoding.