Storage Networks
As explained in the preceding section, two networks may be used: the IP Ethernet network (in NAS or iSCSI modes) and the FC network (FC or FCoE).
IP Storage Network
This type of network was not originally designed to offer high-performance storage but rather to carry information between the network’s various active elements. Therefore, it is not adapted for applications requiring high performance, such as database applications. The IP network is located at Level 3 of the OSI layer, so it is routable, which favors network interconnectivity over long distances. The FC network is found at Level 2 and, therefore, not routable. Today, throughputs reach 10 GbE, and the future promises 40 GbE and 100 GbE.
The problem with IP networks is that an IP network experiences packet loss because of the following factors:
- Signal degradation on the physical line
- Routing errors
- Buffer overflow (when the receptor cannot absorb the incoming flow)
The TCI/IP protocol allows the retransmission of lost packets (if sent data is not acknowledged by the receiver), but this has a dramatic impact on performance.
Another issue is that only a limited quantity of data, called maximum transmission unit (MTU), can be sent in an IP packet. This quantity, called the payload, is set at 1500 bytes for an Ethernet packet. Data beyond 1500 bytes must be fragmented before it is sent. Each time a packet is received by the network card, the card sends an interrupt to the host to confirm reception. This adds to the overload at the host and CPU cycle level (called overhead). As the number of sent packets increases, routing becomes more complicated and time-consuming.
To reduce this frame fragmentation, jumbo frames were created. These allow the transmission of packets larger than 1500 bytes (up to an MTU of 9000 bytes). The jumbo frames play a significant role in improving efficiency, and some studies have shown reductions of 50% in CPU overhead. The MTU must be activated and compatible from the beginning to the end of the chain, including physical switches, cards, cables, and so on.
Exercise caution. If a problem occurs, the higher the MTU between the source and the target, the larger the packets to retransmit will be, which decreases performance and increases latency. To make the most of jumbo frames, the network must be robust and well implemented.
IP storage networks have the advantage of being less expensive than SAN FC equipment. Ethernet networks are already in place, so in some cases, less implementation is required, making them easier to use. Furthermore, IT teams have used the technology for several years.
iSCSI in VMware
In the VMware environment, the iSCSI protocol has been supported only since 2006. If deployed in an optimal fashion, this protocol offers very good performance. The IP network is administered by a team other than the storage team.
Advantages: iSCSI has been adopted in many activity sectors because it uses the company’s TCP/IP network for access in block mode, without the need to invest in FC equipment. For this reason, it is an ideal solution in certain environments because it is much easier to set up. Using the traditional Ethernet network means greater distances can be covered before requiring special conversion equipment (such as FC to IP converters)—for example, for replication. The skills necessary are network skills rather than advanced storage skills.
Disadvantages: Tests have proven iSCSI is the protocol that uses the most CPU resources. Therefore, monitoring CPU use is important and should be taken into account when provisioning networks.
The following best practices are recommended:
- Using iSCSI is worthwhile only if the architecture can take full advantage of this protocol by activating jumbo frames (MTU 9000), which provides excellent performance. This activation must exist from one end of the chain to the other.
- Using iSCSI HBA cards becomes essential when using 10-GB connections, and links should be aggregated wherever possible for high performance and redundancy in case of failure.
- It is advisable to physically separate the iSCSI storage network from the standard IP network. If this is not possible, streams should be isolated using virtual local-area networks (VLANs).
- Use cards with TCP/IP offline engine (TOE) functionality to unload the host from some instructions related to the iSCSI overlay and to reduce the overhead.
- Implement quality of service (QoS) by putting the priority on streams. Using vSphere, this can be done using the Storage I/O Control (SIOC) functionality.
- Network packet loss is one of the main challenges to achieving good iSCSI network performance. Packet loss can be caused by faulty network configuration or the wrong quality of wiring (for example, using Category 5 cables rather than Category 6 for gigabit links).
NFS in VMware
Network File System (NFS) is a protocol used by NAS and supported by ESX since 2006. It provides storage sharing through the network at the file-system level. VMware supports NFS version 3 over TCP. Contrary to what is sometimes believed, tests show good performance if this protocol is implemented properly. Therefore, it is possible to use this protocol under certain conditions for virtual environments. Activation of jumbo frames (MTU 9000) allows the transmission of 8192 (8 KB) NFS data blocks, which are well suited for the protocol. By default, 8 NFS mounts per ESXi host are possible. This can be extended to 256 NFS mounts. If you increase the maximum NFS mounts above the default setting of eight, make sure to also increase the Net.TcpipHeapSize and Net.TcpipHeapMax, as well. The values are in the advanced configuration, and govern the amount of heap memory, measured in megabytes, which is allocated for managing VMkernel TCP/IP network connectivity.
- ESXi 5.0: Set Net.TcpipHeapSize to 32
- ESXi 5.0: Set Net.TcpipHeapMax to 128
Advantages: Like iSCSI, NFS uses the standard TCP/IP network and is very easy to implement without the need for a dedicated storage infrastructure. It is the least expensive solution, and it does not require particular storage skills. Very often, NAS offers de-duplication, which can reduce the amount of storage space required.
Disadvantages: It offers the lowest performance of the described solutions, but it is close to iSCSI’s. It makes more use of the host server’s CPU than the FC protocol, but less than iSCSI software. Therefore, it could conceivably be used in a production environment with VMs requiring average performance for Tier 2 and Tier 3 applications.
The following best practices are recommended:
- Use 100 to 400 vmdk files per NFS volume. The maximum possible logical unit number (LUN) is 256 for a maximum size of 64 TB per NFS volume. (Manufacturers can provide information about the limit supported by file systems, usually 16 TB.)
- Separate the network dedicated to storage from the Ethernet network by using dedicated switches or VLANs.
- Activate flow control.
- Activate jumbo frames by using dedicated switches with large per-port buffers.
- Activate the Spanning Tree Protocol.
- Use a 10-Gb network (strongly recommended).
- Use full TOE cards to unload ESXi host servers.
- To isolate storage traffic from other networking traffic, use either dedicated switches or VLANs for your NFS and iSCSI traffic
Fibre Channel Network
Essentially, the Fibre Channel network is dedicated to storage that offers direct lossless access to block mode data. This network is designed for high-performance storage with very low latency, through advanced mechanisms such as buffer credits (a kind of buffer memory used to regulate streams in a SAN). The FC protocol (FC) encapsulates SCSI packets through a dedicated Fibre Channel network. Speeds are 1, 2, 4, 8, or 16 Gbps. FC packets carry a payload of 2112 bytes. This storage network carries data between servers and storage devices through Fibre Channel switches. The SAN, illustrated in Figure 3.5, enables storage consolidation and provides high scalability.
Figure 3.5. Schematic architecture of a Fibre Channel SAN.
SAN FC in VMware
Fibre Channel (FC) is the most advanced of the protocols supported by VMware. This is why it is most often the one used by clients in their production environment.
Advantages: It seems to be a given today that FC is the most high-performing protocol, as well as the one using the least host-server CPU resources when compared to NFS and iSCSI. High performance levels can be reached, and because the technology is lossless, the network is predictive. This protocol works for all popular applications and is ideal for those that are I/O intensive such as databases or enterprise resource planning (ERP) applications.
Disadvantages: FC is the most expensive solution because it involves building a specialized storage architecture and requiring an investment in HBA cards, FC switches, small form-factor pluggable (SFP) ports, and cables. Moreover, implementing this solution is more complex and requires specialized storage skills. Training is required, and so is learning new terminology to manage the SAN, such as LUN masking, zoning WWN, and fabrics.
The following best practices are recommended:
- To reduce broken links, insert several HBA cards in each server and use numerous storage array access paths.
- Use load-balancing software, such as native ESXi Round Robin, or EMC PowerPath/Virtual Edition, to optimize path management between servers and storage.
- Use ALUA-compliant storage arrays that are compatible with VMware’s VAAI APIs.
- Use the same number of paths between all members of the cluster, and all host servers within a cluster should see the same volumes.
- Comply with the connection compatibility matrix between the members of the ESXi cluster and storage.
- Use the same speed of switches in all connections to avoid the creation of contention points in the SAN.
- Check firmware levels on FC switches and HBAs, and follow instructions from the storage array manufacturer.
SAN FCoE in VMware
Fibre Channel over Ethernet (FCoE) represents the convergence of various fields: Ethernet for the network (TCP/IP), SAN for storage (SAN FC), and InfiniBand for clustering (IPC). This means a single type of card, switch cables, and management interface can now be used for these various protocols. FC frames are encapsulated in Ethernet frames that provide transport in a more efficient manner than TCP/IP.
FCoE frames carry a payload of 2500 bytes. The goal is to render Ethernet lossless like FC. This is achieved by making the physical network more reliable and by making a number of improvements, especially with regard to QoS. Dedicated equipment is required, as is the activation of jumbo frames (2180 bytes). Congestion is eliminated through stream-management mechanisms.
Because FCoE remains relatively uncommon in 2012, we lack practical experience regarding the advantages and disadvantages of this type of protocol in a VMware environment.
Which Protocol Is Best for You?
In our experience, SAN FC is the protocol administrators prefer for virtual production environments. An estimated 70% of customers use SAN FC for production in VMware environments. The arrival of 10 GbE with jumbo frames, however, allows the easy implementation of a SAN IP infrastructure while maintaining a level of performance that can suffice in some cases. Aside from technical criteria, the optimal choice is based on existing architectures and allocated budgets.
To summarize
- SAN FC should be favored for applications that require high performance (Tier 1 and Tier 2), such as database applications.
- iSCSI can be used for Tier 2 applications. Some businesses use IP in iSCSI for remote data replication, which works well and limits costs.
- NAS can be used for network services such as infrastructure VMs—domain controller, DNS, file, or noncritical application servers (Tier 3 applications)—as well as for ISO image, template, and VM backup storage.