VMFS
Virtual Machine File System (VMFS) is a file system developed by VMware that is dedicated and optimized for clustered virtual environments and the storage of large files. The structure of VMFS makes it possible to store VM files in a single folder, simplifying VM administration.
Advantages: Traditional file systems authorize only a single server to obtain read/write access to a storage resource. VMFS is a so-called clustered file system—it allows read/write access to storage resources by several ESXi host servers simultaneously. To ensure that several servers do not simultaneously access the same VM, VMFS provides a system called on-disk locking. This guarantees that a VM works with only a single ESXi server at a time. To manage access, ESXi uses a SCSI reservation technique that modifies metadata files. This very short locking period prevents I/O on the entire LUN for any ESXi server and for VMs. This is why it is important not to have frequent SCSI reservations, because they could hinder performance.
The SCSI reservation is used by ESXi when:
- Creating a VMFS datastore
- Expanding a VMFS datastore onto additional extends
- Powering on a VM
- Acquiring a lock on a file
- Creating or deleting a file
- Creating a template
- Deploying a VM from a template
- Creating a new VM
- Migrating a VM with vMotion
- Growing a file (for example, a VMFS) snapshot file or a thin provisioned virtual disk)
- HA functionality is used (if a server fails, disk locking is released, which allows another ESXi server to restart VMs and use disk locking for its own purposes)
VMFS-5 Specifications
vSphere 5 introduces VMFS 5 with a maximum size of 64 TB. Table 3.1 outlines the evolution of VMFS from version 3 to version 5.
Table 3.1. VMFS-3 Versus VMFS-5
Functionalities |
VMFS-3 |
VMFS-5 |
Maximum volume |
2 TB |
64 TB |
Block size |
1, 2, 4, or 8 MB |
1 MB |
Sub-blocks |
64 KB |
8 KB |
Small files |
No |
1 KB |
VMFS-5 offers higher limits than VMFS-3 because the addressing table was redeveloped in 64 bits. (VMFS-3 offered a 32-bit table and was limited to 256,000 blocks of 8 MB [or 2 TB].) With VMFS-5, blocks have a fixed size of 1 MB, and the maximum volume is 64 TB. With VMFS-3, blocks vary in size between 1 MB and 8 MB, which can cause virtual disk maximum size issues if the block size is too low. (For example, 1 MB blocks are limited to 256 GB vmdk files, and the volume must be reformatted using the right size of blocks for a larger file size.) Sub-blocks go from 64 KB to 8 KB, with the possibility of managing files as small as 1 KB.
You should also note the following:
- A single VMFS datastore must be created for each LUN.
- VMFS keeps an event log. This preserves data integrity and allows quick restoration should problems arise.
Upgrading VMFS-3 to VMFS-5
VMFS-3 is compatible with vSphere 5. The upgrade from VMFS-3 to VMFS-5 is supported and occurs without service interruption while VMs are running. Creating a new VMFS volume is preferable, however, because the VMFS-3 to VMFS-5 upgrade carries the following limitations:
- Blocks keep their initial size (which can be larger than 1 MB). Copy operations between datastores with different block sizes will not benefit from the VAAI feature full copy.
- Sub-blocks remain at 64 KB.
- When a new VMFS-5 volume is created, the maximum number of files remains unchanged at 30,720 instead of a maximum of 100,000 files.
- The use of a master boot record (MBR) type partition remains, but it is automatically changed to a GUID partition table (GPT) when volumes are larger than 2 TB.
Signature of VMFS Datastores
Each VMFS datastore has a universal unique identifier (UUID) to identify on which LUN the VMFS datastore is located. This UUID must be unique. If two VMFSs are simultaneously mounted with the same UUID, ESXi does not know on which volume to perform read/write operations (it will send at random to each volume), which can lead to data corruption. vSphere detects this situation and prevents it.
When a VMFS LUN is replicated, snapshot, or cloned, the VMFS LUN created is 100% identical to the original, including the UUID. To exploit this new VMFS LUN, it is possible to assign a new signature or to keep the same signature as the original under certain conditions, using the following options (shown in Figure 3.6):
- Keep the Existing Signature: This option enables the preservation of the same signature and mounting of the replicated datastore. To avoid UUID conflicts, such mounting can be performed only in cases where the source VMFS LUN is unmounted (or removed).
- Assign a New Signature: When re-signaturing a VMFS, ESXi assigns a new UUID and name to the LUN copy. This enables the simultaneous mount of both VMFS datastores (the original volume and its copy) with two distinct identifiers. Note that re-signaturing is irreversible. Remember to perform a datastore rescan to update the LUNs introduced to the ESXi.
- Format the Disk: This option entirely reformats the volume.
Figure 3.6. Options offered when remounting a replicated or snapshot LUN.
Re-Signature of a VMFS Volume as Part of a DRP
A new UUID is generated when implementing a disaster recovery plan (DRP) and in cases where the replicated volume changes signature. The vmx and vmdk configuration files from the VMs recorded on the volume point to the former UUID rather than to the new volume. Therefore, all VMs that are part of the DRP must be manually removed from the vCenter’s inventory and re-recorded to recuperate the new UUID. This can be a cumbersome process and can lead to handling errors when these operations are performed manually.
One of the valuable propositions offered by Site Recovery Manager 5 (SRM5) is the automation of this workflow to simplify the process and avoid errors. With SRM5, the replicated volume is re-signatured on the backup site, and configuration files are automatically referenced with the proper UUID, pointing the VMs to the new replicated volume. Each protected VM is associated with the virtual disks assigned to it.
Technical Details
Within the environment, a VMFS volume is represented in the following ways:
- By its UUID (for example, 487788ae-34666454-2ae3-00004ea244e1).
- By a network address authority (NAA) ID (for example, naa.5000.xxx). vSphere uses NAA ID to detect the UUID with which the LUN ID is associated.
- By a label name seen by ESXi and a datastore name seen by vCenter Server (for example, myvmfsprod). This name is provided by the user and is only an alias pointing to the VMFS UUID, but it makes it easier to find your way.
- By a VMkernel device name, called runtime name in vCenter (for example, vmhba1:0:4).
When re-signaturing a VMFS, ESXi assigns a new UUID and a new label name for the copy and mounts the copied LUN like an original. The new associated name adopts the format type snap—for example, snapID-oldLabel, where snapID is an integer and oldLabel is the datastore’s former name.
Besides snapshots and replication, other operations performed on a datastore are seen by ESXi as a copy of the original and, therefore, require action from the administrator:
- LUN ID change: When changing a LUN ID, vSphere detects that the UUID is now associated with a new device.
- Change of SCSI type: For example, going from SCSI-2 to SCSI-3.
- Activation of SPC-2 compliance for some systems: For example, EMC Symmetrix requires this activation.
Rescanning the Datastore
After each storage-related modification at the ESXi or storage level, it is necessary to rescan storage adapters to take the new configuration into account. This allows updating of the list of visible datastores and related information.
Rescanning is required each time the following tasks are performed:
- Changing zoning at the SAN level, which has an impact on ESXi servers
- Creating a new LUN within the SAN or performing a re-signature
- Changing the LUN masking within the storage array
- Reconnecting a cable or fiber
- Changing a host at the cluster level
By default, VMkernel scans LUNs from 0 to 255. (Remember, the maximum number of LUNs that can be introduced to a host is 256.) To accelerate the scanning process, it is possible to specify a lower value in the advanced parameters: Disk.MaxLUN (for example, 64 in Figure 3.7).
Figure 3.7. Performing a datastore scan.
Alignment
Alignment is an important issue to take into account. Stacking up various layers can create nonaligned partitions, as shown in Figure 3.8. Contrast this with an example of aligned partitions, shown in Figure 3.9.
Figure 3.8. Nonaligned partitions.
Figure 3.9. Aligned partitions.
The smallest unit in a RAID stack is called a chunk. Under it is the VMFS, which uses 1-MB blocks. Above it is the formatted NTFS using blocks of 1 KB to 64 KB (called the disk cluster). If these layers are not aligned, reading a cluster can mean reading two blocks overlapping three chunks on three different hard drives, which can offset writing and, thus, decrease performance.
When the partition is aligned, a cluster should read a single block, itself aligned with a chunk. This alignment is crucial, and in a VMware environment, nonalignment can cause a 40% drop in performance.
In a Microsoft environment, Windows Server 2008 is automatically aligned, whereas older operating systems must be aligned using the Diskpart utility. See the software publisher’s instructions.
Increasing Volume
Volume Grow allows the dynamic extension of an existing VMFS without shutting down the VMs (up to 32 extensions). When a physical storage space is added to a LUN, the existing datastore can be extended without shutting down the server or the associated storage. This complements storage array options, which allow the dynamic extension of LUNs. Extending the storage space of a virtual disk (vmdk) is also possible in persistent mode without snapshots, using Hot VMDK Extend. It is recommended that extensions be put on disks with the same performance.
The vmdk extension and the visibility of the disk’s free space depend on the OS mechanism and its file system. Depending on the OS version, third-party tools might be required to extend a system partition, as is the case with Windows 2003. To find out more, refer to VMware’s Knowledge Base: KB 1004071.
Can a Single Large 64-TB Volume Be Created to Host All VMs?
With vSphere 5, the maximum size for a LUN VMFS-5 is 64 TB. Theoretically, a single, very large 64-TB volume could be created. Because the storage arrays integrate VMware’s APIs (VAAI), they offer excellent volume-access performance. However, we do not recommend adopting this approach, for the following reasons:
- Separating environments is absolutely essential; production, testing, receipt, and backup should each have its own dedicated environment and LUN. It is important not to mix I/O profiles when these are known (random versus sequential access, for example) and not to balance the load based on the VMs’ activities (even though Storage DRS allows load balancing).
- During migrations, because migrating a large volume is more complex than migrating several small volumes, which can be performed in stages.
- When a large volume gets corrupted, the impact is more significant than if the volume is smaller, containing fewer VMs.
Because of the preceding issues, creating separate LUNs is the preferred approach. It also makes replication easier (for example, by allowing protection to apply only to the critical environment).
Best Practices for VMFS Configuration
The following best practices are recommended:
- Generally, you should create VMFS volumes between 600 GB and 1 TB and use 15 to 20 active vmdks per volume (no more than 32). (A VM can have several active vmdks.)
- For environments that require high levels of performance, such as Oracle, Microsoft SQL, and SAP, the RDM mode is preferable.
- VMware recommends the use of VMFS over NFS because VMFS offers the complete set of capabilities and allows the use of RDM volumes for I/O-intensive applications.
- To avoid significant contentions, avoid connecting more than eight ESX servers to a LUN.
- Avoid placing several VMs with snapshots on the same VMFS.
- Avoid defining the DRS as aggressive because this will trigger frequent VM migration from one host server to another and, therefore, frequent SCSI reservations.
- Separate production LUNs from test LUNs, and store ISO files, templates, and backups on dedicated LUNs.
- Align vmdk partitions after the OS is configured for new disks.
- Avoid grouping several LUNs to form a VMFS because the different environments cannot be separated (the production environment, test environment, and templates), which increases the risk of contentions, with more frequent reservations.
- Avoid creating one VMFS per VM because it increases the number of LUNs and makes management more complex while limiting expansions to 256 LUNs or 256 VMs.