Adjusting Round Robin IOPS limit from default 1000 to 1 (2069356)

 Symptoms
ESXi hosts using iSCSI/FC/FC/FCoE storage experiences latency issues with no signs of latency on the SAN side.
 Resolution
ESXi Round Robin PSP (Path Selection Plug-in) uses a round-robin algorithm to balance the load across all active storage paths. A path is selected and used until a specific quantity of data has been transferred. After that quantity is reached, the PSP selects the next path in the list.
The quantity at which a path change triggered is known as the limit.
ESXi Round Robin PSP supports two types of limits:
  • IOPS limit: The Round Robin PSP defaults to an IOPS limit with a value of 1000. In this default case, a new path is used after 1000 I/O operations are issued.
  • Bytes limit: The bytes limit is an alternative to the IOPS limit. The bytes limit allows for a specified amount of bytes to be transferred before the path is switched.
Adjusting the limit can provide a positive impact to performance and is recommended by some storage vendors to change IOPS limit to 1.
Example:
The default of 1000 input/output operations per second (IOPS) sends 1000 I/O down each path before switching. If the load is such that a portion of the 1000 IOPS can saturate the bandwidth of the path, the remaining I/O must wait even if the storage array could service the requests. The IOPS or bytes limit can be adjusted downward allowing the path to be switched at a more frequent rate. The adjustment allows the bandwidth of additional paths to be used while the other path is currently saturated.
Adjusting the IOPS parameter
To adjust the IOPS parameter from the default 1000 to 1, run this command:

In ESXi 6.x:

for i in `esxcfg-scsidevs -c |awk ‘{print $1}’ | grep naa.xxxx`; do esxcli storage nmp psp roundrobin deviceconfig set –type=iops –iops=1 –device=$i; done

Where, .xxxx matches the first few characters of your naa IDs.

To verify if the changes are applied, run this command:

esxcli storage nmp device list

You see output similar to:
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config: {policy=iops,iops=1,bytes=10485760,useANO=0;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}
Path Selection Policy Device Custom Config:
Working Paths: vmhba33:C1:T4:L0, vmhba33:C0:T4:L0
Note: You do not need to restart the host for the changes to take effect.
 Related Information
When an ESXi hosts cluster contains Raw Device Mappings (RDMs) that are used for Microsoft Failover Clustering, SAN port congestion matters. Adjusting the Round Robin limit value to 200 IOPS may provide better failover results.
Changing IOPS from the default to a lower value with the Path Selection Policy Round Robin option is a performance tuning setting and can help load balance across the storage processors and HBA paths in a storage array. With IOPS=1000, 1000 I/Os are sent down a path before alternating to the next path. You could potentially run into a scenario were all LUNs could be sending 1000 I/Os down the same path at the same time leaving one Storage Processor in the array processing the I/O, while the other one(s) are in an idle state. By alternating the path after one I/O, you get a better balance of I/O across the paths and thus storage processors. You get even balance of I/Os across the paths. If saturation occurs, it is very short in duration and gives the Storage Process or more time to process the I/O.
You may automate this with PowerCLI to change IOPS limit per cluster for every LUN that has Round Robin multipathing policy. For more information, see the attached 2069356_PowerCLI_-_change_RR_IOPS_per_cluster.txt file.
Note: Consult your storage vendor to recommend if this value should be changed in your environment.