What is File Placement Optimizer (FPO) ?
GPFS File Placement Optimizer (FPO) is a set of features that allow GPFS to operate efficiently in a system based
on a shared nothing architecture. It is particularly useful for "big
data" applications that process massive amounts of data.
Why this post ?
Spectrum Scale 4.2 installer toolkit does not support FPO configuration enabling at the time of installation. This blog will provide you step by step guide for configuring FPO setup with spectrum scale installer toolkit.
Where to start ?
Let's start with extracting and configuring spectrum scale installer toolkit similar to the regular setup. Here are details of my setup which I had given to spectrum scale installer toolkit. If you are looking for more help about spectrum scale installer toolkit then you will find it here - Overview of the spectrumscale installation toolkit
[root@viknode1 installer]# ./spectrumscale node list [ INFO ] List of nodes in current configuration: [ INFO ] [Installer Node] [ INFO ] 10.0.100.71 [ INFO ] [ INFO ] [Cluster Name] [ INFO ] vwnode.gpfscluster [ INFO ] [ INFO ] [Protocols] [ INFO ] Object : Enabled [ INFO ] SMB : Enabled [ INFO ] NFS : Enabled [ INFO ] [ INFO ] GPFS Node Admin Quorum Manager NSD Server Protocol GUI Server [ INFO ] viknode1 X X X [ INFO ] viknode2 X [ INFO ] viknode3 X [ INFO ] viknode4 X X [ INFO ] viknode5 X X [ INFO ] [ INFO ] [Export IP address] [ INFO ] 10.0.100.76 (pool) [ INFO ] 10.0.100.77 (pool) [root@viknode1 installer]# ./spectrumscale nsd list [ INFO ] Name FS Size(GB) Usage FG Pool Device Servers [ INFO ] nsd1 cesSharedRoot unknown dataAndMetadata 1 Default /dev/dm-2 [viknode1]
Here I have added one NSD which will be required by cesSharedRoot filesystem.
The CES shared root (cesSharedRoot) is needed for storing CES shared configuration data, protocol recovery, and for some other protocol specific purpose.
Here is a high level diagram for this setup -
The CES shared root (cesSharedRoot) is needed for storing CES shared configuration data, protocol recovery, and for some other protocol specific purpose.
Here is a high level diagram for this setup -
(Click on diagram to enlarge)
Let's run
install command to install basic GPFS packages and GPFS commands.[root@viknode1 installer]# ./spectrumscale install
Configuring NSDs for FPO
Configuring NSDs is more or less everything about FPO. According to IBM's official documentation, it is recommended that GPFS FPO configuration has two storage pools, a system pool for metadata only and a data pool. On my setup I will be creating three storage pools. A fast storage pool and a slow storage pool and a system storage pool. Fast storage pool, let's say, have all SSDs and other fast disks; a slow storage pool, let's say, have all HDDs and other slow disks; and a pool named 'system' for storing metadata.
- Storage pool:
- Storage pool stanzas are used to specify the type of layout map and write affinity depth, and to enable write affinity, for each storage pool.
- Storage pool stanzas have the following format:
%pool: pool=StoragePoolName #
name of the storage pool. blockSize=BlockSize #
the block size of the disks in the storage pool. usage={dataOnly | metadataOnly | dataAndMetadata} #
the type of data to be
stored in the storage pool.layoutMap={scatter | cluster} #
The block allocation map type cannot be changed after the
storage pool has been created.allowWriteAffinity={yes | no} #
Indicates whether the IBM Spectrum Scale File Placement
Optimizer (FPO) feature is to be enabled for the storage pool.writeAffinityDepth={0 | 1 | 2} #
Specifies the allocation policy to be used by the node writing
the data. It is also used for FPO-enabled pools.blockGroupFactor=BlockGroupFactor #
Specifies how many file system blocks are laid out sequentially on disk to behave like a single large block. This option only works on FPO enabled pools, where --allow-write-affinity is set for the data pool.
For more details check Planning for IBM Spectrum Scale FPO
NSD:
Every local disk to be used by GPFS must have a matching entry in the disk file.
NSD stanzas have this format:- Storage pool stanzas have the following format:
%nsd:
device=DiskName #
device name that appears in
/devnsd=NsdName #
name of the NSD to be created servers=ServerList #
comma-separated list of NSD server nodes usage={dataOnly | metadataOnly | dataAndMetadata | descOnly | localCache} #
disk usage failureGroup=FailureGroup #
the failure group to which this disk belongs pool=StoragePool #
the name of the storage pool to which the NSD is assigned
[root@viknode1 ~]# ls /dev/dm-3 /dev/dm-3 [root@viknode2 ~]# ls /dev/dm-4 /dev/dm-4 [root@viknode3 ~]# ls /dev/dm-5 /dev/dm-5
[root@viknode1 ~]# cat /tmp/newStanzaFile %pool: pool=fast layoutMap=cluster blocksize=1024K allowWriteAffinity=yes # this option enables FPO feature writeAffinityDepth=1 # place 1st copy on disks local to the node writing data blockGroupFactor=128 # Defines chunk size of 128MB %pool: pool=slow layoutMap=cluster blocksize=1024K allowWriteAffinity=yes # this option enables FPO feature writeAffinityDepth=1 # place 1st copy on disks local to the node writing data blockGroupFactor=128 # Defines chunk size of 128MB #Disks in system pool are defined for metadata %nsd: nsd=nsd1 device=/dev/dm-3 servers=viknode1 usage=metadataOnly failureGroup=101 pool=system # Disks in fast pool %nsd: nsd=nsd2 device=/dev/dm-4 servers=viknode2 usage=dataOnly failureGroup=102 pool=fast # Disk(s) in slow pool %nsd: nsd=nsd3 device=/dev/dm-5 servers=viknode3 usage=dataOnly failureGroup=103 pool=slow
Here, I have three pools -
1) System pool - Created by default by installer toolkit. I will use it to store metadata.
2) Fast pool - For fast disks. Use to store data.
3) Slow pool - For slow disks. Use to store data.
Lets create these NSDs
[root@viknode1 ~]# mmcrnsd -F /tmp/newStanzaFile
Creating NSDs is async process.
After NSDs are created you can check them using mmlsnsd command.
[root@viknode1 ~]# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- (free disk) nsd1 viknode1 (free disk) nsd2 viknode1 (free disk) nsd3 viknode2 (free disk) nsd4 viknode3
Now we are going to create a gpfs file system on these NSDs. I am going with all default parameters but you can tune the parameters as per your requirement. Here is guide to mmcrfs command.
[root@viknode1 ~]# mmcrfs gpfs0 -F /tmp/newStanzaFile -T /ibm/gpfs0
Ones file system is created then you can check it with mmlsfs command.
[root@viknode1 installer]# mmlsfs all File system attributes for /dev/gpfs0: ====================================== flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment size in bytes (system pool) 32768 Minimum fragment size in bytes (other pools) -i 4096 Inode size in bytes -I 16384 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j cluster Block allocation type -D nfs4 File locking semantics in effect -k nfs4 ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 262144 Block size (system pool) 1048576 Block size (other pools) -Q none Quotas accounting enabled none Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 15.01 (4.2.0.0) File system version --create-time Thu Apr 7 08:06:30 2016 File system creation time -z No Is DMAPI enabled? -L 4194304 Logfile size -E Yes Exact mtime mount option -S No Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 65792 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) -P system;fast;slow Disk storage pools in file system -d nsd2;nsd3;nsd4 Disks in file system -A yes Automatic mount option -o none Additional mount options -T /ibm/gpfs0 Default mount point --mount-priority 0 Mount priority
You can check storage pools with mmlspool command.
[root@viknode1 installer]# mmlspool gpfs0 all -L Pool: name = system poolID = 0 blockSize = 256 KB usage = metadataOnly maxDiskSize = 98 GB layoutMap = cluster allowWriteAffinity = no writeAffinityDepth = 0 blockGroupFactor = 1 Pool: name = fast poolID = 65537 blockSize = 1024 KB usage = dataOnly maxDiskSize = 64 GB layoutMap = cluster allowWriteAffinity = yes writeAffinityDepth = 1 blockGroupFactor = 128 Pool: name = slow poolID = 65538 blockSize = 1024 KB usage = dataOnly maxDiskSize = 64 GB layoutMap = cluster allowWriteAffinity = yes writeAffinityDepth = 1 blockGroupFactor = 128
'allowWriteAffinity = yes' in above output shows disks in pool are enabled for FPO.
Let's mount this file system on all nodes.
[root@viknode1 ~]# mmmount gpfs0 -a Wed Mar 16 10:40:42 EDT 2016: mmmount: Mounting file systems ... [root@viknode1 ~]# mmlsmount gpfs0 File system gpfs0 is mounted on 5 nodes.
Enable protocols as per your requirement.
Don't forget to mention correct filesystem and mount point for deploying protocols.
[root@viknode1 installer]# ./spectrumscale node list [ INFO ] List of nodes in current configuration: [ INFO ] [Installer Node] [ INFO ] 10.0.100.71 [ INFO ] [ INFO ] [Cluster Name] [ INFO ] vwnode.gpfscluster [ INFO ] [ INFO ] [Protocols] [ INFO ] Object : Enabled [ INFO ] SMB : Enabled [ INFO ] NFS : Enabled [ INFO ] [ INFO ] GPFS Node Admin Quorum Manager NSD Server Protocol GUI Server [ INFO ] viknode1 X X X [ INFO ] viknode2 X X [ INFO ] viknode3 X X [ INFO ] viknode4 X X [ INFO ] viknode5 X X [ INFO ] [ INFO ] [Export IP address] [ INFO ] 10.0.100.76 (pool) [ INFO ] 10.0.100.77 (pool)
[root@viknode1 installer]# ./spectrumscale config protocols -f cesSharedRoot -m /ibm/cesSharedRoot [root@viknode1 installer]# ./spectrumscale config object -f gpfs0 -m /ibm/gpfs0
Now you can deploy protocols and your setup will be ready with FPO.
[root@viknode1 installer]# ./spectrumscale deploy
For more details here are recommended videos -
Spectrum Scale (GPFS) for Hadoop Technical Introduction (Part 1 of 2)
Spectrum Scale (GPFS) for Hadoop Technical Introduction (Part 2 of 2)
Hi, those node are VMware? and the disk is RDM disk? can I used VMDK to setup FPO?
ReplyDeleteYes. Those nodes were virtual machines in our development environment.
DeleteSpectrum Scale do not support VMDK disk. One has to use RDM please refer FAQ 7.3 on IBM Knowledge Center for more details.
https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#virtual