How to Create a Highly Available NFS Service with Gluster and Oracle Linux 8
Introduction
In this tutorial , we will create an NFS service hosted by three instances: ol-node01, ol-node02, and ol-node03. These instances will replicate a Gluster volume for data redundancy and use clustering tools for service redundancy.
A fourth instance named ol-client will mount this NFS service for demonstration and testing.
This tutorial is targeted at Oracle Linux 8 users.
Components
- Corosync provides clustering infrastructure to manage which nodes are involved, their communication, and quorum.
- Pacemaker manages cluster resources and rules of their behavior.
- Gluster is a scalable and distributed filesystem.
- Ganesha is an NFS server that can use many different backing filesystem types, including Gluster.
Objectives
In this tutorial , you’ll learn to:
- Create a Gluster volume
- Configure Ganesha
- Create a Cluster
- Create Cluster services
Prerequisites
- Four Oracle Linux 8 instances installed with the following configuration:
- a non-root user with
sudo
permissions - ssh keypair for the non-root user
- ability to ssh from one host (ol-node01) to the others (ol-node02,ol-node03) using passwordless ssh login
- additional block volume for use with gluster
Setup Lab Environment
If not already connected, open a terminal and connect via ssh to each instance mentioned above.
ssh oracle@<ip_address_of_instance>
Install Software
Enable the required Oracle Linux repositories before installing the Corosync, Ganesha, Gluster, and Pacemaker software.
(On all nodes) Install the Gluster yum repository configuration.
sudo dnf install -y oracle-gluster-release-el8
(On all nodes) Enable the repositories.
sudo dnf config-manager --enable ol8_addons ol8_UEKR6 ol8_appstream
(On all nodes) Install the software.
sudo dnf install -y corosync glusterfs-server nfs-ganesha-gluster pacemaker pcs pcp-zeroconf fence-agents-all
Create the Gluster volume
Prepare each attached block volume to create and activate a replicated Gluster volume.
(On all nodes) Create an XFS filesystem on
/dev/sdb
with a label ofgluster-000
.sudo mkfs.xfs -f -i size=512 -L gluster-000 /dev/sdb
-f
: Forces overwriting the device when detecting an existing filesystem.-i size
: Sets the filesystem’s inode size, which defaults to a value of 256 bytes.-L
: Sets the filesystem label, which cannot exceed 12 characters in length.
(On all nodes) Create a mountpoint, add a fstab(5) entry for a disk with the label
gluster-000
, and mount the filesystem.sudo mkdir -p /data/glusterfs/sharedvol/mybrick echo 'LABEL=gluster-000 /data/glusterfs/sharedvol/mybrick xfs defaults 0 0' | sudo tee -a /etc/fstab > /dev/null sudo mount /data/glusterfs/sharedvol/mybrick
(On all nodes) Enable and start the Gluster service.
sudo systemctl enable --now glusterd
Configure the firewall to allow traffic on the ports that are specifically used by Gluster.
sudo firewall-cmd --permanent --zone=trusted --add-source=10.0.0.0/24 sudo firewall-cmd --permanent --zone=trusted --add-service=glusterfs sudo firewall-cmd --reload
(Optional) Ensure that each node has a resolvable name across all the nodes in the pool.
Configure using DNS resolution for each hostname or using the
/etc/hosts
file instead. When using thehosts
file, edit the file on each node and add entries for all Gluster nodes.The free lab environment already has name resolution configured.
(On ol-node01) Create the Gluster environment by adding peers.
sudo gluster peer probe ol-node02 sudo gluster peer probe ol-node03
(On all nodes) Show that the peers have joined the environment.
sudo gluster peer status
Example Output:
Number of Peers: 2 Hostname: ol-node02 Uuid: 2607976e-7004-47e8-821c-7c6985961cda State: Peer in Cluster (Connected) Hostname: ol-node03 Uuid: c51cb4aa-fccd-47f7-9fb2-edb5766991d2 State: Peer in Cluster (Connected)
(On ol-node01) Create a Gluster volume named
sharedvol
, which replicates across the three hosts: ol-node01, ol-node02, and ol-node03.sudo gluster volume create sharedvol replica 3 ol-node0{1,2,3}:/data/glusterfs/sharedvol/mybrick/brick
For more details on volume types, see the Creating and Managing Volumes section of the Oracle Linux Gluster Storage documentation.
(On ol-node01) Enable the
sharedvol
Gluster volume.sudo gluster volume start sharedvol
(On ol-node01) Verify that the replicated Gluster volume is now available from any node.
sudo gluster volume info
Example Output:
Volume Name: sharedvol Type: Replicate Volume ID: 1608bc61-cd4e-4b64-a5f3-f5800b717f76 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ol-node01:/data/glusterfs/sharedvol/mybrick/brick Brick2: ol-node02:/data/glusterfs/sharedvol/mybrick/brick Brick3: ol-node03:/data/glusterfs/sharedvol/mybrick/brick Options Reconfigured: storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off
(On ol-node01) Get the status of the Gluster volume.
sudo gluster volume status
Example Output:
Status of volume: sharedvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ol-node01:/data/glusterfs/sharedvol/m ybrick/brick 49152 0 Y 78082 Brick ol-node02:/data/glusterfs/sharedvol/m ybrick/brick 49152 0 Y 77832 Brick ol-node03:/data/glusterfs/sharedvol/m ybrick/brick 49152 0 Y 77851 Self-heal Daemon on localhost N/A N/A Y 78099 Self-heal Daemon on ol-node02 N/A N/A Y 77849 Self-heal Daemon on ol-node03 N/A N/A Y 77868 Task Status of Volume sharedvol ------------------------------------------------------------------------------ There are no active volume tasks
Configure Ganesha
Ganesha is the NFS server that shares out the Gluster volume. In this example, we allow any NFS client to connect to our NFS share with read/write permissions.
(On all nodes) Populate the file
/etc/ganesha/ganesha.conf
with the given configuration.sudo tee /etc/ganesha/ganesha.conf > /dev/null <<'EOF' EXPORT{ Export_Id = 1 ; # Unique identifier for each EXPORT (share) Path = "/sharedvol"; # Export path of our NFS share FSAL { name = GLUSTER; # Backing type is Gluster hostname = "localhost"; # Hostname of Gluster server volume = "sharedvol"; # The name of our Gluster volume } Access_type = RW; # Export access permissions Squash = No_root_squash; # Control NFS root squashing Disable_ACL = FALSE; # Enable NFSv4 ACLs Pseudo = "/sharedvol"; # NFSv4 pseudo path for our NFS share Protocols = "3","4" ; # NFS protocols supported Transports = "UDP","TCP" ; # Transport protocols supported SecType = "sys"; # NFS Security flavors supported } EOF
For more options to control permissions, see the EXPORT {CLIENT{}} section of config_samples-export in the Additional Information section.
Create a Cluster
Create and start a Pacemaker/Corosync cluster using the three ol-nodes.
(On all nodes) Set a shared password for the user
hacluster
.echo "hacluster:oracle" | sudo chpasswd
(On all nodes) Enable the Corosync and Pacemaker services.
sudo systemctl enable corosync sudo systemctl enable pacemaker
(On all nodes) Enable and start the configuration system service.
sudo systemctl enable --now pcsd
(On all nodes) Configure the firewall to allow traffic on the ports that are specifically used by High Availability.
sudo firewall-cmd --permanent --zone=trusted --add-service=high-availability sudo firewall-cmd --reload
(On ol-node01) Authenticate with all cluster nodes using the
hacluster
user and password defined above.sudo pcs host auth ol-node01 ol-node02 ol-node03 -u hacluster -p oracle
(On ol-node01) Create a cluster named
HA-NFS
.sudo pcs cluster setup HA-NFS ol-node01 ol-node02 ol-node03
(On ol-node01) Start the cluster on all nodes
sudo pcs cluster start --all
(On ol-node01) Enable the cluster to run on all nodes at boot time.
sudo pcs cluster enable --all
(On ol-node01) Disable STONITH
STONITH is a feature of Linux for maintaining the integrity of nodes in a high-availability (HA) cluster. STONITH automatically powers down, or fences, a node that is not working correctly. An administrator may utilize STONITH if one of the nodes in a cluster is unreachable by the other node(s) in the cluster.
STONITH is disabled for simplicity in the lab, but setting to disabled is not recommended for production.
sudo pcs property set stonith-enabled=false
(On any node) Check the cluster status.
The cluster is now running.
sudo pcs cluster status
Example Output:
Cluster Status: Cluster Summary: * Stack: corosync * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum * Last updated: Wed May 4 16:47:55 2022 * Last change: Wed May 4 16:47:47 2022 by hacluster via crmd on ol-node03 * 3 nodes configured * 0 resource instances configured Node List: * Online: [ ol-node01 ol-node02 ol-node03 ] PCSD Status: ol-node01: Online ol-node03: Online ol-node02: Online
(On any node) Check the cluster’s details, including resources, pacemaker status, and node details.
sudo pcs status
Example Output:
Cluster name: HA-NFS Cluster Summary: * Stack: corosync * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum * Last updated: Wed May 4 16:50:21 2022 * Last change: Wed May 4 16:47:47 2022 by hacluster via crmd on ol-node03 * 3 nodes configured * 0 resource instances configured Node List: * Online: [ ol-node01 ol-node02 ol-node03 ] Full List of Resources: * No resources Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
Create Cluster Services
Create a Pacemaker resource group containing the resources necessary to host NFS services from the hostname nfs
(10.0.0.100) defined as a floating secondary IP address on ol-node01.
(On all nodes) Configure the firewall to allow traffic on the ports that are specifically used by NFS.
sudo firewall-cmd --permanent --zone=trusted --add-service=nfs sudo firewall-cmd --reload
(On ol-node01) Create a systemd based cluster resource to ensure
nfs-ganesha
is running.sudo pcs resource create nfs_server systemd:nfs-ganesha op monitor interval=10s
(On ol-node01) Create an IP cluster resource used to present the NFS server.
sudo pcs resource create nfs_ip ocf:heartbeat:IPaddr2 ip=10.0.0.100 cidr_netmask=24 op monitor interval=10s
(On ol-node01) Join the Ganesha service and IP resource in a group to ensure they remain together on the same host.
sudo pcs resource group add nfs_group nfs_server nfs_ip
(On ol-node01) Verify service is now running.
sudo pcs status
Example Output:
Cluster name: HA-NFS Cluster Summary: * Stack: corosync * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum * Last updated: Wed May 4 16:52:56 2022 * Last change: Wed May 4 16:52:39 2022 by root via cibadmin on ol-node01 * 3 nodes configured * 2 resource instances configured Node List: * Online: [ ol-node01 ol-node02 ol-node03 ] Full List of Resources: * Resource Group: nfs_group: * nfs_server (systemd:nfs-ganesha): Started ol-node01 * nfs_ip (ocf::heartbeat:IPaddr2): Started ol-node01 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
Note: The DC (Designated Controller) node is where all the decisions get made, and if the current DC fails, corosync elects a new one from the remaining cluster nodes. The choice of DC is of no significance to an administrator beyond the fact that its logs will generally be more interesting.
Update the IPaddr2 library configuration
When a node in the cluster does not respond for some reason, Pacemaker and Corosync will make a call to the IPaddr2 library.
We will customize this library to include details of our deployment (such as the VNIC OCIDs, and IP Addresses), and it will utilize those details when it calls the Oracle Command Line Interface (CLI). The CLI will do the heavy lifting by asking the OCI Console to migrate the Secondary IP Address from one node to the other.
(On all nodes) Install the Oracle Linux Developer repository.
sudo dnf install -y oraclelinux-developer-release-el8
The repository is already installed and available in the free lab environment.
(On all nodes) Install the OCI CLI.
sudo dnf install -y python36-oci-cli
(On ol-node01) Verify the OCI CLI install.
The free lab environment uses Instance Principal for the authorization of the OCI CLI. For self deployments, configure the same or set up the OCI CLI configuration file.
export LC_ALL=C.UTF-8 oci os ns get --auth instance_principal
(On all nodes) Make a back up of the IPaddr2 file.
sudo cp /usr/lib/ocf/resource.d/heartbeat/IPaddr2 /usr/lib/ocf/resource.d/heartbeat/IPaddr2.bak
(On all nodes) Run script to update IPaddr2 file.
The script makes the changes within the “add_interface()” function. The reason for making the changes there is once a node fails, Corosync/Pacemaker will run IPaddr2 and move the resource(s) to another node(s) in the cluster. IPaddr2 calls this function during this process.
sudo ./update-ipaddr2.sh
Here is a sample version of the script for reference.
Test NFS availability using a client
If not already open and connected, we recommend opening two terminal windows for these steps as we test failover with ol-node01 and ol-client.
If not already connected, open a terminal and connect via ssh to ol-node01 and ol-client system.
ssh oracle@<ip_address_of_instance>
(On ol-client) Mount the NFS service provided by our cluster and create a file.
sudo dnf install -y nfs-utils sudo mkdir /sharedvol sudo mount -t nfs nfs:/sharedvol /sharedvol df -h /sharedvol/ echo "Hello from Oracle CloudWorld" | sudo tee /sharedvol/hello > /dev/null
(On ol-node01) Identify the host running the nfs_group resources and put it in standby mode to stop running services.
sudo pcs status
Example Output:
Cluster name: HA-NFS Cluster Summary: * Stack: corosync * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum * Last updated: Thu May 5 00:48:07 2022 * Last change: Thu May 5 00:47:50 2022 by root via crm_resource on ol-node01 * 3 nodes configured * 2 resource instances configured Node List: * Online: [ ol-node01 ol-node02 ol-node03 ] Full List of Resources: * Resource Group: nfs_group: * nfs_server (systemd:nfs-ganesha): Started ol-node01 * nfs_ip (ocf::heartbeat:IPaddr2): Started ol-node01 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
sudo pcs node standby ol-node01
(On ol-node01) Verify that the nfs_group resources have moved to another node.
sudo pcs status
Example Output:
Cluster name: HA-NFS Cluster Summary: * Stack: corosync * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum * Last updated: Thu May 5 00:53:19 2022 * Last change: Thu May 5 00:53:08 2022 by root via cibadmin on ol-node01 * 3 nodes configured * 2 resource instances configured Node List: * Node ol-node01: standby * Online: [ ol-node02 ol-node03 ] Full List of Resources: * Resource Group: nfs_group: * nfs_server (systemd:nfs-ganesha): Started ol-node02 * nfs_ip (ocf::heartbeat:IPaddr2): Started ol-node02 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
(On ol-node02) Verify the floating IP address moved from ol-node01 to ol-node02.
ip a
Example Output:
... 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP group default qlen 1000 link/ether 02:00:17:06:6a:dd brd ff:ff:ff:ff:ff:ff inet 10.0.0.151/24 brd 10.0.0.255 scope global dynamic ens3 valid_lft 83957sec preferred_lft 83957sec inet 10.0.0.100/24 brd 10.0.0.255 scope global secondary ens3 valid_lft forever preferred_lft forever inet6 fe80::17ff:fe06:6add/64 scope link valid_lft forever preferred_lft forever
(On ol-client) Verify the file is still accessible
This action has a short delay as the service moves from one node to another.
sudo ls -la /sharedvol/ sudo cat /sharedvol/hello
(On ol-node01) Bring the standby node back into the cluster.
sudo pcs node unstandby ol-node01
(On ol-node01) Verify that the node is back in the cluster.
sudo pcs status
(On ol-node01) Move resources back to ol-node01.
sudo pcs resource move nfs_ip ol-node01
(On ol-node01) Verify that the resources moved back to ol-node01.
sudo pcs status
(On ol-node01) Verify the floating IP address moved from ol-node02 to ol-node01.
ip a
We now understand how to use Pacemaker/Corosync to create highly available services backed by Gluster.
Enable Gluster encryption
Create a self-signed certificate for each node and have its peers trust it.
For more options, see Setting up Transport Layer Security in the Gluster Storage for Oracle Linux User’s Guide
(On all nodes) Create a private key and create a certificate for this host signed with this key.
sudo openssl genrsa -out /etc/ssl/glusterfs.key 2048 sudo openssl req -new -x509 -days 365 -key /etc/ssl/glusterfs.key \ -out /etc/ssl/glusterfs.pem \ -subj "/CN=${HOSTNAME}/"
(On ol-node01) Combine the certificate from each node into one file all nodes can trust.
cat /etc/ssl/glusterfs.pem > ~/combined.ca.pem
ssh ol-node02 cat /etc/ssl/glusterfs.pem >> ~/combined.ca.pem
ssh ol-node03 cat /etc/ssl/glusterfs.pem >> ~/combined.ca.pem
(On ol-node01) Copy the combined list of trusted certificates to the local system of each node for Gluster use.
sudo cp ~/combined.ca.pem /etc/ssl/glusterfs.ca
scp ~/combined.ca.pem ol-node02:~
scp ~/combined.ca.pem ol-node03:~
ssh -t ol-node02 sudo cp ~/combined.ca.pem /etc/ssl/glusterfs.ca > /dev/null 2>&1
ssh -t ol-node03 sudo cp ~/combined.ca.pem /etc/ssl/glusterfs.ca > /dev/null 2>&1
- The
-t
option allows running remote ssh commands withsudo
.
- The
(On all nodes) Enable encryption for Gluster management traffic.
sudo touch /var/lib/glusterd/secure-access
(On ol-node01) Enable encryption on the Gluster volume
sharedvol
.sudo gluster volume set sharedvol client.ssl on sudo gluster volume set sharedvol server.ssl on
(On all nodes) Restart the Gluster service.
sudo systemctl restart glusterd
Verify the Gluster volume has transport encryption enabled.
sudo gluster volume info
Example Output:
Volume Name: sharedvol Type: Replicate Volume ID: 674b73a8-8c09-457e-8996-4417db16651e Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ol-node01:/data/glusterfs/sharedvol/mybrick/brick Brick2: ol-node02:/data/glusterfs/sharedvol/mybrick/brick Brick3: ol-node03:/data/glusterfs/sharedvol/mybrick/brick Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet storage.fips-mode-rchecksum: on client.ssl: on server.ssl: on
No comments