Skip to the content.

glusterfs集群故障恢复

glusterfs集群故障恢复 以2台主机最简单的副本方式举例,安装步骤如下,先配置hosts,主机名,安装gluster服务

OS: CentOS 7.4
cat > /etc/hosts <<EOF
192.168.0.198 master
192.168.0.199 slave
EOF
hostnamectl --static set-hostname master/slave
exec $SHELL
yum install centos-release-gluster -y
yum install -y glusterfs glusterfs-server glusterfs-fuse glusterfs-rdma glusterfs-geo-replication glusterfs-devel
systemctl start glusterd && systemctl enable glusterd && systemctl status glusterd

添加其他节点,创建volume并启动

gluster peer probe slave
mkdir /data
gluster peer status
gluster volume create gfs-volume replica 2 transport tcp master:/data slave:/data force
ls -a /data/
gluster volume info
gluster volume start gfs-volume
gluster volume info gfs-volume

其他主机挂载gfs

yum install centos-release-gluster -y
yum install -y glusterfs-fuse

cat /etc/hosts
192.168.1.1 master

cat > /etc/fstab <<EOF
master:gfs-volume /opt glusterfs defaults,_netdev 0 0
EOF
mount -a

假如slave主机挂掉,并且不能启动,有2种方式可以恢复, 1,再创建一台主机,保持主机名和ip同故障主机一致 2,新创建一台主机,替换故障的主机

方案1,如果还是用192.168.0.199 这个ip作为slave节点,先安装glusterfs服务,不用启动 既然slave挂了,那么先要找到slave主机gfs的uuid,在正常的节点查看,这里是在master节点

# cat /var/lib/glusterd/peers/*
uuid=090f4559-e1a4-43ed-8a3d-2edd4042ce50
state=3
hostname1=slave

在挂掉的slave主机编辑gfs配置后,启动gfs服务

# cat /var/lib/glusterd/glusterd.info 
UUID=090f4559-e1a4-43ed-8a3d-2edd4042ce50
operating-version=1
# systemctl start glusterd

加入集群,检查状态,如果不ok,那么多重启几次glusterfs

gluster peer status
gluster peer probe master
gluster peer status
systemctl restart glusterd

同步数据

gluster volume info
gluster volume sync master all
cat /var/lib/glusterd/glusterd.info 
查看 operating-version 这个数值已经变动

查看状态

gluster volume heal gfs-volume info

方案2,替代故障主机

创建主机,主机名比如为 three,ip为 192.168.0.200,在正常运行的gfs主机添加新主机,替换故障主机

gluster peer probe three
gluster volume replace-brick gfs-volume slave:data three:/data commit force

查看状态

gluster volume heal gfs-volume info

因为gfs模式为replica ,到这里也就结束了,如果是Distributed ,需要rebalance

gluster volume rebalance gfs-volume fix-layout start

附: 移除一个gfs节点Remove a brick

gluster volume remove-brick gfs-volume replica 1 slave:/data start

移除slave主机的数据,只保留1份,即master 查看移除的状态

gluster volume remove-brick gfs-volume replica 1 slave:/data status

添加一个副本

gluster volume add-brick gfs-volume replica 2 slave:/data

参考文档

Managing GlusterFS Volumes GlusterFS Architecture

2018年08月06日 于 linux工匠 发表