GlusterFS

k8s集群中，怎么也会遇到数据持久化的问题，需要用到PV和PVC，把数据存储到一个固定的节点上。

GlusterFS 是一个可伸缩的分布式文件存储方案，Kubernetes 的StorageClass支持的驱动中，推荐使用GlusterFS

https://kubernetes.io/docs/concepts/storage/storage-classes/

开始安装

GlusterFS官方文档：https://docs.gluster.org/en/latest/Quick-Start-Guide/Quickstart/

GlusterFS+Kubernetes官方文档：https://github.com/gluster/gluster-kubernetes#quickstart

因为文件系统功能简单，使用相对稳定，一次部署，基本上不会再做删减，因此，我们没有必要用Kubernetes去部署GlusterFS，手工部署反而更简单，依赖少。

安装前的准备

最少两台服务器，用replication模式，一主一备，防止数据库丢失，高可用。每台服务器2CPU，2G内存，100G SSD硬盘，网络带宽1G以上，安装Ubuntu 16.04

安装

sudo apt-get install software-properties-common
sudo apt-get update
sudo apt-get install glusterfs-server -y

配置

我们有两台服务器：10.103.1.11和10.103.1.119，在10.103.1.11上执行命令，连接10.103.1.119，组成集群

sudo gluster peer probe 10.103.1.119
sudo gluster peer status

Number of Peers: 1

Hostname: 10.103.1.119
Uuid: 60fedfe2-cb44-41f7-9b3d-cf2901793a85
State: Peer in Cluster (Connected)

格式化两台服务器的数据盘，自动挂载

mkfs.xfs -i size=512 -f /dev/vdb
# 设置开机自动挂载磁盘
echo "/dev/vdb /export/vdb xfs defaults 0 0"  >> /etc/fstab
# 创建目录
mkdir -p /export/vdb && mount -a && mkdir -p /export/vdb/brick1

添加第一个卷

在10.103.1.11上执行

gluster volume create gv0 replica 2 10.103.1.119:/export/vdb/brick1 10.103.1.11:/export/vdb/brick1

gluster volume info查看券状态

root@10-103-1-11:~# gluster volume info

Volume Name: gv0
Type: Replicate
Volume ID: 5faab39a-c2d4-47b8-a612-29f6d53616cf
Status: Created
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.103.1.119:/export/vdb/brick1
Brick2: 10.103.1.11:/export/vdb/brick1
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

启动卷：

root@10-103-1-11:~# gluster volume start gv0
volume start: gv0: success
root@10-103-1-11:~# gluster volume info

Volume Name: gv0
Type: Replicate
Volume ID: 5faab39a-c2d4-47b8-a612-29f6d53616cf
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.103.1.119:/export/vdb/brick1
Brick2: 10.103.1.11:/export/vdb/brick1
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

停止卷

root@10-103-1-11:~# gluster
gluster> volume stop kube-vol
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: kube-vol: success

删除卷

gluster> volume delete kube-vol
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
volume delete: kube-vol: success

客户端挂载卷

卷可以像普通nfs一样通过mount挂载，只需要安装glusterfs支持即可。

这里需要一天新的客户端电脑，安装glusterfs客户端：

sudo apt-get install software-properties-common
sudo add-apt-repository ppa:gluster/glusterfs-3.8
sudo apt-get update
sudo apt-get install glusterfs-client

挂载卷

sudo mkdir /mnt/my-vol
sudo mount -t glusterfs 10.103.1.11:/gv0/dir1 /mnt/my-vol
df -h /mnt/my-vol

root@10-10-61-207:~# df -h /mnt/my-vol
Filesystem        Size  Used Avail Use% Mounted on
10.103.1.11:/gv0  100G   33M  100G   1% /mnt/my-vol

开启磁盘配额

在服务端开启配额

gluster volume quota gv0 enable

在服务端设置一个目录的磁盘配额

gluster volume quota gv0 limit-usage / 10GB

客户端查看配额

root@10-10-61-207:~# df -h /mnt/my-vol
Filesystem        Size  Used Avail Use% Mounted on
10.103.1.11:/gv0   20G     0   20G   0% /mnt/my-vol

Kubernetes使用GlusterFS

K8s官方文档https://github.com/kubernetes/examples/tree/master/staging/volumes/glusterfs

添加entrypoint

在gluster服务器上查看服务端口

root@10-103-1-11:~# netstat -nptl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      978/sshd
tcp        0      0 0.0.0.0:49157           0.0.0.0:*               LISTEN      1324/glusterfsd
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      1144/glusterd
tcp6       0      0 :::22                   :::*                    LISTEN      978/sshd

我们进程glusterd对应的端口24007就是服务端口

开始创建entryppint

gluster-endpoints.yaml

---
kind: Endpoints
apiVersion: v1
metadata:
  labels:
    app: external-glusterfs
  name: glusterfs
  namespace: common
subsets:
- addresses:
  - ip: 10.103.1.11
  - ip: 10.103.1.119
  ports:
  - port: 24007

配置 service

gluster-endpoints.yaml

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: external-glusterfs
  name: glusterfs
  namespace: common
spec:
  ports:
  - port: 24007
    protocol: TCP
    targetPort: 24007

创建测试 pod

volumes[0].glusterfs.endpoints对应的值是上面创建service的名字

volumes[0].glusterfs.path是上面创建的volume名字

---
apiVersion: v1
kind: Pod
metadata:
  name: glusterfs-debug
  namespace: common
spec:
  containers:
  - name: glusterfs
    image: nginx
    volumeMounts:
    - mountPath: "/mnt/glusterfs"
      name: glusterfsvol
  volumes:
  - name: glusterfsvol
    glusterfs:
      endpoints: glusterfs
      path: gv0
      readOnly: true

查看pod状态，确认启动成功

kubectl describe pods/glusterfs

安装配置Heketi

上面的方式，适合只有很少PV需求的场景，如果有很多Pod需要很多PV，指向Glusterfs不同的卷（或者说不同的目录），那么每次都要手工创建Glusterfs Volume再创建对应的PV就会很麻烦。因此，我们需要一种自动的方式去生成所需资源。K8s支持自定义PVC的StorageClass，通过它和Heketi配合，就可以自动创建所需的存储资源，不同Deployment可以拥有独立目录。

参考文档：

gluster-kubernetes/docs/examples/dynamic_provisioning_external_gluster at master · gluster/gluster-kubernetesGitHub

GitHub - heketi/heketi: RESTful based volume management framework for GlusterFSGitHub

添加heketi配置文件：

/etc/heketi/heketi.json

{
  "_port_comment": "Heketi Server Port Number",
  "port": "8080",

  "_use_auth": "Enable JWT authorization. Please enable for deployment",
  "use_auth": false,

  "_jwt": "Private keys for access",
  "jwt": {
    "_admin": "Admin has access to all APIs",
    "admin": {
      "key": "My Secret"
    },
    "_user": "User only has access to /volumes endpoint",
    "user": {
      "key": "My Secret"
    }
  },

  "_glusterfs_comment": "GlusterFS Configuration",
  "glusterfs": {

    "_executor_comment": "Execute plugin. Possible choices: mock, ssh",
    "executor": "ssh",
    "sshexec": {
      "keyfile": "/etc/heketi/heketi_key",
      "user": "ubuntu",
      "port": "22",
      "fstab": "/etc/fstab",
      "sudo": true
    },

    "_db_comment": "Database file name",
    "db": "/var/lib/heketi/heketi.db"
  }
}

这里需要注意的glusterfs.sshexec.sudo为true，否则heketi服务会提示没有权限

启动heketi服务：为了方便调试，我们在本地用docker启动服务，生产环境，我们可能会在一台独立电脑上启动服务，或者用K8s在Pod中运行这个服务。

docker run --name heketi -v /Users/xiaohui/coding/heketi:/etc/heketi heketi/heketi

这里，我们映射了本地目录/Users/xiaohui/coding/heketi到容器的/etc/heketi

把上面的heketi.json文件放到本地目录/Users/xiaohui/coding/heketi，同时添加topylogy（Glusterfs Cluster的拓扑图）文件：

/etc/heketi/topology.json

{
  "clusters": [
    {
      "nodes": [
        {
          "node": {
            "hostnames": {
              "manage": [
                "10.103.1.11"
              ],
              "storage": [
                "10.103.1.11"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/sdb"
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "10.103.1.119"
              ],
              "storage": [
                "10.103.1.119"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/sdb"
          ]
        }
      ]
    }
  ]
}

这个文件的目的是定义Glusterfs服务器地址和磁盘。

然后新开一个终端，进入heketi容器：

docker exec -it heketi bash

当我们手工在Ubuntu上安装Glusterfs之后，用heketi-cli加载集群配置时，可能会出现如下错误：

[root@b5ff52589fc7 ~]# heketi-cli topology load --json /etc/heketi/topology.json
Creating cluster ... ID: ea58edbcd5880a426771e73d0bd5c0df
	Allowing file volumes on cluster.
	Allowing block volumes on cluster.
	Creating node 10.103.1.11 ... Unable to create node: New Node doesn't have glusterd running
	Creating node 10.103.1.119 ... Unable to create node: New Node doesn't have glusterd running

查看Heketi HTTP服务器端日志，会发现如下错误：

[negroni] Started GET /queue/a569809c6f54685276f2331aa4891e5f
[negroni] Completed 500 Internal Server Error in 103.1µs
[negroni] Started POST /nodes
[cmdexec] INFO 2018/08/04 10:29:40 Check Glusterd service status in node 10.103.1.11
[cmdexec] DEBUG 2018/08/04 10:29:41 /src/github.com/heketi/heketi/pkg/utils/ssh/ssh.go:174: Host: 10.103.1.11:22 Command: /bin/bash -c 'systemctl status glusterd'

这里可以找到原因：systemctl status glusterd没有找到服务，是因为Ubuntu安装的gluster服务名是glusterfs-server

root@10-103-1-11:~# systemctl status glusterfs-server.service
● glusterfs-server.service - LSB: GlusterFS server
   Loaded: loaded (/etc/init.d/glusterfs-server; bad; vendor preset: enabled)
  Drop-In: /etc/systemd/system/glusterfs-server.service.d
           └─override.conf
   Active: active (running) since Sat 2018-08-04 18:26:14 CST; 6min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 9703 ExecStart=/etc/init.d/glusterfs-server start (code=exited, status=0/SUCCESS)
    Tasks: 40
   Memory: 126.6M
      CPU: 1.775s

目前，Heketi写死了服务名称，无法配置 https://github.com/heketi/heketi/blob/6d80b73a799084cd28776e25857225c645756aaf/executors/cmdexec/peer.go#L72

因此，我们只有创建一个systemd别名：glusterd.service => glusterfs-server.service

方法如下：

root@10-103-1-11:~# systemctl edit glusterfs-server.service

添加内容：

[Install]
Alias=glusterd.service

获取服务的文件地址/run/systemd/generator.late/glusterfs-server.service

root@10-103-1-11:~# systemctl cat glusterfs-server.service
# /run/systemd/generator.late/glusterfs-server.service
# Automatically generated by systemd-sysv-generator

[Unit]
Documentation=man:systemd-sysv-generator(8)
SourcePath=/etc/init.d/glusterfs-server
Description=LSB: GlusterFS server
Before=multi-user.target
Before=multi-user.target
Before=multi-user.target

创建软连接：

ln -sf /run/systemd/generator.late/glusterfs-server.service /etc/systemd/system/glusterd.service

启动服务：

systemctl daemon-reload
systemctl enable glusterd
systemctl start glusterd

现在我们再次加载Glusterfs Cluster Topology 文件

heketi-cli topology load --json /etc/heketi/topology.json
[root@c63599638219 /]# heketi-cli topology load --json /etc/heketi/topology.json
	Found node 10.103.1.11 on cluster 2748c838c9e5433c140dd580ce8d92ba
		Adding device /dev/vdb ... OK
	Found node 10.103.1.119 on cluster 2748c838c9e5433c140dd580ce8d92ba
		Adding device /dev/vdb ... OK

这里需要注意，磁盘/dev/sdb必须是一个空磁盘并且没有被挂载到系统，否则会添加失败。Heketi这么做的目的也是为了保护我们的磁盘数据，万一设置了错误的磁盘，不至于丢失数据。

如果磁盘的确已经被挂载或者已经被格式化，我们需要做如下操作：

卸载磁盘
umount /dev/sdb
# 删除掉挂载信息
vi /etc/fstab
# 清除磁盘分区
wipefs /dev/vdb -a

测试heketi-cli命令

获取集群

[root@c63599638219 /]# heketi-cli cluster list
Clusters:
Id:2748c838c9e5433c140dd580ce8d92ba [file][block]

获取节点列表

[root@c63599638219 /]# heketi-cli node list
Id:783947cc37876e44c1fdee6f1690c1d7	Cluster:2748c838c9e5433c140dd580ce8d92ba
Id:e935a6a52e562477da79d6667401a505	Cluster:2748c838c9e5433c140dd580ce8d92ba

创建Volume

[root@c63599638219 /]# heketi-cli volume create --size=10 --replica=2  --name my-vol
Name: my-vol
Size: 10
Volume Id: 66f29b294df96b378548f10ee898eaf5
Cluster Id: 2748c838c9e5433c140dd580ce8d92ba
Mount: 10.103.1.119:my-vol
Mount Options: backup-volfile-servers=10.103.1.11
Block: false
Free Size: 0
Block Volumes: []
Durability Type: replicate
Distributed+Replica: 2

创建Volume时可能会出现错误“usr sbin thin_check no such file or directory”，需要在每台Glusterfs服务器上安装：apt install thin-provisioning-tools

获取卷列表

[root@c63599638219 /]# heketi-cli volume list
Id:66f29b294df96b378548f10ee898eaf5    Cluster:2748c838c9e5433c140dd580ce8d92ba    Name:my-vol

获取拓扑信息

[root@c63599638219 /]# heketi-cli topology info

Cluster Id: 2748c838c9e5433c140dd580ce8d92ba

    File:  true
    Block: true

    Volumes:

	Name: my-vol
	Size: 10
	Id: 66f29b294df96b378548f10ee898eaf5
	Cluster Id: 2748c838c9e5433c140dd580ce8d92ba
	Mount: 10.103.1.119:my-vol
	Mount Options: backup-volfile-servers=10.103.1.11
	Durability Type: replicate
	Replica: 2
	Snapshot: Disabled

		Bricks:
			Id: 44b8c2bdbd04fbeadd197e10a25701b0
			Path: /var/lib/heketi/mounts/vg_26cea42d1a71151288db4cab8477bc86/brick_44b8c2bdbd04fbeadd197e10a25701b0/brick
			Size (GiB): 10
			Node: e935a6a52e562477da79d6667401a505
			Device: 26cea42d1a71151288db4cab8477bc86

			Id: de363568400f13f28e0bb83845c549bd
			Path: /var/lib/heketi/mounts/vg_75a66dbe5b2c64db3678f052837fc9c2/brick_de363568400f13f28e0bb83845c549bd/brick
			Size (GiB): 10
			Node: 783947cc37876e44c1fdee6f1690c1d7
			Device: 75a66dbe5b2c64db3678f052837fc9c2


    Nodes:

	Node Id: 783947cc37876e44c1fdee6f1690c1d7
	State: online
	Cluster Id: 2748c838c9e5433c140dd580ce8d92ba
	Zone: 1
	Management Hostnames: 10.103.1.119
	Storage Hostnames: 10.103.1.119
	Devices:
		Id:75a66dbe5b2c64db3678f052837fc9c2   Name:/dev/vdb            State:online    Size (GiB):99      Used (GiB):10      Free (GiB):89
			Bricks:
				Id:de363568400f13f28e0bb83845c549bd   Size (GiB):10      Path: /var/lib/heketi/mounts/vg_75a66dbe5b2c64db3678f052837fc9c2/brick_de363568400f13f28e0bb83845c549bd/brick

	Node Id: e935a6a52e562477da79d6667401a505
	State: online
	Cluster Id: 2748c838c9e5433c140dd580ce8d92ba
	Zone: 1
	Management Hostnames: 10.103.1.11
	Storage Hostnames: 10.103.1.11
	Devices:
		Id:26cea42d1a71151288db4cab8477bc86   Name:/dev/vdb            State:online    Size (GiB):99      Used (GiB):10      Free (GiB):89
			Bricks:
				Id:44b8c2bdbd04fbeadd197e10a25701b0   Size (GiB):10      Path: /var/lib/heketi/mounts/vg_26cea42d1a71151288db4cab8477bc86/brick_44b8c2bdbd04fbeadd197e10a25701b0/brick
[root@c63599638219 /]#

获取磁盘信息

[root@c63599638219 /]# heketi-cli device info 26cea42d1a71151288db4cab8477bc86
Device Id: 26cea42d1a71151288db4cab8477bc86
Name: /dev/vdb
State: online
Size (GiB): 99
Used (GiB): 10
Free (GiB): 89
Bricks:
Id:44b8c2bdbd04fbeadd197e10a25701b0   Size (GiB):10      Path: /var/lib/heketi/mounts/vg_26cea42d1a71151288db4cab8477bc86/brick_44b8c2bdbd04fbeadd197e10a25701b0/brick

删除卷

[root@c63599638219 /]# heketi-cli volume delete 66f29b294df96b378548f10ee898eaf5
Volume 66f29b294df96b378548f10ee898eaf5 deleted

Previous文件系统 Next日志管理

Last updated 7 years ago