- You need a shared folder for all ES nodes. One typical setup is NFS.
1. In one ES node, setup a NFS server with a big volume.
2. From all other ES nodes, mount it as NFS client.
3. Create ES repository
4. Create ES snapshot for selective indices or all
- Umount NFS immediately, if you don’t need it. NFS service is troublesome. It might introduce unreasonable high CPU load to all nodes. Let me repeat it again: unreasonable high!
- Always check return code, before running the following procedure. Without this principle, any automation would be dangerous.
- Full backup VS Incremental backup. rsync VS scp.
Previously I need to migrate a big system from one data center to another. One major challenge is how to migrate 3TB ES cluster(10TB data) with minimum downtime?
Here is what I have done:
1. Perform first round of backup and restore. No downtime for this.
2. Run second round of backup. As an incremental backup, it's fast.
3. Use rsync to copy over TB data across WAN. Not scp.
4. Rsync from N nodes to M nodes, is faster than 1 to 1.
5. Run second round of restore in cluster2. It's relatively fast.
- Record time performance, for future reference. How large original data and backupset, how long each critical step(backup/copy/restore) takes.
# wrap curl with time command
time curl ...
# wrap rsync with time command
time rsync ...