You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

pmxcfs.adoc 7.8KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222
  1. [[chapter_pmxcfs]]
  2. ifdef::manvolnum[]
  3. pmxcfs(8)
  4. =========
  5. :pve-toplevel:
  6. NAME
  7. ----
  8. pmxcfs - Proxmox Cluster File System
  9. SYNOPSIS
  10. --------
  11. include::pmxcfs.8-synopsis.adoc[]
  12. DESCRIPTION
  13. -----------
  14. endif::manvolnum[]
  15. ifndef::manvolnum[]
  16. Proxmox Cluster File System (pmxcfs)
  17. ====================================
  18. :pve-toplevel:
  19. endif::manvolnum[]
  20. The Proxmox Cluster file system (``pmxcfs'') is a database-driven file
  21. system for storing configuration files, replicated in real time to all
  22. cluster nodes using `corosync`. We use this to store all PVE related
  23. configuration files.
  24. Although the file system stores all data inside a persistent database
  25. on disk, a copy of the data resides in RAM. That imposes restriction
  26. on the maximum size, which is currently 30MB. This is still enough to
  27. store the configuration of several thousand virtual machines.
  28. This system provides the following advantages:
  29. * seamless replication of all configuration to all nodes in real time
  30. * provides strong consistency checks to avoid duplicate VM IDs
  31. * read-only when a node loses quorum
  32. * automatic updates of the corosync cluster configuration to all nodes
  33. * includes a distributed locking mechanism
  34. POSIX Compatibility
  35. -------------------
  36. The file system is based on FUSE, so the behavior is POSIX like. But
  37. some feature are simply not implemented, because we do not need them:
  38. * you can just generate normal files and directories, but no symbolic
  39. links, ...
  40. * you can't rename non-empty directories (because this makes it easier
  41. to guarantee that VMIDs are unique).
  42. * you can't change file permissions (permissions are based on path)
  43. * `O_EXCL` creates were not atomic (like old NFS)
  44. * `O_TRUNC` creates are not atomic (FUSE restriction)
  45. File Access Rights
  46. ------------------
  47. All files and directories are owned by user `root` and have group
  48. `www-data`. Only root has write permissions, but group `www-data` can
  49. read most files. Files below the following paths:
  50. /etc/pve/priv/
  51. /etc/pve/nodes/${NAME}/priv/
  52. are only accessible by root.
  53. Technology
  54. ----------
  55. We use the http://www.corosync.org[Corosync Cluster Engine] for
  56. cluster communication, and http://www.sqlite.org[SQlite] for the
  57. database file. The file system is implemented in user space using
  58. http://fuse.sourceforge.net[FUSE].
  59. File System Layout
  60. ------------------
  61. The file system is mounted at:
  62. /etc/pve
  63. Files
  64. ~~~~~
  65. [width="100%",cols="m,d"]
  66. |=======
  67. |`corosync.conf` | Corosync cluster configuration file (previous to {pve} 4.x this file was called cluster.conf)
  68. |`storage.cfg` | {pve} storage configuration
  69. |`datacenter.cfg` | {pve} datacenter wide configuration (keyboard layout, proxy, ...)
  70. |`user.cfg` | {pve} access control configuration (users/groups/...)
  71. |`domains.cfg` | {pve} authentication domains
  72. |`status.cfg` | {pve} external metrics server configuration
  73. |`authkey.pub` | Public key used by ticket system
  74. |`pve-root-ca.pem` | Public certificate of cluster CA
  75. |`priv/shadow.cfg` | Shadow password file
  76. |`priv/authkey.key` | Private key used by ticket system
  77. |`priv/pve-root-ca.key` | Private key of cluster CA
  78. |`nodes/<NAME>/pve-ssl.pem` | Public SSL certificate for web server (signed by cluster CA)
  79. |`nodes/<NAME>/pve-ssl.key` | Private SSL key for `pve-ssl.pem`
  80. |`nodes/<NAME>/pveproxy-ssl.pem` | Public SSL certificate (chain) for web server (optional override for `pve-ssl.pem`)
  81. |`nodes/<NAME>/pveproxy-ssl.key` | Private SSL key for `pveproxy-ssl.pem` (optional)
  82. |`nodes/<NAME>/qemu-server/<VMID>.conf` | VM configuration data for KVM VMs
  83. |`nodes/<NAME>/lxc/<VMID>.conf` | VM configuration data for LXC containers
  84. |`firewall/cluster.fw` | Firewall configuration applied to all nodes
  85. |`firewall/<NAME>.fw` | Firewall configuration for individual nodes
  86. |`firewall/<VMID>.fw` | Firewall configuration for VMs and Containers
  87. |=======
  88. Symbolic links
  89. ~~~~~~~~~~~~~~
  90. [width="100%",cols="m,m"]
  91. |=======
  92. |`local` | `nodes/<LOCAL_HOST_NAME>`
  93. |`qemu-server` | `nodes/<LOCAL_HOST_NAME>/qemu-server/`
  94. |`lxc` | `nodes/<LOCAL_HOST_NAME>/lxc/`
  95. |=======
  96. Special status files for debugging (JSON)
  97. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  98. [width="100%",cols="m,d"]
  99. |=======
  100. |`.version` |File versions (to detect file modifications)
  101. |`.members` |Info about cluster members
  102. |`.vmlist` |List of all VMs
  103. |`.clusterlog` |Cluster log (last 50 entries)
  104. |`.rrd` |RRD data (most recent entries)
  105. |=======
  106. Enable/Disable debugging
  107. ~~~~~~~~~~~~~~~~~~~~~~~~
  108. You can enable verbose syslog messages with:
  109. echo "1" >/etc/pve/.debug
  110. And disable verbose syslog messages with:
  111. echo "0" >/etc/pve/.debug
  112. Recovery
  113. --------
  114. If you have major problems with your Proxmox VE host, e.g. hardware
  115. issues, it could be helpful to just copy the pmxcfs database file
  116. `/var/lib/pve-cluster/config.db` and move it to a new Proxmox VE
  117. host. On the new host (with nothing running), you need to stop the
  118. `pve-cluster` service and replace the `config.db` file (needed permissions
  119. `0600`). Second, adapt `/etc/hostname` and `/etc/hosts` according to the
  120. lost Proxmox VE host, then reboot and check. (And don't forget your
  121. VM/CT data)
  122. Remove Cluster configuration
  123. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  124. The recommended way is to reinstall the node after you removed it from
  125. your cluster. This makes sure that all secret cluster/ssh keys and any
  126. shared configuration data is destroyed.
  127. In some cases, you might prefer to put a node back to local mode without
  128. reinstall, which is described in
  129. <<pvecm_separate_node_without_reinstall,Separate A Node Without Reinstalling>>
  130. Recovering/Moving Guests from Failed Nodes
  131. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  132. For the guest configuration files in `nodes/<NAME>/qemu-server/` (VMs) and
  133. `nodes/<NAME>/lxc/` (containers), {pve} sees the containing node `<NAME>` as
  134. owner of the respective guest. This concept enables the usage of local locks
  135. instead of expensive cluster-wide locks for preventing concurrent guest
  136. configuration changes.
  137. As a consequence, if the owning node of a guest fails (e.g., because of a power
  138. outage, fencing event, ..), a regular migration is not possible (even if all
  139. the disks are located on shared storage) because such a local lock on the
  140. (dead) owning node is unobtainable. This is not a problem for HA-managed
  141. guests, as {pve}'s High Availability stack includes the necessary
  142. (cluster-wide) locking and watchdog functionality to ensure correct and
  143. automatic recovery of guests from fenced nodes.
  144. If a non-HA-managed guest has only shared disks (and no other local resources
  145. which are only available on the failed node are configured), a manual recovery
  146. is possible by simply moving the guest configuration file from the failed
  147. node's directory in `/etc/pve/` to an alive node's directory (which changes the
  148. logical owner or location of the guest).
  149. For example, recovering the VM with ID `100` from a dead `node1` to another
  150. node `node2` works with the following command executed when logged in as root
  151. on any member node of the cluster:
  152. mv /etc/pve/nodes/node1/qemu-server/100.conf /etc/pve/nodes/node2/
  153. WARNING: Before manually recovering a guest like this, make absolutely sure
  154. that the failed source node is really powered off/fenced. Otherwise {pve}'s
  155. locking principles are violated by the `mv` command, which can have unexpected
  156. consequences.
  157. WARNING: Guest with local disks (or other local resources which are only
  158. available on the dead node) are not recoverable like this. Either wait for the
  159. failed node to rejoin the cluster or restore such guests from backups.
  160. ifdef::manvolnum[]
  161. include::pve-copyright.adoc[]
  162. endif::manvolnum[]