Notice

This document is for a development version of Ceph.

Disaster recovery

Metadata damage and repair

If a file system has inconsistent or missing metadata, it is considered damaged. You may find out about damage from a health message, or in some unfortunate cases from an assertion in a running MDS daemon.

Metadata damage can result either from data loss in the underlying RADOS layer (e.g. multiple disk failures that lose all copies of a PG), or from software bugs.

CephFS includes some tools that may be able to recover a damaged file system, but to use them safely requires a solid understanding of CephFS internals. The documentation for these potentially dangerous operations is on a separate page: Advanced: Metadata repair tools.

Data pool damage (files affected by lost data PGs)

If a PG is lost in a data pool, then the file system will continue to operate normally, but some parts of some files will simply be missing (reads will return zeros).

Losing a data PG may affect many files. Files are split into many objects, so identifying which files are affected by loss of particular PGs requires a full scan over all object IDs that may exist within the size of a file. This type of scan may be useful for identifying which files require restoring from a backup.

Danger

This command does not repair any metadata, so when restoring files in this case you must remove the damaged file, and replace it in order to have a fresh inode. Do not overwrite damaged files in place.

If you know that objects have been lost from PGs, use the pg_files subcommand to scan for files that may have been damaged as a result:

cephfs-data-scan pg_files <path> <pg id> [<pg id>...]

For example, if you have lost data from PGs 1.4 and 4.5, and you would like to know which files under /home/bob might have been damaged:

cephfs-data-scan pg_files /home/bob 1.4 4.5

The output will be a list of paths to potentially damaged files, one per line.

Note that this command acts as a normal CephFS client to find all the files in the file system and read their layouts, so the MDS must be up and running.

Using first-damage.py

  1. Unmount all clients.

  2. Flush the journal if possible:

    ceph tell mds.<fs_name>:0 flush journal
    
  3. Fail the file system:

    ceph fs fail <fs_name>
    
  4. Recover dentries from the journal. If the MDS flushed the journal successfully, this will be a no-op:

    cephfs-journal-tool --rank=<fs_name>:0 event recover_dentries summary
    
  5. Reset the journal:

    cephfs-journal-tool --rank=<fs_name>:0 journal reset --yes-i-really-mean-it
    
  6. Run first-damage.py to list damaged dentries:

    python3 first-damage.py --memo run.1 <pool>
    
  7. Optionally, remove the damaged dentries:

    python3 first-damage.py --memo run.2 --remove <pool>
    

    Note

    use --memo to specify a different file to save objects that have already been traversed. This makes it possible to separate data made during different, independent runs.

    This command has the effect of removing a dentry from the snapshot or head (in the current hierarchy). The inode’s linkage will be lost. The inode may however be recoverable in lost+found during a future data-scan recovery.

Brought to you by the Ceph Foundation

The Ceph Documentation is a community resource funded and hosted by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now.