4.7. 使用 WeatherReport 对 CouchDB 3 进行故障排除

4.7.1. 概述

WeatherReport 是一个 OTP 应用程序和工具集,用于诊断可能影响 CouchDB 版本 3 节点或集群(不支持版本 4 或更高版本)的常见问题。可以通过 weatherreport 命令行 escript 访问它。

以下是一个使用 weatherreport 的基本示例,紧随其后的是命令的输出

$ weatherreport --etc /path/to/etc
[warning] Cluster member [email protected] is not connected to this node. Please check whether it is down.

4.7.2. 用法

在大多数情况下,您只需运行 weatherreport 命令即可,如上所示。但是,有时您可能希望了解一些额外的细节,或者只运行特定的检查。为此,可以使用命令行选项。执行 weatherreport --help 以了解有关这些选项的更多信息

$ weatherreport --help
Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...]

  -c, --etc                 Path to the CouchDB configuration directory
  -d, --level               Minimum message severity level (default: notice)
  -l, --list                Describe available diagnostic tasks
  -e, --expert              Perform more detailed diagnostics
  -h, --help                Display help/usage
  check_name                A specific check to run

要了解将要运行哪些检查,请使用 –list 选项

$ weatherreport --list
Available diagnostic checks:

  custodian            Shard safety/liveness checks
  disk                 Data directory permissions and atime
  internal_replication Check the number of pending internal replication jobs
  ioq                  Check the total number of active IOQ requests
  mem3_sync            Check there is a registered mem3_sync process
  membership           Cluster membership validity
  memory_use           Measure memory usage
  message_queues       Check for processes with large mailboxes
  node_stats           Check useful erlang statistics for diagnostics
  nodes_connected      Cluster node liveness
  process_calls        Check for large numbers of processes with the same current/initial call
  process_memory       Check for processes with high memory usage
  safe_to_rebuild      Check whether the node can safely be taken out of service
  search               Check the local search node is responsive
  tcp_queues           Measure the length of tcp queues in the kernel

如果您希望了解 WeatherReport 正在执行的所有详细信息,可以在更详细的日志级别运行检查,使用 --level 选项

$ weatherreport --etc /path/to/etc --level debug
[debug] Not connected to the local cluster node, trying to connect. alive:false connect_failed:undefined
[debug] Starting distributed Erlang.
[debug] Connected to local cluster node '[email protected]'.
[debug] Local RPC: mem3:nodes([]) [5000]
[debug] Local RPC: os:getpid([]) [5000]
[debug] Running shell command: ps -o pmem,rss -p 73905
[debug] Shell command output:
%MEM    RSS
0.3  25116

[debug] Local RPC: erlang:nodes([]) [5000]
[debug] Local RPC: mem3:nodes([]) [5000]
[warning] Cluster member [email protected] is not connected to this node. Please check whether it is down.
[info] Process is using 0.3% of available RAM, totalling 25116 KB of real memory.

大多数情况下您会想要使用默认值,但任何 syslog 严重性名称都可以(从最详细到最不详细):debug, info, notice, warning, error, critical, alert, emergency

最后,如果您只想运行单个诊断或特定诊断列表,可以传递其名称

$ weatherreport --etc /path/to/etc nodes_connected
[warning] Cluster member [email protected] is not connected to this node. Please check whether it is down.