Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Help about MediaWiki
FUTO
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Introduction to a Self Managed Life: a 13 hour & 28 minute presentation by FUTO software
(section)
Main Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Step 2: Creating Complete ZFS Monitoring Script with Logging == <span id="create-log-directory"></span> ==== 2.1 Create Log Directory ==== <pre>sudo mkdir -p /var/log/zfs-monitor sudo chown root:root /var/log/zfs-monitor sudo chmod 755 /var/log/zfs-monitor</pre> <span id="make-the-monitoring-script"></span> ==== 2.2 Make the Monitoring Script ==== <pre>sudo -u root nano /root/zfs_health_check.sh</pre> Copy and paste this complete script: <pre>#!/bin/bash # Configuration EMAIL="l.a.rossmann@gmail.com" HOSTNAME=$(hostname) LOG_FILE="/var/log/zfs-monitor/health_check.log" LOG_MAX_SIZE=$((10 * 1024 * 1024)) # 10MB in bytes # Email configuration FROM_EMAIL="yourdriveisdead@stevesavers.com" FROM_NAME="Steve" REPLY_TO="Steve <steve@stevesavers.com>" # Use a more consistent Reply-To address RETURN_PATH="bounce@stevesavers.com" # A safe Return-Path address to handle bounces properly # Create required directories mkdir -p "$(dirname "$LOG_FILE")" # Initialize error log errors="" # Logging functions rotate_log() { if [ -f "$LOG_FILE" ] && [ $(stat -f%z "$LOG_FILE" 2>/dev/null || stat -c%s "$LOG_FILE") -gt "$LOG_MAX_SIZE" ]; then mv "$LOG_FILE" "$LOG_FILE.old" fi } log_message() { echo -e "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE" } log_error() { local message="$1" errors="${errors}\n$message" log_message "ERROR: $message" } # Check overall pool status check_pool_status() { while IFS= read -r pool; do status=$(zpool status "$pool") # Check for common failure keywords if echo "$status" | grep -E "DEGRADED|FAULTED|OFFLINE|UNAVAIL|REMOVED|FAIL|DESTROYED|SUSPENDED" > /dev/null; then log_error "ALERT: Pool $pool is not healthy:\n$status" fi # Check for errors if echo "$status" | grep -v "No known data errors" | grep -i "errors:" > /dev/null; then log_error "ALERT: Pool $pool has errors:\n$status" fi # Check scrub status if echo "$status" | grep "scan" | grep -E "scrub canceled|scrub failed" > /dev/null; then log_error "ALERT: Pool $pool has unusual scrub status:\n$(echo "$status" | grep "scan")" fi done < <(zpool list -H -o name) } # Check individual device status check_devices() { while IFS= read -r pool; do devices=$(zpool status "$pool" | awk '/ONLINE|DEGRADED|FAULTED|OFFLINE|UNAVAIL|REMOVED/ {print $1,$2}') echo "$devices" | while read -r device state; do if [ "$state" != "ONLINE" ] && [ "$device" != "pool" ] && [ "$device" != "mirror" ] && [ "$device" != "raidz1" ] && [ "$device" != "raidz2" ]; then log_error "ALERT: Device $device in pool $pool is $state" fi done done < <(zpool list -H -o name) } # Check capacity threshold (80% by default) check_capacity() { while IFS= read -r pool; do capacity=$(zpool list -H -p -o capacity "$pool") if [ "$capacity" -ge 80 ]; then log_error "WARNING: Pool $pool is ${capacity}% full" fi done < <(zpool list -H -o name) } # Check dataset properties check_dataset_properties() { while IFS= read -r dataset; do # Skip base pools if ! echo "$dataset" | grep "/" > /dev/null; then continue fi # Check if compression is enabled compression=$(zfs get -H compression "$dataset" | awk '{print $3}') if [ "$compression" = "off" ]; then log_error "WARNING: Compression is disabled on dataset $dataset" fi # Check if dataset is mounted mounted=$(zfs get -H mounted "$dataset" | awk '{print $3}') if [ "$mounted" = "no" ]; then log_error "WARNING: Dataset $dataset is not mounted" fi # Check available space available=$(zfs get -H available "$dataset" | awk '{print $3}') if [ "$available" = "0" ] || [ "$available" = "0B" ]; then log_error "CRITICAL: Dataset $dataset has no available space" fi done < <(zfs list -H -o name) } # Function to send email send_email() { local subject="$1" local content="$2" { echo "Subject: $subject" echo "To: ${EMAIL}" echo "From: ${FROM_NAME} <${FROM_EMAIL}>" echo "Reply-To: ${REPLY_TO}" echo "Return-Path: ${RETURN_PATH}" echo "Content-Type: text/plain; charset=UTF-8" echo echo "$content" } | sendmail -t } # Main execution rotate_log log_message "Starting ZFS health check" # Run all checks check_pool_status check_devices check_capacity check_dataset_properties # Send notification if there are errors if [ -n "$errors" ]; then log_message "Issues detected - sending email alert" subject="Storage Alert: Issues Detected on ${HOSTNAME}" # Simplified subject line content=$(echo -e "ZFS Health Monitor Report from ${HOSTNAME}\n\nThe following issues were detected:${errors}") send_email "$subject" "$content" else log_message "All ZFS checks passed successfully" fi</pre> <span id="set-proper-permissions-2"></span> ==== 2.3 Set Proper Permissions ==== <pre>sudo -u root chmod +x /root/zfs_health_check.sh</pre> <span id="test-the-script"></span> ==== 2.4 Test the Script ==== <pre>sudo /root/zfs_health_check.sh</pre> <span id="make-sure-logging-works"></span> ==== 2.5 Make sure logging works ==== <pre>tail -f /var/log/zfs-monitor/health_check.log</pre> <span id="features-of-this-script"></span> ==== 2.6 Features of this Script: ==== * '''Monitoring''': ** It tells you when your pool has issues BEFORE all your drives die ** Device status checks ** Capacity warnings * '''Email Alerts''': ** Sends when issues are detected ** Includes error information The script is now ready for cron job configuration and regular use. Cron jobs are tasks we tell the machine to perform at regular intervals, similar to setting a utility bill to autopay. <span id="step-3-create-cron-job"></span>
Summary:
Please note that all contributions to FUTO may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
FUTO:Copyrights
for details).
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following hCaptcha:
Cancel
Editing help
(opens in new window)