Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Help about MediaWiki
FUTO
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Introduction to a Self Managed Life: a 13 hour & 28 minute presentation by FUTO software
(section)
Main Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==== 5.1 Creating the alert script ==== I’m not a programmer, so bear with me. This script is for my personal use, but I’m sharing it because it works. Here’s what you need to do: # '''Edit Email Addresses''': You’ll need to change the email addresses in the script. This includes: #* The recipient email #* The sender email #* The reply-to address #* The return path for bounced emails # '''Script Location''': Save the script at <code>root/mdadm_alert.sh</code> <pre>sudo -u root nano -w /root/mdadm_alert.sh</pre> Enter the following: <pre>#!/bin/bash # thank you to stack overflow for giving me the courage to wade through 100s of posts and hack together something that looks like it works. # stricter error handling set -euo pipefail # ‘set -e’ exits on errors, ‘u’ throws errors on unset variables, & ‘pipefail’ exits if any part of a pipeline fails IFS=$'\n\t' # Set IFS (Internal Field Separator) to newline & tab to avoid issues with spaces and other weird characters in filenames # Configuration variables (where settings are stored) EMAIL="l.a.rossmann@gmail.com" # Email to send alerts to - EDIT THIS HOSTNAME=$(hostname) # Pull the system's hostname dynamically and save it here LOG_DIR="/var/log/mdadm-monitor" # Directory path for where logs go LOG_FILE="${LOG_DIR}/raid_health_check.log" # Full path to the specific log file for RAID checks LOG_MAX_SIZE=$((10 * 1024 * 1024)) # Maximum log file size in bytes (10 MB here) # Email configuration for the alert message FROM_EMAIL="yourdriveisdead@stevesavers.com" # The email address that will appear as the sender - EDIT THIS FROM_NAME="Steve" # name of the sender, EDIT THIS REPLY_TO="Steve <steve@stevesavers.com>" # Reply-to email address, EDIT THIS RETURN_PATH="bounce@stevesavers.com" # Return path for bounced emails when email fails EDIT THIS # make empty variables & associated arrays errors="" # Empty variable to collect error messages drive_health_report="" # Another empty variable to store drive health details declare -A RAID_ARRAYS # array to keep track of RAID arrays we find, indexed by name like "boot" declare -A SMART_SCORES # array to store SMART scores for drives, indexed by rive path # Set up log directory and ensure permissions are correct setup_logging() { # Make the log directory if it doesn’t already exist mkdir -p "$LOG_DIR" || { echo "ERROR: Cannot create log directory $LOG_DIR"; exit 1; } # Exit with error if I can’t make the directory chmod 750 "$LOG_DIR" # Set directory permissions to allow owner & group access but not others # Check if the log file exists and exceeds the max size limit if [ -f "$LOG_FILE" ] && [ "$(stat -c%s "$LOG_FILE")" -gt "$LOG_MAX_SIZE" ]; then # ‘stat -c%s’ gives the size in bytes mv "$LOG_FILE" "$LOG_FILE.old" # Archive the old log file by renaming it fi touch "$LOG_FILE" # Create an empty log file if it doesn’t exist chmod 640 "$LOG_FILE" # Set permissions on the log file (read/write for owner, read for group) } # Function for logging messages w/ timestamps log_message() { local timestamp # Make local variable for this timestamp=$(date '+%Y-%m-%d %H:%M:%S') # Generate a timestamp in this specific format echo "[$timestamp] $1" | tee -a "$LOG_FILE" # Output the message with the timestamp to both console & log file } # Function for logging errors (adds them to the error string and logs them as "ERROR") log_error() { local message="$1" # Message passed to this function errors="${errors}\n$message" # Append this message to the errors variable log_message "ERROR: $message" # Log the error with a timestamp } # Check that required (commands) are installed on the system check_dependencies() { log_message "Checking required dependencies..." # Announce the check in the log local missing_deps=() # Initialize an empty array for any missing commands # Loop through each command we need, checking if it’s available for dep in mdadm smartctl lsblk findmnt awk grep dmsetup; do if ! command -v "$dep" &>/dev/null; then # If the command is missing, add it to the array missing_deps+=("$dep") fi done # If the array of missing dependencies isn’t empty, log an error and exit if [ ${#missing_deps[@]} -ne 0 ]; then log_error "Missing required dependencies: ${missing_deps[*]}" # Log missing commands log_error "Install them with: sudo apt-get install mdadm smartmontools util-linux findutils gawk grep dmsetup" exit 1 # Exit with error because we’re missing something we need(find what you need if you're getting this) fi } # Find & detect RAID arrays on this system detect_raid_arrays() { log_message "Detecting RAID arrays..." # Log that we’re looking for RAID arrays # Find all block devices with names like /dev/md0, /dev/md1 (these are RAID arrays like the one you made for the OS & boot) local md_devices md_devices=$(find /dev -name 'md[0-9]*' -type b) # Save this list to the md_devices variable # Loop through each RAID array found and log its details for md_dev in $md_devices; do local array_detail # Temporary variable for array details array_detail=$(mdadm --detail "$md_dev" 2>/dev/null) || continue # Get RAID details; skip if it fails # Extract the RAID array name from the details local array_name array_name=$(echo "$array_detail" | grep "Name" | awk '{print $NF}') # Last word on the "Name" line is the array name # Use the name to decide if this array is for boot or root, then add it to RAID_ARRAYS if [[ "$array_name" == *"bootraid"* ]]; then # Array name contains "bootraid" RAID_ARRAYS["boot"]="$md_dev" # Save the device path with the key "boot" log_message "Found boot array: $md_dev ($array_name)" # Log the found boot array elif [[ "$array_name" == *"osdriveraid"* ]]; then # Array name contains "osdriveraid" RAID_ARRAYS["root"]="$md_dev" # Save the device path with the key "root" log_message "Found root array: $md_dev ($array_name)" # Log the found root array fi done # Check if we actually found both root and boot arrays, and log an error if any are missing if [ -z "${RAID_ARRAYS["boot"]:-}" ] || [ -z "${RAID_ARRAYS["root"]:-}" ]; then # If either key is empty log_error "Failed to detect both boot and root RAID arrays" # Log a general error [ -z "${RAID_ARRAYS["boot"]:-}" ] && log_error "Boot array not found" # Specific message if boot is missing [ -z "${RAID_ARRAYS["root"]:-}" ] && log_error "Root array not found" # Specific message if root is missing return 1 # Return an error code fi # Print out a summary of all arrays found log_message "Detected arrays:" for purpose in "${!RAID_ARRAYS[@]}"; do log_message " $purpose: ${RAID_ARRAYS[$purpose]}" done } # Check the health of a specific RAID array check_array_status() { local array="$1" # The path of the array device local purpose="$2" # Either "boot" or "root" to clarify which array this is # Verify that the array actually exists as a block device if [ ! -b "$array" ]; then log_error "$purpose array device $array does not exist" # Log the missing device return 1 # Return error because we can’t check a nonexistent device fi # Get details about the RAID array and store it in the detail variable local detail detail=$(mdadm --detail "$array" 2>&1) || { # ‘2>&1’ captures error output in case of issues log_error "Failed to get details for $purpose array ($array)" return 1 # Exit with an error code if it failed } # Extract the state of the array (like "clean" or "active") and log it local state state=$(echo "$detail" | grep "State :" | awk '{print $3,$4}') # Get the words after "State :" from the details log_message "$purpose array status: $state" # If the array is in an undesirable state, log a warning if [[ "$state" =~ degraded|DEGRADED|failed|FAILED|inactive|INACTIVE ]]; then log_error "$purpose array ($array) is in concerning state: $state" fi # Detect failed devices within the array local failed_devices failed_devices=$(echo "$detail" | grep "Failed Devices" | awk '{print $4}') # Pull the failed devices count if [ "$failed_devices" -gt 0 ]; then # If there are failed devices, go through each one while read -r line; do if [[ "$line" =~ "faulty" ]]; then # If the line mentions "faulty" local failed_dev failed_dev=$(echo "$line" | awk '{print $7}') # Get the 7th word (the device name) log_error "$purpose array ($array) has failed device: $failed_dev" # Log which device failed fi done < <(echo "$detail" | grep -A20 "Number" | grep "faulty") # Look up to 20 lines after "Number" to find "faulty" fi # Check if any devices are rebuilding, and log it if they are if echo "$detail" | grep -q "rebuilding"; then while read -r line; do if [[ "$line" =~ "rebuilding" ]]; then # Check for "rebuilding" in the line local rebuilding_dev rebuilding_dev=$(echo "$line" | awk '{print $7}') # Get the device name being rebuilt log_error "$purpose array ($array) is rebuilding device: $rebuilding_dev" # Log the rebuilding device fi done < <(echo "$detail" | grep -A20 "Number" | grep "rebuilding") # Again, look ahead 20 lines for any "rebuilding" mention fi } # Function to check the health of each drive within a RAID array check_drive_health() { local drive="$1" # The drive device to check (e.g., /dev/sda) local health_score=100 # Initialize health score to 100 (a perfect score) local issues="" # Skip the check if it’s not a valid block device if [ ! -b "$drive" ]; then log_error "Device $drive is not a block device" # Log the invalid device return 1 # Exit with an error code fi log_message "Checking health of drive $drive..." # Announce which drive we’re checking # Run SMART health check and reduce health score if it fails if ! smartctl -H "$drive" | grep -q "PASSED"; then # If it does NOT say "PASSED" health_score=$((health_score - 50)) # Drop score by 50 points if it fails issues+="\n- Overall health check failed" # Log this specific issue fi # Collect SMART attributes for further checks local smart_attrs smart_attrs=$(smartctl -A "$drive" 2>/dev/null) || true # Redirect error to /dev/null # Check for reallocated sectors (sign of drive wear and tear) local reallocated reallocated=$(echo "$smart_attrs" | awk '/^ 5/ {print $10}') # Look for attribute ID 5 in SMART data if [ -n "$reallocated" ] && [ "$reallocated" -gt 0 ]; then health_score=$((health_score - 10)) # Drop health score by 10 if we have reallocated sectors issues+="\n- Reallocated sectors: $reallocated" # Add to issues list fi # Check for pending sectors (could cause read/write errors) local pending pending=$(echo "$smart_attrs" | awk '/^197/ {print $10}') # Look for attribute ID 197 in SMART data if [ -n "$pending" ] && [ "$pending" -gt 0 ]; then health_score=$((health_score - 10)) # Drop health score by 10 if pending sectors are present issues+="\n- Pending sectors: $pending" # Add to issues list fi SMART_SCORES["$drive"]=$health_score # Save the final score in SMART_SCORES array if [ "$health_score" -lt 100 ]; then drive_health_report+="\nDrive: $drive\nHealth Score: $health_score/100\nIssues:$issues" # Append issues to report if any were found fi } # Send email if any errors or health issues were found send_email() { local subject="RAID Alert: Issues Detected on ${HOSTNAME}" # Set email subject line local content="RAID Health Monitor Report from ${HOSTNAME}\nTime: $(date '+%Y-%m-%d %H:%M:%S')\n" [ -n "$errors" ] && content+="\nRAID Issues:${errors}" # Append RAID issues to the email content if any [ -n "$drive_health_report" ] && content+="\nDrive Health Report:${drive_health_report}" # Append drive health report if any issues were found # Build the email using sendmail syntax { echo "Subject: $subject" echo "To: ${EMAIL}" echo "From: ${FROM_NAME} <${FROM_EMAIL}>" echo "Reply-To: ${REPLY_TO}" echo "Return-Path: ${RETURN_PATH}" echo "Content-Type: text/plain; charset=UTF-8" # Text format for readability echo echo -e "$content" # Use ‘-e’ to allow newline characters } | sendmail -t # Pipe the entire email message to sendmail for delivery } # Main function to execute checks and send email if needed main() { # Make sure script is run as root for necessary permissions [ "$(id -u)" -ne 0 ] && { echo "ERROR: This script must be run as root"; exit 1; } setup_logging # Call function to initialize logging setup log_message "Starting RAID health check" # Announce the start of the health check check_dependencies # Verify dependencies are available detect_raid_arrays # Detect RAID arrays # Loop through each RAID array and check its status, then check each drive in the array for purpose in "${!RAID_ARRAYS[@]}"; do array="${RAID_ARRAYS[$purpose]}" check_array_status "$array" "$purpose" # For each device in the RAID array, check health while read -r device; do if [[ "$device" =~ ^/dev/ ]]; then check_drive_health "$device" fi done < <(mdadm --detail "$array" | grep "active sync" | awk '{print $NF}') done # Send an email if errors or health issues were found; otherwise, log a success message [ -n "$errors" ] || [ -n "$drive_health_report" ] && send_email || log_message "All checks passed successfully" } # Execute the main function to start everything main # Calls the main function, running all the checks</pre> Set permissions properly so it can run: <pre>sudo -u root chmod +x /root/mdadm_alert.sh</pre> <span id="setting-up-the-cron-job"></span>
Summary:
Please note that all contributions to FUTO may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
FUTO:Copyrights
for details).
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following hCaptcha:
Cancel
Editing help
(opens in new window)