16

I rather randomly checked the status of my RAID arrays with cat/proc/mdstat and realized, that one of my arrays seems to be resyncing:

md1 : active raid1 sdb7[1] sdc7[0]
      238340224 blocks [2/2] [UU]
      [==========>..........]  resync = 52.2% (124602368/238340224) finish=75.0min speed=25258K/sec

Why is this and what does it mean? I seemingly can access the mount point just fine with r/w permissions.

EDIT 1 (in response to SLM's ANSWER)

I can't really see anything if I grep through dmesg and the --detail switch doesn't tell me much either, i.e. it displays that the resync is in progress... but no hint for the reason or why it might have gotten out of sync... - I guess I might just need to keep an eye on it before I start swapping out my hardware.

slm
  • 363,520
  • 117
  • 767
  • 871
stdcerr
  • 2,037
  • 12
  • 42
  • 65

4 Answers4

14

This would seem to be indicating that the syncing between the 2 members of the RAID are not staying in sync with each other.

1. Investigate logs

I'd investigate your dmesg logs and see if there are any messages stating that either of the physical HDDs that make up this array are having hardware failures.

2. Check mdadm

You can also consult mdadm using the --detail switch to find out more information about the resync like so:

$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Sat Jan 26 09:14:11 2008
     Raid Level : raid1
     Array Size : 976759936 (931.51 GiB 1000.20 GB)
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jan  1 01:29:16 2010
          State : clean, resyncing
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

 Rebuild Status : 50% complete

           UUID : 37a3bfcb:41393031:23c133e6:3b879f08
         Events : 0.2178969

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1

If both devices seem fine and you cannot pinpoint which device is having an issue, you may want to temporarily run a diagnostic tool such as HDAT2 or SpinRite against each HDD to confirm their health.

3. Cabling

If the HDDs check out then I would start scrutinizing the cabling, I typically will swap these out.

4. Controller

I'd next scrutinize the controller itself, either taking the drives out of the affected system and diagnose them in a secondary system, or add a 3rd party controller card into the affected system to diagnose the issue further.

5. Power supply

Believe it or not, I've had issues in the past with HDDs and RAIDs where swapping out a failing, or about to fail, power supply, resolved my RAID health issues.

Anthon
  • 78,313
  • 42
  • 165
  • 222
slm
  • 363,520
  • 117
  • 767
  • 871
  • @cerr - yeah if the resyncs just keep happening at what appears to be random intervals then it's likely one of the HDDs is on the way out, or 3,4, or 5. The manifestation you're describing I've had happen a few times myself and it's been those situations which have resolved these failures, for me, in the past. – slm Sep 02 '14 at 03:17
12

Check your cron files, many distros do a scheduled resync/re-check once a week.

On CentOS 7.1 it's in /etc/cron.d/raid-check

# Run system wide raid-check once a week on Sunday at 1am by default
0 1 * * Sun root /usr/sbin/raid-check

To configure the behaviour edit /etc/sysconfig/raid-check

bebbo
  • 153
  • 1
  • 6
Sergio
  • 129
  • 1
  • 2
3

On Debian it is done from:

/etc/cron.d/mdadm

To disable:

chmod -x /usr/share/mdadm/checkarray

The cron job checks if checkarray is executable before running it.

See also.

sanmai
  • 1,406
  • 18
  • 23
  • 5
    Better would be to set AUTOCHECK=false in /etc/default/mdadm – kelnos Jul 06 '20 at 10:44
  • It seems a very bad idea to disable `checkarray`. If you don't want the cron job to run, just disable the cron job. – mivk Aug 07 '22 at 09:56
  • @mivk It is not. The cron job checks if `checkarray` is executable before running it. I would assume this is intended behaviour. – sanmai Aug 08 '22 at 09:30
  • Well, if you are absolutely certain that you will never want to run checkarray for any reason, you can sure disable it. But if it is the cron job that bothers you is, then it seems more reasonable to just disable that without breaking a core part of the mdadm installation. – mivk Aug 08 '22 at 12:03
  • I'm not concerned with the cron job, but I'm concerned if there's something else that might run `checkarray` and leave the database server in ruins for days. At least I was in 2017. – sanmai Aug 09 '22 at 13:46
0

On newer Ubuntu (at least 22.04+), the raid gets checked/resync-ed through CRON tasks that are started with a systemd timer.

$ systemctl list-timers
NEXT                        LEFT                LAST                        PASSED        UNIT                           ACTIVATES                       
___________________________________________________________________________________________________________________________________________________
Tue 2023-06-06 12:52:04 PDT 5h 17min left       Mon 2023-06-05 02:36:42 PDT 1 day 4h ago  mdmonitor-oneshot.timer        mdmonitor-oneshot.service
Sun 2023-07-02 22:17:28 PDT 3 weeks 5 days left Sun 2023-06-04 21:31:43 PDT 1 day 10h ago mdcheck_start.timer            mdcheck_start.service
n/a                         n/a                 Tue 2023-06-06 03:17:46 PDT 4h 16min ago  mdcheck_continue.timer         mdcheck_continue.service

(other timers not shown)

As mentioned by others, the mdcheck will also do a resync at least once a month to make sure your data is safe. If you have a lot of data (Tera bytes) then it may take a long time.

You can get details about each entry using the show command:

systemctl show mdcheck_start

That will start a check. The mdcheck_continue makes sure that it ends (in case it was interrupted, possibly by a reboot).

How the check works?

If you look at the mdadm man page, it says:

_--action= _

Set the "sync_action" for all md devices given to one of idle, frozen, check, repair. Setting to idle will abort any currently running action though some actions will automatically restart. Setting to frozen will abort any current action and ensure no other action starts automatically.

Details of check and repair can be found it md(4) under SCRUBBING AND MISMATCHES.

So we do:

man md

and search for SCRUBBING ......

SCRUBBING AND MISMATCHES

As storage devices can develop bad blocks at any time it is valuable to regularly read all blocks on all devices in an array so as to catch such bad blocks early. This process is called scrubbing.

md arrays can be scrubbed by writing either check or repair to the file md/sync_action in the sysfs directory for the device.

[...]

The systemctl command will send a repair action to md and the continue makes sure the command is indeed running. This allows the reboot process to interrupt and restart the process cleanly.

Ryan Chen
  • 3
  • 1
Alexis Wilke
  • 2,697
  • 2
  • 19
  • 42