For the test, I will update the Smart Array P420i firmware on the HPE DL380p G8 server.Continue reading “Smart Array P420i Firmware Update”
Here is an example of migrating a running Ubuntu system to a software RAID1.
In the process, you will need to perform two reboots.
The first step is to switch to the root user if not yet:
Let’s see a list of disks and partitions:
fdisk -l fdisk -l | grep '/dev/sd' lsblk -o NAME,UUID
Suppose that the system uses one disk, for example /dev/sda and has one main partition, /dev/sda1.
For the test, I installed a clean Ubuntu Server 18.04, the disk was parted by default, swap was the file on the same partition.
To create a raid, we connect another disk of the same size, it will be called /dev/sdb.
Replaced once the junk drive in the software RAID1, added it to the raid, it successfully synchronized, installed GRUB.
After a while I received an email message:
Subject: Cron <root@server> /usr/sbin/raid-check WARNING: mismatch_cnt is not 0 on /dev/md2
In my case, raid-check found that the mismatch_cnt counter is not equal to 0 for /dev/md2, which means that there may be broken sectors on the disk, or it simply needs to be resynchronized. Since I installed GRUB after adding the disk to the raid, this is probably the cause.
Example of viewing the counters of all arrays:
Or each in turn:
cat /sys/block/md0/md/mismatch_cnt cat /sys/block/md1/md/mismatch_cnt cat /sys/block/md2/md/mismatch_cnt
View the status of raids:
If mismatch_cnt is not 0 for any array, then you can try to resynchronize it:
echo 'repair' >/sys/block/md2/md/sync_action
echo 'check' >/sys/block/md2/md/sync_action
If you want to cancel the action:
echo 'idle' >/sys/block/md2/md/sync_action
Let’s see the synchronization status and other data of the array:
How to fix the problem with mdadm disks
RAID arrays are necessary to improve the reliability of data storage and increase the speed of working with disks by combining multiple disks into one large one. RAID arrays can be either hardware, firmware or software.
I will describe several types:
RAID 0 (stripe) – The mode is only for improving performance when reading/writing does not increase reliability. The user has access to the whole volume of disks, if one of the disks fails, the array is usually destroyed and data recovery is almost impossible.
RAID 1 (mirror) – On all disks the recording is performed synchronously, they completely duplicate each other. Half of the disk space is available to the user. Increases performance only when reading, but this is a very reliable way to protect information. The minimum number of disks is 2.
RAID 10 (RAID 1+0) – This is an array of RAID0 from RAID1 arrays. Fast as RAID0 and reliable as RAID1. The minimum number of disks is four and their number should be even. Half of the disk space is available to the user.
RAID 0 + 1 – RAID1 array from RAID0 arrays. Not popular, since the advantages are worse than RAID 10.
RAID 1E – Similar to RAID10 using an odd number of disks and a minimum of 3.
RAID 5 – User-accessible space is reduced by one disk, reliability is lower than RAID 1, performance is increased when reading and writing as in RAID 0. If one of the disks fails, the data can be restored. The minimum number of disks is 3.
RAID 6 – Similar to RAID 5, including speed, but a little more reliable. The space available to the user is reduced by two disks, the information is not lost when two disks fail. The minimum number of disks is 4.
I personally prefer to use RAID 1 and RAID 6.
I recommend reading my article Description of RAID types.
You can install mdadm in Ubuntu using the command:
sudo aptitude install mdadm
yum install mdadm
On the test I will collect RAID in Ubuntu 14.04, I immediately switch to the root user (hereinafter the commands will be similar for other operating systems):
In the beginning we’ll see the list of disks by commands (I have two unmounted identical sizes /dev/sdb and /dev/sdc):
fdisk -l df -h lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT
Let’s create RAID 1:
mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sdb /dev/sdc
Check the status of the array and its components by:
cat /proc/mdstat mdadm --detail /dev/md0 mdadm -E /dev/sdb mdadm -E /dev/sdc
Create a file system:
mkfs.ext4 -F /dev/md0
To mount the created RAID to the current system, create a directory and mount it into it:
mkdir -p /mnt/md0 mount /dev/md0 /mnt/md0
Let’s see the details of RAID:
mdadm --verbose --detail --scan
Save the changes:
mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf update-initramfs -u echo '/dev/md0 /mnt/md0 ext4 defaults,nofail,discard 0 0' | tee -a /etc/fstab
Done, after rebooting the system, RAID will be automatically mounted.
To receive e-mail notifications about the RAID status, in the mdadm.conf configuration file, specify which address to send and from which (for mail to be sent to the system, for example, postfix should be installed):
MAILADDR firstname.lastname@example.org MAILFROM email@example.com
Restart the monitoring service:
service mdadm restart
You can configure some parameters by answering the questions with the command:
I received three email messages from one of the servers on Hetzner with information about raids md0, md1, md2:
DegradedArray event on /dev/md/0:example.com
This is an automatically generated mail message from mdadm
running on example.com
A DegradedArray event had been detected on md device /dev/md/0.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid6] [raid5] [raid4] [raid1]
md2 : active raid6 sdb3 sdd3
208218112 blocks super 1.0 level 6, 512k chunk, algorithm 2 [4/2] [_U_U]
md1 : active raid1 sdb2 sdd2
524224 blocks super 1.0 [4/2] [_U_U]
md0 : active raid1 sdb1 sdd1
12582784 blocks super 1.0 [4/2] [_U_U]
I looked at the information about RAID and disks:
cat /proc/mdstat cat /proc/partitions mdadm --detail /dev/md0 mdadm --detail /dev/md1 mdadm --detail /dev/md2 fdisk -l | grep '/dev/sd' fdisk -l | less
I was going to send a ticket to the tech support and plan to replace the dropped SSD disks.
SMART recorded information about the dropped discs in the files, there was also their serial number:
smartctl -x /dev/sda > sda.log smartctl -x /dev/sdc > sdc.log
Remove disks from the raid if you can:
mdadm /dev/md0 -r /dev/sda1 mdadm /dev/md1 -r /dev/sda2 mdadm /dev/md2 -r /dev/sda3 mdadm /dev/md0 -r /dev/sdc1 mdadm /dev/md1 -r /dev/sdc2 mdadm /dev/md2 -r /dev/sdc3
If any partition of the disk is displayed as working, and the disk needs to be extracted, then first mark the partition not working and then delete, for example, if /dev/sda1, /dev/sda2 are dropped, and /dev/sda3 works:
mdadm /dev/md0 -f /dev/sda3 mdadm /dev/md0 -r /dev/sda3
In my case, having looked at the information about the dropped discs, I found that they are whole and working, even better than active ones.
I looked at the disk partitions:
fdisk /dev/sda p q fdisk /dev/sdc p q
They were marked the same way as before:
Disk /dev/sda: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00015e3f
Device Boot Start End Blocks Id System
/dev/sda1 1 1567 12582912+ fd Linux raid autodetect
/dev/sda2 1567 1633 524288+ fd Linux raid autodetect
/dev/sda3 1633 14594 104109528+ fd Linux raid autodetect
Therefore, after waiting for the synchronization of each returned these discs back to the raid:
mdadm /dev/md0 -a /dev/sda1 mdadm /dev/md1 -a /dev/sda2 mdadm /dev/md2 -a /dev/sda3 mdadm /dev/md0 -a /dev/sdc1 mdadm /dev/md1 -a /dev/sdc2 mdadm /dev/md2 -a /dev/sdc3
At the end, the command cat /proc/mdstat was already displayed with [UUUU].
If the disks are replaced with new ones, then they need to be broken in the same way as the ones installed.
An example of partitioning the disk /dev/sdb is similar to /dev/sda with MBR:
sfdisk -d /dev/sda | sfdisk --force /dev/sdb
Example of partitioning /dev/sdb with GPT and assigning a random UUID disk:
sgdisk -R /dev/sdb /dev/sda sgdisk -G /dev/sdb
Also on the newly installed disk you need to install the bootloader:
grub-install --version grub-install /dev/sdb update-grub
Either through the menu grub (hd0 is /dev/sda, hd0,1 – /dev/sda2):
cat /boot/grub/device.map grub device (hd0) /dev/sda root (hd0,1) setup (hd0) quit
If the grub installation is performed from the rescue disk, you need to look at the partition list and mount it, for example if RAID is not used:
ls /dev/[hsv]d[a-z]*[0-9]* mount /dev/sda3 /mnt
If you are using software RAID:
ls /dev/md* mount /dev/md2 /mnt
ls /dev/mapper/* mount /dev/mapper/vg0-root /mnt
And execute chroot:
chroot-prepare /mnt chroot /mnt
After mounting, you can restore GRUB as I wrote above.
See also my other articles:
How did I make a request to Hetzner to replace the disk in the raid
The solution to the error “md: kicking non-fresh sda1 from array”
The solution to the warning “mismatch_cnt is not 0 on /dev/md*”
mdadm – utility for managing software RAID arrays
Description of RAID types
Diagnostics HDD using smartmontools
Recovering GRUB Linux