debian-jessie – www.gonscak.sk

disk cloning with dd

How to create a disk or usb image, and compress it on the fly? And how to restore it?
I have own operating system on USB key. To create a full-backup and then possible restore to another device, I use linux command dd (dd – convert and copy a file).
Now, we must determine, on which patch we have s source disk. I my case, it is

sudo fdisk -l /dev/sdb
Disk /dev/sdb: 29,5 GiB

First, I install additional software for monitoring and best compressing on more cores

sudo apt-get install pigz pv

Then, I create a full copy of the usb key. Without compression it takes 30GB, with compression, it take only 3GB. With command “pv” we can watch progress. Pigz compress the source image with multiple threads and cores. With parameter -c it writes all processed output to stdout. So with operand “>” we write this pigz output to a file:

sudo dd if=/dev/sdb | pv | pigz -c > /home/vasil/Documents/corsair-work.dd.gz

If we had som bad blocks on source disk, and we want to clone it anyway, we can use another conv options. Like:

conv=sync,noerror

This means:

noerror – This makes use dd continue even after a read error is encountered;
sync – This option has sense especially when used together with noerror.

In such a case the noerror option will make dd continue running even if it a sector cannot be read successfully, and the sync option will make so that the amount of data failed to be read its replaced by NULs, so that the length of the data is preserved even if the actual data is lost (since it’s not possible to read it).

Then, I remove the source usb key and insert new one. It also has a path /dev/sdb. Now, I restore it with this command:

pigz -cdk Documents/corsair-work.dd.gz |pv| sudo dd of=/dev/sdb bs=4M

Parameter -c also write output to stdout and program dd writes it to disk. Parameter -k menas, that keep original file after decompress. And parameter -d means decompress.
Now, we can boot system with new usb key. And this image is identical as the source.
I hope, that this help someone. Have a nice day.

Total Page Visits: 221053 - Today Page Visits: 97

how to set up drbd primary-primary mode on proxmox 4.x

Today, I met with an interesting problem. I tried to create a primary-primary (dual primary) DRBD cluster on proxmox.
The first we must have fully configured proxmox Two-node cluster. Like this:
https://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster
We must have a good configuration of /etc/hosts to resolve names into IP:

root@cl3-amd-node1:/etc/drbd.d# cat /etc/hosts
cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.104 cl3-amd-node1 pvelocalhost
192.168.1.108 cl3-amd-node2

root@cl3-amd-node2:/etc/drbd.d# cat /etc/hosts
cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.104 cl3-amd-node1
192.168.1.108 cl3-amd-node2 pvelocalhost

One server was created on hardware raid PCI-E LSI 9240-4i (/dev/sdb) and second server was build on software raid via mdadm (/dev/md1) on debian jessie with installation with proxmox packages. So the backend for drbd devices was on one side – hardware raid and software raid on the other side. We must create a two disks with the same size (in sectors):

root@cl3-amd-node1:
fdisk -l /dev/sdb
Disk /dev/sdb: 1.8 TiB, 1998998994944 bytes, 3904294912 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
Device     Boot Start        End    Sectors   Size Id Type
/dev/sdb1        2048 1953260927 1953258880 931.4G 83 Linux

root@cl3-amd-node2:
fdisk -l /dev/md1
Disk /dev/md1: 931.4 GiB, 1000069595136 bytes, 1953260928 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
Device     Boot Start        End    Sectors   Size Id Type
/dev/md1p1       2048 1953260927 1953258880 931.4G 83 Linux

Now, we must have a direct network to each other of servers for drbd traffic, which will be very high. I use a bond of two gigabit network cards:

#cl3-amd-node1:
cat /etc/network/interfaces
auto bond0
iface bond0 inet static
        address  192.168.5.104
        netmask  255.255.255.0
        slaves eth2 eth1
        bond_miimon 100
        bond_mode balance-rr

#cl3-amd-node2:
cat /etc/network/interfaces
auto bond0
iface bond0 inet static
        address  192.168.5.108
        netmask  255.255.255.0
        slaves eth1 eth2
        bond_miimon 100
        bond_mode balance-rr

And we can test the speed of this network with package iperf:

apt-get install iperf

We start an iperf instance on one server by this command:

#cl3-amd-node2
iperf  -s -p 888

And from the other, we connect to this instance for 20 seconds:

#cl3-amd-node1
iperf -c 192.168.5.108 -p 888 -t 20
#and the conclusion
------------------------------------------------------------
Client connecting to 192.168.5.108, TCP port 888
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.5.104 port 49536 connected with 192.168.5.108 port 888
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-20.0 sec  4.39 GBytes  1.88 Gbits/sec

So we can see, that I have a bonded network from two network cards and the resulting speed is almost 2Gbps.
Now, we can continue with installing and setting up the drbd resource.

apt-get install drbd-utils drbdmanage

All aspects of DRBD are controlled in its configuration file, /etc/drbd.conf. Normally, this configuration file is just a skeleton with the following contents:
include “/etc/drbd.d/global_common.conf”;
include “/etc/drbd.d/*.res”;
The simplest configuration is:

cat /etc/drbd.d/global_common.conf
global {
        usage-count yes;
}
common {
        net {
        protocol C;
        }
}

And the configuration of resource itself. It must be the same on both nodes:

root@cl3-amd-node1:/etc/drbd.d# cat /etc/drbd.d/r0.res
resource r0 {
disk {
        c-plan-ahead 15;
        c-fill-target 24M;
        c-min-rate 90M;
        c-max-rate 150M;
}
net {
        protocol C;
        allow-two-primaries yes;
        data-integrity-alg md5;
        verify-alg md5;
}
on cl3-amd-node1 {
        device /dev/drbd0;
        disk /dev/sdb1;
        address 192.168.5.104:7789;
        meta-disk internal;
}
on cl3-amd-node2 {
        device /dev/drbd0;
        disk /dev/md1p1;
        address 192.168.5.108:7789;
        meta-disk internal;
}
}

root@cl3-amd-node2:/etc/drbd.d# cat /etc/drbd.d/r0.res
resource r0 {
disk {
        c-plan-ahead 15;
        c-fill-target 24M;
        c-min-rate 90M;
        c-max-rate 150M;
}
net {
        protocol C;
        allow-two-primaries yes;
        data-integrity-alg md5;
        verify-alg md5;
}
on cl3-amd-node1 {
        device /dev/drbd0;
        disk /dev/sdb1;
        address 192.168.5.104:7789;
        meta-disk internal;
}
on cl3-amd-node2 {
        device /dev/drbd0;
        disk /dev/md1p1;
        address 192.168.5.108:7789;
        meta-disk internal;
}
}

Now, we must create and initialize backend devices for drbd, on both nodes:

drbdadm create-md r0
#answer yes to destroy possible data on devices

Now, we can start the drbd service, on both nodes:

root@cl3-amd-node2:/etc/drbd.d# /etc/init.d/drbd start
[ ok ] Starting drbd (via systemctl): drbd.service.
root@cl3-amd-node1:/etc/drbd.d# /etc/init.d/drbd start
[ ok ] Starting drbd (via systemctl): drbd.service.

Or we can start it on both nodes:

drbdadm up r0

And we can see it as inconsistent and both of them are secondary:

root@cl3-amd-node1:~# drbdadm status
r0 role:Secondary
  disk:Inconsistent
  cl3-amd-node2 role:Secondary
    peer-disk:Inconsistent

Start the initial full synchronization. This step must be performed on only one node, only on initial resource configuration, and only on the node you selected as the synchronization source. To perform this step, issue this command:

root@cl3-amd-node1:# drbdadm primary --force r0

And we can see the status of our drbd storage:

root@cl3-amd-node2:~# drbdadm status
r0 role:Secondary
  disk:Inconsistent
  cl3-amd-node1 role:Primary
    replication:SyncTarget peer-disk:UpToDate done:3.10

After synchronization successfully finish, we set up our secondary server to be primary:

root@cl3-amd-node2:~# drbdadm status
r0 role:Secondary
  disk:UpToDate
  cl3-amd-node1 role:Primary
    peer-disk:UpToDate

root@cl3-amd-node2:~# drbdadm primary r0

And we can see status of this dual-primary (primary-primary) drbd storage resource:

root@cl3-amd-node2:~# drbdadm status
r0 role:Primary
  disk:UpToDate
  cl3-amd-node1 role:Primary
    peer-disk:UpToDate

Now we have a new block device on both servers:

root@cl3-amd-node2:~# fdisk -l /dev/drbd0
Disk /dev/drbd0: 931.4 GiB, 1000037986304 bytes, 1953199192 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

We can configure this drbd block device as physical volume for lvm. This lvm is on top of this drbd. So, we can continue as it is a physical disk. Do it only on one server. The change will reflect on second server, due to primary-primary disk of drbd:

pvcreate /dev/drbd0
  Physical volume "/dev/drbd0" successfully created

As we can see, we must adapt /etc/lvm/lvm.conf to our needs, because it scans all block devices and we can found duplicate entries:

root@cl3-amd-node2:~# pvs
  Found duplicate PV WXwDGteoexfmLxN6GQvt6Nd3jJxgvT2z: using /dev/drbd0 not /dev/md1p1
  Found duplicate PV WXwDGteoexfmLxN6GQvt6Nd3jJxgvT2z: using /dev/md1p1 not /dev/drbd0
  Found duplicate PV WXwDGteoexfmLxN6GQvt6Nd3jJxgvT2z: using /dev/drbd0 not /dev/md1p1
  PV         VG   Fmt  Attr PSize   PFree
  /dev/drbd0      lvm2 ---  931.36g 931.36g
  /dev/md0   pve  lvm2 a--  931.38g      0

So, we must edit filter option in this configuration. Look at our resouce configuration r0.res. We must exlude our backend devices (/dev/sdb1 on one server and /dev/md1p1 on second server), or we can reject all devices and allow only specific. I prefer reject all and allow only what we want. So edit the filter variable.

root@cl3-amd-node1:~# cat /etc/lvm/lvm.conf | grep drbd
     filter =[ "a|/dev/drbd0|", "a|/dev/sda3|", "r|.*|" ]

root@cl3-amd-node2:~# cat /etc/lvm/lvm.conf | grep drbd
    filter =[ "a|/dev/drbd0|", "a|/dev/md0|", "r|.*|" ]

Now, we don’t see duplicates and we can create a volume group. Only on one server:

root@cl3-amd-node2:~# vgcreate drbd0-vg /dev/drbd0
  Volume group "drbd0-vg" successfully created
...
root@cl3-amd-node2:~# pvs
  PV         VG       Fmt  Attr PSize   PFree
  /dev/drbd0 drbd0-vg lvm2 a--  931.36g 931.36g
  /dev/md0   pve      lvm2 a--  931.38g      0

And finally we add the LVM group to the proxmox. It can be done via web interface. So, go to proxmox web interface to Datacenter, click on storage and add (LVM).
Then create your ID (this is the name of your storage. It can not be changed later. Maybe: drbd0-vg), next you will see the previously created volume group drbd0-vg. So select it and enable the sharing by click the ‘shared’ box.
Now, we can create virtual machine on this LVM and when we can migrate it without downtime from one server to another because of drbd. There is one shared storage. So when the migration starts, machine is started on another server and through ssh tunnel is migrate content of ram. And after few seconds, it is started.
Sometimes, after some circumstances with network disconnect and connect, there is split-brain detected. So if this happened, don’t panic. When this happened, both servers are marked as “standalone” and drbd storage started to diverge. From this time there happened different writes to both sides. We must one of this servers mark as victim, because one of these servers has the “right” data and the other has “wrong” data. So the only way is backup the running virtuals on the “victim” and then we must destroy/discard this data on drbd storage and synchronize it from other server, which has “right” data. So if this is happening, this is in logs:

root@cl3-amd-node1:~# dmesg | grep -i brain
[499210.096185] drbd r0/0 drbd0 cl3-amd-node1: helper command: /sbin/drbdadm initial-split-brain
[499210.097306] drbd r0/0 drbd0 cl3-amd-node1: helper command: /sbin/drbdadm initial-split-brain exit code 0 (0x0)
[499210.097313] drbd r0/0 drbd0: Split-Brain detected but unresolved, dropping connection!

We must manually solve this problem. So I choose as victim: cl3-amd-node1. We must set this node as secondary:

drbdadm secondary r0

And now, we must disconnect it and connect it back with marking data to be discarded.

root@cl3-amd-node1:~# drbdadm connect --discard-my-data r0

And after synchronization, mark it back to primary node:

root@cl3-amd-node1:~# drbdadm primary r0

And in log, we can see:

cl3-amd-node1 kernel: [246882.068518] drbd r0/0 drbd0: Split-Brain detected, manually solved. Sync from peer node

Have fun.

Total Page Visits: 221053 - Today Page Visits: 97

How to create software raid 1 with mdadm with spare

At first, we must create partitions on disks with the SAME size in blocks:

fdisk /dev/sdc
> n (new partition)
> p (primary type of partition)
> 1  (partition number)
> 2048 (first sector: default)
> 1953525167 (last sector: default)
> t (change partition type) - selected partition nb. 1
> fd (set it to Linux raid autodetect)
> w (write end exit)

fdisk /dev/sdd
> n (new partition)
> p (primary type of partition)
> 1  (partition number)
> 2048 (first sector: default)
> 1953525167 (last sector: default)
> t (change partition type) - selected partition nb. 1
> fd (set it to Linux raid autodetect)
> w (write end exit)

fdisk -l /dev/sdc
Disk /dev/sdc: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Device     Boot Start        End    Sectors   Size Id Type
/dev/sdc1        2048 1953525167 1953523120 931.5G fd Linux raid autodetect
root@cl3-amd-node2:~# fdisk -l /dev/sdd
Disk /dev/sdd: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Device     Boot Start        End    Sectors   Size Id Type
/dev/sdd1        2048 1953525167 1953523120 931.5G fd Linux raid autodetect

Now, we can create raid using a mdadm. Parameter –level=1 defines raid1.

 mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sdc1 /dev/sdd1

We can watch the progress of building the raid:

cat /proc/mdstat
md1 : active raid1 sdd1[1] sdc1[0]
      976630464 blocks super 1.2 [2/2] [UU]
      [>....................]  resync =  1.8% (17759616/976630464) finish=110.0min speed=145255K/sec
      bitmap: 8/8 pages [32KB], 65536KB chunk

Now we can add a spare disk:

fdisk /dev/sde
> n (new partition)
> p (primary type of partition)
> 1  (partition number)
> 2048 (first sector: default)
> 1953525167 (last sector: default)
> t (change partition type) - selected partition nb. 1
> fd (set it to Linux raid autodetect)
> w (write end exit)

mdadm --add-spare /dev/md1 /dev/sde1

And now we can see detail of the raid:

mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Tue Mar 14 11:56:28 2017
     Raid Level : raid1
     Array Size : 976630464 (931.39 GiB 1000.07 GB)
  Used Dev Size : 976630464 (931.39 GiB 1000.07 GB)
   Raid Devices : 2
  Total Devices : 3
    Persistence : Superblock is persistent
  Intent Bitmap : Internal
    Update Time : Tue Mar 14 12:00:49 2017
          State : clean, resyncing
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1
  Resync Status : 3% complete
           Name : cl3-amd-node2:1  (local to host cl3-amd-node2)
           UUID : 919632d4:74908819:4f43bba3:33b89328
         Events : 52
    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8       49        1      active sync   /dev/sdd1
       2       8       65        -      spare   /dev/sde1

And we can it see here too:

cat /proc/mdstat
md1 : active raid1 sde1[2](S) sdd1[1] sdc1[0]
      976630464 blocks super 1.2 [2/2] [UU]
      [=>...................]  resync =  7.5% (73929920/976630464) finish=103.3min speed=145533K/sec
      bitmap: 8/8 pages [32KB], 65536KB chunk
unused devices: <none>

After reboot, if we can not see our md1 device like this:

root@cl3-amd-node2:/etc/drbd.d# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sde1[2](S) sdb1[1]
      976629760 blocks super 1.2 [2/2] [UU]
      bitmap: 1/8 pages [4KB], 65536KB chunk
unused devices: <none>

We can recreate (assemble) it with this command without resync:

mdadm --assemble /dev/md1 /dev/sdc1 /dev/sdd1
mdadm: /dev/md1 has been started with 2 drives.
root@cl3-amd-node2:/etc/drbd.d# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdc1[0] sdd1[1]
      976630464 blocks super 1.2 [2/2] [UU]
      bitmap: 0/8 pages [0KB], 65536KB chunk
md0 : active raid1 sda1[0] sde1[2](S) sdb1[1]
      976629760 blocks super 1.2 [2/2] [UU]
      bitmap: 1/8 pages [4KB], 65536KB chunk
unused devices: <none>

If we want to automatically start this raid with the boot, we must add this array to mdadm.conf. At first, we scan for our arrays and add it to /etc/mdadm/mdadm.conf.

root@cl3-amd-node2:/etc/drbd.d# mdadm --examine --scan
...
ARRAY /dev/md/1  metadata=1.2 UUID=94e2df50:43dbed78:b3075927:401a9b65 name=cl3-amd-node2:1
ARRAY /dev/md/0  metadata=1.2 UUID=2c29b20a:0f2d8abf:c2c9e150:070adaba name=cl3-amd-node2:0
   spares=1

cat /etc/mdadm/mdadm.conf
...
# definitions of existing MD arrays
ARRAY /dev/md/0  metadata=1.2 UUID=2c29b20a:0f2d8abf:c2c9e150:070adaba name=cl3-amd-node2:0
   spares=1

echo "ARRAY /dev/md/1  metadata=1.2 UUID=94e2df50:43dbed78:b3075927:401a9b65 name=cl3-amd-node2:1" >> /etc/mdadm/mdadm.conf

And the last step is update the initramfs to update mdadm.conf in it:

update-initramfs -u

If there is a need to replace bad missing disk, we must create a partition on new disk with the same space.

fdisk -l /dev/sdb
Disk /dev/sdb: 233.8 GiB, 251000193024 bytes, 490234752 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
Device     Boot Start       End   Sectors   Size Id Type
/dev/sdb1        2048 488397167 488395120 232.9G fd Linux raid autodetect

Degraded array:

mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Fri May 27 09:08:25 2016
     Raid Level : raid5
     Array Size : 488132608 (465.52 GiB 499.85 GB)
  Used Dev Size : 244066304 (232.76 GiB 249.92 GB)
   Raid Devices : 3
  Total Devices : 2
    Persistence : Superblock is persistent
    Update Time : Thu Apr 20 11:33:11 2017
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
         Layout : left-symmetric
     Chunk Size : 512K
           Name : cl2-sm-node3:1  (local to host cl2-sm-node3)
           UUID : 827b1c8a:5a1a1e7c:1bb5624f:9aa491b1
         Events : 692
    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       65        1      active sync   /dev/sde1
       3       8       49        2      active sync   /dev/sdd1

Now we can add new disk to this array:

mdadm --manage /dev/md1 --add /dev/sdb1
   mdadm: added /dev/sdb1

And its done:

cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid5 sdb1[4] sde1[1] sdd1[3]
      488132608 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
      [>....................]  recovery =  0.3% (869184/244066304) finish=197.5min speed=20515K/sec
      bitmap: 0/2 pages [0KB], 65536KB chunk

If we have a problem with some disk, we may remove it during work. At first, we must mark it as failed. So look at good and working raid-1:

mdadm --detail /dev/md0
/dev/md0:
 Raid Level : raid1
 Array Size : 976629760 (931.39 GiB 1000.07 GB)
 Used Dev Size : 976629760 (931.39 GiB 1000.07 GB)
 Raid Devices : 2
 Total Devices : 3
 State : clean
 Active Devices : 2
 Working Devices : 3
 Failed Devices : 0
 Spare Devices : 1
active sync /dev/sda1
active sync /dev/sdb1
spare /dev/sde1

Now mark disk sda1 as faulty:

mdadm /dev/md0 -f /dev/sda1

mdadm --detail /dev/md0
/dev/md0:
 Array Size : 976629760 (931.39 GiB 1000.07 GB)
 Used Dev Size : 976629760 (931.39 GiB 1000.07 GB)
 Raid Devices : 2
 Total Devices : 3
 Persistence : Superblock is persistent
 State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 1
 Spare Devices : 1
Rebuild Status : 0% complete
spare rebuilding /dev/sde1
active sync /dev/sdb1
faulty /dev/sda1

cat /proc/mdstat
md0 : active raid1 sda1[0](F) sde1[2] sdb1[1]
 976629760 blocks super 1.2 [2/1] [_U]
 [>....................] recovery = 0.2% (2292928/976629760) finish=169.9min speed=95538K/sec

I waited until finish this operation. Then I halted this server, remove the exact drive and insert a new one. After power-on, we create a new partition table on /dev/sda exactly as old one, or as active disks now. The we re-add it as spare to the raid:

 mdadm /dev/md0 -a /dev/sda1

mdadm --detail /dev/md0
/dev/md0:
 Raid Level : raid1
 Array Size : 976629760 (931.39 GiB 1000.07 GB)
 Used Dev Size : 976629760 (931.39 GiB 1000.07 GB)
 Raid Devices : 2
 Total Devices : 3
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
 Spare Devices : 1
active sync /dev/sde1
active sync /dev/sdb1
spare /dev/sda1

cat /proc/mdstat
md0 : active raid1 sda1[3](S) sde1[2] sdb1[1]
 976629760 blocks super 1.2 [2/2] [UU]
 bitmap: 1/8 pages [4KB], 65536KB chunk

Total Page Visits: 221053 - Today Page Visits: 97