Page MenuHomeSoftware Heritage

I/O error on worker06.internal
Closed, MigratedEdits Locked

Description

worker06:/var/log/auth.log.1 is partially unreadable.

worker06 is a VM and its virtual drive media reports read errors:

[Mon Jan 21 11:32:42 2019] sd 2:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Mon Jan 21 11:32:42 2019] sd 2:0:0:0: [sda] tag#0 Sense Key : Aborted Command [current]
[Mon Jan 21 11:32:42 2019] sd 2:0:0:0: [sda] tag#0 Add. Sense: I/O process terminated
[Mon Jan 21 11:32:42 2019] sd 2:0:0:0: [sda] tag#0 CDB: Read(10) 28 00 02 8f 6b 80 00 00 08 00
[Mon Jan 21 11:32:42 2019] blk_update_request: I/O error, dev sda, sector 42953600

Event Timeline

ftigeot changed the task status from Open to Work in Progress.Jan 21 2019, 2:38 PM
ftigeot triaged this task as High priority.
ftigeot created this task.

worker06.internal.softwareheritage.org is a VM running on louvre, Its virtual disk is backed by /dev/dm-36 on the host.

louvre:/dev/dm-36 also reports I/O errors:

[Mon Jan 21 12:17:53 2019] buffer_io_error: 25 callbacks suppressed
[Mon Jan 21 12:17:53 2019] Buffer I/O error on dev dm-36, logical block 930275, async page read
[Mon Jan 21 12:17:53 2019] Buffer I/O error on dev dm-36, logical block 930275, async page read

louvre:/dev/dm-36 is backed by /dev/md3, which is itself layered on top of /dev/sda and /dev/sdb.
None of these low-level devices report any error.

The /dev/md3 check completed successfully and did not report any error.

ftigeot changed the status of subtask T1518: I/O error on louvre:/dev/md3 from Open to Work in Progress.Feb 5 2019, 4:18 PM

A brand new virtual disk was created, skipping bad data blocks:

  • Shut down the VM
  • Create a new drive with identical size in the Proxmox web interface. There are now two virtual drives:
/dev/ssd/vm-206-disk-1 => dm-34
/dev/ssd/vm-206-disk-0 => dm-35
  • Activate the new lvm device:
lvchange -a y ssd/vm-206-disk-0
  • Copy data, skipping unreadable 4K blocks:
dd if=/dev/dm-34 of=/dev/dm-35 bs=4k conv=sync,noerror
  • In the Proxmox wui, detach the old drive, tell the VM to boot from the new one and restart it

Resolved on 2019-02-07.