Got "input output error" when execute any commands

Question

Last Monday morning, I found my server can't run any command, and it shouws "input output error". With tried for half an hour, I found the only command can executed is sudo poweroff -f (must use flag -f or I got "input output error").
And I booting server manually, and check the system log, but I got nothing special. And I made a smartctl test to confirm if there is any promblem with hard disk. And it passed without error.
Then this Monday this problem shows again. I shutdown the server and boot it manually, and it looks fine just like nothing happened. Then I use msmtest86 8.2 test if if memory stick is ok. And makesure the SATA cable and hard disk in good condition and connected trustily.
I think maybe it is the problem with OS or file system? My OS is Debian 8.11. Can you give me some advice? Thank you all!

Hello @ajgringo619 I checked my hard disk usage, and it still have 600+GB available. — fajin yu, Sep 19 '19 at 05:36
You can try the command `badblocks -nv ` (e.g: `badblocks -v /dev/sda2 ) `. Here the device name [ The block device that is mounted on `/` e.g: `/dev/sda2` ] can be found from the command `lsblk`. — ss_iwe, Sep 19 '19 at 06:32
@ajgringo619, if you running low on space, your programs will get `ENOSPC` ("No space left on device") error instead. I personally run into this condition from time to time in my small desktop configuration. — xwindows -on strike-, Sep 19 '19 at 07:37

xwindows -on strike- · Accepted Answer · 2019-09-21T07:22:59.983

I found my server can't run any command, and it shouws "input output error"

The error code EIO ("Input/output error") on command launch would happen when your filesystem is damaged; or worse, when you are running on a faulty storage.

Cross your fingers; either way, be aware that at this point you should NOT try to power on the server unless really necessary.¹

The Test

There is one sure-fire way to distinguish between two root causes: conduct block-level read scan on the system, and watch out for kernel messages.

Boot your system with GNU/Linux recovery boot disk.
Change the system to the plain old text console (press Ctrl+Alt+F1); don't use graphical terminal for this.
Login as root.
Run dmesg -E to enable live kernel message display on the console.
Run dmesg -n debug to let low-level kernel message though.
Run blkid to see which disk contains system partition. (Note that blkid will list partitions; strip number off the end of partition path and you will get the disk)
Run time -p dd if=/dev/sda of=/dev/null bs=4M to conduct an entire-disk read test (please type this carefully). If your system disk is not /dev/sda, substitute accordingly.
Watch the screen (it will take a long while)...

Results

In the best case where dd completed successfully and uneventfully, then it is likely a filesystem problem.
- If you are comfortable doing filesystem check from boot disk, you can do it now (recommended).
- If you would rather let the system sort it by itself, reboot (also remove the boot disk), and boot your usual system but with fsck.mode=force appended to the end of kernel command line. (See this question for details)
- Discussing the result of filesystem check will warrant a different question though.

However, in the worst case, you would see kernel messages like this spewing on the screen:

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: irq_stat 0x40000001
ata2.00: failed command: READ DMA EXT
ata2.00: cmd 25/00:08:78:15:c5/00:00:6c:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:78:15:c5/00:00:6c:00:00/e0 Emask 0x9 (media error)
ata2.00: status: { DRDY ERR }
ata2.00: error: { UNC }
ata2.00: configured for UDMA/100
sd 1:0:0:0: [sda] Unhandled sense code
sd 1:0:0:0: [sda]  
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 1:0:0:0: [sda]  
Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
        6c c5 15 78 
sd 1:0:0:0: [sda]  
Add. Sense: Unrecovered read error - auto reallocate failed
sd 1:0:0:0: [sda] CDB: 
Read(10): 28 00 6c c5 15 78 00 00 08 00
end_request: I/O error, dev sda, sector 1824855416
Buffer I/O error on device sda, logical block 228106927
ata2: EH complete

Look for the key parts:

DRDY, ERR and UNC in braces
Medium Error status
Unrecovered read error sense message

If you glanced and find these in the messages (even once), they show that you are facing physical disk error.

When this is the case, don't let dd finish, press Ctrl+C to stop, NOW; shut down your system, and bring your disk to a data recovery shop you trust.

If you did not find the above worst-case telltales, and rather found this kind of kernel messages repeated:
```
ata2: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
ata2: irq_stat 0x00000040, connection status changed
ata2: SError: { CommWake DevExch }
ata2: hard resetting link
ata2: link is slow to respond, please be patient (ready=0)
```
Key parts:
- hard resetting link
- link is slow to respond
Then you are rather facing SATA link problem (e.g. bad cabling): press Ctrl+C to stop, shut down your system, fix your disk cable and connection, and try again.

Side Notes

And I made a smartctl test to confirm if there is any promblem with hard disk. And it passed without error.

Beware that some hard disks tell straight lies in their S.M.A.R.T status (I'm looking at you, Toshiba); my previous laptop hard disk just ground to halt when reading, spewing read errors, and it still said "nothing's wrong" in its status registers.

If your server is mission-critical, then you should consider RAID-based setup.

¹ Cautionary tale: My housemate once ignored this warning, and keep filesystem checker grinding on his desktop system anyway. He didn't wait for me to check it up until it eventually failed to boot. Once I got a chance to check it, the disk damage had been already beyond recover (the 500 GB disk could only barely read at snail-pace KB/s, and there was no significant continuous readable area found even after several days).

On the other hand, in another case with the same symptom, the machine owner heeded my warning and left the thing off until I could check it. Of course, it was a hard disk failure. After half a day of GNU DDRescue session and one new hard disk, I brought a good news to him that his system and data was 100% recovered at block level- i.e. all files intact, and ready to boot again without any modification.

Why: Change the system to the plain old text console (press Ctrl+Alt+F1); don't use graphical terminal for this. — HUA Di, Aug 17 '21 at 12:40
@HUADi because the X system, WM & DE, will write files in the background, for their own purposes and for starting up other applications. If you're worried about a nearly-failed disk, you don't want any disk activity happening unless it's directly helping you to backup data. — mcint, Feb 07 '23 at 01:15

score 3 · Answer 2 · answered Oct 19 '20 at 21:10

3

I ran into this error on my linux server (running Debian 10) when navigating folders and accessing files, despite the drive passing all SMART tests. I was not able to solve the problem using any of the answers posted on Stackexchange.

I was using a 2.5" HDD in a 3.5" drive bay, and it turns out the drive had vibrated lose from the SATA connector. I shut the server down and plugged the drive back in firmly and the errors disappeared.

answered Oct 19 '20 at 21:10

Wurstbrot24

31
1

Thank you! My version was dust on probably the connectors of my nvme drive. I cleaned it using compressed air and the problem went away. – He Shiming Mar 12 '22 at 01:24

Got "input output error" when execute any commands

2 Answers2

The Test

Results

Side Notes

Linked