[linux-l] Platte oder Controller am Abrauchen?
Dani Oderbolz
oderbolz-lists at ecologic.de
Do Mär 4 11:37:11 CET 2004
Liebe Liste,
ich bin derzeit etwas verzweifelt, denn einer unserer Server
machte gestern Nacht auf Panik:
Mar 3 19:46:20 ecoserv01 kernel: hde: irq timeout: status=0xd0 {
Busy }
Mar 3 19:46:20 ecoserv01 kernel: PDC202XX: Primary channel reset.
Mar 3 19:46:55 ecoserv01 kernel: ide2: reset timed-out, status=0x80
Mar 3 19:46:55 ecoserv01 kernel: hde: status timeout:
status=0xd0 { Busy }
Mar 3 19:46:55 ecoserv01 kernel: PDC202XX: Primary channel reset.
Mar 3 19:46:55 ecoserv01 kernel: hde: drive not ready for command
Mar 3 19:47:01 ecoserv01 kernel: ide2: reset: success
Mar 3 18:55:53 ecoserv01 smbd[23052]: [2004/03/03 18:55:53, 0]
lib/util_sock.c:read_data(436)
Mar 3 18:55:53 ecoserv01 smbd[23052]: read_data: read failure
for 4. Error = Connection reset by peer
[snip]
Mar 3 19:46:20 ecoserv01 kernel: hde: irq timeout: status=0xd0 {
Busy }
Mar 3 19:46:20 ecoserv01 kernel: PDC202XX: Primary channel reset.
Mar 3 19:46:55 ecoserv01 kernel: ide2: reset timed-out, status=0x80
Mar 3 19:46:55 ecoserv01 kernel: hde: status timeout:
status=0xd0 { Busy }
Mar 3 19:46:55 ecoserv01 kernel: PDC202XX: Primary channel reset.
Mar 3 19:46:55 ecoserv01 kernel: hde: drive not ready for command
Mar 3 19:47:01 ecoserv01 kernel: ide2: reset: success
Mar 3 20:02:43 ecoserv01 smbd[22247]: [2004/03/03 20:02:43, 0]
lib/util_sock.c:read_data(436)
Mar 3 20:02:43 ecoserv01 smbd[22247]: read_data: read failure
for 4. Error = No route to host
Mar 3 20:32:12 ecoserv01 smbd[23105]: [2004/03/03 20:32:12, 0]
lib/util_sock.c:read_data(436)
Mar 3 20:32:12 ecoserv01 smbd[23105]: read_data: read failure
for 4. Error = No route to host
Mar 3 20:49:38 ecoserv01 smbd[22276]: [2004/03/03 20:49:38, 0]
lib/util_sock.c:read_data(436)
Mar 3 20:49:38 ecoserv01 smbd[22276]: read_data: read failure
for 4. Error = No route to host
Mar 3 20:50:13 ecoserv01 kernel: hde: irq timeout: status=0xd0 {
Busy }
Mar 3 20:52:45 ecoserv01 kernel: hde: irq timeout: status=0xd0 {
Busy }
Mar 3 20:52:45 ecoserv01 kernel: hde: status timeout:
status=0xd0 { Busy }
Mar 3 20:52:45 ecoserv01 kernel: PDC202XX: Primary channel reset.
Mar 3 20:52:45 ecoserv01 kernel: hde: drive not ready for command
Mar 3 20:52:45 ecoserv01 kernel: ide2: reset: success
Mar 3 20:52:55 ecoserv01 kernel: hde: irq timeout: status=0xd0 {
Busy }
Mar 3 20:53:09 ecoserv01 kernel: hde: status timeout:
status=0xd0 { Busy }
Mar 3 20:53:09 ecoserv01 kernel: PDC202XX: Primary channel reset.
Mar 3 20:53:09 ecoserv01 kernel: hde: drive not ready for command
Mar 3 20:53:12 ecoserv01 kernel: ide2: reset: success
[snip]
Mar 4 07:41:26 ecoserv01 kernel: hde: irq timeout: status=0xd0 {
Busy }
Mar 4 07:41:26 ecoserv01 kernel: hde: status timeout:
status=0xd0 { Busy }
Mar 4 07:41:26 ecoserv01 kernel: PDC202XX: Primary channel reset.
Mar 4 07:41:26 ecoserv01 kernel: hde: drive not ready for command
Mar 4 07:42:01 ecoserv01 kernel: ide2: reset timed-out, status=0x80
Mar 4 07:42:01 ecoserv01 kernel: hde: status timeout:
status=0x80 { Busy }
Mar 4 07:42:01 ecoserv01 kernel: PDC202XX: Primary channel reset.
Mar 4 07:42:01 ecoserv01 kernel: hde: drive not ready for command
Mar 4 07:42:31 ecoserv01 kernel: ide2: reset timed-out, status=0x80
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 2009535
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 3645159
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 8668095
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 9514431
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 18087999
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 103083135
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 103101607
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 103284799
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 103381711
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 103546943
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 104333375
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 124387079
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 124423599
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 128974911
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 129060815
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 137363519
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 140023199
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 682567
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 683111
Mar 4 07:42:31 ecoserv01 kernel: end_request: I/O error, dev
21:00 (hde), sector 685359
Mar 4 07:42:31 ecoserv01 kernel: journal-949: buffer 251184
write failed
Mar 4 07:42:31 ecoserv01 kernel: kernel BUG at prints.c:334!
Mar 4 07:42:31 ecoserv01 kernel: invalid operand: 0000
2.4.19-4GB #1 Fri Sep 13 13:14:56 UTC 2002
Mar 4 07:42:31 ecoserv01 kernel: CPU: 0
Mar 4 07:42:31 ecoserv01 kernel: EIP:
0010:[nls_iso8859-1:__insmod_nls_iso8859-1_O/lib/modules/2.4.19-4GB/kernel/fs/n+-304467624/96]
Not tainted
Mar 4 07:42:31 ecoserv01 kernel: EIP: 0010:[<c1dd2158>]
Not tainted
Mar 4 07:42:31 ecoserv01 kernel: EFLAGS: 00010286
Mar 4 07:42:31 ecoserv01 kernel: eax: 0000002b ebx: c204cc00
ecx: dc5ade20 edx: c1de99e2
Mar 4 07:42:31 ecoserv01 kernel: esi: d9403b40 edi: d9403b40
ebp: c2189ea0 esp: dc5ade1c
Mar 4 07:42:31 ecoserv01 kernel: ds: 0018 es: 0018 ss: 0018
Mar 4 07:42:31 ecoserv01 kernel: Process postmaster (pid: 26498,
stackpage=dc5ad000)
Mar 4 07:42:31 ecoserv01 kernel: Stack: c1de99e2 c1dea3a0
c1de7e80 dc5ade3c e2be4ae4 c1dddd45 c204cc00 c1de7e80
Mar 4 07:42:31 ecoserv01 kernel: 0003d530 00000001
00000001 00000015 c2189ea0 00000018 c2189ea0 c204cc00
Mar 4 07:42:32 ecoserv01 kernel: c1dde25c c204cc00
c2189ea0 00000001 c2189ba0 4046cf0f c204cc00 4046cf0f
Mar 4 07:42:32 ecoserv01 kernel: Call Trace:
[nls_iso8859-1:__insmod_nls_iso8859-1_O/lib/modules/2.4.19-4GB/kernel/fs/n+-304371230/96]
[nls_iso8859-1:__insmod_nls_iso8859-1_O/lib/modules/2.4.19-4GB/kernel/fs/n+-304368736/96]
[nls_iso8859-1:__insmod_nls_iso8859-1_O/lib/modules/2.4.19-4GB/kernel/fs/n+-304378240/96]
[nls_iso8859-1:__insmod_nls_iso8859-1_O/lib/modules/2.4.19-4GB/kernel/fs/n+-304419515/96]
[nls_iso8859-1:__insmod_nls_iso8859-1_O/lib/modules/2.4.19-4GB/kernel/fs/n+-304378240/96]
Mar 4 07:42:32 ecoserv01 kernel: Call Trace: [<c1de99e2>]
[<c1dea3a0>] [<c1de7e80>] [<c1dddd45>] [<c1de7e80>]
Mar 4 07:42:32 ecoserv01 kernel:
[nls_iso8859-1:__insmod_nls_iso8859-1_O/lib/modules/2.4.19-4GB/kernel/fs/n+-304418212/96]
[nls_iso8859-1:__insmod_nls_iso8859-1_O/lib/modules/2.4.19-4GB/kernel/fs/n+-304405548/96]
[nls_iso8859-1:__insmod_nls_iso8859-1_O/lib/modules/2.4.19-4GB/kernel/fs/n+-304403731/96]
[nls_iso8859-1:__insmod_nls_iso8859-1_O/lib/modules/2.4.19-4GB/kernel/fs/n+-304408414/96]
[nls_iso8859-1:__insmod_nls_iso8859-1_O/lib/modules/2.4.19-4GB/kernel/fs/n+-304477151/96]
[__mark_inode_dirty+119/128]
Mar 4 07:42:32 ecoserv01 kernel: [<c1dde25c>] [<c1de13d4>]
[<c1de1aed>] [<c1de08a2>] [<c1dcfc21>] [<c0156f17>]
Mar 4 07:42:32 ecoserv01 kernel: [update_atime+85/96]
[generic_file_read+130/288] [file_read_actor+0/240]
[sys_read+133/256] [system_call+51/64]
Mar 4 07:42:32 ecoserv01 kernel: [<c0158285>] [<c0133ed2>]
[<c0133d60>] [<c0143725>] [<c0108e63>]
Mar 4 07:42:32 ecoserv01 kernel: Modules:
[(reiserfs:<c1dc0060>:<c1debd50>)]
Der Kernel ist 2.4.19-4GB (Standard Suse 8.1).
Wir betreiben daruf ein Promise FastTrak 100 (RAID 1 aud 2 Platten).
Das Filesystem ist Reiserfs mit 3.6 er Journal.
Nun ist meine grosse Frage: Ist der Controller schuld oder die
Platten?
(Ich denke mal, dass die WS grösser ist, dass es der Controller
ist, denn ich habe ja 2 Platten drin).
Nach einem beherzen Reboot eines Kollegen läuft die Machine im
Moment ohne Mucken, ich habe aber ein ganz schlechtes Gefühl...
Was würdet ihr ausser reiserfsck noch machen?
Danke für den Beistand, Gruss,
Dani
--
Daniel Oderbolz
Jagowstrasse 13
D-10555 Berlin
http://www.oderbolz.org
Mehr Informationen über die Mailingliste linux-l