Re: Linux uptime goes from 1 minute to 1 day - then freeze and hard reboot
Re: Linux uptime goes from 1 minute to 1 day - then freeze and hard reboot
Here's another one. Today a Gutsy server locked up.
I had this happen a while ago on another server where the disk was
bad, and there were bad sectors in the swap file. The data in bad
sectors confused the kernel (no surprise) and it started doing
oom-killer. Replacing the hard drive cured the problem.
So I want to look at HP Proliant SmartArray disks, but unfortunately,
they are behind hardware SCSI. So smartmontools does not support them,
per Bruce Allen.
So what do I do? How can I diagnoze these disks?
output from /var/log/messages
i
Jul 20 00:22:33 smilefdlx01 kernel: [373891.820009] perl invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820022] [out_of_memory+389/448] out_of_memory+0x185/0x1c0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820033] [__alloc_pages+700/784] __alloc_pages+0x2bc/0x310
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820036] [vma_link+96/256] vma_link+0x60/0x100
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820044] [__get_free_pages+56/64] __get_free_pages+0x38/0x40
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820046] [proc_info_read+69/192] proc_info_read+0x45/0xc0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820053] [vfs_read+188/352] vfs_read+0xbc/0x160
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820057] [proc_info_read+0/192] proc_info_read+0x0/0xc0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820060] [sys_read+65/112] sys_read+0x41/0x70
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820065] [sysenter_past_esp+107/161] sysenter_past_esp+0x6b/0xa1
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820071] =======================
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820074] Mem-info:
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820076] DMA per-cpu:
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820077] CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820081] CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820084] CPU 2: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820086] CPU 3: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820088] CPU 4: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820092] CPU 5: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820095] CPU 6: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820098] CPU 7: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820101] Normal per-cpu:
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820103] CPU 0: Hot: hi: 186, btch: 31 usd: 135 Cold: hi: 62, btch: 15 usd: 57
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820106] CPU 1: Hot: hi: 186, btch: 31 usd: 115 Cold: hi: 62, btch: 15 usd: 57
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820110] CPU 2: Hot: hi: 186, btch: 31 usd: 130 Cold: hi: 62, btch: 15 usd: 60
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820113] CPU 3: Hot: hi: 186, btch: 31 usd: 129 Cold: hi: 62, btch: 15 usd: 47
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820117] CPU 4: Hot: hi: 186, btch: 31 usd: 29 Cold: hi: 62, btch: 15 usd: 53
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820119] CPU 5: Hot: hi: 186, btch: 31 usd: 92 Cold: hi: 62, btch: 15 usd: 60
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820123] CPU 6: Hot: hi: 186, btch: 31 usd: 170 Cold: hi: 62, btch: 15 usd: 52
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820126] CPU 7: Hot: hi: 186, btch: 31 usd: 121 Cold: hi: 62, btch: 15 usd: 53
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820128] HighMem per-cpu:
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820131] CPU 0: Hot: hi: 186, btch: 31 usd: 61 Cold: hi: 62, btch: 15 usd: 0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820135] CPU 1: Hot: hi: 186, btch: 31 usd: 155 Cold: hi: 62, btch: 15 usd: 8
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820138] CPU 2: Hot: hi: 186, btch: 31 usd: 8 Cold: hi: 62, btch: 15 usd: 8
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820142] CPU 3: Hot: hi: 186, btch: 31 usd: 135 Cold: hi: 62, btch: 15 usd: 0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820145] CPU 4: Hot: hi: 186, btch: 31 usd: 81 Cold: hi: 62, btch: 15 usd: 8
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820148] CPU 5: Hot: hi: 186, btch: 31 usd: 71 Cold: hi: 62, btch: 15 usd: 10
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820151] CPU 6: Hot: hi: 186, btch: 31 usd: 51 Cold: hi: 62, btch: 15 usd: 12
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820154] CPU 7: Hot: hi: 186, btch: 31 usd: 166 Cold: hi: 62, btch: 15 usd: 12
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820159] Active:98677 inactive:6131 dirty:1 writeback:0 unstable:0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820160] free:1761992 slab:8252 mapped:4651 pagetables:305 bounce:0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820164] DMA free:3504kB min:68kB low:84kB high:100kB active:36kB inactive:0kB present:16256kB pages_scanned:51 all_unreclaimable? yes
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820166] lowmem_reserve[]: 0 873 8874
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820175] Normal free:3700kB min:3744kB low:4680kB high:5616kB active:256kB inactive:76kB present:894080kB pages_scanned:506 all_unreclaimable? yes
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820178] lowmem_reserve[]: 0 0 64008
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820183] HighMem free:7040764kB min:512kB low:9096kB high:17684kB active:394416kB inactive:24448kB present:8193024kB pages_scanned:0 all_unreclaimable? no
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820186] lowmem_reserve[]: 0 0 0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820192] DMA: 2*4kB 6*8kB 8*16kB 1*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820204] Normal: 0*4kB 61*8kB 14*16kB 5*32kB 3*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 3624kB
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820212] HighMem: 50749*4kB 40737*8kB 23922*16kB 9009*32kB 2165*64kB 765*128kB 548*256kB 788*512kB 520*1024kB 325*2048kB 943*4096kB = 7040764kB
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820226] Swap cache: add 0, delete 0, find 0/0, race 0+0
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820229] Free swap = 5855652kB
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820231] Total swap = 5855652kB
Jul 20 00:22:35 smilefdlx01 kernel: [373891.820232] Free swap: 5855652kB
Jul 20 00:22:35 smilefdlx01 kernel: [373891.845074] 2293759 pages of RAM
Jul 20 00:22:35 smilefdlx01 kernel: [373891.845079] 2064383 pages of HIGHMEM
Jul 20 00:22:35 smilefdlx01 kernel: [373891.845080] 216220 reserved pages
Jul 20 00:22:35 smilefdlx01 kernel: [373891.845081] 93055 pages shared
Jul 20 00:22:35 smilefdlx01 kernel: [373891.845085] 0 pages swap cached
Jul 20 00:22:35 smilefdlx01 kernel: [373891.845087] 1 pages dirty
Jul 20 00:22:35 smilefdlx01 kernel: [373891.845089] 0 pages writeback
Jul 20 00:22:35 smilefdlx01 kernel: [373891.845092] 4651 pages mapped
Jul 20 00:22:35 smilefdlx01 kernel: [373891.845094] 8252 pages slab
Jul 20 00:22:35 smilefdlx01 kernel: [373891.845096] 305 pages pagetables
On 2008-07-20, Linonut <linonut@bollsouth.nut> wrote:
> * Ignoramus2031 peremptorily fired off this memo:
>
>> I am administering linux installed on approximately 20 machines (home
>> and work). Out of them about 16 are Ubuntu. One experiences periodic
>> freezes (once a week), and that one runs Ubuntu Hardy. I have
>> approximately 7 or so more Ubuntu Hardy machines (desktops and
>> laptops) and they do not freeze, in the last 2 months or so at least.
>>
>> Everything running Gutsy is totally rock solid. However, I feel that
>> installing Hardy on anything where big money is at stake, is premature
>> at this point.
>
> Could be. Just make sure you actually diagnose the problem.
>
>> I try to make sure that these Gutsy machines are upgradeable. I have a
>> feeling that Hardy is improving and in several months I will start
>> deploying it to some production servers.
>>
>> Somewhat contrary to this, I installed Hardy on my replacement
>> personal webserver a while ago and it seems to be in good shape, I
>> will take it to the data center soon.
>
> All too many problem-threads found on the internet consist of various
> blind men making stabs at guessing the problem, and offering
> superstitious advice such as "switch distros" or "reinstall Windows".
>
> Both operating systems have many diagnostic tools. Everyone who cares
> about system maintenance should try to learn to use them.
>
--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/