Tuesday, July 21, 2009

Screen still going blank! (/proc/mtrr)

I wrote before how my screen keeps going blank, and how it had seemed to get worse after upgrading to software RAID. It has still been doing it, and the past week or so I've been systematically disabling things to see if I can fix it. It still happens, but less frequently, so some combination of things I've disabled may have helped?

But, it does still happen, so I went googling again. I think I may have found something:
https://bugzilla.redhat.com/show_bug.cgi?id=446620
https://bugs.freedesktop.org/show_bug.cgi?id=15360

It seems Intel video driver and/or the linux kernel cope badly with unusual memory configurations. I moved to software RAID at the same time as I moved from 2G to 6G main memory! Here is my /proc/mtrr:

reg00: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1
reg01: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg03: base=0x100000000 (4096MB), size=2048MB: write-back, count=1
reg04: base=0x180000000 (6144MB), size= 512MB: write-back, count=1
reg05: base=0x1a0000000 (6656MB), size= 256MB: write-back, count=1
reg06: base=0xcf700000 (3319MB), size= 1MB: uncachable, count=1

This looks like one of the problem ones from the above bug report. Unfortunately I'm at a bit of a loss what to do with it now. My understanding of the problem is something like main memory and video memory overlap, so when some program uses that memory it kills my video and X dies. My ctrl-alt-F1, breath in, breath out, ctrl-alt-F7 "fix" must reset X?? (Hhhmm, how come I never see problems with any programs having their memory overwritten by the graphics card though?)

Here are my dmesg entries either side of the only mtrr complaint:
[ 142.691737] [drm] Initialized drm 1.1.0 20060810
[ 142.700818] ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 17
[ 142.700827] PCI: Setting latency timer of device 0000:00:02.0 to 64
[ 142.700880] [drm] Initialized i915 1.6.0 20060119 on minor 0
[ 142.715171] mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
[ 143.311587] set status page addr 0x00033000


These are the best explanations I've found so far:
http://www.rage3d.com/board/showthread.php?t=33821469
http://www.rage3d.com/board/showthread.php?threadid=33736241

Both the solutions in the 2nd link talk about modifying the video=... parameter given to the kernel. But I don't have one of them. I just tried throwing it on the end of the kernel commands but the /proc/mtrr output is unchanged, so I don't think it had any effect:
kernel /vmlinuz-2.6.24-24-server root=/dev/md3 ro quiet splash nomttr

I've just tried changing Advanced|Chipset|Northbridge|Memory Remap from Enabled to Disabled, in my bios. Main memory has dropped from 6G to about 5.3G, and /proc/mtrr has changed to:
reg00: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1
reg01: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg03: base=0x100000000 (4096MB), size=2048MB: write-back, count=1
reg04: base=0xcf700000 (3319MB), size= 1MB: uncachable, count=1
reg05: base=0xcf800000 (3320MB), size= 8MB: uncachable, count=1

Well, it looks just as dubious to my untrained eye (for starters it still seems to think there is 6G of memory), but at least it is different. Let's see what happens... no screen still goes blank.

What about DVMT mode? http://www.techarp.com/showfreebog.aspx?lang=0&bogno=322
I currently am in "DVMT" mode, with 256M. I've plenty of memory, so let's try fixed mode, with 256M. ...it made no difference to the /proc/mtrr output, and the problem still happens.

(There is also an "ASMT resolution" option in the bios, which is "enabled". Google isn't helping me much here, but some explanation is here and it doesn't seem to be to do with video memory: http://www.avsforum.com/avs-vb/showthread.php?t=938473&page=12)

(I've been holding off on posting this for the past week, in the hope I'd resolve the problem. But unfortunately I haven't yet. I guess fiddling with /proc/mtrr directly is needed, but I don't have time to investigate that currently.)

3 comments:

keith.s.wilkinson said...

One possible simple workaround might be to use a graphic card -- the fanless, low-profile HD4550 is cheap, and really fast compared with Intel chipset video -- and disable the chipset video in the BIOS.

keith.s.wilkinson said...

Maker URL

keith.s.wilkinson said...

The HD 4670 is hotter (not fanless) and supports video transcoding (see here and here)