From josip@icase.edu Tue Jan 16 13:33:58 2001
Date: Tue, 16 Jan 2001 10:26:27 -0500
From: Josip Loncaric <josip@icase.edu>
To: Jon Tegner <tegner@nada.kth.se>
Cc: beowulf@beowulf.org
Subject: Re: D-Link switch and ecc-memory.

Jon Tegner wrote:
> 
> I also have an completely unrelated question, would there be any problem
> to use ecc-memory with Athlons? And how big is the risk of getting
> memory related errors (for example, we haven't used ecc-memory on our
> system, is it "stupid" not to use ecc-memory)?

My best estimate is that our system corrects one single bit error (SBE)
per week in 37.5 GB of ECC memory.  This translates into SBE event
intervals of about 9 months per GB of RAM.  Your mileage may vary...

Sincerely,
Josip

-- 
Dr. Josip Loncaric, Senior Staff Scientist        mailto:josip@icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric@larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134

_______________________________________________
Beowulf mailing list
Beowulf@beowulf.org
http://www.beowulf.org/mailman/listinfo/beowulf

From glindahl@hpti.com Tue Jan 16 13:34:52 2001
Date: Tue, 16 Jan 2001 12:57:23 -0500
From: Greg Lindahl <glindahl@hpti.com>
To: josip@icase.edu
Cc: beowulf@beowulf.org
Subject: RE: D-Link switch and ecc-memory.

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "US-ASCII" character set.  ]
    [ Some characters may be displayed incorrectly. ]

> My best estimate is that our system corrects one single bit error (SBE)
> per week in 37.5 GB of ECC memory.  This translates into SBE event
> intervals of about 9 months per GB of RAM.  Your mileage may vary...

Josip neglected to mention that he is at sea level. If you are at a higher
altitude, you will see more errors.

CPlant's 2000 cpus have a total of something like 500 gigabytes of RAM. I
haven't computed the errors/GB/month (although we do monitor them, because
it detects bad motherboards), but with Josip's number, that would be an
interrupt every 12 hours.

-- g


_______________________________________________
Beowulf mailing list
Beowulf@beowulf.org
http://www.beowulf.org/mailman/listinfo/beowulf
