PCI EIDE CONTROLLER FLAWS (ABRIDGED)


Revision 18: 1995 October 4


INTRODUCTION

There are serious flaws affecting about 1/3 of all PCI
motherboards. The flaws affect any motherboard or EIDE
controller paddleboard containing the PC-Tech RZ-1000 PCI
EIDE controller chip or the CMD PCIO 640 PCI EIDE controller
chip.

The flaws affect motherboards from ASUSTeK, AT&T, DEC, Dell,
Gateway, Intel, Micron, NEC, Zeos and others. Since Intel
makes so many of the motherboards sold under other brand
names, the flaws affect many machines, both 486 and Pentium
PCI.

The flaws show up most frequently when you run a true
multitasking operating system such as OS/2 Warp or NT. It
also shows up under Windows For WorkGroups in 32-bit mode
during tape or floppy backup and restore. In theory the
flaws could do damage under DOS, DESQview, Windows and
Windows For WorkGroups in 16-bit mode, but so far there have
been no damage reports. Windows-95 contains code to bypass
the flaws.

The RZ-1000 has two flaws. The CMD-640 has those same two
flaws plus three others. To make matters worse, most
motherboard manufacturers using these two flawed chips
connected them up incorrectly. There are software bypasses
for these flaws. However, the Warp fix the CMD-640 reduces
disk performance by 15 to 50%. The RZ-1000 fix has
negligible impact on disk I/O though it can slow down
background processes.

I would advise new hardware to bypass the CMD-640 flaws, and
living with software fixes to bypass the RZ-1000 flaws.


WHAT ARE THE SYMPTOMS?

When you are using an IDE or EIDE hard disk attached to the
EIDE motherboard port, the flaws subtly corrupt your files
by randomly changing bytes every once in a while. The flaws
introduce bugs into EXE files, subtle errors into your
spreadsheets, stray characters into your word processing
documents, changes to the deductions in last year's tax
return files, and random changes to engineering design
files.

This corruption happens when you are simultaneously using
your EIDE or IDE hard disk and some other device, most
commonly the floppy drive or mag tape backup.

The same sorts of problem may occur on reading a CD-ROM
drive attached to an EIDE port.


TESTING FOR THE FLAWS

I wrote two test programs that run under DESQview, Windows,
Windows For WorkGroups, Windows 95, NT and OS/2. EIDEtest
verifies that your hard disk in working properly, and CDtest
verifies your CD-ROM. If these tests fail, it proves you
have a serious problem, but not necessarily that you have
the RZ-1000 or CMD-640 chip.

If the tests pass, you still may have a problem since,
especially under DOS, DESQview and Windows, the flaws may
only show up very rarely. If you run the tests under Windows-
95 they will always pass, even if you have the defective
chip, because the operating system already bypasses the
flaws.


WHAT CAN YOU DO IF YOU HAVE A FLAW?

 1)  Pester the manufacturer. Unfortunately, the EIDE
   controller chips are soldered in. The only way to repair a
   flaw is to replace the whole motherboard, recycling the
   socketed chips -- the CPU, DRAM and SRAM cache. It would be
   very expensive for computer and motherboard manufacturers to
   fix a flaw.


 2)  Buy a new unpopulated Triton PCI motherboard and
   recycle the CPU, DRAM and SRAM cache chips from the old
   motherboard. Unfortunately, the Triton chipset has design
   shortcuts that hamper performance in simultaneous I/O
   situations. At least they don't corrupt data.

 3)  Run the controller in degraded mode. Some BIOSes have a
   feature disable the EIDE prefetch buffer. Vendors may offer
   a BIOS upgrade to allow you to manually disable prefetch.
   The BIOS may also turn it off automatically if either of the
   defective chips is present. This will bypass both RZ-1000
   flaws and two of the five CMD-640 flaws.

 4)  Buy a PCI EIDE paddleboard controller such as the DTC
   2130S, the Tekram 290N/290S, the Promise 2300+ or the
   BusLogic BT-910 to replace the one on the motherboard. You
   must disable the EIDE controller on the motherboard. This
   fix will waste one of your precious slots. Be careful. You
   could be leaping out of the RZ-1000 frying pan into the CMD-
   640 fire since paddleboards often use the CMD-640.

 5)  Buy a SCSI hard disk and CD-ROM, and avoid using the
   EIDE ports entirely. Under OS/2 and Linux, SCSI gives better
   performance, but costs more. DOS, Windows, Windows For
   WorkGroups and Windows-95 are unable to exploit the advanced
   features of SCSI, but at least avoid the EIDE flaws when you
   go pure SCSI.

 6)  Find a software work-around. There are fixes for Warp
   to bypass all the flaws in the RZ-1000 and CMD-640. Fixpack
   10 is the first fixpack to bypass the flaws. Now that Intel
   and IBM have finally revealed the technical details, all the
   operating system writers can patch their EIDE drivers to
   bypass the flaws. There are also fixes for NT 3.1 and 3.5.

 7)  Get a BIOS upgrade. For DOS, DESQview, and Windows 3.1,
   to bypass the flaws you may need a new BIOS -- an EPROM
   chip. If you have a flash BIOS, you can update it simply by
   downloading a file. Most BIOSes already have code to bypass
   the flaws for DOS, DESQview and Windows. However, more
   advanced operating systems bypass the BIOS, so even a smart
   BIOS will not protect you. However, the BIOS CMOS settings
   may allow you to disable prefetch, which also protects you
   even in true multitasking operating systems.

 8)  Cut the trace. Cut the trace on the motherboard from
   the floppy changeline to the EIDE controller. However this
   just bypasses one of the CMD-640's five flaws and one of the
   RZ-1000's two flaws.

 9)  Use the Secondary EIDE Controller. Some motherboards
  such as the Micron P5-90 M54Pi-N 11P use different kinds of
  controller on the primary and secondary EIDE ports. The
  primary may be flawed, but the secondary OK.

Whatever method you use to bypass the flaws, retest with
EIDEtest and CDTest afterwards to be sure your fix worked
and you caught all the problems.


CLEANING UP THE MESS

Once you have bypassed the flaws, you can start working the
problem of cleaning up your files.

The first thing to do is to re-install your operating system
and all your application programs. This will replace any
damaged EXE and DLL files.

Catching errors in your data files is more difficult. Keep
your eyes peeled for any improbable spreadsheet results. You
may have to hire a programmer to write you some comb
programs to sniff through your databases, looking for
suspicious values.

If you routinely use the verify feature of Lotus Magellan,
it can detect changes to files that should not have changed.
This may help you uncover some of the damage. The flaws are
not polite enough to redate the files they corrupt. :-)

If you have backups from before the time you bought the
faulty machine, you can restore them and re-key everything.

Most people will not be so fortunate. All their backups will
also be corrupt.

Most people with flaws will just have to put up with random
errors dotting their data files ever after.


WHAT ARE THE FLAWS?

IBM Confirmed the RZ-100 has two different flaws:

1.   In prefetch mode, multi-sector reads often fail.

2.   The chip erroneously responds to floppy status commands
    and corrupts hard disk or CD-ROM I/O in the process.

IBM confirmed the CMD-640 has five different flaws:

1.   It has the same prefetch problem as the RZ-1000.

2.   It has the same floppy status problem as the RZ-1000.

3.   It does not support simultaneous I/O on the primary and
    secondary EIDE ports.

4.   Confusion over legacy and PCI mode.

5.   Does not support 32-bit writes.

TEST PROGRAMS

When accessing files on the Internet generally you must use
lower case.

On the Internet, I have posted my EIDEtest and CDtest
programs for DOS, DESQview, Windows, Windows For WorkGroups,
Windows 95, NT, OS/2 and Warp. They ensure your hard disk
and CDROM will function without interference from background
I/O activity. These indirectly detect the flawed RZ-1000 and
CMD-640 chips. It also includes an unabridged 28-page
version of this article, complete with references to essays,
tests, and fixes for the various operating systems.  By the
time you read this, I may have posted a newer version.

     ftp://garbo.uwasa.fi/pc/diskutil/eidete18.zip
alternatively
     ftp://ftp.cdrom.com/.4/os2/incoming/eidete18.zip
or
     ftp://ftp.cdrom.com/.4/os2/sysutil/eidete18.zip


CONTACTING THE AUTHOR

The author, Roedy Green is a computer consultant who prefers
to work on Forth, C++, Delphi, DOS, OS/2 and Internet Web
projects.

If you send me $5 (US or Canadian) to cover duplication,
postage to anywhere in the world, and handling I will send
you a diskette containing the relevant test programs, fixes,
Internet postings and essays.

Please report any machines with flaws. Send email to:

     Roedy@bix.com

or discuss this problem on the Internet newsgroup in:

     comp.os.os2.bugs.

You can also write via snail mail:

Roedy Green
Canadian Mind Products
#601 - 1330 Burrard Street
Vancouver, BC  CANADA
V6Z 2B8
(604) 685-8412

-30-
