
Prelude for those that don't read the documentation:
  Do not mail me bug reports. I can't fix them... Other opinions on the 
  program are welcome.
  I do not know if this program works on a CPU without math co-proc (like the
  486-SX)

System Benchmark "SysBench" 0.9.0
---------------------------------

(C) 1994 Henrik Harmsen
The disk IO code: (C) 1994 Kai Uwe Rommel

Contents:

  1 Introduction
  2 Tests
  3 Copyright notice
  4 Thanks

  Appendix A : Todo
  Appendix B : Building
  Appendix C : Example results

---



1 Introduction


I thought OS/2 needed a benchmark program, so I wrote one. This
program is not quite finished, and probably never will be, not by me
anyway, since I'm saying goodbye to OS/2 and turning my attention to
Linux. The reasons for this has not so much to do with OS/2, which is
still a great OS, as it has to do with Linux. Linux is slick,
super-fast, finally has drivers for my Viper card, has free TCP/IP and
last but not least, Linux is Unix.

This means I am probably not going to make updates to this program,
since I won't have OS/2 on my disk anymore. I'm saying probably, since
I can't read the future. Maybe one day my whimsical mind will think
OS/2 is more fun that Linux, who knows ? :-)

It also means that I am donating this program to anyone who is willing
to continue working on it. If you think you want to continue working
on this program, make sure you clearly note that this is released by
you, not me. To do this, change the version number to 0.9.0xxx, where
xxx are your initials. For example 0.9.0hch, which would indicate that
I (Henrik C Harmsen) has made this release. The version numbering
scheme should follow that of GCC. The first number is the major
release number, to be increased when major enhancements have been made
to the program or it is considered out of beta. The second number is
the minor release number, increase it when you have made small changes
to the program. The last number should be increased when making
bug-fixes only.

Take a look at the appendices for more information on what needs to be
done, what's not quite finished yet, and how to re-build the
program. Among other things, this document needs rewriting.

Do not send me complaints about bugs and errors, since I will have no
way of fixing them...

Now, that said, let's take a look at what this program tests.




2 Tests

HANDLE WITH CARE! DO NOT BLINDLY TRUST BENCHMARK VALUES. THEY ARE ONLY
GOOD IF YOU KNOW WHAT THEY ARE TESTING AND KNOW WHAT THEY ARE NOT
TESTING...

The values obtained here are not useful for comparing against values
obtained from other benchmarks programs. Even though one of the tests
for example measure Linpack performance and yields a value in MFLOPS,
this value is not useful in comparing with other values from a
different benchmark program. The only exception here is the dhrystone
2.1 value which might possibly be compared to values from other
dhrystone 2.1 benchmarks. As a rule: Only compare values with people
running this same benchmark program.

Almost all tests are adaptive in that they will first measure the
approximate speed of your computer so the test will take about 10-15
seconds in total, no matter how slow or fast your computer is.  The
ones that are not adaptive are the floating point tests and the
CPU integer tests with the exception of the dhrystone test.


2.1 Graphic tests

These tests test how fast the video hardware/display driver
combination can pump pixels to the screen. OS/2 has long had abysmal
display drivers for many cards, these tests are meant to sort out
whether they really are bad, good or stink.

Most window operations are using only a few key operations of the
video card accelerator. Take a look at your windows, they're mostly
built from filled rectangles, with some text and vertical and
horizontal lines. Maybe a few bitmaps here and there (icons and such).

The PM-marks are calculated from the other values as a weighted
arithmetic mean-value.


2.1.1 BitBlit S->S Copy

Tests the speed of the bitblit screen->screen copy operation. One of
the most important values, since it affects how fast you can scroll
text, and move large windows.


2.1.2 BitBlit M->S Copy

Tests the speed of the bitblit memory->screen copy operation. This
affects how fast updates of large bitmaps are and all operations that
copy data from RAM to Video RAM.


2.1.3 Filled rectangle, patterned filled rectangle.

Tests how fast the blitter can blank areas with a color or stipple
pattern. When updating a window, the background is usually blanked
with a single color or pattern before text or other things are drawn
on it.


2.1.4 Lines

Tests the speed of line-drawing in different directions. The
horizontal and vertical line drawing speed is important when drawing
frames around windows and such.


2.1.5 Text render

Extremely important function for speedy updates in text editors, shell
windows, word processors etc.



2.2 CPU Integer tests

The CPU tests are divided into two sections, one to test 'integer'
performance, meaning not only integer arithmetics but also every other
'normal' program that does some kind of data processing. 99% of all
applications do not use floating-point arithmetic. Those that do are
usually ray-tracers, scientific engineering type of programs etc.

The CPU-int marks are calculated as a weighted mean average of the other 
tests.

2.2.1 Dhrystone VAX MIPS

When reading about how many MIPS a computer performs, that is usually
tested by running this Dhrystone test and adjusting the result to be
relative to one VAX 11/780 MIPS. That means, this test does not
benchmark the number of million instructions per second (MIPS) as
defined by machine instructions, but rather a weighted value against
the base reference of one VAX 11/780 MIPS.

This test uses very little memory, meaning it will measure the CPU
performance only, not taking into account other vital parts as memory
speed etc.

Here is an excerpt from the sources from where I got this program:

 "Dhrystone is a short synthetic benchmark program intended to be
representative for system (integer) programming. Based on published
statistics on use of programming language features: see original
publication in CACM 27,10 (Oct 1984). Orginally published in ADA, now
mostly used in C. Version 2 (in C) published in SIGPLAN Notices 23,8
(Aug 1988), together with measurement rules. Version 1 is no longer
recommended since state-of-the-art compilers can eliminate too much
'dead code' from the benchmark (However, quoted MIPS numbers are often
based on version 1).  Problems: Due to its small size (100 HLL
statements, 1-1.5 KB code), the memory system outside the cache is not
tested; compilers can too easily optimize for Dhrystone; string
operations are somewhat over-represented.  Recommendation: Use it for
controlled experiments only; don't blindly trust single Dhrystone MIPS
numbers quoted somewhere (don't do this for any benchmark)."

This test is based on the C-version of Dhrystone 2.1.

2.2.2 Hanoi

An integer program which solves the Towers of Hanoi puzzle using
recursive function calls.  It uses very little memory, and thus does
not test memory speed.

2.2.3 Heapsort

Tests how fast your computer can sort a large array of random values
using the heapsort algorithm. Tests both CPU and memory speed.  The
MIPS are just a measurement against some arbitrary base MIPS
reference.  This test uses about 1 MB memory.

2.2.4 Sieve

Tests how fast your computer can find lots of prime numbers using the
sieve of Eratosthenes using arrays from 8 kB to 1.2 MB. The result is
a weighted mean value of the different speeds. Tests both CPU and
memory speed.



2.3 CPU floating point tests

These tests measure how fast your computer is at floating point
arithmetics. (Floating point means non-integer numbers like 2.3,
0.24 etc.)

The CPUfloat-marks are calculated as a weighted mean average of the
other values.

2.3.1 Linpack

This is the Linpack program (floating-point) converted to C.  Results
here are sensitive to cache effects and memory speed. This version
tests only the rolled double precision version.


2.3.2 Flops

Estimates MFLOPS rating for specific FADD, FSUB, FMUL, and FDIV
instruction mixes. Four distinct MFLOPS ratings are provided based on
the FDIV weightings from 25% to 0% and using register-register
operations. Works with both scalar and vector machines. Since the
program trys to maximize register usage the results are NOT sensitive
to main memory speed. In this sense flops yields a peak rating. The
four different values are used to get a weighted mean average.

2.3.3 The Fast Fourier Transform

This program performs FFT's using the Duhamel-Hollman method for FFT's
from 32 to 262,144 points in size.



2.4 DIVE tests

DIVE means Direct Interface to video extensions. It is a library in
OS/2 that gives fast access to video routines used for programming
games or other very demanding graphic applications. It gives the games
programmer access to the Holy Graal - a pointer to the frame buffer.
The tests here are not incorporated into the benchmark since the DIVE
functionality will not actually appear until OS/2 3.0. I will describe
them, nonetheless.

The DIVE-marks are calculated as a weighted mean average of the other values.

2.4.1 Video bus bandwidth

This test makes a copy of the frame buffer and copies it back to the
screen a lot of times in order to measure how many bytes per second
you can pump data to the video RAM. On my 486-66 machine with a
Diamond Viper card this amounts to about 13 MB/s! That means about 42
frames per second in 640x480x256...

2.4.2 DIVE fun

This was an entry I added since I had a few ideas on fun screen hacks
you can do with DIVE. One of them is smoothly turning the screen
upside down and back again. The value obtained here will be highly
correlated with the Video Bus Bandwidth test.

2.4.3 Memory to screen copy with DIVE

DIVE has built-in routines for copying a large amount of data from RAM
or Video RAM to the display with the help of an hardware blitter (if
one is available), or software. There are three such tests. The first
test just blits an image to the screen, the second performs
pixel-doubling, effectivly doubling the size of the display. The third
test tests arbitrary stretching of the bitmap when displaying it on
screen. If you have Warp II or OS/2 3.0 you will have seen the ability
to stretch a running video clip to any size you want. These tests are
not finished yet.

2.5 Disk IO tests

These tests were programmed by Kai Uwe Rommel, although I have made a
lot of changes to his source code. Thanks Kai Uwe!. The tests are
available as a free-standing package called diskio14.zip at
ftp.cdrom.com. If there are any errors or strange behaviour in these
tests then blame me, not Kai Uwe.

The test can test all you fixed disks in your system. There is a menu
choice to change which disk to test.

The DiskIO-marks are calculated as a weighted mean average of the
other values.

2.5.1 Average seek time

Tests the average seek time of the currently selected disk. I have
seen that this is often a bit higher than what the disk manufacturers
promise... This is most likely due to different ways of testing
things.

2.5.2 Disk transfer speed.

Measures how fast the disk can be read NOT using the cache. When I
first came across the diskio program by Kai Uwe, my disk performed at
about 1.0 MB/s. I thought that was not very good, but perhaps
acceptable. Then I started to muck around with the CMOS parameters and
by changing the IO block read delay (I think that is what it was
called) the speed of the disk jumped from 1.0 to 1.5 MB/s ! Not bad, I
thought. But when I upgraded to Warp II the disk performance suddenly
jumped to 2.2 MB/s. This is probably due to OS/2 using multiple mode
block transfer mode. Then finally, I changed the AT bus speed from 8.3 MHz
to 11 MHz and the disk transfer speed jumped again from 2.2 to 
2.6 MB/s !  

From this can be learned that there seems to be a lot that can be done
about slow IO. Just be careful when you muck around with the CMOS
parameters though, since there is a very high likelyhood of making
mistakes that can make the machine unusable or prone to strange
errors. Usually, this is not dangerous, just reset the value to the
old one and your machine should perform as before. Sometimes, though,
you _can_ destroy your computer by changing values incorrectly. Be
warned...



2.6 Memory speed tests

Memory speed seems to be a forgotten area when talking about the speed
of a computer. You hear a lot about CPU speed and disk speed and video
speed and such, but rarely of memory speed. This is wrong IMHO, since
a lot of the performance of a computer has to do with memory IO. When
PC Magazine measured memory speed in one of their grande tests they
discovered a lot of difference between the good and bad performers. I
would like to bring this fact into focus: Memory IO speed is a vital
part of the performance of your computer, even more so with faster and
faster processors. A really fast RISC processor can execute as much as
40 instructions in one memory read...

Of course, memory speed timing is a complex issue. How fast a memory
access is depends on:
  The pattern of the access     : Random, sequential, local, global ?
  Cache                         : Primary and secondary cache size and type.
  Virtual memory                : Paging algorithm, disk IO performace.
  Motherboard Memory controller : This is the key component to fast mem IO
  Speed of SIMMS                : 60, 70 or 100 ns?

etc. etc.

These tests are also limited. They cannot test the whole truth about
the speed of your memory IO.

The Mem-marks are calculated as a weighted mean average of the other values.

2.6.1 Memory copy

This test first allocates a chunk of memory and then reads and writes
it back and forth a few times to "activate" the memory: Initialize the
physical pages, and read it into the caches. This is done to obtain as
stable as possible value between measures. It also has the effect of
maximizing the access speed.

Then it proceeds to copy the first half of the memory to the second
and then the second half to the first. This is to diminish the strange
effects you get from write-through and copy-back caches.  When it says
5 kB copy, that means copying 2.5 kB back and forth.

You can clearly see the effects of your caches. As long as the access
is within the cache, it is a lot faster. There is also another factor
that will make the larger (80-160kB) values jump up and down, and that
is the effect of virtual memory. The second level cache performs well
on a sequential memory range, but the virtual memory will chop the
physical memory into 4kB pages and shuffle them around in physical
memory. If you are lucky, the physical pages are sequential but they
don't have to be. When they are not, the pages are scattered around
and the second level cache (which is almost always a direct-mapped
cache) will have a larger probability of mapping several physical
pages to the same area. Higher level cache (2-way, 4-way) techniques
should help here, but that is not certain.

Again, CMOS settings can very much affect the speed of your memory
access. Be sure to use as low value as possible on the various wait
state entries and make sure the whole memory is cached, not just the
first 16 MB if you have more.

2.6.2 Memory read

Tested by calculating the checksum over the specified amount of bytes
over and over again.

2.6.3 Memory write

Tested by writing a value into all longwords of the specified amount
of memory.



3 Copyright notice

There is no warranty. Use this software at your own risk. Due to the
complexity and variety of today's hardware and software which may be
used to run this program, I am not responsible for any damage or loss
of data caused by use of this software. It was tested and is expected
to work correctly, but nobody can actually guarantee this for any
circumstances. And because this software is free, you get what you pay
for...

This program can be used freely for non-commercial purposes.


4. Thanks

Thanks to Kai Uwe Rommel (rommel@ars.muc.de) for supplying the disk IO
benchmark code and to Al Aburto (aburto@marlin.nosc.mil) for supplying
the CPU integer and CPU float benchmark code.





                          -- Henrik Harmsen 


Email: harmsen@eritel.se





Appendix A - TODO

  1  Make the CPU integer and CPU float tests adaptive to the speed of the 
     computer.

  2  DIVE: Support for bank-switched cards. Better error handling. Finish the
     Memory->Screen bitblit tests.

  3  Graphics test: The Memory to screen bitblit copy is probably not
     correct for 16 and 24 bit displays.





Appendix B - Building

  You need Cset++ 2.1. Cd src, run nmake. It is probably quite easy to 
  port to emx-gcc.

  Why are all the source code files named pmb_* ? Well I first wanted
  to call it PMBench, as a play with WinBench, but it turned out that
  PC Magazine already had a PMBench program... So I changed the name
  to SysBench, but I did not have time to change all the 'pmb' to 'sysb'...





Appendix C - Example results


Example of a result file, when benchmarking my own system, which is:

Software:
--------------
  OS/2 2.11
  Diamond Viper display drivers 1.02beta running 1024x768x8

Hardware:
--------------
  CPU     : 486DX2-66
  Chipset : UMC
  Cache   : 8 kB level 1, 256 kB copy-back level 2.
  Memory  : 20 MB 70ns.
  Harddisk: disk 1: Seagate 340 MB. disk 2: Conner CFA540A 540 MB.
  Video   : Diamond Viper VLB, 2MB VRAM, 2.02 BIOS.

-------

Sysbench 0.9.0 result file created Sat Oct 22 14:31:27 1994


 Graphics
   BitBlt S->S cpy       :       52.640    Mpixels/s
   BitBlt M->S cpy       :       15.581    Mpixels/s
   Filled Rectangle      :      356.366    Mpixels/s
   Pattern Fill          :       90.477    Mpixels/s
   Vertical Lines        :        6.233    Mpixels/s
   Horizontal Lines      :        9.656    Mpixels/s
   Diagonal Lines        :        7.545    Mpixels/s
   Text Render           :       18.553    Mpixels/s
   ------------------------------------------------------------
   Total                 :       73.835    PM-marks

 CPU integer
   Dhrystone             :       39.800    VAX 11/780 MIPS
   Hanoi                 :       27.083    moves/25 usec
   Heapsort              :       19.290    MIPS
   Sieve                 :       37.741    MIPS
   ------------------------------------------------------------
   Total                 :       32.938    CPUint-marks

 CPU float
   Linpack               :        2.535    MFLOPS
   Flops                 :        3.572    MFLOPS
   Fast Fourier Tr.      :        4.291    VAX FFT's
   ------------------------------------------------------------
   Total                 :        3.472    CPUfloat-marks

 Direct Interface to video extensions - DIVE
   Video bus bandw.      :       --.---    MB/s (on Warp II, this was ca. 13 MB/s)
   DIVE fun              :       --.---    fps
   M->S, DD,   1.00:1    :       --.---    fps
   M->S, DD,   2.00:1    :       --.---    fps
   M->S, DD,   2.43:1    :       --.---    fps
   ------------------------------------------------------------
   Total                 :       --.---    DIVE-marks

 Disk I/O - disk 2: 528 MB
   Average seek time     :       16.852    ms
   Transfer speed        :        1.990    MB/s
   ------------------------------------------------------------
   Total                 :        1.465    DiskIO-marks

 Memory
   5    kB copy          :       61.561    MB/s
   10   kB copy          :       49.211    MB/s
   20   kB copy          :       33.167    MB/s
   40   kB copy          :       25.707    MB/s
   80   kB copy          :       25.571    MB/s
   160  kB copy          :       17.578    MB/s
   320  kB copy          :       15.526    MB/s
   640  kB copy          :       13.385    MB/s
   1280 kB copy          :       11.941    MB/s
   5    kB read          :       70.885    MB/s
   10   kB read          :       42.156    MB/s
   20   kB read          :       42.970    MB/s
   40   kB read          :       32.170    MB/s
   80   kB read          :       31.747    MB/s
   160  kB read          :       21.777    MB/s
   320  kB read          :       19.533    MB/s
   640  kB read          :       17.150    MB/s
   1280 kB read          :       15.710    MB/s
   5    kB write         :       50.263    MB/s
   10   kB write         :       47.512    MB/s
   20   kB write         :       49.802    MB/s
   40   kB write         :       50.763    MB/s
   80   kB write         :       48.561    MB/s
   160  kB write         :       47.028    MB/s
   320  kB write         :       44.140    MB/s
   640  kB write         :       44.034    MB/s
   1280 kB write         :       42.258    MB/s
   ------------------------------------------------------------
   Total                 :       28.007    Mem-marks





