                        
                      ==================================
                                   6x86opt
                      Cyrix/IBM 6x86 processor optimizer
                      ==================================
                            v0.74 Mikael Johansson
    
    *WHAT THIS PROGRAM DOES*
    ------------------------
    This program optimizes the Cyrix/IBM 6x86 (M1) processor.
    
    *HOW IT DOES IT*
    ----------------
    By setting/unsetting the appropriate bits on the CPU. 
    The following bits are set:

    bit     | reg:bit | why
    ------------------------------------------------------------------------
    NO_LOCK | CCR1:4  | "With NO_LOCK set, previously noncacheable locked
            |         |  cycles are executed as unlocked cycles and,
            |         |  therefore, may be cached. This results in higher 
            |         |  CPU performance."
            |         | The reason the bit is not set as default is because
            |         | software that requires locked cycles might exist. I
            |         | have never had any problems with this. 
    ------------------------------------------------------------------------
    WL      | RCR7:2  | This one is related to the one above. It enables 
            |         | weak locking for the memory region specified by  
            |         | ARR7, i.e. all memory. I am not sure if this is
            |         | necessary, as my benchmark results vary. It does no
            |         | harm however.
    ------------------------------------------------------------------------
    DTE_EN  | CCR4:4  | "DTE_EN allows Directory Table Entries (DTE) to be
            |         |  cached on the 6x86 microprocessor. This provides a
            |         |  performance improvement for some applications that
            |         |  access and modify the page table frequently."
    ------------------------------------------------------------------------
    WT_ALLOC| CCR5:0  | "Write Allocate (WT_ALLOC) allows L1 cache write 
            |         |  misses to cause a cache line allocation. This
            |         |  feature improves the L1 cache hit rate resulting
            |         |  in higher performance especially for Windows
            |         |  applications."
    ------------------------------------------------------------------------
    SUSP_HLT| CCR2:3  | If this bit is set, the HLT instruction causes the 
            |         | CPU to enter low power suspend mode. This works OK
            |         | at least running Linux, don't know about DOS. But 
            |         | anything that might result in the 6x86 running 
            |         | cooler is worth trying... Doesn't affect performance
    ------------------------------------------------------------------------
        All quotations above are taken from the 'IBM 6x86 Microprocessor 
        BIOS Writers Guide', Document #40205
    
    The following bits are unset:

    bit | reg:bit | why 
    ------------------------------------------------------------------------
    CD  | CR0:30  | When 'Cache Disable' is set, the cache is disabled(!) 
        |         | Naturally the cache should be enabled. I do not believe 
        |         | that this bit is ever set, but you never know.
    ------------------------------------------------------------------------
    NW  | CR0:29  | When 'No Write Back' is set, the L1 cache operates in 
        |         | Write Through mode. By unsetting the bit the cache 
        |         | strategy will be set to Write Back.
    ------------------------------------------------------------------------
        The above bits are not affected if the -nowb parameter is defined

    The following bits are also modified:    

    bit(s) | reg:bit | what and why
    ------------------------------------------------------------------------
    MAPEN  | CCR3:7-4| Set to 0001 during execution to get access to all
           |         | register indexes. Set to 0000 after optimization.
    ------------------------------------------------------------------------
    LOCK_NW| CCR2:2  | Unset so that the NW bit in CR0 can be modified. 
           |         | After optimization it's value is restored.
    ------------------------------------------------------------------------
    ARREN  | CCR5:5  | Unset during execution so that the RCR7 can be 
           |         | programmed. Restored after optimization.
    ------------------------------------------------------------------------
        
        6x86opt enables the Branch Target Buffer (BTB), configures it to 
    store target addresses for both near and far Change-Of-Flow instructions
    (COFs), and enables it's stack. After this the BTB is flushed. See below
    
    The following undocumented bits are modified :
    
    bit(s)  | reg:bit | what and comment
    ------------------------------------------------------------------------
    undocum.| DBR0:1  | Will be set if the -x parameter is defined.
    ------------------------------------------------------------------------
    undocum.| DBR0:5  | Will be set.
    ------------------------------------------------------------------------
    undocum.| DBR0:6  | Set to 1 to get access to Test Registers below TR3.
            |         | Unset after optimization. 
    ------------------------------------------------------------------------
    undocum.| TR2i5:  | The 4 lowest bits of Test Register 2 index 5 are
    (BTB-   |  3-0    | unset. This enables the BTB fully. Unsetting bit 1
    control)|         | allows both near and far COF storage.
    ------------------------------------------------------------------------
    undocum.| TR2i4:  | These bits in index 4 of TR2 will be set. 
    (BTB-ct)|  10-8   |
    ------------------------------------------------------------------------
    undocum.| TR1:1-0 | These two bits are set just to be unset again.
    (BTB-ct)|         |
    ------------------------------------------------------------------------
        If the -defbtb parameter is defined, no BTB related registers are 
        affected. Only bits that I know exactly what they do are explained.

    When the -linbuf or -manbuf parameters are defined, 6x86opt tries to set
    up an ARR and RCR for the Linear Frame Buffer, allowing Write Gathering
    for this memory area. See the topic Linear Frame Buffer.
    
    
    *USING THE PROGRAM*
    -------------------
    6x86opt can be invoked without any parameters for default optimization.
    You can also define eight different command line parameters:
        -nowb   (-n) ; Do not enable caching and WriteBack. Note that this 
                       parameter doesn't disable these, just leaves them be.
        -defbtb (-d) ; Do not change the BTB configuration.
        -x           ; Set bit 1 in DBR0. See the topic Reduced Performance.
        -linbuf (-l) ; Searches for a Linear Frame Buffer and tries to define
                       an ARR/RCR for it allowing Write Gathering.
        -manbuf (-m) ; Same as above, but the LFB address and size must be 
                       given like -manbuf:ADDR,SIZE. ADDR and SIZE in MB.
        -verbose(-v) ; Miss mennn. Shows what the program does during
                       execution.
        -peek   (-p) ; Does not change anything, just shows the bit states,
                       and the LFB and Video Memory size.
        -force  (-f) ; Forces execution of the optimization even if 6x86opt
                       does not detect a 6x86 CPU. If for example the VIPERM
                       bit (CCR5:6) is set the identification will fail.
                       Can cause undefined behaviour.
        The parameters in parentheses are abbreviations for the parameters.
    If an unknown parameter is defined on the command line, 6x86opt will show
    some info about itself.
        The processor is restored to default state on system reset. So it is
    probably a good idea to invoke 6x86opt from your AUTOEXEC.BAT file, or
    your Startup Folder.


    *EXIT CODES*
    ------------
    The following exit codes can be generated:
        0 : 6x86opt (at least thinks it) encountered no problems 
        1 : 6x86opt was invoked with some unknown parameter, or issued a 
                    warning of some kind
        2 : 6x86opt did not detect a 6x86 CPU (and -force is not defined)
    Any other codes _should_ never be generated.
    
    
    *THE LINEAR FRAME BUFFER AND THE -linbuf AND -manbuf PARAMETERS*
    ----------------------------------------------------------------
    As the Linear Frame Buffer defined by the VESA 2.0 standard is located
    outside the area of physical memory, the 6x86 has all memory access to
    this area defaulted at poorest performance. To increase the performance
    of the LFB, an ARR and a corresponding RCR can be set up for it, allowing
    Write Gathering. When WG is enabled, multiple writes to sequential 
    addresses are gathered and issued in one write cycle. The buffer is 64
    bits wide, so for example 4 word writes are written in one cycle instead 
    of four. Of course only applications that use the LFB, like Quake, get 
    a performance increase from this.
        When the -linbuf parameter is defined, 6x86opt will automatically 
    search for the LFB and the size of your Video Card memory, which also
    is the size of the LFB. It will then search the ARR:s to see if an ARR 
    already is set at this address. If not, it will try to find an empty
    (zero-size) ARR. When it has decided what ARR to program it will do this.
    The RCR will be set with the bits WG and RCD (Region Cache Disable).
        If 6x86opt has problems setting up an ARR, a warning message will be 
    shown. The optimization process will continue, but the exit code will be
    1.
        If 6x86opt fails to detect your LFB or Video Memory correctly, or if
    you for example want to define a region before loading a VESA 2.0 driver,
    you can manually define it with the parameter -manbuf. The format is:
    -manbuf:ADDR,SIZE where ADDR and SIZE must be given in megabytes. If your
    Video Card has less than 1 MB of memory, you still must give this as a 
    minimum size. This should not cause any problems. If your LFB is located
    at 3584 MB and you have a video card with 2 MB of RAM, the -manbuf
    declaration would be: -manbuf:3584,2 (or -m:3584,2).
        If you get a warning message complaining that the Address is not a
    multiple of the Block Size, you should reconfigure the LFB Address so 
    that it is. This is a requirement of the 6x86 CPU.
        If -linbuf fails in detecting the environment correctly, tell me 
    about it!
        
        NOTE 1: If you have several -manbuf parameters on the command line,
    only the _last_ will be processed. 
        NOTE 2: -linbuf is executed before -manbuf, so if -manbuf defines the
    same address, but different size, the -manbuf definition will prevail.
        NOTE 3: Only so much checking of the ADDR and SIZE definitions of 
    -manbuf is done so that 6x86opt will not (should not:) crash. Be careful
    in defining the right values for these.
    
    
    *REDUCED PERFORMANCE AND THE -x PARAMETER*
    ------------------------------------------
    If running 6x86opt gives a decrease in performance you should give the
    -x command line parameter to 6x86opt when executing it. When defined,
    bit 1 of DBR0 will be set, and a performance increase should result.
        Some systems need to have this parameter defined. Also other systems
    can benefit from setting this bit. But as the documentation for the bit 
    (along with other whole registers) is only available to specific OEM 
    partners of Cyrix and IBM, I do not (yet) know exactly what it does. 
    Therefore it is not included in the default optimization process.
        Feedback of experiences with this is very welcome!
        
    
    *WINBENCH PROCESSOR SCORES*
    ---------------------------
    The processor suites in WinBench gives different results every test run, 
    and the second run is usually poorer than the first. To get reliable 
    results, run these (and other) testsuites right after booting the system
    with either unoptimized or optimized configuration. You will find that
    the processor tests are not affected noticeably, as these tests 
    apparently has no use of the optimized settings. Graphics scores should
    increase. So, the performance increase is dependent on the application. 
        The performance increase is also dependent on how the BIOS sets the
    bits on boot. If the BIOS "sets'em all" itself there can of course be no
    improvement. I have not heard of any such systems. BIOS:es that does not
    support the 6x86 very well will have the highest improvement. In no case
    should 6x86 decrease overall performance (see previous topic).
    
    
    *WHY THIS PROGRAM EXISTS*
    -------------------------
    Because I could find no good optimizer for the processor anywhere. The
    only one I found was IBM:s M1OPT. This program however changes every
    bit in the CCR:s and everywhere else according to some model machine
    of their own. For example all Power Management features are disabled
    even if they are set before. This program also does not set NO_LOCK.
    
       
    *PUBLIC DOMAIN* 
    ---------------
    Everything in this package is Public Domain. The only restriction is that
    all files must be included when distributing the package, and that they 
    are not modified. The following files are part of the package:

        6X86OPT.EXE 19568 bytes  6x86opt, the optimizer
        6X86OPT.TXT 17191 bytes  this textfile
       
    
    (I will of course not object if you for some reason feel urged to send me 
     a postcard, money or other nice things:)
    
    *CONTACTING THE AUTHOR*
    -----------------------
    If you encounter any bugs in the program, have some suggestions, or 
    especially if you know of any other optimization tricks for the 6x86, I 
    would like to receive mail from you. All info about the undocumented
    registers is greatly appreciated (performance affecting or not)!
        
        e-mail:
            mpjohans@kumpu.helsinki.fi
            or:
            Mikael.Johansson@helsinki.fi
        
        For everyone who would like to contact me using the traditional
    postal services: 
            Mikael Johansson
            Kitarakuja 3C 220
            00420 Helsinki
            FINLAND
    
    
    *VERSION HISTORY*
    -----------------
        v0.64 : First released version. (5.11.1996)

        v0.64b: Some comments added to the document. (22.11.96)

        v0.72 : Command line parameters added.
                The program now unsets the CD and NW bits in CR0.
                The document was clarified. (3.12.1996)

        v0.73 : Added the -x and -verbose parameters.
                Some undocumented bits are now modified.
                Better code. (18.12.1996)

        v0.74 : Added the -linbuf, -manbuf and -peek parameters.
                
    
    *TO BE DONE*
    ------------
    Windows NT support.
    
    Anything else that sounds good.
    
    
    *SPECIAL THANKS*
    ----------------
    Rich "Doc" Colley. For testing that helped deliver the -x parameter.
    
    Everyone who has sent me mail that helped improve 6x86opt.

    
    *THE ULTIMATE HEAVY METAL BAND*
    -------------------------------
    Mercyful Fate

    
    *THINGS*
    --------
    Cyrix and IBM have copyright on what they have.
