True BASIC Program to Calculate

Best Multiple Linear Regression for Two Variables

by Namir Shammas

The following program calculates the the best model and statistical coefficients for the following model:

H(Y) = A + B F(X) + C G(X)

Where X is the independent variable and Y is the dependent variable. In addition, H(), F(), and G() are transformation functions for the regression variables. The program also calculates the coefficient of determination R-Square.

The program performs different transformations on all the variables. These transformations include:

The program attempts to fit all the possible different curves. For data that have only positive values, the program succeeds in calculating the full set of possible models. The presence of negative values and zeros will reduce the number of models.

The program displays the following simple menu:

 

            BEST MULTIPLE LINEAR REGRESSION (2 VARS)
            ========================================
0) QUIT
1) KEYBOARD INPUT
2) FILE INPUT
3) FIND BEST FIT
SELECT CHOICE BY NUMBER: 

In option 1 the program prompts you to enter the number of observations and then type in the data for X, and Y.

In option 2, the program prompts you for the name of the input text file. This file (which has each value on a separate line) specifies the number of observations and then lists the observations for the variables X and Y.

Option 3 causes the program to calculate the best fit and performs the following tasks:

Here is a sample session that fits the data in the following table:

X Y
1 7
2 8
3 13
4 22
5 35
6 52

The above data can be read from a text file that looks like this:

 
6
1
7
2
8
3
13
4
22
5
35
6
52
 

The top ten models that fit the above data are:


R^2 =  1 
Y = ( 10. ) + (-5 ) * X1 + ( 2 ) * X2^2

R^2 =  .99996571 
SQR(Y) = ( .93514887 ) + ( 1.7141375 ) * X1 + (-2.2358857 ) * LOG(X2)

R^2 =  .9999189 
Y = ( 16.387089 ) + (-10.995891 ) * SQR(X1) + ( 1.7338791 ) * X2^2

R^2 =  .99982379 
LOG(Y) = (-8.1673155 ) + ( 4.4660914 ) * LOG(X1) + ( 10.112393 ) * 1/SQR(X2)

R^2 =  .99979627 
SQR(Y) = ( 5.6943149 ) + ( 2.5478097 ) * X1 + (-5.6105579 ) * SQR(X2)

R^2 =  .99973459 
LOG(Y) = (-1.0388005 ) + ( 2.0308004 ) * SQR(X1) + ( .95445884 ) * 1/X2^2

R^2 =  .99969114 
SQR(Y) = (-3.3140124 ) + ( 1.4420207 ) * X1 + ( 4.5327844 ) * 1/SQR(X2)

R^2 =  .99968287 
Y = ( 5.6070019 ) + (-6.4580511 ) * LOG(X1) + ( 1.6027157 ) * X2^2

R^2 =  .99959842 
1/SQR(Y) = (-.25855689 ) + ( .99222751 ) * 1/SQR(X1) + (-.35571662 ) * 1/X2^2

R^2 =  .99959659 
LOG(Y) = (-1.8024133 ) + ( 2.8587806 ) * LOG(X1) + ( 3.7525629 ) * 1/X2

Here is the BASIC listing:

! PROGRAM TO FIND BEST MULTIPLE LINEARIZED REGRESSION FOR 2 VARIABLES
!
! F(Y) = A + B G(X) + C H(X)

OPTION TYPO
OPTION NOLET

DECLARE NUMERIC MAX_CURVES
DEClARE NUMERIC ITX1, ITX2, ITY, NDATA, CH, I, K, A, B, T1, T2, T3
DECLARE NUMERIC SumX1, SumX1Sqr, SumY, SumYSqr, SumX1Y, Yt, X1t, X2t
DECLARE NUMERIC SumX2, SumX2Sqr, SumX1X2, SumX2Y, MeanX1, MeanX2, MeanY
DECLARE STRING A$, R$, D$

DIM R2(512), SlopeX1(512), SlopeX2(512), Intercept(512), TX1(512), TY(512), TX2(512)
DIM X(1), Y(1)


SUB InitStatArrays 
  LOCAL I
  
  FOR I = 1 to 512
    R2(I) = 0
    SlopeX1(I) = 0
    SlopeX1(I) = 0
    Intercept(I) = 0
    TX1(I) = 0
    TX2(I) = 0
    TY(I) = 0
  NEXT I
  
END SUB

SUB SortResults
  LOCAL I, J, BUFF
  
  FOR I = 1 TO MAX_CURVES - 1
    FOR J = I+1 TO MAX_CURVES
      IF R2(I) < R2(J) THEN
        BUFF = R2(I)
        R2(I) = R2(J)
        R2(J) = BUFF

        BUFF = SlopeX1(I)
        SlopeX1(I) = SlopeX1(J)
        SlopeX1(J) = BUFF

        BUFF = SlopeX2(I)
        SlopeX2(I) = SlopeX2(J)
        SlopeX2(J) = BUFF

        BUFF = Intercept(I)
        Intercept(I) = Intercept(J)
        Intercept(J) = BUFF

        BUFF = TX1(I)
        TX1(I) = TX1(J)
        TX1(J) = BUFF

        BUFF = TY(I)
        TY(I) = TY(J)
        TY(J) = BUFF

        BUFF = TX2(I)
        TX2(I) = TX2(J)
        TX2(J) = BUFF
      END IF
    NEXT J
  NEXT I  
  
END SUB

DEF SayTransf$(TI, V$)
  LOCAL B$

  SELECT CASE TI
    CASE 1
      B$ =  V$
    CASE 2
      B$ = "LOG(" & V$ &")"
    CASE 3
      B$ = "SQR(" & V$ & ")"
    CASE 4
      B$ = "1/SQR(" & V$ & ")"
    CASE 5
      B$ = "1/" & V$
    CASE 6
      B$ = V$ & "^2"
    CASE 7
      B$ = "1/" & V$ & "^2"
    CASE 8
      B$ =  V$ & "^3"
    CASE ELSE
      B$ = V$
  END SELECT  
  SayTransf$ = B$
END DEF

DO
  PRINT
  PRINT TAB(20);"BEST MULTIPLE LINEAR REGRESSION (2 VARS)"
  PRINT TAB(20);"========================================"
  PRINT "0) QUIT"
  PRINT "1) KEYBOARD INPUT"
  PRINT "2) FILE INPUT"
  PRINT "3) FIND BEST FIT"
  INPUT PROMPT "SELECT CHOICE BY NUMBER: ":CH
  
  IF CH=0 THEN
    PRINT "BYE!"

  ELSEIF CH=1 THEN
    A$ = "KEYBOARD"
    INPUT PROMPT "ENTER NUMBER OF OBSERVATIONS: ": NDATA
    MAT REDIM X(NDATA), Y(NDATA)
    FOR I = 1 TO NDATA
      PRINT "X(";I;")";
      INPUT X(I)
      PRINT "Y(";I;")";
      INPUT Y(I)
    NEXT I
    
  ELSEIF CH=2 THEN
    INPUT PROMPT "ENTER FILENAME? ":A$
    WHEN ERROR IN
      OPEN #1: NAME A$, ORG TEXT, CREATE OLD, ACCESS INPUT
      INPUT #1: NDATA
      MAT REDIM X(NDATA), Y(NDATA)
      FOR I = 1 TO NDATA
        INPUT #1: X(I)
        INPUT #1: Y(I)
      NEXT I
      CLOSE #1
    USE 
      PRINT "COULD NOT OPEN OR READ FROM FILE ";A$
    END WHEN
  
  ELSEIF CH=3 THEN
  
    CALL InitStatArrays
    K = 0
    
    FOR ITX1 = 1 TO 7     
		   
      FOR ITX2 = ITX1+1 TO 8

        FOR ITY = 1 to 8

          SumX1 = 0
          SumX2 = 0
          SumY = 0
          SumX1Sqr = 0
          SumX2Sqr = 0
          SumYSqr = 0
          SumX1Y = 0
          SumX2Y = 0
          SumX1X2 = 0  
 
          K = K + 1

          TX1(K) = ITX1
          TX2(K) = ITX2
          TY(K) = ITY
        
          WHEN ERROR IN

            FOR I = 1 TO NDATA
				  
              SELECT CASE ITX1
                CASE 1
                  X1t = X(I)
                CASE 2
                  X1t = LOG(X(I))
                CASE 3
                  X1t = SQR(X(I))
                CASE 4
                  X1t = 1/SQR(X(I))
                CASE 5
                  X1t = 1/X(I)
                CASE 6
                  X1t = X(I)^2
                CASE 7
                  X1t = 1/X(I)^2
                CASE 8
                  X1t = X(I)^3
                CASE ELSE
                  X1t = X(i)
              END SELECT  

              SELECT CASE ITX2
                CASE 1
                  X2t = X(I)
                CASE 2
                  X2t = LOG(X(I))
                CASE 3
                  X2t = SQR(X(I))
                CASE 4
                  X2t = 1/SQR(X(I))
                CASE 5
                  X2t = 1/X(I)
                CASE 6
                  X2t = X(I)^2
                CASE 7
                  X2t = 1/X(I)^2
                CASE 8
                  X2t = X(I)^3
                CASE ELSE
                  X2t = X(i)
               END SELECT  
						
               SELECT CASE ITY
                CASE 1
                  Yt = Y(I)
                CASE 2
                  Yt = LOG(Y(I))
                CASE 3
                  Yt = SQR(Y(I))
                CASE 4
                  Yt = 1/SQR(Y(I))
                CASE 5
                  Yt = 1/Y(I)
                CASE 6
                  Yt = Y(I)^2
                CASE 7
                  Yt = 1/Y(I)^2
                CASE 8
                  Yt = Y(I)^3
                CASE ELSE
                  Yt = Y(I)
               END SELECT  
						
               SumX1 = SumX1 + X1t 
               SumX2 = SumX2 + X2t 
               SumY = SumY + Yt
               SumX1Sqr = SumX1Sqr + X1t^2 
               SumX2Sqr = SumX2Sqr + X2t^2 
               SumYSqr = SumYSqr + Yt^2 
               SumX1Y = SumX1Y + X1t * Yt
               SumX2Y = SumX2Y + X2t * Yt
               SumX1X2 = SumX1X2 + X1t * X2t
         
          NEXT I
          
 
	      MeanX1 = SumX1 / NDATA
	      MeanX2 = SumX2 / NDATA
	      MeanY = SumY / NDATA
	      A = (NDATA * SumX1Sqr - SumX1 ^ 2) * (NDATA * SumX2Y - SumX2 * SumY)
	      B = (NDATA * SumX1X2 - SumX1 * SumX2) * (NDATA * SumX1Y - SumX1 * SumY)
	  
          T1 = NDATA * SumX1Sqr - SumX1 ^ 2
          T2 = NDATA * SumX2Sqr - SumX2 ^ 2
          T3 = NDATA * SumX1X2 - SumX1 * SumX2
	      SlopeX2(K) = (A - B) / (T1 * T2 - T3 ^ 2)

          T1 = NDATA * SumX1Y - SumX1 * SumY
          T2 = NDATA * SumX1X2 - SumX1 * SumX2
          T3 = NDATA * SumX1Sqr - SumX1 ^ 2
	      SlopeX1(K) = (T1 - SlopeX2(K) * T2) / T3
          Intercept(K) = MeanY - SlopeX1(K) * MeanX1 - SlopeX2(K) * MeanX2
          T1 = SumY ^ 2 / NDATA
         R2(K) = (Intercept(K) * SumY + SlopeX1(K) * SumX1Y + SlopeX2(K) * SumX2Y - T1) / (SumYSqr - T1)
	  
        USE
        
	      SlopeX1(K) = 0
	      SlopeX2(K) = 0
	      Intercept(K) = 0
	      R2(K) = 0
        
        END WHEN

        NEXT ITY
      NEXT ITX2	
    NEXT ITX1

    MAX_CURVES = K
    CALL SortResults

    PRINT
    PRINT "TOP 10 CURVES"    
    ! Show top 10 best cyrve fits
    FOR I = 1 TO 10
      PRINT "R^2 = ";R2(I)
      PRINT SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";SlopeX1(I);") * "; SayTransf$(TX1(I), "X");" + (";SlopeX2(I);") * "; SayTransf$(TX2(I), "X")
      PRINT
    NEXT I

    I = POS(A$, ".")
    IF I > 0 THEN
      R$ = A$[1:I-1] & "_REPORT.TXT"
    ELSE
      R$ = A$ & "_REPORT.TXT"
    END IF
    OPEN #1: NAME R$, ORG TEXT, CREATE NEWOLD, ACCESS OUTIN
    ERASE #1
    PRINT #1: "DATA SOURCE ";A$
    D$ = DATE$
    PRINT #1: D$[5:6] & "/" & D$[7:8] & "/" & D$[1:4] & " " & TIME$
    PRINT #1: ""
    FOR I = 1 TO MAX_CURVES
      IF R2(I) <= 0 THEN EXIT FOR
      PRINT #1: "R^2 = ";R2(I)
      PRINT #1: SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";SlopeX1(I);") * "; SayTransf$(TX1(I), "X1");" + (";SlopeX2(I);") * "; SayTransf$(TX2(I), "X2")
      PRINT #1: ""
    NEXT I
    CLOSE #1  
    PRINT "EXPANDED LIST OR CURVE FITS WAS WRITTEN TO FILE ";R$
  ELSE
    PRINT "INVALID CHOICE"
  END IF
LOOP UNTIL CH = 0

END

BACK

Copyright (c) Namir Shammas. All rights reserved.