True BASIC Program to Calculate

Best Linear Regression for Two Variables

by Namir Shammas

The following program calculates the the best model and statistical coefficients for the following model:

H(Y) = A + B F(X)

Where X is the independent variable and Y is the dependent variable. In addition, H() and F() are transformation functions for the regression variables. The program also calculates the coefficient of determination R-Square.

The program performs different transformations on all the variables. These transformations include:

The program attempts to fit a total of 64 different curves. For data that have only positive values, the program succeeds in calculating 64 different models. The presence of negative values and zeros will reduce the number of models.

The program displays the following simple menu:

 

            BEST LINEAR REGRESSION 
            =======================
0) QUIT
1) KEYBOARD INPUT
2) FILE INPUT
3) FIND BEST FIT
SELECT CHOICE BY NUMBER: 

In option 1 the program prompts you to enter the number of observations and then type in the data for X and Y.

In option 2, the program prompts you for the name of the input text file. This file (which has each value on a separate line) specifies the number of observations and then lists the observations for the variables X and Y.

Option 3 causes the program to calculate the best fit and performs the following tasks:

Here is a sample session that fits the data in the following table:

X Y
100 212
10 50
25 77
30 86
35 95
40 104

The above data can be read from a text file that looks like this:

 
6
100
212
10
50
25
77
30
86
35
95
40
104
 

The top ten models that fit the above data are:


R^2 =  1 
Y = ( 32 ) + ( 1.8 ) * X
MeanX =  40  MeanY =  104 
SdevX =  31.144823  SdevY =  56.060681 

R^2 =  .99935625 
1/SQR(Y) = ( .21511956 ) + (-3.1685186e-2 ) * LOG(X)
MeanX =  3.4620093  MeanY =  .10542515 
SdevX =  .74485911  SdevY =  .0236086 

R^2 =  .99841951 
Y^3 = ( 360453.48 ) + ( 9.1831683 ) * X^3
MeanX =  191750  MeanY =  2121326 
SdevX =  396560.22  SdevY =  3644560.5 

R^2 =  .99813172 
Y^3 = (-214475.78 ) + ( 969.88309 ) * X^2
MeanX =  2408.3333  MeanY =  2121326 
SdevX =  3754.2198  SdevY =  3644560.5 

R^2 =  .99785498 
Y^2 = ( 3378.0973 ) + ( 4.1758765 ) * X^2
MeanX =  2408.3333  MeanY =  13435 
SdevX =  3754.2198  SdevY =  15694. 

R^2 =  .99622634 
1/Y^2 = (-8.3885371e-6 ) + ( 4.1354627e-3 ) * 1/X
MeanX =  3.9484127e-2  MeanY =  1.548966e-4 
SdevX =  3.1300032e-2  SdevY =  1.2968504e-4 

R^2 =  .99622573 
LOG(Y) = ( 3.2928317 ) + ( .20925336 ) * SQR(X)
MeanX =  5.9800231  MeanY =  4.5441716 
SdevX =  2.2554798  SdevY =  .47285993 

R^2 =  .99555317 
SQR(Y) = ( 3.2993261 ) + ( 1.11005 ) * SQR(X)
MeanX =  5.9800231  MeanY =  9.9374506 
SdevX =  2.2554798  SdevY =  2.5092807 

R^2 =  .98888781 
1/Y = (-1.4552693e-3 ) + ( 6.9457301e-2 ) * 1/SQR(X)
MeanX =  .18765778  MeanY =  1.1578934e-2 
SdevX =  7.1571091e-2  SdevY =  4.9989872e-3 

R^2 =  .98791158 
SQR(Y) = ( 6.7342628 ) + ( 8.0079697e-2 ) * X
MeanX =  40  MeanY =  9.9374506 
SdevX =  31.144823  SdevY =  2.5092807 

Here is the BASIC listing:

! PROGRAM TO FIND BEST LINEARIZED REGRESSION

OPTION TYPO
OPTION NOLET

DECLARE NUMERIC MAX_CURVES
DEClARE NUMERIC ITX, ITY, NDATA, CH, I, K
DECLARE NUMERIC SumX, SumX2, SumY, SumY2, SumXY, Yt, Xt
DECLARE STRING A$, R$, D$

DIM R2(64), Slope(64), Intercept(64), MeanX(64), MeanY(64), SdevX(64), SdevY(64), TX(64), TY(64)
DIM X(1), Y(1)

MAX_CURVES = 64

SUB InitStatArrays 
  LOCAL I
  
  FOR I = 1 to MAX_CURVES
    R2(I) = 0
    Slope(I) = 0
    Intercept(I) = 0
    MeanX(I) = 0
    MeanY(I) = 0
    SdevX(I) = 0
    SdevY(I) = 0
    TX(i) = 0
    TY(i) = 0
  NEXT I
  
END SUB

SUB SortResults
  LOCAL I, J, BUFF
    
  FOR I = 1 TO MAX_CURVES - 1
    FOR J = I+1 TO MAX_CURVES
      IF R2(I) < R2(J) THEN
        BUFF = R2(I)
        R2(I) = R2(J)
        R2(J) = BUFF

        BUFF = Slope(I)
        Slope(I) = Slope(J)
        Slope(J) = BUFF

        BUFF = Intercept(I)
        Intercept(I) = Intercept(J)
        Intercept(J) = BUFF

        BUFF = MeanX(I)
        MeanX(I) = MeanX(J)
        MeanX(J) = BUFF

        BUFF = MeanY(I)
        MeanY(I) = MeanY(J)
        MeanY(J) = BUFF

        BUFF = SdevX(I)
        SdevX(I) = SdevX(J)
        SdevX(J) = BUFF

        BUFF = SdevY(I)
        SdevY(I) = SdevY(J)
        SdevY(J) = BUFF

        BUFF = TX(I)
        TX(I) = TX(J)
        TX(J) = BUFF

        BUFF = TY(I)
        TY(I) = TY(J)
        TY(J) = BUFF
      END IF
    NEXT J
  NEXT I  
  
END SUB

DEF SayTransf$(TI, V$)
  LOCAL B$

  SELECT CASE TI
	CASE 1
		B$ =  V$
	CASE 2
		B$ = "LOG(" & V$ &")"
	CASE 3
		B$ = "SQR(" & V$ & ")"
	CASE 4
	  B$ = "1/SQR(" & V$ & ")"
	CASE 5
	  B$ = "1/" & V$
	CASE 6
	  B$ = V$ & "^2"
	CASE 7
	  B$ = "1/" & V$ & "^2"
	CASE 8
	  B$ =  V$ & "^3"
	CASE ELSE
	  B$ = V$
  END SELECT  
  SayTransf$ = B$
END DEF

DO
  PRINT
  PRINT TAB(20);"BEST LINEAR REGRESSION"
  PRINT TAB(20);"======================"
  PRINT "0) QUIT"
  PRINT "1) KEYBOARD INPUT"
  PRINT "2) FILE INPUT"
  PRINT "3) FIND BEST FIT"
  INPUT PROMPT "SELECT CHOICE BY NUMBER:":CH
  
  IF CH=0 THEN
    PRINT "BYE!"

  ELSEIF CH=1 THEN
    A$ = "KEYBOARD"
    INPUT PROMPT "ENTER NUMBER OF OBSERVATIONS: ": NDATA
    MAT REDIM X(NDATA), Y(NDATA)
    FOR I = 1 TO NDATA
      PRINT "X(";I;")";
      INPUT X(I)
      PRINT "Y(";I;")";
      INPUT Y(I)
    NEXT I
    
  ELSEIF CH=2 THEN
    INPUT PROMPT "ENTER FILENAME? ":A$
    WHEN ERROR IN
      OPEN #1: NAME A$, ORG TEXT, CREATE OLD, ACCESS INPUT
      INPUT #1: NDATA
      MAT REDIM X(NDATA), Y(NDATA)
      FOR I = 1 TO NDATA
        INPUT #1: X(I)
        INPUT #1: Y(I)
      NEXT I
      CLOSE #1
    USE 
      PRINT "COULD NOT OPEN OR READ FROM FILE ";A$
    END WHEN
  
  ELSEIF CH=3 THEN
  
    CALL InitStatArrays
    K = 0
    
    FOR ITX = 1 TO 8     

      FOR ITY = 1 to 8

        SumX = 0
        SumY = 0
        SumX2 = 0
        SumY2 = 0
        SumXY = 0
 
        K = K + 1

        TX(K) = ITX
        TY(K) = ITY
        
        WHEN ERROR IN

	  FOR I = 1 TO NDATA
				  
	    SELECT CASE ITX
              CASE 1
                Xt = X(I)
              CASE 2
                Xt = LOG(X(I))
              CASE 3
                Xt = SQR(X(I))
              CASE 4
                Xt = 1/SQR(X(I))
              CASE 5
                Xt = 1/X(I)
              CASE 6
                Xt = X(I)^2
              CASE 7
                Xt = 1/X(I)^2
              CASE 8
                Xt = X(I)^3
              CASE ELSE
                Xt = X(i)
            END SELECT  
						
            SELECT CASE ITY
              CASE 1
                Yt = Y(I)
              CASE 2
                Yt = LOG(Y(I))
              CASE 3
                Yt = SQR(Y(I))
              CASE 4
                Yt = 1/SQR(Y(I))
              CASE 5
                Yt = 1/Y(I)
              CASE 6
                Yt = Y(I)^2
              CASE 7
                Yt = 1/Y(I)^2
              CASE 8
                Yt = Y(I)^3
              CASE ELSE
                Yt = Y(I)
            END SELECT  
						
            SumX = SumX + Xt 
            SumX2 = SumX2 + Xt^2 
            SumY = SumY + Yt
            SumY2 = SumY2 + Yt^2 
            SumXY = SumXY + Xt * Yt						
         
          NEXT I
          
 
          MeanX(K) = SumX / NDATA
          MeanY(K) = SumY / NDATA
	  SdevX(K) = Sqr((SumX2 - SumX^2/NDATA)/(NDATA-1))
	  SdevY(K) = Sqr((SumY2 - SumY^2/NDATA)/(NDATA-1))
          Slope(K) = (NDATA * SumXY - SumX * SumY) / (NDATA * SumX2 - SumX ^ 2)
	  Intercept(K) = MeanY(K) - Slope(K) * MeanX(K)
	  R2(K) = ((NDATA * SumXY - SumX * SumY) / (NDATA * (NDATA - 1) * SdevX(K) * SdevY(K))) ^ 2

        USE
        
          MeanX(K) = 0
          MeanY(K) = 0
          SdevX(K) = 0
	  SdevY(K) = 0
	  Slope(K) = 0
	  Intercept(K) = 0
	  R2(K) = 0
        
        END WHEN

      NEXT ITY
    NEXT ITX

    CALL SortResults

    PRINT
    PRINT "TOP 5 CURVES"    
    ! Show top 5 best cyrve fits
    FOR I = 1 TO 5
      PRINT "R^2 = ";R2(I)
      PRINT SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";Slope(I);") * "; SayTransf$(TX(I), "X")
      PRINT "MeanX = "; MeanX(I);" MeanY = ";MeanY(I)
      PRINT "SdevX = "; SdevX(I);" SdevY = ";SdevY(I)
      PRINT
    NEXT I

    I = POS(A$, ".")
    IF I > 0 THEN
      R$ = A$[1:I-1] & "_REPORT.TXT"
    ELSE
      R$ = A$ & "_REPORT.TXT"
    END IF
    OPEN #1: NAME R$, ORG TEXT, CREATE NEWOLD, ACCESS OUTIN
    ERASE #1
    PRINT #1: "DATA SOURCE ";A$
    D$ = DATE$
    PRINT #1: D$[5:6] & "/" & D$[7:8] & "/" & D$[1:4] & " " & TIME$
    PRINT #1: ""
    FOR I = 1 TO MAX_CURVES
      IF R2(I) <= 0 THEN EXIT FOR
      PRINT #1: "R^2 = ";R2(I)
      PRINT #1: SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";Slope(I);") * "; SayTransf$(TX(I), "X")
      PRINT #1: "MeanX = "; MeanX(I);" MeanY = ";MeanY(I)
      PRINT #1: "SdevX = "; SdevX(I);" SdevY = ";SdevY(I)
      PRINT #1: ""
    NEXT I
    CLOSE #1  
    PRINT "FULL LIST OR CURVE FITS WAS WRITTEN TO FILE ";R$
  ELSE
    PRINT "INVALID CHOICE"
  END IF
LOOP UNTIL CH = 0

END

BACK

Copyright (c) Namir Shammas. All rights reserved.