The following program calculates the the best model and statistical coefficients for the following model:
H(Y) = A + B F(X) + C G(X)
Where X is the independent variable and Y is the dependent variable. In addition, H(), F(), and G() are transformation functions for the regression variables. The program also calculates the coefficient of determination R-Square.
The program performs different transformations on all the variables. These transformations include:
The program attempts to fit all the possible different curves. For data that have only positive values, the program succeeds in calculating the full set of possible models. The presence of negative values and zeros will reduce the number of models.
The program displays the following simple menu:
BEST MULTIPLE LINEAR REGRESSION (2 VARS) ======================================== 0) QUIT 1) KEYBOARD INPUT 2) FILE INPUT 3) FIND BEST FIT SELECT CHOICE BY NUMBER:
In option 1 the program prompts you to enter the number of observations and then type in the data for X, and Y.
In option 2, the program prompts you for the name of the input text file. This file (which has each value on a separate line) specifies the number of observations and then lists the observations for the variables X and Y.
Option 3 causes the program to calculate the best fit and performs the following tasks:
Here is a sample session that fits the data in the following table:
X | Y |
1 | 7 |
2 | 8 |
3 | 13 |
4 | 22 |
5 | 35 |
6 | 52 |
The above data can be read from a text file that looks like this:
6 1 7 2 8 3 13 4 22 5 35 6 52
The top ten models that fit the above data are:
R^2 = 1 Y = ( 10. ) + (-5 ) * X1 + ( 2 ) * X2^2 R^2 = .99996571 SQR(Y) = ( .93514887 ) + ( 1.7141375 ) * X1 + (-2.2358857 ) * LOG(X2) R^2 = .9999189 Y = ( 16.387089 ) + (-10.995891 ) * SQR(X1) + ( 1.7338791 ) * X2^2 R^2 = .99982379 LOG(Y) = (-8.1673155 ) + ( 4.4660914 ) * LOG(X1) + ( 10.112393 ) * 1/SQR(X2) R^2 = .99979627 SQR(Y) = ( 5.6943149 ) + ( 2.5478097 ) * X1 + (-5.6105579 ) * SQR(X2) R^2 = .99973459 LOG(Y) = (-1.0388005 ) + ( 2.0308004 ) * SQR(X1) + ( .95445884 ) * 1/X2^2 R^2 = .99969114 SQR(Y) = (-3.3140124 ) + ( 1.4420207 ) * X1 + ( 4.5327844 ) * 1/SQR(X2) R^2 = .99968287 Y = ( 5.6070019 ) + (-6.4580511 ) * LOG(X1) + ( 1.6027157 ) * X2^2 R^2 = .99959842 1/SQR(Y) = (-.25855689 ) + ( .99222751 ) * 1/SQR(X1) + (-.35571662 ) * 1/X2^2 R^2 = .99959659 LOG(Y) = (-1.8024133 ) + ( 2.8587806 ) * LOG(X1) + ( 3.7525629 ) * 1/X2
Here is the BASIC listing:
! PROGRAM TO FIND BEST MULTIPLE LINEARIZED REGRESSION FOR 2 VARIABLES ! ! F(Y) = A + B G(X) + C H(X) OPTION TYPO OPTION NOLET DECLARE NUMERIC MAX_CURVES DEClARE NUMERIC ITX1, ITX2, ITY, NDATA, CH, I, K, A, B, T1, T2, T3 DECLARE NUMERIC SumX1, SumX1Sqr, SumY, SumYSqr, SumX1Y, Yt, X1t, X2t DECLARE NUMERIC SumX2, SumX2Sqr, SumX1X2, SumX2Y, MeanX1, MeanX2, MeanY DECLARE STRING A$, R$, D$ DIM R2(512), SlopeX1(512), SlopeX2(512), Intercept(512), TX1(512), TY(512), TX2(512) DIM X(1), Y(1) SUB InitStatArrays LOCAL I FOR I = 1 to 512 R2(I) = 0 SlopeX1(I) = 0 SlopeX1(I) = 0 Intercept(I) = 0 TX1(I) = 0 TX2(I) = 0 TY(I) = 0 NEXT I END SUB SUB SortResults LOCAL I, J, BUFF FOR I = 1 TO MAX_CURVES - 1 FOR J = I+1 TO MAX_CURVES IF R2(I) < R2(J) THEN BUFF = R2(I) R2(I) = R2(J) R2(J) = BUFF BUFF = SlopeX1(I) SlopeX1(I) = SlopeX1(J) SlopeX1(J) = BUFF BUFF = SlopeX2(I) SlopeX2(I) = SlopeX2(J) SlopeX2(J) = BUFF BUFF = Intercept(I) Intercept(I) = Intercept(J) Intercept(J) = BUFF BUFF = TX1(I) TX1(I) = TX1(J) TX1(J) = BUFF BUFF = TY(I) TY(I) = TY(J) TY(J) = BUFF BUFF = TX2(I) TX2(I) = TX2(J) TX2(J) = BUFF END IF NEXT J NEXT I END SUB DEF SayTransf$(TI, V$) LOCAL B$ SELECT CASE TI CASE 1 B$ = V$ CASE 2 B$ = "LOG(" & V$ &")" CASE 3 B$ = "SQR(" & V$ & ")" CASE 4 B$ = "1/SQR(" & V$ & ")" CASE 5 B$ = "1/" & V$ CASE 6 B$ = V$ & "^2" CASE 7 B$ = "1/" & V$ & "^2" CASE 8 B$ = V$ & "^3" CASE ELSE B$ = V$ END SELECT SayTransf$ = B$ END DEF DO PRINT PRINT TAB(20);"BEST MULTIPLE LINEAR REGRESSION (2 VARS)" PRINT TAB(20);"========================================" PRINT "0) QUIT" PRINT "1) KEYBOARD INPUT" PRINT "2) FILE INPUT" PRINT "3) FIND BEST FIT" INPUT PROMPT "SELECT CHOICE BY NUMBER: ":CH IF CH=0 THEN PRINT "BYE!" ELSEIF CH=1 THEN A$ = "KEYBOARD" INPUT PROMPT "ENTER NUMBER OF OBSERVATIONS: ": NDATA MAT REDIM X(NDATA), Y(NDATA) FOR I = 1 TO NDATA PRINT "X(";I;")"; INPUT X(I) PRINT "Y(";I;")"; INPUT Y(I) NEXT I ELSEIF CH=2 THEN INPUT PROMPT "ENTER FILENAME? ":A$ WHEN ERROR IN OPEN #1: NAME A$, ORG TEXT, CREATE OLD, ACCESS INPUT INPUT #1: NDATA MAT REDIM X(NDATA), Y(NDATA) FOR I = 1 TO NDATA INPUT #1: X(I) INPUT #1: Y(I) NEXT I CLOSE #1 USE PRINT "COULD NOT OPEN OR READ FROM FILE ";A$ END WHEN ELSEIF CH=3 THEN CALL InitStatArrays K = 0 FOR ITX1 = 1 TO 7 FOR ITX2 = ITX1+1 TO 8 FOR ITY = 1 to 8 SumX1 = 0 SumX2 = 0 SumY = 0 SumX1Sqr = 0 SumX2Sqr = 0 SumYSqr = 0 SumX1Y = 0 SumX2Y = 0 SumX1X2 = 0 K = K + 1 TX1(K) = ITX1 TX2(K) = ITX2 TY(K) = ITY WHEN ERROR IN FOR I = 1 TO NDATA SELECT CASE ITX1 CASE 1 X1t = X(I) CASE 2 X1t = LOG(X(I)) CASE 3 X1t = SQR(X(I)) CASE 4 X1t = 1/SQR(X(I)) CASE 5 X1t = 1/X(I) CASE 6 X1t = X(I)^2 CASE 7 X1t = 1/X(I)^2 CASE 8 X1t = X(I)^3 CASE ELSE X1t = X(i) END SELECT SELECT CASE ITX2 CASE 1 X2t = X(I) CASE 2 X2t = LOG(X(I)) CASE 3 X2t = SQR(X(I)) CASE 4 X2t = 1/SQR(X(I)) CASE 5 X2t = 1/X(I) CASE 6 X2t = X(I)^2 CASE 7 X2t = 1/X(I)^2 CASE 8 X2t = X(I)^3 CASE ELSE X2t = X(i) END SELECT SELECT CASE ITY CASE 1 Yt = Y(I) CASE 2 Yt = LOG(Y(I)) CASE 3 Yt = SQR(Y(I)) CASE 4 Yt = 1/SQR(Y(I)) CASE 5 Yt = 1/Y(I) CASE 6 Yt = Y(I)^2 CASE 7 Yt = 1/Y(I)^2 CASE 8 Yt = Y(I)^3 CASE ELSE Yt = Y(I) END SELECT SumX1 = SumX1 + X1t SumX2 = SumX2 + X2t SumY = SumY + Yt SumX1Sqr = SumX1Sqr + X1t^2 SumX2Sqr = SumX2Sqr + X2t^2 SumYSqr = SumYSqr + Yt^2 SumX1Y = SumX1Y + X1t * Yt SumX2Y = SumX2Y + X2t * Yt SumX1X2 = SumX1X2 + X1t * X2t NEXT I MeanX1 = SumX1 / NDATA MeanX2 = SumX2 / NDATA MeanY = SumY / NDATA A = (NDATA * SumX1Sqr - SumX1 ^ 2) * (NDATA * SumX2Y - SumX2 * SumY) B = (NDATA * SumX1X2 - SumX1 * SumX2) * (NDATA * SumX1Y - SumX1 * SumY) T1 = NDATA * SumX1Sqr - SumX1 ^ 2 T2 = NDATA * SumX2Sqr - SumX2 ^ 2 T3 = NDATA * SumX1X2 - SumX1 * SumX2 SlopeX2(K) = (A - B) / (T1 * T2 - T3 ^ 2) T1 = NDATA * SumX1Y - SumX1 * SumY T2 = NDATA * SumX1X2 - SumX1 * SumX2 T3 = NDATA * SumX1Sqr - SumX1 ^ 2 SlopeX1(K) = (T1 - SlopeX2(K) * T2) / T3 Intercept(K) = MeanY - SlopeX1(K) * MeanX1 - SlopeX2(K) * MeanX2 T1 = SumY ^ 2 / NDATA R2(K) = (Intercept(K) * SumY + SlopeX1(K) * SumX1Y + SlopeX2(K) * SumX2Y - T1) / (SumYSqr - T1) USE SlopeX1(K) = 0 SlopeX2(K) = 0 Intercept(K) = 0 R2(K) = 0 END WHEN NEXT ITY NEXT ITX2 NEXT ITX1 MAX_CURVES = K CALL SortResults PRINT PRINT "TOP 10 CURVES" ! Show top 10 best cyrve fits FOR I = 1 TO 10 PRINT "R^2 = ";R2(I) PRINT SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";SlopeX1(I);") * "; SayTransf$(TX1(I), "X");" + (";SlopeX2(I);") * "; SayTransf$(TX2(I), "X") PRINT NEXT I I = POS(A$, ".") IF I > 0 THEN R$ = A$[1:I-1] & "_REPORT.TXT" ELSE R$ = A$ & "_REPORT.TXT" END IF OPEN #1: NAME R$, ORG TEXT, CREATE NEWOLD, ACCESS OUTIN ERASE #1 PRINT #1: "DATA SOURCE ";A$ D$ = DATE$ PRINT #1: D$[5:6] & "/" & D$[7:8] & "/" & D$[1:4] & " " & TIME$ PRINT #1: "" FOR I = 1 TO MAX_CURVES IF R2(I) <= 0 THEN EXIT FOR PRINT #1: "R^2 = ";R2(I) PRINT #1: SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";SlopeX1(I);") * "; SayTransf$(TX1(I), "X1");" + (";SlopeX2(I);") * "; SayTransf$(TX2(I), "X2") PRINT #1: "" NEXT I CLOSE #1 PRINT "EXPANDED LIST OR CURVE FITS WAS WRITTEN TO FILE ";R$ ELSE PRINT "INVALID CHOICE" END IF LOOP UNTIL CH = 0 END
Copyright (c) Namir Shammas. All rights reserved.