The following program calculates the the best model and statistical coefficients for the following model:
H(Y) = A + B F(X1) + C G(X2)
Where X1, and X2 are independent variables and Y is the dependent variable. In addition, H(), F(), and G() are transformation functions for the regression variables. The program also calculates the coefficient of determination R-Square.
The program performs different transformations on all the variables. These transformations include:
The program attempts to fit a total of 512 different curves. For data that have only positive values, the program succeeds in calculating 512 different models. The presence of negative values and zeros will reduce the number of models.
The program displays the following simple menu:
BEST MULTIPLE LINEAR REGRESSION (3 VARS) ======================================== 0) QUIT 1) KEYBOARD INPUT 2) FILE INPUT 3) FIND BEST FIT SELECT CHOICE BY NUMBER:
In option 1 the program prompts you to enter the number of observations and then type in the data for X1, X2, and Y.
In option 2, the program prompts you for the name of the input text file. This file (which has each value on a separate line) specifies the number of observations and then lists the observations for the variables X1, X2, and Y.
Option 3 causes the program to calculate the best fit and performs the following tasks:
Here is a sample session that fits the data in the following table:
X | Y | Z |
1.5 | 0.7 | 2.1 |
0.45 | 2.3 | 4.0 |
1.8 | 1.6 | 4.1 |
2.8 | 4.5 | 9.4 |
2 | 3 | 6.37 |
1.5 | 0.7 | 2.23 |
The above data can be read from a text file that looks like this:
6 1.5 0.7 2.1 0.45 2.3 4.0 1.8 1.6 4.1 2.8 4.5 9.4 2 3 6.37 1.5 0.7 2.23
The top ten models that fit the above data are:
R^2 = .99986932 Y^3 = (-35.641994 ) + ( 20.911468 ) * X1^2 + ( 7.718001 ) * X2^3 R^2 = .99975937 Y^3 = (-60.252326 ) + ( 47.992676 ) * X1 + ( 8.3029893 ) * X2^3 R^2 = .99966833 Y^3 = (-105.61953 ) + ( 96.299552 ) * SQR(X1) + ( 8.5067121 ) * X2^3 R^2 = .99961153 Y^3 = (-20.961612 ) + ( 10.010019 ) * X1^3 + ( 6.9696742 ) * X2^3 R^2 = .9995716 Y^3 = (-5.7525922 ) + ( 46.239598 ) * LOG(X1) + ( 8.6577173 ) * X2^3 R^2 = .99947766 Y^3 = ( 83.136325 ) + (-85.114487 ) * 1/SQR(X1) + ( 8.7646497 ) * X2^3 R^2 = .99940971 Y = (-2.0382938 ) + ( .39551164 ) * X1^2 + ( 3.9245134 ) * SQR(X2) R^2 = .99939375 Y^3 = ( 39.287771 ) + (-37.648892 ) * 1/X1 + ( 8.8370305 ) * X2^3 R^2 = .99930196 Y^2 = (-7.1660368 ) + ( 7.1572968 ) * X1 + ( 3.7341173 ) * X2^2 R^2 = .99927175 Y^3 = ( 20.841072 ) + (-13.308537 ) * 1/X1^2 + ( 8.9133897 ) * X2^3
Here is the BASIC listing:
! PROGRAM TO FIND BEST MULTIPLE LINEARIZED REGRESSION FOR 3 VARIABLES ! ! F(Y) = A + B G(X1) + C H(X2) OPTION TYPO OPTION NOLET DECLARE NUMERIC MAX_CURVES DEClARE NUMERIC ITX1, ITX2, ITY, NDATA, CH, I, K, A, B, T1, T2, T3 DECLARE NUMERIC SumX1, SumX1Sqr, SumY, SumYSqr, SumX1Y, Yt, X1t, X2t DECLARE NUMERIC SumX2, SumX2Sqr, SumX1X2, SumX2Y, MeanX1, MeanX2, MeanY DECLARE STRING A$, R$, D$ DIM R2(512), SlopeX1(512), SlopeX2(512), Intercept(512), TX1(512), TY(512), TX2(512) DIM X1(1), X2(1), Y(1) MAX_CURVES = 512 SUB InitStatArrays LOCAL I FOR I = 1 to 512 R2(I) = 0 SlopeX1(I) = 0 SlopeX1(I) = 0 Intercept(I) = 0 TX1(I) = 0 TX2(I) = 0 TY(I) = 0 NEXT I END SUB SUB SortResults LOCAL I, J, BUFF FOR I = 1 TO MAX_CURVES - 1 FOR J = I+1 TO MAX_CURVES IF R2(I) < R2(J) THEN BUFF = R2(I) R2(I) = R2(J) R2(J) = BUFF BUFF = SlopeX1(I) SlopeX1(I) = SlopeX1(J) SlopeX1(J) = BUFF BUFF = SlopeX2(I) SlopeX2(I) = SlopeX2(J) SlopeX2(J) = BUFF BUFF = Intercept(I) Intercept(I) = Intercept(J) Intercept(J) = BUFF BUFF = TX1(I) TX1(I) = TX1(J) TX1(J) = BUFF BUFF = TY(I) TY(I) = TY(J) TY(J) = BUFF BUFF = TX2(I) TX2(I) = TX2(J) TX2(J) = BUFF END IF NEXT J NEXT I END SUB DEF SayTransf$(TI, V$) LOCAL B$ SELECT CASE TI CASE 1 B$ = V$ CASE 2 B$ = "LOG(" & V$ &")" CASE 3 B$ = "SQR(" & V$ & ")" CASE 4 B$ = "1/SQR(" & V$ & ")" CASE 5 B$ = "1/" & V$ CASE 6 B$ = V$ & "^2" CASE 7 B$ = "1/" & V$ & "^2" CASE 8 B$ = V$ & "^3" CASE ELSE B$ = V$ END SELECT SayTransf$ = B$ END DEF DO PRINT PRINT TAB(20);"BEST MULTIPLE LINEAR REGRESSION (3 VARS)" PRINT TAB(20);"========================================" PRINT "0) QUIT" PRINT "1) KEYBOARD INPUT" PRINT "2) FILE INPUT" PRINT "3) FIND BEST FIT" INPUT PROMPT "SELECT CHOICE BY NUMBER: ":CH IF CH=0 THEN PRINT "BYE!" ELSEIF CH=1 THEN A$ = "KEYBOARD" INPUT PROMPT "ENTER NUMBER OF OBSERVATIONS: ": NDATA MAT REDIM X1(NDATA), Y(NDATA), X2(NDATA) FOR I = 1 TO NDATA PRINT "X1(";I;")"; INPUT X1(I) PRINT "X2(";I;")"; INPUT X2(I) PRINT "Y(";I;")"; INPUT Y(I) NEXT I ELSEIF CH=2 THEN INPUT PROMPT "ENTER FILENAME? ":A$ WHEN ERROR IN OPEN #1: NAME A$, ORG TEXT, CREATE OLD, ACCESS INPUT INPUT #1: NDATA MAT REDIM X1(NDATA), X2(NDATA), Y(NDATA) FOR I = 1 TO NDATA INPUT #1: X1(I) INPUT #1: X2(I) INPUT #1: Y(I) NEXT I CLOSE #1 USE PRINT "COULD NOT OPEN OR READ FROM FILE ";A$ END WHEN ELSEIF CH=3 THEN CALL InitStatArrays K = 0 FOR ITX1 = 1 TO 8 FOR ITX2 = 1 TO 8 FOR ITY = 1 to 8 SumX1 = 0 SumX2 = 0 SumY = 0 SumX1Sqr = 0 SumX2Sqr = 0 SumYSqr = 0 SumX1Y = 0 SumX2Y = 0 SumX1X2 = 0 K = K + 1 TX1(K) = ITX1 TX2(K) = ITX2 TY(K) = ITY WHEN ERROR IN FOR I = 1 TO NDATA SELECT CASE ITX1 CASE 1 X1t = X1(I) CASE 2 X1t = LOG(X1(I)) CASE 3 X1t = SQR(X1(I)) CASE 4 X1t = 1/SQR(X1(I)) CASE 5 X1t = 1/X1(I) CASE 6 X1t = X1(I)^2 CASE 7 X1t = 1/X1(I)^2 CASE 8 X1t = X1(I)^3 CASE ELSE X1t = X1(i) END SELECT SELECT CASE ITX2 CASE 1 X2t = X2(I) CASE 2 X2t = LOG(X2(I)) CASE 3 X2t = SQR(X2(I)) CASE 4 X2t = 1/SQR(X2(I)) CASE 5 X2t = 1/X2(I) CASE 6 X2t = X2(I)^2 CASE 7 X2t = 1/X2(I)^2 CASE 8 X2t = X2(I)^3 CASE ELSE X2t = X2(i) END SELECT SELECT CASE ITY CASE 1 Yt = Y(I) CASE 2 Yt = LOG(Y(I)) CASE 3 Yt = SQR(Y(I)) CASE 4 Yt = 1/SQR(Y(I)) CASE 5 Yt = 1/Y(I) CASE 6 Yt = Y(I)^2 CASE 7 Yt = 1/Y(I)^2 CASE 8 Yt = Y(I)^3 CASE ELSE Yt = Y(I) END SELECT SumX1 = SumX1 + X1t SumX2 = SumX2 + X2t SumY = SumY + Yt SumX1Sqr = SumX1Sqr + X1t^2 SumX2Sqr = SumX2Sqr + X2t^2 SumYSqr = SumYSqr + Yt^2 SumX1Y = SumX1Y + X1t * Yt SumX2Y = SumX2Y + X2t * Yt SumX1X2 = SumX1X2 + X1t * X2t NEXT I MeanX1 = SumX1 / NDATA MeanX2 = SumX2 / NDATA MeanY = SumY / NDATA A = (NDATA * SumX1Sqr - SumX1 ^ 2) * (NDATA * SumX2Y - SumX2 * SumY) B = (NDATA * SumX1X2 - SumX1 * SumX2) * (NDATA * SumX1Y - SumX1 * SumY) T1 = NDATA * SumX1Sqr - SumX1 ^ 2 T2 = NDATA * SumX2Sqr - SumX2 ^ 2 T3 = NDATA * SumX1X2 - SumX1 * SumX2 SlopeX2(K) = (A - B) / (T1 * T2 - T3 ^ 2) T1 = NDATA * SumX1Y - SumX1 * SumY T2 = NDATA * SumX1X2 - SumX1 * SumX2 T3 = NDATA * SumX1Sqr - SumX1 ^ 2 SlopeX1(K) = (T1 - SlopeX2(K) * T2) / T3 Intercept(K) = MeanY - SlopeX1(K) * MeanX1 - SlopeX2(K) * MeanX2 T1 = SumY ^ 2 / NDATA R2(K) = (Intercept(K) * SumY + SlopeX1(K) * SumX1Y + SlopeX2(K) * SumX2Y - T1) / (SumYSqr - T1) USE SlopeX1(K) = 0 SlopeX2(K) = 0 Intercept(K) = 0 R2(K) = 0 END WHEN NEXT ITY NEXT ITX2 NEXT ITX1 CALL SortResults PRINT PRINT "TOP 10 CURVES" ! Show top 10 best cyrve fits FOR I = 1 TO 10 PRINT "R^2 = ";R2(I) PRINT SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";SlopeX1(I);") * "; SayTransf$(TX1(I), "X1");" + (";SlopeX2(I);") * "; SayTransf$(TX2(I), "X2") PRINT NEXT I I = POS(A$, ".") IF I > 0 THEN R$ = A$[1:I-1] & "_REPORT.TXT" ELSE R$ = A$ & "_REPORT.TXT" END IF OPEN #1: NAME R$, ORG TEXT, CREATE NEWOLD, ACCESS OUTIN ERASE #1 PRINT #1: "DATA SOURCE ";A$ D$ = DATE$ PRINT #1: D$[5:6] & "/" & D$[7:8] & "/" & D$[1:4] & " " & TIME$ PRINT #1: "" FOR I = 1 TO MAX_CURVES IF R2(I) <= 0 THEN EXIT FOR PRINT #1: "R^2 = ";R2(I) PRINT #1: SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";SlopeX1(I);") * "; SayTransf$(TX1(I), "X1");" + (";SlopeX2(I);") * "; SayTransf$(TX2(I), "X2") PRINT #1: "" NEXT I CLOSE #1 PRINT "EXPANDED LIST OR CURVE FITS WAS WRITTEN TO FILE ";R$ ELSE PRINT "INVALID CHOICE" END IF LOOP UNTIL CH = 0 END
Copyright (c) Namir Shammas. All rights reserved.