The following program calculates the the best model and statistical coefficients for the following model:
H(Y) = A + B F(X)
Where X is the independent variable and Y is the dependent variable. In addition, H() and F() are transformation functions for the regression variables. The program also calculates the coefficient of determination R-Square.
The program performs different transformations on all the variables. These transformations include:
The program attempts to fit a total of 64 different curves. For data that have only positive values, the program succeeds in calculating 64 different models. The presence of negative values and zeros will reduce the number of models.
The program displays the following simple menu:
BEST LINEAR REGRESSION ======================= 0) QUIT 1) KEYBOARD INPUT 2) FILE INPUT 3) FIND BEST FIT SELECT CHOICE BY NUMBER:
In option 1 the program prompts you to enter the number of observations and then type in the data for X and Y.
In option 2, the program prompts you for the name of the input text file. This file (which has each value on a separate line) specifies the number of observations and then lists the observations for the variables X and Y.
Option 3 causes the program to calculate the best fit and performs the following tasks:
Here is a sample session that fits the data in the following table:
X | Y |
100 | 212 |
10 | 50 |
25 | 77 |
30 | 86 |
35 | 95 |
40 | 104 |
The above data can be read from a text file that looks like this:
6 100 212 10 50 25 77 30 86 35 95 40 104
The top ten models that fit the above data are:
R^2 = 1 Y = ( 32 ) + ( 1.8 ) * X MeanX = 40 MeanY = 104 SdevX = 31.144823 SdevY = 56.060681 R^2 = .99935625 1/SQR(Y) = ( .21511956 ) + (-3.1685186e-2 ) * LOG(X) MeanX = 3.4620093 MeanY = .10542515 SdevX = .74485911 SdevY = .0236086 R^2 = .99841951 Y^3 = ( 360453.48 ) + ( 9.1831683 ) * X^3 MeanX = 191750 MeanY = 2121326 SdevX = 396560.22 SdevY = 3644560.5 R^2 = .99813172 Y^3 = (-214475.78 ) + ( 969.88309 ) * X^2 MeanX = 2408.3333 MeanY = 2121326 SdevX = 3754.2198 SdevY = 3644560.5 R^2 = .99785498 Y^2 = ( 3378.0973 ) + ( 4.1758765 ) * X^2 MeanX = 2408.3333 MeanY = 13435 SdevX = 3754.2198 SdevY = 15694. R^2 = .99622634 1/Y^2 = (-8.3885371e-6 ) + ( 4.1354627e-3 ) * 1/X MeanX = 3.9484127e-2 MeanY = 1.548966e-4 SdevX = 3.1300032e-2 SdevY = 1.2968504e-4 R^2 = .99622573 LOG(Y) = ( 3.2928317 ) + ( .20925336 ) * SQR(X) MeanX = 5.9800231 MeanY = 4.5441716 SdevX = 2.2554798 SdevY = .47285993 R^2 = .99555317 SQR(Y) = ( 3.2993261 ) + ( 1.11005 ) * SQR(X) MeanX = 5.9800231 MeanY = 9.9374506 SdevX = 2.2554798 SdevY = 2.5092807 R^2 = .98888781 1/Y = (-1.4552693e-3 ) + ( 6.9457301e-2 ) * 1/SQR(X) MeanX = .18765778 MeanY = 1.1578934e-2 SdevX = 7.1571091e-2 SdevY = 4.9989872e-3 R^2 = .98791158 SQR(Y) = ( 6.7342628 ) + ( 8.0079697e-2 ) * X MeanX = 40 MeanY = 9.9374506 SdevX = 31.144823 SdevY = 2.5092807
Here is the BASIC listing:
! PROGRAM TO FIND BEST LINEARIZED REGRESSION OPTION TYPO OPTION NOLET DECLARE NUMERIC MAX_CURVES DEClARE NUMERIC ITX, ITY, NDATA, CH, I, K DECLARE NUMERIC SumX, SumX2, SumY, SumY2, SumXY, Yt, Xt DECLARE STRING A$, R$, D$ DIM R2(64), Slope(64), Intercept(64), MeanX(64), MeanY(64), SdevX(64), SdevY(64), TX(64), TY(64) DIM X(1), Y(1) MAX_CURVES = 64 SUB InitStatArrays LOCAL I FOR I = 1 to MAX_CURVES R2(I) = 0 Slope(I) = 0 Intercept(I) = 0 MeanX(I) = 0 MeanY(I) = 0 SdevX(I) = 0 SdevY(I) = 0 TX(i) = 0 TY(i) = 0 NEXT I END SUB SUB SortResults LOCAL I, J, BUFF FOR I = 1 TO MAX_CURVES - 1 FOR J = I+1 TO MAX_CURVES IF R2(I) < R2(J) THEN BUFF = R2(I) R2(I) = R2(J) R2(J) = BUFF BUFF = Slope(I) Slope(I) = Slope(J) Slope(J) = BUFF BUFF = Intercept(I) Intercept(I) = Intercept(J) Intercept(J) = BUFF BUFF = MeanX(I) MeanX(I) = MeanX(J) MeanX(J) = BUFF BUFF = MeanY(I) MeanY(I) = MeanY(J) MeanY(J) = BUFF BUFF = SdevX(I) SdevX(I) = SdevX(J) SdevX(J) = BUFF BUFF = SdevY(I) SdevY(I) = SdevY(J) SdevY(J) = BUFF BUFF = TX(I) TX(I) = TX(J) TX(J) = BUFF BUFF = TY(I) TY(I) = TY(J) TY(J) = BUFF END IF NEXT J NEXT I END SUB DEF SayTransf$(TI, V$) LOCAL B$ SELECT CASE TI CASE 1 B$ = V$ CASE 2 B$ = "LOG(" & V$ &")" CASE 3 B$ = "SQR(" & V$ & ")" CASE 4 B$ = "1/SQR(" & V$ & ")" CASE 5 B$ = "1/" & V$ CASE 6 B$ = V$ & "^2" CASE 7 B$ = "1/" & V$ & "^2" CASE 8 B$ = V$ & "^3" CASE ELSE B$ = V$ END SELECT SayTransf$ = B$ END DEF DO PRINT PRINT TAB(20);"BEST LINEAR REGRESSION" PRINT TAB(20);"======================" PRINT "0) QUIT" PRINT "1) KEYBOARD INPUT" PRINT "2) FILE INPUT" PRINT "3) FIND BEST FIT" INPUT PROMPT "SELECT CHOICE BY NUMBER:":CH IF CH=0 THEN PRINT "BYE!" ELSEIF CH=1 THEN A$ = "KEYBOARD" INPUT PROMPT "ENTER NUMBER OF OBSERVATIONS: ": NDATA MAT REDIM X(NDATA), Y(NDATA) FOR I = 1 TO NDATA PRINT "X(";I;")"; INPUT X(I) PRINT "Y(";I;")"; INPUT Y(I) NEXT I ELSEIF CH=2 THEN INPUT PROMPT "ENTER FILENAME? ":A$ WHEN ERROR IN OPEN #1: NAME A$, ORG TEXT, CREATE OLD, ACCESS INPUT INPUT #1: NDATA MAT REDIM X(NDATA), Y(NDATA) FOR I = 1 TO NDATA INPUT #1: X(I) INPUT #1: Y(I) NEXT I CLOSE #1 USE PRINT "COULD NOT OPEN OR READ FROM FILE ";A$ END WHEN ELSEIF CH=3 THEN CALL InitStatArrays K = 0 FOR ITX = 1 TO 8 FOR ITY = 1 to 8 SumX = 0 SumY = 0 SumX2 = 0 SumY2 = 0 SumXY = 0 K = K + 1 TX(K) = ITX TY(K) = ITY WHEN ERROR IN FOR I = 1 TO NDATA SELECT CASE ITX CASE 1 Xt = X(I) CASE 2 Xt = LOG(X(I)) CASE 3 Xt = SQR(X(I)) CASE 4 Xt = 1/SQR(X(I)) CASE 5 Xt = 1/X(I) CASE 6 Xt = X(I)^2 CASE 7 Xt = 1/X(I)^2 CASE 8 Xt = X(I)^3 CASE ELSE Xt = X(i) END SELECT SELECT CASE ITY CASE 1 Yt = Y(I) CASE 2 Yt = LOG(Y(I)) CASE 3 Yt = SQR(Y(I)) CASE 4 Yt = 1/SQR(Y(I)) CASE 5 Yt = 1/Y(I) CASE 6 Yt = Y(I)^2 CASE 7 Yt = 1/Y(I)^2 CASE 8 Yt = Y(I)^3 CASE ELSE Yt = Y(I) END SELECT SumX = SumX + Xt SumX2 = SumX2 + Xt^2 SumY = SumY + Yt SumY2 = SumY2 + Yt^2 SumXY = SumXY + Xt * Yt NEXT I MeanX(K) = SumX / NDATA MeanY(K) = SumY / NDATA SdevX(K) = Sqr((SumX2 - SumX^2/NDATA)/(NDATA-1)) SdevY(K) = Sqr((SumY2 - SumY^2/NDATA)/(NDATA-1)) Slope(K) = (NDATA * SumXY - SumX * SumY) / (NDATA * SumX2 - SumX ^ 2) Intercept(K) = MeanY(K) - Slope(K) * MeanX(K) R2(K) = ((NDATA * SumXY - SumX * SumY) / (NDATA * (NDATA - 1) * SdevX(K) * SdevY(K))) ^ 2 USE MeanX(K) = 0 MeanY(K) = 0 SdevX(K) = 0 SdevY(K) = 0 Slope(K) = 0 Intercept(K) = 0 R2(K) = 0 END WHEN NEXT ITY NEXT ITX CALL SortResults PRINT PRINT "TOP 5 CURVES" ! Show top 5 best cyrve fits FOR I = 1 TO 5 PRINT "R^2 = ";R2(I) PRINT SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";Slope(I);") * "; SayTransf$(TX(I), "X") PRINT "MeanX = "; MeanX(I);" MeanY = ";MeanY(I) PRINT "SdevX = "; SdevX(I);" SdevY = ";SdevY(I) PRINT NEXT I I = POS(A$, ".") IF I > 0 THEN R$ = A$[1:I-1] & "_REPORT.TXT" ELSE R$ = A$ & "_REPORT.TXT" END IF OPEN #1: NAME R$, ORG TEXT, CREATE NEWOLD, ACCESS OUTIN ERASE #1 PRINT #1: "DATA SOURCE ";A$ D$ = DATE$ PRINT #1: D$[5:6] & "/" & D$[7:8] & "/" & D$[1:4] & " " & TIME$ PRINT #1: "" FOR I = 1 TO MAX_CURVES IF R2(I) <= 0 THEN EXIT FOR PRINT #1: "R^2 = ";R2(I) PRINT #1: SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";Slope(I);") * "; SayTransf$(TX(I), "X") PRINT #1: "MeanX = "; MeanX(I);" MeanY = ";MeanY(I) PRINT #1: "SdevX = "; SdevX(I);" SdevY = ";SdevY(I) PRINT #1: "" NEXT I CLOSE #1 PRINT "FULL LIST OR CURVE FITS WAS WRITTEN TO FILE ";R$ ELSE PRINT "INVALID CHOICE" END IF LOOP UNTIL CH = 0 END
Copyright (c) Namir Shammas. All rights reserved.