The following program calculates the the best model and statistical coefficients for the following model:
H(Y) = A + B F(X)
Where X is the independent variable and Y is the dependent variable. In addition, H() and F() are transformation functions for the regression variables. The program also calculates the coefficient of determination R-Square.
The program performs different transformations on all the variables. These transformations include:
The program attempts to fit a total of 121 different curves. For data that have only positive values, the program succeeds in calculating 121 different models. The presence of negative values and zeros will reduce the number of models tested. The application skips certain transformations for an entire data set if ANY value is zero and/or negative. The bypass prevents run-time errors. Skipping a transformation for an entire data set makes the models easier to compare since they all are based on the same number of observations.
Click here to download a ZIP file containing the project files for this program.
The program is a Windows application that has the following interface:
The above interface has the following controls:
The application reads data from text files. Each line in the source text file may be one of the following:
Here is an example of a data file:
Sample Data file Created 1/31/2006 General format for a data line is (the Weight value is optional): X,Y[,Weight] Next we have X = 100 and Y = 212 100,212 Notice leading spaces on next line 10,50 The next line has a commented observation ! 33,45 25,77 Next data line has a weight value of 2 (X = 30, Y = 86, and weight = 2) 30,86,2
The application allows for flexible commenting throughout the text file and is able to extract the data. You can add leading characters like !, #, or % as the first character of a comment line. This option may make it easier for the human eye to spot comment lines. It may also make it easier for a separate utility program to strip the comment lines.
One reason clicking the Read Data button displays the data is to allow you to double check the integrity of the data. If a data line has only one value, then the application generates flags an error. If a data line has more than 3 values, the program ignores the extra values and does not raise an error.
The application shifts and scales data using the following formulas:
X' = ScaleX * (X - ShiftX)
Y' = ScaleY * (Y - ShiftY)
Keep the above equations in mind when you assign values for the shift and/or scale factors.
Some of the mathematical transformations used take arguments that are only positive or only non-negative. In case you source data contains zeros and/or negative values, the application will avoid applying certain mathematical transformation to avoid causing run-time errors. Keep in mind that the program applies such avoidance to the entire data set and not just to those specific values that can cause error. You will notice the difference in the number of models display depending on your source data range. When using all-positive observations, the applications applies the entire set of transformations. When you have zeros or negative values, the application applies fewer transformations.
Here is a sample output:
The above output shows the first two best regression models. Here is the simple help message box:
The project file contains the following modules and classes of interest:
Here is the listing for class Form1:
Imports System.IO Public Class Form1 Private sDataFilename As String Private bEditMode As Boolean Private bTextHasChanged As Boolean Private Sub cmdCalc_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdCalc.Click Dim objLR As CStatSum Dim objRes As CResults Dim I, nDataCOunt As Integer Dim fShiftX, fShiftY, fScaleX, fScaleY As Double Dim sBuffer As String If sDataFilename = "" Then MessageBox.Show("Please select a data file first", "Error", MessageBoxButtons.OK, MessageBoxIcon.Hand) Exit Sub End If objLR = New CStatSum objRes = New CResults If bTextHasChanged Then If MessageBox.Show("Save changed data?", "Confirmation", MessageBoxButtons.YesNo, MessageBoxIcon.Question) = Windows.Forms.DialogResult.Yes Then File.WriteAllText(sDataFilename, txtRes.Text) End If bTextHasChanged = False End If bEditMode = False ' check the Shift X text box If txtShiftX.Text.Length > 0 Then fShiftX = Double.Parse(txtShiftX.Text) Else fShiftX = 0 End If ' check the Shift Y text box If txtShiftY.Text.Length > 0 Then fShiftY = Double.Parse(txtShiftY.Text) Else fShiftY = 0 End If ' check the Scale X text box If txtScaleX.Text.Length > 0 Then fScaleX = Double.Parse(txtScaleX.Text) If fScaleX = 0 Then fScaleX = 1 Else fScaleX = 1 End If ' check the Scale Y text box If txtScaleY.Text.Length > 0 Then fScaleY = Double.Parse(txtScaleY.Text) If fScaleY = 0 Then fScaleY = 1 Else fScaleY = 1 End If If objLR.GetData(sDataFilename, nDataCOunt, fShiftX, fShiftY, fScaleX, fScaleY) Then Cursor = Cursors.WaitCursor objLR.FindBestFit(objRes) objRes.SortResults() sBuffer = "" sBuffer = "Source Data File: " & sDataFilename & vbCrLf & vbCrLf sBuffer = sBuffer & "Date/Time: " & Now() & vbCrLf & vbCrLf sBuffer = sBuffer & "Number of observations = " & nDataCOunt & vbCrLf & vbCrLf If fScaleX <> 1 Then sBuffer = sBuffer & "Scale X = " & fScaleX & vbCrLf If fShiftX <> 0 Then sBuffer = sBuffer & "Shift X = " & fShiftX & vbCrLf If fScaleY <> 1 Then sBuffer = sBuffer & "Scale Y = " & fScaleY & vbCrLf If fShiftY <> 0 Then sBuffer = sBuffer & "Shift Y = " & fShiftY & vbCrLf & vbCrLf For I = 0 To objRes.Count - 1 sBuffer = sBuffer & "R-Sqr = " & objRes.GetR2(I).ToString & vbCrLf sBuffer = sBuffer & "Model: " & objRes.GetModel(I) & vbCrLf sBuffer = sBuffer & "A = " & objRes.GetIntercept(I).ToString & _ ", B = " & objRes.GetSlope(I).ToString & vbCrLf Next txtRes.Text = sBuffer sBuffer = "" Cursor = Cursors.Default cmdSaveRes.Enabled = True Else MessageBox.Show("Error in processing data", "Error", MessageBoxButtons.OK, MessageBoxIcon.Error) End If End Sub Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load sDataFilename = "" cmdCalc.Enabled = False cmdSaveRes.Enabled = False bEditMode = False End Sub Private Sub cmdReadData_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdReadData.Click dlgReadData.Filter = "All files (*.*)|*.*|Text files|*.txt|Data files (*.dat)|*.dat" If dlgReadData.ShowDialog = Windows.Forms.DialogResult.OK Then sDataFilename = dlgReadData.FileName txtRes.Text = File.ReadAllText(sDataFilename) cmdCalc.Enabled = True cmdSaveRes.Enabled = True bTextHasChanged = False bEditMode = True End If End Sub Private Sub cmdSaveRes_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdSaveRes.Click dlgSaveRes.Filter = "All files (*.*)|*.*|Text files|*.txt|Data files (*.dat)|*.dat" If dlgSaveRes.ShowDialog = Windows.Forms.DialogResult.OK Then File.WriteAllText(dlgSaveRes.FileName, txtRes.Text) bTextHasChanged = False End If End Sub Private Sub cmdClose_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdClose.Click If MessageBox.Show("Close application?", "Confirmation", MessageBoxButtons.YesNo, MessageBoxIcon.Question) = Windows.Forms.DialogResult.Yes Then Close() End If End Sub Private Sub txtRes_TextChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles txtRes.TextChanged If bEditMode Then bTextHasChanged = True End Sub Private Sub cmdHelp_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdHelp.Click Dim sText As String Dim I As Integer sText = "Each line can be 1) empty, 2) a comment line or 3) a data line" & vbCrLf & _ "A data line has a pair of y and x values separated by a comma" & vbCrLf & _ "A data line can have a weight value that is appended after x and is separated by a comma" & vbCrLf & _ "Weights are optional and need to appear when their values are not 1" & vbCrLf & _ "A comment line must NOT start with any of the chatacters +-.0123456789" & vbCrLf & _ "Show an example?" If MessageBox.Show(sText, "Help", MessageBoxButtons.YesNo, MessageBoxIcon.Information) = Windows.Forms.DialogResult.Yes Then bEditMode = False sText = "Sample data (example of free form comment line)" & vbCrLf & _ "45,32" & vbCrLf & _ "67,34" & vbCrLf & _ "Next line is an observation that is temporaryly commente dout" & vbCrLf & _ "! 56,23" & vbCrLf For I = 1 To 10 sText = sText & Int(200 * Rnd(1)) & "," & Int(200 * Rnd(1)) & vbCrLf Next txtRes.Text = sText End If End Sub End Class
Here is the listing for class CErrors:
Public Class CErrors Private m_sErrors() As String Private m_nNumErrs As Integer Public Sub New() Clear() End Sub Public Sub Add(ByVal sErr As String) ReDim Preserve m_sErrors(m_nNumErrs + 1) m_sErrors(m_nNumErrs) = sErr m_nNumErrs = m_nNumErrs + 1 End Sub Public Function GetCount() As Integer Return m_nNumErrs End Function Public Function GetErrText(ByVal nIndex As Integer) As String Dim sErrText As String = "Out of Bound Index" Try sErrText = m_sErrors(nIndex) Catch End Try Return sErrText End Function Public Sub Clear() m_nNumErrs = 0 ReDim m_sErrors(1) End Sub End Class
Here is the listing for class CResults::
Public Class CResults Private m_nResCount As Integer Private m_uResRec() As TypeModule.ResType Public Sub New() m_nResCount = 0 End Sub Public Sub Add(ByVal sModel As String, ByVal fR2 As Double, ByVal fSlope As Double, ByVal fIntercept As Double, ByVal sErr As String) ReDim Preserve m_uResRec(m_nResCount + 1) m_uResRec(m_nResCount).m_sModel = sModel m_uResRec(m_nResCount).m_fR2 = fR2 m_uResRec(m_nResCount).m_fSlope = fSlope m_uResRec(m_nResCount).m_fIntercept = fIntercept m_uResRec(m_nResCount).m_sErr = sErr m_nResCount = m_nResCount + 1 End Sub Public Sub Clear() m_nResCount = 0 End Sub Public Function Count() As Integer Return m_nResCount End Function Public Function GetModel(ByVal nIndex As Integer) As String Return IIf(nIndex > -1 And nIndex < m_nResCount, m_uResRec(nIndex).m_sModel, "") End Function Public Function GetR2(ByVal nIndex As Integer) As Double Return IIf(nIndex > -1 And nIndex < m_nResCount, m_uResRec(nIndex).m_fR2, -1) End Function Public Function GetSlope(ByVal nIndex As Integer) As Double Return IIf(nIndex > -1 And nIndex < m_nResCount, m_uResRec(nIndex).m_fSlope, -1.0E+30) End Function Public Function GetIntercept(ByVal nIndex As Integer) As Double Return IIf(nIndex > -1 And nIndex < m_nResCount, m_uResRec(nIndex).m_fIntercept, -1.0E+30) End Function Public Function GetErr(ByVal nIndex As Integer) As String Return IIf(nIndex > -1 And nIndex < m_nResCount, m_uResRec(nIndex).m_sErr, "") End Function Public Sub SortResults() Dim bInorder As Boolean Dim I, J, N, nOffset, nResetCounter As Integer Dim uBuffer As TypeModule.ResType N = m_nResCount nOffset = N nResetCounter = 0 Do nOffset = (5 * nOffset) / 11 If nOffset = 0 Then nOffset = 1 bInorder = True For I = 0 To N - nOffset - 1 J = I + nOffset If m_uResRec(I).m_fR2 < m_uResRec(J).m_fR2 Then uBuffer = m_uResRec(I) m_uResRec(I) = m_uResRec(J) m_uResRec(J) = uBuffer bInorder = False End If Next If bInorder Then nResetCounter += 1 If (Not bInorder) And (nOffset = 1) Then nOffset = N For I = 1 To nResetCounter nOffset = (5 * nOffset) / 11 Next If nOffset = 0 Then nOffset = 1 End If Loop Until nOffset = 1 And bInorder End Sub End Class
Here is the listing for class CStatSum:
Imports System.IO Public Class CStatSum Private Const EPSILON = 1.0E-20 Private Const DIGIT_MARKERS = "-+0123456789." Private m_bZeroX As Boolean Private m_bZeroY As Boolean Private m_bNegX As Boolean Private m_bNegY As Boolean Private m_fSum As Double Private m_fSumX As Double Private m_fSumX2 As Double Private m_fSumY As Double Private m_fSumY2 As Double Private m_fSumXY As Double Private m_fMeanX As Double Private m_fMeanY As Double Private m_fSdevX As Double Private m_fSdevY As Double Private m_fSlope As Double Private m_fIntercept As Double Private m_fR2 As Double Private m_sTX As String Private m_sTY As String Private m_sWt As String Private m_fX() As Double Private m_fY() As Double Private m_fWt() As Double Private m_fShiftX As Double Private m_fShiftY As Double Private m_fScaleX As Double Private m_fScaleY As Double Private m_nDataCount As Integer Public Sub InitSums() m_fSum = 0 m_fSumX = 0 m_fSumX2 = 0 m_fSumY = 0 m_fSumY2 = 0 m_fSumXY = 0 End Sub Public Function GetData(ByVal sDataFilename As String, ByRef nDataCount As Integer, _ Optional ByVal ShiftX As Double = 0, Optional ByVal ShiftY As Double = 0, _ Optional ByVal ScaleX As Double = 0, Optional ByVal ScaleY As Double = 1) As Boolean Dim sLine, sLines() As String Dim sData() As String Dim I, J, K, N As Integer Dim bRes As Boolean = True Try sLines = File.ReadAllLines(sDataFilename) nDataCount = sLines.GetUpperBound(0) ' Dimension arrays for maximum capacity ReDim m_fX(nDataCount) ReDim m_fY(nDataCount) ReDim m_fWt(nDataCount) J = 0 m_nDataCount = 0 Do Until J = sLines.Length sLine = sLines(J).Trim() ' is line not empty? If sLine.Length > 0 Then ' is it NOT a comment? If DIGIT_MARKERS.IndexOf(sLine.Substring(0, 1)) >= 0 Then sData = sLine.Split(",") N = sData.GetUpperBound(0) m_fX(m_nDataCount) = Double.Parse(sData(0)) m_fY(m_nDataCount) = Double.Parse(sData(1)) If N < 2 Then m_fWt(m_nDataCount) = 1 Else m_fWt(m_nDataCount) = Double.Parse(sData(2)) End If m_nDataCount += 1 End If End If J += 1 Loop ' adjust arrays to actual number of data ReDim Preserve m_fX(m_nDataCount) ReDim Preserve m_fY(m_nDataCount) ReDim Preserve m_fWt(m_nDataCount) nDataCount = m_nDataCount For I = 0 To m_nDataCount - 1 m_fX(I) = ScaleX * (m_fX(I) - ShiftX) m_fY(I) = ScaleY * (m_fY(I) - ShiftY) Next I Catch ex As Exception bRes = False End Try Return bRes End Function Public Sub New() InitSums() End Sub Private Sub Add(ByVal X As Double, ByVal Y As Double, Optional ByVal Wt As Double = 1) m_fSum = m_fSum + Wt m_fSumX = m_fSumX + X * Wt m_fSumX2 = m_fSumX2 + X * X * Wt m_fSumY = m_fSumY + Y * Wt m_fSumY2 = m_fSumY2 + Y * Y * Wt m_fSumXY = m_fSumXY + X * Y * Wt End Sub Public Sub FindBestFit(ByRef objRes As CResults) Dim I As Integer Dim ITX As FitType Dim ITY As FitType Dim bOK As Boolean Dim fXt, fYt As Double Dim objErrs As New CErrors If m_nDataCount < 3 Then Exit Sub objRes.Clear() Try m_bZeroX = False m_bZeroY = False m_bNegX = False m_bNegY = False 'objRes.Initialize For I = 0 To m_nDataCount - 1 If m_fX(I) < 0 Then m_bNegX = True If m_fY(I) < 0 Then m_bNegY = True If Math.Abs(m_fX(I)) < EPSILON Then m_bZeroX = True If Math.Abs(m_fY(I)) < EPSILON Then m_bZeroY = True Next I m_sWt = "" For ITY = TypeModule.FitType.eLinear To TypeModule.FitType.eLn ' validate transformations If m_bZeroY And m_bNegY Then bOK = CanHandleZeroAndNegative(ITY) ElseIf m_bZeroY Then bOK = CanHandleZero(ITY) ElseIf m_bNegY Then bOK = CanHandleNegative(ITY) Else bOK = True End If ' Can proceed? If bOK Then For ITX = TypeModule.FitType.eLinear To TypeModule.FitType.eLn ' validate transformations If m_bZeroX And m_bNegX Then bOK = CanHandleZeroAndNegative(ITX) ElseIf m_bZeroX Then bOK = CanHandleZero(ITX) ElseIf m_bNegX Then bOK = CanHandleNegative(ITX) Else bOK = True End If ' Can proceed? If bOK Then InitSums() ' store transformation data m_sTX = SayTransform(ITX) m_sTY = SayTransform(ITY, "Y") For I = 0 To m_nDataCount - 1 Select Case ITX Case TypeModule.FitType.eLinear fXt = m_fX(I) Case TypeModule.FitType.eSquare fXt = m_fX(I) ^ 2 Case TypeModule.FitType.eCube fXt = m_fX(I) ^ 3 Case TypeModule.FitType.eCubeRoot fXt = m_fX(I) ^ (1 / 3) Case TypeModule.FitType.eRecip fXt = 1 / m_fX(I) Case TypeModule.FitType.eRecipCubeRoot fXt = 1 / m_fX(I) ^ (1 / 3) Case TypeModule.FitType.eRecipSquare fXt = 1 / m_fX(I) ^ 2 Case TypeModule.FitType.eRecipCube fXt = 1 / m_fX(I) ^ 3 Case TypeModule.FitType.eSqrt fXt = Math.Sqrt(m_fX(I)) Case TypeModule.FitType.eRecipSqrt fXt = 1 / Math.Sqrt(m_fX(I)) Case TypeModule.FitType.eLn fXt = Math.Log(m_fX(I)) End Select Select Case ITY Case TypeModule.FitType.eLinear fYt = m_fY(I) Case TypeModule.FitType.eSquare fYt = m_fY(I) ^ 2 Case TypeModule.FitType.eCube fYt = m_fY(I) ^ 3 Case TypeModule.FitType.eCubeRoot fYt = m_fY(I) ^ (1 / 3) Case TypeModule.FitType.eRecip fYt = 1 / m_fY(I) Case TypeModule.FitType.eRecipCubeRoot fYt = 1 / m_fY(I) ^ (1 / 3) Case TypeModule.FitType.eRecipSquare fYt = 1 / m_fY(I) ^ 2 Case TypeModule.FitType.eRecipCube fYt = 1 / m_fY(I) ^ 3 Case TypeModule.FitType.eSqrt fYt = Math.Sqrt(m_fY(I)) Case TypeModule.FitType.eRecipSqrt fYt = 1 / Math.Sqrt(m_fY(I)) Case TypeModule.FitType.eLn fYt = Math.Log(m_fY(I)) End Select ' add transformed data to statistical summations Add(fXt, fYt, m_fWt(I)) Next I objErrs.Clear() ' calculate regression statistics and store in ' object accessed by m_objRes CalcLR(objErrs) If objErrs.GetCount > 1 Then objRes.Add(m_sTY & " = A + B * " & m_sTX, m_fR2, m_fSlope, m_fIntercept, objErrs.GetErrText(0)) Else objRes.Add(m_sTY & " = A + B * " & m_sTX, m_fR2, m_fSlope, m_fIntercept, "") End If End If Next ITX End If Next ITY Catch ex As Exception objErrs.Add(ex.Message) End Try End Sub Private Function SayTransform(ByVal eVal As FitType, Optional ByVal sVar As String = "X") As String Select Case eVal Case TypeModule.FitType.eLinear Return sVar Case TypeModule.FitType.eSquare Return sVar & "^2" Case TypeModule.FitType.eCube Return sVar & "^3" Case TypeModule.FitType.eCubeRoot Return sVar & "^1/3" Case TypeModule.FitType.eRecip Return "1/" & sVar Case TypeModule.FitType.eRecipCubeRoot Return "1/" & sVar & "^1/3" Case TypeModule.FitType.eRecipSquare Return "1/" & sVar & "^2" Case TypeModule.FitType.eRecipCube Return "1/" & sVar & "^3" Case TypeModule.FitType.eSqrt Return sVar & "^1/2" Case TypeModule.FitType.eRecipSqrt Return "1/" & sVar & "^1/2" Case TypeModule.FitType.eLn Return "Ln(" & sVar & ")" End Select End Function Private Function CanHandleZero(ByVal eVal As FitType) As Boolean Select Case eVal Case TypeModule.FitType.eLinear, TypeModule.FitType.eSquare, TypeModule.FitType.eCube, _ TypeModule.FitType.eCubeRoot, TypeModule.FitType.eSqrt CanHandleZero = True Case Else CanHandleZero = False End Select End Function Private Function CanHandleZeroAndNegative(ByVal eVal As FitType) As Boolean Select Case eVal Case TypeModule.FitType.eLinear, TypeModule.FitType.eSquare, TypeModule.FitType.eCube CanHandleZeroAndNegative = True Case Else CanHandleZeroAndNegative = False End Select End Function Private Function CanHandleNegative(ByVal eVal As FitType) As Boolean Select Case eVal Case TypeModule.FitType.eLinear, TypeModule.FitType.eSquare, TypeModule.FitType.eCube, _ TypeModule.FitType.eRecip, TypeModule.FitType.eRecipSquare, TypeModule.FitType.eRecipCube CanHandleNegative = True Case Else CanHandleNegative = False End Select End Function Private Sub CalcLR(ByRef objErrs As CErrors) If m_fSum < 2 Then Exit Sub ' caluclate regression Try m_fMeanX = m_fSumX / m_fSum m_fMeanY = m_fSumY / m_fSum m_fSdevX = Math.Sqrt((m_fSumX2 - m_fSumX ^ 2 / m_fSum) / (m_fSum - 1)) m_fSdevY = Math.Sqrt((m_fSumY2 - m_fSumY ^ 2 / m_fSum) / (m_fSum - 1)) m_fSlope = (m_fSum * m_fSumXY - m_fSumX * m_fSumY) / _ (m_fSum * m_fSumX2 - m_fSumX ^ 2) m_fIntercept = m_fMeanY - m_fSlope * m_fMeanX m_fR2 = ((m_fSum * m_fSumXY - m_fSumX * m_fSumY) / _ (m_fSum * (m_fSum - 1) * m_fSdevX * m_fSdevY)) ^ 2 Catch ex As Exception objErrs.Add("For model" & m_sTY & " = A + B * " & m_sTX & vbCrLf & ex.Message) End Try End Sub End Class
Copyright (c) Namir Shammas. All rights reserved.