Curve Fitting using MATLAB
Curve fitting is the mathematical process in which we design the curve to fit the given data sets to a maximum extent. Here, we find the specific solution connecting the dependent and the independent variables for the provided data. The fitting should be as accurate as possible for the input data. The advantage of curve fitting is that it helps in optimizing the models to perform better.
Using MATLAB, you can efficiently perform curve fitting and interpolations. It has an in-built toolbox that helps you to visualize the curve fitting. Apart from using the in-built toolbox, you can manually program your model to fit your data. As a result, curve fitting is widely applicable in a vast sector of engineering and science. In this article, we will see how to fit Matlab using the curve fitting toolbox.
Prerequisites
To follow along with this tutorial, you need to have:
Overview of the curve fitting toolbox
Here, we will use the curve fitting toolbox available in Matlab to fit our set of points. Also, generating Matlab code for whatever we are going to do and use the generated code to fit some data is covered.
To locate the curve fitting toolbox, click on the apps at the top-right of the Matlab window. Once this is open, you will see the curve fitting toolbox as shown.
Upon clicking the curve fitting toolbox, the following window will appear.
Now, let us look at the various parts of this toolbox and describe their multiple functions.
The above window shows that the curve fitting toolbox has seven divisions, and each division has different functions. Therefore, we will look at every section.
- Area1 - This is the data selection area. As we can see, it has an autogenerated fit name, which is
untitled fit 1
.
Note, you can make the
fit name
whatever you want. You can choose yourx
,y
, andz
data in this section, considering the data you want to fit.
- Area2 - This area is for choice of fit. You choose the type of fit you want for your data. By default, it is referred to as
interpolant
. - Area3 - In this section, you choose how the model should choose the best parameter. You can select any method you wish to use by clicking on the dropdown arrow on the method section. We use the default method, the
linear
approach, also known as the least linear square estimate for this article. - Area4 - You can center and scale your data by enabling the
center and scale
. It is possible by clicking on the checkbox, and when you see atick
on the checkbox, this indicates that it's enabled. - Area5 - Once you have entered your data, and you want Matlab to fit your data automatically, then you select the
autofit
checkbox here. In autofit, Matlab determines the appropriate type of fit for your data and fits it automatically. In case you want to fit it manually, then you select thefit
. You can also stop fitting your data by clicking on the stop button below the fit button. - Area6 - Here, you will see the type of fit, constants of the model, coefficients of the variable, and the goodness of the fit.
- Area7 - Here, you get the figure for your fitted data.
- Area8 - This is the table of fit. As we can see, it shows the
fitname
,fit type
, and theerrors
.
Curve fitting
We are now going to input our data and see how these sections work when fitting your curve. To start your curve fitting, you have to choose your x
and y
variables. We cannot input the x
and y
variables in the curve fitting app. It means that we first give the x
and y
variables in the command window and then call them in the curve fitting app.
So you give a simple input for the x
and y
as shown below;
x = [1, 2, 3, 4, 5];
y = [6, 7, 8, 9, 10];
Once you save these data in the workspace, you need to go back to your curve fitting toolbox, choose the data, and use them. In this case, we input two variables, that is, x
and y
. If you click on the dropdown button to choose your data, you will see only the x
and y
. Once you choose your x
data, then it is automatically plotted in 3-D as shown below;
When we choose our y
data, we get our fit as shown;
Our data automatically fits since we enabled the auto fit
. As we can see, Matlab has chosen the type of fit, which is polynomial
of one degree. One degree polynomial is an exact fit through any two points with distinct x
coordinates. If we look at the result section, we see the model of our fit.
In this case, our fit model is f(x)=p1*x + p2
. We also have the coefficients of our fit, the goodness of the fit, and the errors. You can also refer to the co-efficient of the curve as the bounds of the curve.
The goodness of the fit shows how well the model can fit your data. We will look at it in detail when looking at the errors of the data. As we can see in our data table, we have a list of four error measures: SSE, R square, adjusted R square, and RMSE.
As we said before, it is possible to fit your data using your fit method manually. For example, suppose you need a custom equation. In that case, you can choose from the available model's list by clicking on the dropdown arrow in the model method section. We also have the fit options given when you click on the fit option
shown below;
When you click on the fit option
, a new window opens up.
When you click on the fit options, it lists the coefficients. As you can see, we have two coefficients in our model, p1
and p2
. For the two coefficients, we specify the lower and the upper bound. It is essential when we fit models with many solutions. So, in this case, we know the range at which a variable can vary.
For example, let's say we have a record of air temperatures as our input data. In that case, we know that the temperature range can vary from -40 degrees to a maximum of 50 or 60 degrees. Accordingly, we can give a bound for the data so that Matlab will not consider the other solutions for this parameter.
For example, if you measure modulus or stiffness parameter, you know it cannot be negative. It shows the importance of understanding the bound of your given data. So, accordingly, we can choose our coefficients here.
By default, the coefficients are -inf
, which is negative infinity for the lower bound to inf
, which is positive for the upper bound. So let's say, for example, we are a given data. The data law states that the coefficients should vary from 0-1, then specify that range in the fit option window shown above.
Specifying the range is essential because it searches for the solutions whose coefficients are within that range alone. In most cases, we have to specify this for the correct solution to our problem. So let's give our range to be o - 1 for the lower limits as shown below;
If you want to input your equation, you choose custom equation
as the appropriate method. When you select a custom equation
, a new window opens up that allows you to input your equation.
As we can see, our dependent variable is y
, and the independent variable is x
. This dependent variable can be anything depending on the data given. It is why we specify y
as the function of its independent variables.
Since our fit is in the form of y =mx + c
which is given as a*exp(-b*x)+c
. Assuming that we want our curve to be fitted in the form of y = mx
and see the results. It means the equation is going to be a*x
, and the output is:
Since the autofit is activated, the fit automatically fits. However, as we can see, the points are not correctly fitted since the constant term is missing. Also, in the result section, we can see more information about the curve.
We see it has specified the coefficient as 2.364bound
and the bound as 1.471
for lower limit, 3.256
for the upper limit. The goodness of the fit is also given by the error measures as shown. Additionally, we have a warning of a better fitting method for our data. It is a suggestion to improve your model.
How to compare fits
In this section, we understand how to compare two different fits. It involves comparing the coefficients, bounds, and even the goodness of the fits. We do this by keeping the fits in a single space and viewing them through the tools provided. Then, to add a new fit, click on the fit
option in the menu bar and click on the new fit
option. Once we finish this, we see a new window opens up with an autogenerated name untitled2
. Also, when we look at the table of fit
section, we see information about the curve added there.
As you can see, we can compare the error measures, fit type, and even the coefficients. By this, you can have multiple fits and save them in the same window and compare.
Error measures
These error measures define the goodness of any fit. It is not only for Matlab but also for other languages that you use for modeling for data prediction. We use the error measures to predict how well the model can predict your data or how closely it can predict. There are four types of error measures;
- Sum of squares error (SSE)
- R-square
- Adjusted R-square
- Root mean squared error (RMSE)
Sum of squares error (SSE)
It measures the total deviation of the response value from the fit to the fit values. Let's say you want to take the data for the temperature of your pavement. The first thing you know is that the temperature of that pavement depends on the air temperature.
It means that if you know how the air temperature varies, you can know how the pavement temperature varies. Also, the pavement temperature depends not on the air temperature but also on some external factors. So now we have the pavement temperature, which is the variability.
It depends on the air temperature, the explained variability, and some external factors, known as unexplained variability. SST (total sum of squares) explains variability is SSR (sum of square regressions), and the unexplained variability is SSE (Sum of squares due to error).
pavement temprature variability = air temprature + other factors
SST = SSR + SSE
SST is the total variability from the mean. SSR is the variability explained due to the model. Finally, SSE is the sum of the random error, which the model does not attributes. Let's look at the sketch below for illustration and easy understanding.
In this sketch, $y_i$ is the data point, y<sup>^</sup> is the model and y<sup>-</sup> is the mean.
SST = ($y_i$ - y<sup>-</sup>)<sup>2</sup> SSR = (y<sup>^</sup>-y<sup>-</sup>)<sup>2</sup> SSE = ($y_i$ - y<sup>^</sup>)<sup>2</sup>
R Squared
It clarifies the extent of variability(SST) in the model. We can also refer to it as the ratio of SSR to the SST. So it is given by;
R square = SSR/SST
Adjusted R-squared
Let's assume that we have a model with two terms, and we want to add one more term to the model. When we add one more term to the model, the goodness of the fit improves, reducing error measures.
The addition of the term is because it explains the variability of the data, or it is because of some random phenomena. This adjusted R-square shows the goodness of the fit and adjusts according to the number of parameters. The number of parameters that provide the best fit is evaluated using this adjusted R-square.
Root mean squared error(RMSE)
It is the standard deviation of the residuals. Residual is the variability between the model and the data points. It shows how far your data points around your fit, which uses the standard deviation measure.
How to generate code for your model
After designing your model using a curve fitting toolbox, you can generate the code behind the model. It is done by;
- Click on the
file
in the menu bar - Choose the
generate code
option
It generates a script that contains the program for our model.
function [fitresult, gof] = createFit(x, y)
%CREATEFIT(X,Y)
% Create a fit.
%
% Data for 'untitled fit 1' fit:
% X Input : x
% Y Output: y
% Output:
% fitresult : a fit object representing the fit.
% gof : structure with goodness-of-fit info.
%
% See also FIT, CFIT, SFIT.
% Auto-generated by MATLAB on 18-Aug-2021 16:41:14
%% Fit: 'untitled fit 1'.
[xData, yData] = prepareCurveData( x, y );
% Set up fittype and options.
ft = fittype( 'a*x', 'independent', 'x', 'dependent', 'y' );
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
opts.Display = 'Off';
opts.StartPoint = 0.880107716657968;
% Fit model to data.
[fitresult, gof] = fit( xData, yData, ft, opts );
% Plot fit with data.
figure( 'Name', 'untitled fit 1' );
h = plot( fitresult, xData, yData );
legend( h, 'y vs. x', 'untitled fit 1', 'Location', 'NorthEast' );
% Label axes
xlabel x
ylabel y
grid on
Conclusion
Matlab is a better tool for curve fitting. It is because it has curve fitting tools, which makes it easier to fit your data. The curve fitting tool is easy to use, and most of the curve fitting activities are done for you.
It can choose the suitable model for your data but also allows you to use your preferred model. Also, using the toolbox, you don't write the codes, but Matlab autogenerates them. We can use Matlab's autogenerated code to predict other data.
Peer Review Contributions by: Lalithnarayan C