Articles

4.7: Fitting Exponential Models to Data - Mathematics


In the previous section, we saw number lines using logarithmic scales. It is also common to see two dimensional graphs with one or both axes using a logarithmic scale.

One common use of a logarithmic scale on the vertical axis is to graph quantities that are changing exponentially, since it helps reveal relative differences. Both stock charts below show the Dow Jones Industrial Average, from 1928 to 2010.

Both charts have a linear horizontal scale, but the first graph has a linear vertical scale, while the second has a logarithmic vertical scale. The first scale is the one we are more familiar with, and shows what appears to be a strong exponential trend, at least up until the year 2000.

Example (PageIndex{1})

There were stock market drops in 1929 and 2008. Which was larger?

Solution

In the first graph, the stock market drop around 2008 looks very large, and in terms of dollar values, it was indeed a large drop. However, the second graph shows relative changes, and the drop in 2009 seems less major on this graph, and in fact the drop starting in 1929 was, percentage-wise, much more significant.

Specifically, in 2008, the Dow value dropped from about 14,000 to 8,000, a drop of 6,000. This is obviously a large value drop, and amounts to about a 43% drop. In 1929, the Dow value dropped from a high of around 380 to a low of 42 by July of 1932. While value-wise this drop of 338 is much smaller than the 2008 drop, it corresponds to a 89% drop, a much larger relative drop than in 2008. The logarithmic scale shows these relative changes.

The second graph above, in which one axis uses a linear scale and the other axis uses a logarithmic scale, is an example of a semi-log graph.

Definition: Semi-log and log-log GRAPHS

A semi-log graph is a graph with one axis using a linear scale and one axis using a logarithmic scale.

A log-log graph is a graph with both axes using logarithmic scales.

Example (PageIndex{2})

Plot 5 points on the graph of(f(x)=3(2)^{x}) on a semi-log graph with a logarithmic scale on the vertical axis.

Solution

To do this, we need to find 5 points on the graph, then calculate the logarithm of the output value. Arbitrarily choosing 5 input values,

(x)(f(x))log((f(x)))
-3(3(2)^{-1} = dfrac{3}{8})-0.426
-1(3(2)^{-1} = dfrac{3}{2})0.176
0(3(2)^{0} = 3)0.477
2(3(2)^{2} = 12)1.079
5(3(2)^{5} = 96)1.982

Plotting these values on a semi-log graph,

Notice that on this semi-log scale, values from the exponential function appear linear. We can show this behavior is expected by utilizing logarithmic properties. For the function (f(x)=ab^{x}), finding log((f(x))) gives

[log left(f(x) ight)=log left(ab^{x} ight) onumber] Utilizing the sum property of logs,
[log left(f(x) ight)=log left(a ight)+log left(b^{x} ight) onumber] Now utilizing the exponent property,
[log left(f(x) ight)=log left(a ight)+xlog left(b ight) onumber]

This relationship is linear, with log(a) as the vertical intercept, and log(b) as the slope. This relationship can also be utilized in reverse.

Example (PageIndex{3})

An exponential graph is plotted on semi-log axes. Find a formula for the exponential function (g(x)) that generated this graph.

Solution

The graph is linear, with vertical intercept at (0, 1). Looking at the change between the points (0, 1) and (4, 4), we can determine the slope of the line is (dfrac{3}{4}). Since the output is log((g(x))), this leads to the equation (log left(g(x) ight)=1+dfrac{3}{4} x).

We can solve this formula for (g(x)) by rewriting in exponential form and simplifying:

[log left(g(x) ight)=1+dfrac{3}{4} x onumber] Rewriting as an exponential,
[g(x)=10^{1+dfrac{3}{4} x} onumber] Breaking this apart using exponent rules,
[g(x)=10^{1} cdot 10^{dfrac{3}{4} x} onumber] Using exponent rules to group the second factor,
[g(x)=10^{1} cdot left(10^{dfrac{3}{4} } ight)^{x} onumber] Evaluating the powers of 10,
[g(x)=10left(5.623 ight)^{x} onumber]

Exercise (PageIndex{1})

An exponential graph is plotted on a semi-log graph below. Find a formula for the exponential function (g(x)) that generated this graph.

Answer

[g(x) = 10^{2 - 0.5x} = 10^2 (10^{-0.5})^{x},quad f(x) = 100 (0.3162)^x onumber]

Fitting Exponential Functions to Data

Some technology options provide dedicated functions for finding exponential functions that fit data, but many only provide functions for fitting linear functions to data. The semi-log scale provides us with a method to fit an exponential function to data by building upon the techniques we have for fitting linear functions to data.

to fit an exponential function to a set of data using linearization

  1. Find the log of the data output values
  2. Find the linear equation that fits the (input, log(output)) pairs. This equation will be of the form log((f(x))) = (b + mx).
  3. Solve this equation for the exponential function (f(x))

Example (PageIndex{4})

The table below shows the cost in dollars per megabyte of storage space on computer hard drives from 1980 to 2004, and the data is shown on a standard graph to the right, with the input changed to years after 1980.

This data appears to be decreasing exponentially. To find a function that models this decay, we would start by finding the log of the costs.

Solution

As hoped, the graph of the log of costs appears fairly linear, suggesting an exponential function will fit the original data will fit reasonably well. Using technology, we can find a linear equation to fit the log(Cost) values. Using (t) as years after 1980, linear regression gives the equation:

[log (C(t))=2.794-0.231t onumber]

Solving for (C(t)),

[C(t)=10^{2.794-0.231t} onumber]
[C(t)=10^{2.794} cdot 10^{-0.231t} onumber]
[C(t)=10^{2.794} cdot left(10^{-0.231} ight)^{t} onumber]
[C(t)=622cdot left(0.5877 ight)^{t} onumber]

This equation suggests that the cost per megabyte for storage on computer hard drives is decreasing by about 41% each year.

Using this function, we could predict the cost of storage in the future. Predicting the cost in the year 2020 ((t = 40)):

(C(40) =622left(0.5877 ight)^{40} approx 0.000000364) dollars per megabyte, a really small number. That is equivalent to $0.36 per terabyte of hard drive storage.

Comparing the values predicted by this model to the actual data, we see the model matches the original data in order of magnitude, but the specific values appear quite different. This is, unfortunately, the best exponential model that can fit the data. It is possible that a non-exponential model would fit the data better, or there could just be wide enough variability in the data that no relatively simple model would fit the data any better.

YearActual Cost per MBCost predicted by model
1980192.31622.3
198487.8674.3
198815.988.9
199241.1
19960.1730.13
20000.0068490.015
20040.0011490.0018

Exercise (PageIndex{2})

The table below shows the value (V), in billions of dollars, of US imports from China (t) years after 2000.

year200020012002200320042005
(t)012345
(V)100102.3125.2152.4196

This data appears to be growing exponentially. Linearize this data and build a model to predict how many billions of dollars of imports were expected in 2011.

Answer

(V(t) = 90.545 (1.2078)^t). Predicting in 2011, (V(11) = 722.45) billion dollars.

Important Topics of this Section

  • Semi-log graph
  • Log-log graph
  • Linearizing exponential functions
  • Fitting an exponential equation to data

4.7: Fitting Exponential Models to Data - Mathematics

As a first step in moving beyond mean models, random walk models, and linear trend models, nonseasonal patterns and trends can be extrapolated using a moving-average or smoothing model. The basic assumption behind averaging and smoothing models is that the time series is locally stationary with a slowly varying mean. Hence, we take a moving (local) average to estimate the current value of the mean and then use that as the forecast for the near future. This can be considered as a compromise between the mean model and the random-walk-without-drift-model. The same strategy can be used to estimate and extrapolate a local trend. A moving average is often called a "smoothed" version of the original series because short-term averaging has the effect of smoothing out the bumps in the original series. By adjusting the degree of smoothing (the width of the moving average), we can hope to strike some kind of optimal balance between the performance of the mean and random walk models. The simplest kind of averaging model is the.

Simple (equally-weighted) Moving Average:

The forecast for the value of Y at time t+1 that is made at time t equals the simple average of the most recent m observations:

(Here and elsewhere I will use the symbol “Y-hat” to stand for a forecast of the time series Y made at the earliest possible prior date by a given model.) This average is centered at period t-(m+1)/2, which implies that the estimate of the local mean will tend to lag behind the true value of the local mean by about (m+1)/2 periods. Thus, we say the average age of the data in the simple moving average is (m+1)/2 relative to the period for which the forecast is computed: this is the amount of time by which forecasts will tend to lag behind turning points in the data. For example, if you are averaging the last 5 values, the forecasts will be about 3 periods late in responding to turning points. Note that if m=1, the simple moving average (SMA) model is equivalent to the random walk model (without growth). If m is very large (comparable to the length of the estimation period), the SMA model is equivalent to the mean model. As with any parameter of a forecasting model, it is customary to adjust the value of k in order to obtain the best "fit" to the data, i.e., the smallest forecast errors on average.

Here is an example of a series which appears to exhibit random fluctuations around a slowly-varying mean. First, let's try to fit it with a random walk model, which is equivalent to a simple moving average of 1 term:

The random walk model responds very quickly to changes in the series, but in so doing it picks much of the "noise" in the data (the random fluctuations) as well as the "signal" (the local mean). If we instead try a simple moving average of 5 terms, we get a smoother-looking set of forecasts:

The 5-term simple moving average yields significantly smaller errors than the random walk model in this case. The average age of the data in this forecast is 3 (=(5+1)/2), so that it tends to lag behind turning points by about three periods. (For example, a downturn seems to have occurred at period 21, but the forecasts do not turn around until several periods later.)

Notice that the long-term forecasts from the SMA model are a horizontal straight line, just as in the random walk model. Thus, the SMA model assumes that there is no trend in the data. However, whereas the forecasts from the random walk model are simply equal to the last observed value, the forecasts from the SMA model are equal to a weighted average of recent values.

The confidence limits computed by Statgraphics for the long-term forecasts of the simple moving average do not get wider as the forecasting horizon increases. This is obviously not correct! Unfortunately, there is no underlying statistical theory that tells us how the confidence intervals ought to widen for this model. However, it is not too hard to calculate empirical estimates of the confidence limits for the longer-horizon forecasts. For example, you could set up a spreadsheet in which the SMA model would be used to forecast 2 steps ahead, 3 steps ahead, etc., within the historical data sample. You could then compute the sample standard deviations of the errors at each forecast horizon, and then construct confidence intervals for longer-term forecasts by adding and subtracting multiples of the appropriate standard deviation.

If we try a 9-term simple moving average, we get even smoother forecasts and more of a lagging effect:

The average age is now 5 periods (=(9+1)/2). If we take a 19-term moving average, the average age increases to 10:

Notice that, indeed, the forecasts are now lagging behind turning points by about 10 periods.

Which amount of smoothing is best for this series? Here is a table that compares their error statistics, also including a 3-term average:

Model C, the 5-term moving average, yields the lowest value of RMSE by a small margin over the 3-term and 9-term averages, and their other stats are nearly identical. So, among models with very similar error statistics, we can choose whether we would prefer a little more responsiveness or a little more smoothness in the forecasts. (Return to top of page.)

Brown's Simple Exponential Smoothing (exponentially weighted moving average)

The simple moving average model described above has the undesirable property that it treats the last k observations equally and completely ignores all preceding observations. Intuitively, past data should be discounted in a more gradual fashion--for example, the most recent observation should get a little more weight than 2nd most recent, and the 2nd most recent should get a little more weight than the 3rd most recent, and so on. The simple exponential smoothing (SES) model accomplishes this.

Let α denote a "smoothing constant" (a number between 0 and 1). One way to write the model is to define a series L that represents the current level (i.e., local mean value) of the series as estimated from data up to the present. The value of L at time t is computed recursively from its own previous value like this:

Thus, the current smoothed value is an interpolation between the previous smoothed value and the current observation, where α controls the closeness of the interpolated value to the most recent observation. The forecast for the next period is simply the current smoothed value:

Equivalently, we can express the next forecast directly in terms of previous forecasts and previous observations, in any of the following equivalent versions. In the first version, the forecast is an interpolation between previous forecast and previous observation:

In the second version, the next forecast is obtained by adjusting the previous forecast in the direction of the previous error by a fractional amount α :

is the error made at time t. In the third version, the forecast is an exponentially weighted (i.e. discounted) moving average with discount factor 1- α:

The interpolation version of the forecasting formula is the simplest to use if you are implementing the model on a spreadsheet: it fits in a single cell and contains cell references pointing to the previous forecast, the previous observation, and the cell where the value of α is stored.

Note that if α =1, the SES model is equivalent to a random walk model (without growth). If α =0, the SES model is equivalent to the mean model, assuming that the first smoothed value is set equal to the mean. (Return to top of page.)

The average age of the data in the simple-exponential-smoothing forecast is 1/ α relative to the period for which the forecast is computed. (This is not supposed to be obvious, but it can easily be shown by evaluating an infinite series.) Hence, the simple moving average forecast tends to lag behind turning points by about 1/ α periods. For example, when α =0.5 the lag is 2 periods when α =0.2 the lag is 5 periods when α =0.1 the lag is 10 periods, and so on.

For a given average age (i.e., amount of lag), the simple exponential smoothing (SES) forecast is somewhat superior to the simple moving average (SMA) forecast because it places relatively more weight on the most recent observation--i.e., it is slightly more "responsive" to changes occuring in the recent past. For example, an SMA model with 9 terms and an SES model with α =0.2 both have an average age of 5 for the data in their forecasts, but the SES model puts more weight on the last 3 values than does the SMA model and at the same time it doesn’t entirely “forget” about values more than 9 periods old, as shown in this chart:

Another important advantage of the SES model over the SMA model is that the SES model uses a smoothing parameter which is continuously variable, so it can easily optimized by using a "solver" algorithm to minimize the mean squared error. The optimal value of α in the SES model for this series turns out to be 0.2961, as shown here:

The average age of the data in this forecast is 1/0.2961 = 3.4 periods, which is similar to that of a 6-term simple moving average.

The long-term forecasts from the SES model are a horizontal straight line, as in the SMA model and the random walk model without growth. However, note that the confidence intervals computed by Statgraphics now diverge in a reasonable-looking fashion, and that they are substantially narrower than the confidence intervals for the random walk model. The SES model assumes that the series is somewhat "more predictable" than does the random walk model.

An SES model is actually a special case of an ARIMA model, so the statistical theory of ARIMA models provides a sound basis for calculating confidence intervals for the SES model. In particular, an SES model is an ARIMA model with one nonseasonal difference, an MA(1) term, and no constant term, otherwise known as an "ARIMA(0,1,1) model without constant". The MA(1) coefficient in the ARIMA model corresponds to the quantity 1- α in the SES model. For example, if you fit an ARIMA(0,1,1) model without constant to the series analyzed here, the estimated MA(1) coefficient turns out to be 0.7029, which is almost exactly one minus 0.2961.

It is possible to add the assumption of a non-zero constant linear trend to an SES model. To do this, just specify an ARIMA model with one nonseasonal difference and an MA(1) term with a constant, i.e., an ARIMA(0,1,1) model with constant. The long-term forecasts will then have a trend which is equal to the average trend observed over the entire estimation period. You cannot do this in conjunction with seasonal adjustment, because the seasonal adjustment options are disabled when the model type is set to ARIMA. However, you can add a constant long-term exponential trend to a simple exponential smoothing model (with or without seasonal adjustment) by using the inflation adjustment option in the Forecasting procedure. The appropriate "inflation" (percentage growth) rate per period can be estimated as the slope coefficient in a linear trend model fitted to the data in conjunction with a natural logarithm transformation, or it can be based on other, independent information concerning long-term growth prospects. (Return to top of page.)

Brown's Linear (i.e., double) Exponential Smoothing


The SMA models and SES models assume that there is no trend of any kind in the data (which is usually OK or at least not-too-bad for 1-step-ahead forecasts when the data is relatively noisy), and they can be modified to incorporate a constant linear trend as shown above. What about short-term trends? If a series displays a varying rate of growth or a cyclical pattern that stands out clearly against the noise, and if there is a need to forecast more than 1 period ahead, then estimation of a local trend might also be an issue. The simple exponential smoothing model can be generalized to obtain a linear exponential smoothing (LES) model that computes local estimates of both level and trend.

The simplest time-varying trend model is Brown's linear exponential smoothing model, which uses two different smoothed series that are centered at different points in time. The forecasting formula is based on an extrapolation of a line through the two centers. (A more sophisticated version of this model, Holt’s, is discussed below.)

The algebraic form of Brown’s linear exponential smoothing model, like that of the simple exponential smoothing model, can be expressed in a number of different but equivalent forms. The "standard" form of this model is usually expressed as follows: Let S' denote the singly-smoothed series obtained by applying simple exponential smoothing to series Y. That is, the value of S' at period t is given by:

(Recall that, under simple exponential smoothing, this would be the forecast for Y at period t+1.) Then let S" denote the doubly-smoothed series obtained by applying simple exponential smoothing (using the same α ) to series S':


4.7: Fitting Exponential Models to Data - Mathematics

As you saw in the world record times data example, linear (line) fits are not always the best way to fit data sets. This lesson will explore how to find exponential fits to various data sets.

We will save the world record times analysis as an exercise for later. Instead, the first data set you will analyze is that of the population of the U.S..

Population growth is a major concern for many countries around the world. These countries fear that one day their population will become so large that they will not be able to provide the basic necessities, such as food and shelter, to their people. Some countries such as China have implemented policies that limit the amount of children a family can have. Other countries in the world, however, have no such policies and their populations continue to spiral.

These are a few of the reasons that it is important to study population growths. In this lesson, we will explore a method of modeling a population by an exponential equation.

This is a data set of the population of the United States since 1805, listed every 10 years. If we could find a model to represent this data then we could predict the population of the U.S. at any time we like. Here is a plot of the U.S. population from 1815-1975. We will use every tenth year starting form 1815 (so 1815, 1825, 1835, etc..) as our data. Also, we will designate 1815 as year 0, so 1825 will be year 1, 1835 as year 2, etc. We do this to make the numbers a little easier to work with in our analysis. So, our data will be years from 0 to 16 and the corresponding population of the United States.

On the plot, the years are plotted on the x-axis and the population on the y.

In this lesson we will use an exponential function to fit two related data sets. To do this, we will use statistics to find the exponential curve that best fits the data.

Each of the following sections should be done in the order presented. They will each have example problems that should be worked by the student. It is important that each section is understood since they build upon each other. This will all lead to a data analysis of a problem that the student will perform on their own. If you need some refreshing on exponentials and logarithms, it is highly recommended to view the REVIEW section and do the exercises.

Note: The following sections can be done independently of the ones above. They are meant as activities to demonstrate understanding of the concepts after completing the other sections. However, there are more detailed instructions for those who do not complete the other sections.


Exponential Fit Details

This VI uses the iterative general Least Square method and the Levenberg-Marquardt method to fit data to an exponential curve of the general form described by the following equation:

where x is the input sequence X, a is amplitude, b is damping, and c is offset. This VI finds the values of a, b, and c that best fit the observations (X, Y).

The following equation specifically describes the exponential curve resulting from the exponential fit algorithm:

If the noise of Y is Gaussian distributed, use the Least Square method. The following illustration shows the exponential fit result using this method.

When you use the Least Square method, this VI finds the amplitude, damping, and offset of the exponential model by minimizing the residue according to the following equation:

where N is the length of Y, wi is the i th element of Weight, fi is the i th element of Best Exponential Fit, and yi is the i th element of Y.

The Least Absolute Residual and Bisquare methods are robust fitting methods. Use these methods if outliers in the observations exist. The following illustration compares the fit results of the Least Square, Least Absolute Residual, and Bisquare fitting methods. In most cases, the Bisquare method is less sensitive to outliers than the Least Absolute Residual method.

When you use the Least Absolute Residual method, this VI finds the amplitude, damping, and offset of the exponential model by minimizing the residue according to the following equation:

When you use the Bisquare method, this VI obtains the amplitude, damping, and offset using an iterative process, as shown in the following illustration, and calculates the residue using the same formula as the Least Square method.


Note: This answer has been completely re-written from the original, which was flawed in several ways (thanks for the commenters for highlighting these). I hope this new answer is correct.

You need a model to fit to the data. Without knowing the full details of your model, let's say that this is an exponential growth model, which one could write as: y = a * e r*t

Where y is your measured variable, t is the time at which it was measured, a is the value of y when t = 0 and r is the growth constant. We want to estimate a and r.

This is a non-linear problem because we want to estimate the exponent, r. However, in this case we can use some algebra and transform it into a linear equation by taking the log on both sides and solving (remember logarithmic rules), resulting in: log(y) = log(a) + r * t

We can visualise this with an example, by generating a curve from our model, assuming some values for a and r:

So, for this case, we could explore two possibilies:

  • Fit our non-linear model to the original data (for example using nls() function)
  • Fit our "linearised" model to the log-transformed data (for example using the lm() function)

Which option to choose (and there's more options), depends on what we think (or assume) is the data-generating process behind our data.

Let's illustrate with some simulations that include added noise (sampled from a normal distribution), to mimic real data. Please look at this StackExchange post for the reasoning behind this simulation (pointed out by Alejo Bernardin's comment).

For the additive model, we could use nls() , because the error is constant across t. When using nls() we need to specify some starting values for the optimization algorithm (try to "guessestimate" what these are, because nls() often struggles to converge on a solution).

Using the coef() function we can get the estimates for the two parameters. This gives us OK estimates, close to what we simulated (a = 10 and r = 0.1).

You could see that the error variance is reasonably constant across the range of the data, by plotting the residuals of the model:

For the multiplicative error case (our y_mult simulated values), we should use lm() on log-transformed data, because the error is constant on that scale instead.

To interpret this output, remember again that our linearised model is log(y) = log(a) + r*t, which is equivalent to a linear model of the form Y = β0 + β1 * X, where β0 is our intercept and β1 our slope.

Therefore, in this output (Intercept) is equivalent to log(a) of our model and t is the coefficient for the time variable, so equivalent to our r. To meaningfully interpret the (Intercept) we can take its exponential ( exp(2.39448488) ), giving us

10.96, which is quite close to our simulated value.

It's worth noting what would happen if we'd fit data where the error is multiplicative using the nls function instead:

Now we over-estimate a and under-estimate r (Mario Reutter highlighted this in his comment). We can visualise the consequence of using the wrong approach to fit our model:

We can see how the lm() fit to log-transformed data was substantially better than the nls() fit on the original data.

You can again plot the residuals of this model, to see that the variance is not constant across the range of the data (we can also see this in the graphs above, where the spread of the data increases for higher values of t):


3 Answers 3

In the equation (1), there is only one exponential function, not two : $M_z(t)=M_z(0) e^ <-t/T_1>+ M_0 (1-e^<-t/T_1>) ag<1>.$ $M_z(t)=(M_z(0)-M_0) e^ <-t/T_1>+ M_0 .$

Supposing that $y=ae^+c$ is a convenient model, the only difficulty is to find a good approximate for $b$.

I haven't Matlab at hand, so using another tool, I found : $bsimeq -0.075$

The change of variable $X=e^<-0.075 x>$ leads to the linear equation : $y=aX+c$ Then, an usual linear regression will give you the approximates of $a$ and $c$.

In order to answer to the questions raised by Merin in the comments section, the procedure of regression (with integral equation) published in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales is shown below.

Be careful, the notations of the parameters $a,b,c$ are not the same as above.

This method isn't very accurate if the number of points is too low, because the numerical integration for $S_k$ cannot be accurate with only few points (this is the case with 9 points). But anyways, the result could be used as an excellent initial value for a further non-linear regression, if necessary.


Exponential Regression using a Linear Model

Sometimes linear regression can be used with relationships that are not inherently linear, but can be made to be linear after a transformation. In particular, we consider the following exponential model:

Taking the natural log (see Exponentials and Logs) of both sides of the equation, we have the following equivalent equation:

This equation has the form of a linear regression model (where I have added an error term ε):

Observation: Since αe β(x+1) = αe βx · e β , we note that an increase in x of 1 unit results in y being multiplied by e β .

Observation: A model of the form ln y = βx + δ is referred to as a log-level regression model. Clearly, any such model can be expressed as an exponential regression model of form y = αe βx by setting α = e δ .

Example 1: Determine whether the data on the left side of Figure 1 fits with an exponential model.

Figure 1 – Data for Example 1 and log transform

The table on the right side of Figure 1 shows ln y (the natural log of y) instead of y. We now use the Regression data analysis tool to model the relationship between ln y and x.

Figure 2 – Regression data analysis for x vs. ln y from Example 1

The table in Figure 2 shows that the model is a good fit and the relationship between ln y and x is given by

Applying e to both sides of the equation yields

We can also see the relationship between x and y by creating a scatter chart for the original data and choosing Layout > Analysis|Trendline in Excel and then selecting the Exponential Trendline option. We can also create a chart showing the relationship between x and ln y and use Linear Trendline to show the linear regression line (see Figure 3).

Figure 3 – Trend lines for Example 1

As usual we can use the formula y = 14.05∙(1.016) x described above for prediction. Thus if we want the y value corresponding to x = 26, using the above model we get ŷ =14.05∙(1.016) 26 = 21.35.

We can get the same result using Excel’s GROWTH function, as described below.

Excel Functions: Excel supplies two functions for exponential regression, namely GROWTH and LOGEST.

LOGEST is the exponential counterpart to the linear regression function LINEST described in Testing the Slope of the Regression Line. Once again you need to highlight a 5 × 2 area and enter the array function =LOGEST(R1, R2, TRUE, TRUE), where R1 = the array of observed values for y (not ln y) and R2 is the array of observed values for x, and then press Ctrl-Shft-Enter. LOGEST doesn’t supply any labels and so you will need to enter these manually.

Essentially LOGEST is simply LINEST using the mapping described above for transforming an exponent model into a linear model. For Example 1 the output for LOGEST(B6:B16, A6:A16, TRUE, TRUE) is as in Figure 4.

Figure 4 – LOGEST output for data in Example 1

GROWTH is the exponential counterpart to the linear regression function TREND described in Method of Least Squares. For R1 = the array containing the y values of the observed data and R2 = the array containing the x values of the observed data, GROWTH(R1, R2, x) = EXP(a) * EXP(b)^x where EXP(a) and EXP(b) are as defined from the LOGEST output described above (or alternatively from the Regression data analysis). E.g., based on the data from Example 1, we have:

GROWTH(B6:B16, A6:A16, 26) = 21.35

which is the same result we obtained earlier using the Regression data analysis tool.

GROWTH can also be used to predict more than one value. In this case, GROWTH(R1, R2, R3) is an array function where R1 and R2 are as described above and R3 is an array of x values. The function returns an array of predicted y values for the x values in R3 based on the model determined by the values in R1 and R2.

Observation: Note that GROWTH(R1, R2, R3) = EXP(TREND(R1, R2, LN(R3)))


C Program for Linear/Exponential Curve Fitting

In some engineering works or scientific experiments, a certain number of data are available in a fixed interval, but they may not sufficient. A data lying in between the interval may be required. In order to find such data, a function or curve needs to be fitted using available data to get the required data through easy and convenient way.

Such technique of approximation of given data into curve which may be liner or of higher degree or exponential is known as curve fitting. It is based on the principle of least square.

A number of manipulations of data are required in curve fitting problems which take a long time to solve and are quite laborious. In order to simplify such calculations using programming approach, here I have presented source code in for linear and exponential curve fitting in C with sample output.

The working procedure of C program for curve fitting (in general) as linear equation is as follows:

  • When the program is executed, it asks for value of number of data, n.
  • Then, the user has to input the values of x and corresponding y. In the program, x and y are defined as array. Therefore, x and y are input using for loop.
  • After that, the program calculates the sum of x, y, xy, x 2 etc.
  • Since the data are required to be approximated to linear equation i.e y = ax + b , the values of a and b are to be calculated which is performed by using following formula:

The working principle of curve fitting C program as exponential equation is also similar to linear but this program first converts exponential equation into linear equation by taking log on both sides as follows:


4.7: Fitting Exponential Models to Data - Mathematics

Linearization is not always an effective method however. Sometimes, the model equation is sufficiently complicated so that no linearization trick exists. For example the logistic model

is highly non-linear in all three parameters P0, K, and r. There is no obvious way to use logarithms or algebraic manipulations to "linearize" the problem of doing a logistic fit.

In other cases, even though a linearization can be found, the fit found using the linearization is worse than one could obtain by fitting the model by even trial and error. To illustrate this point, consider again the power-function fit to the small data set that you did in Part 3. By fitting a straight line to the log-log plot of the data, you should have found the corresponding power function

which yielded the sum of squares of residuals S = 84.3 for the data.

One can easily find a much better fit. The power function

yields a sum of squares of residuals S = 7.20 for the same data. In Part 5, we will show that this power function does, in fact, yield the minimum value of S. In the figure below, the fits of these two power functions are compared. The fit from Part 3 is labeled "log-log fit" and the optimal fit is labeled "power fit".

Below, we show a comparison of the residual plots of these two models. The "log-log" model fits the data very poorly at the right end of the plot.

    Explain how it can be that the power-function fit we got by fitting the log-log data can be worse than some other power function fit. After all, we did do a least squares fit. The sum of squares S should be minimized, shouldn't it?

y = 0.921 e 0.999t
that was simply chosen to pass though the first and last points of the small data set. Show residual plots of both exponential fits together in the same figure. What do you learn from the plot?


Some code:

This is a restriction of the SIR model which models $R_0 = frac<eta>$ where $frac<1>$ is the period how long somebody is sick (time from Infected to Recovered) but that may not need to be the time that somebody is infectious. In addition, the compartment models is limited since the age of patients (how long one has been sick) is not taken into account and each age should be considered as a separate compartment.

But in any case. If the numbers from wikipedia are meaningfull (they may be doubted) then only 2% of the active/infected recover daily, and thus the $gamma$ parameter seems to be small (no matter what model you use).

You might be experiencing numerical issues due to the very large population size $N$ , which will force the estimate of $eta$ to be very close to zero. You could re-parameterise the model as egin &= -eta [1.5ex] &= eta - gamma I [1.5ex] &= gamma I end

This will make the estimate of $eta$ larger so hopefully you'll get something more sensible out of the optimisation.


Watch the video: How to fit an exponential model to data (November 2021).