# 2.1: Lines

## 1. Lines (definitions)

Everyone knows what a line is, but providing a rigorous definition proves to be a challenge.

[dfrac{y-b}{x-a}= m.]

## 2. The Slope Intercept Form of the equation of a Line

Given a point ((x_1,y_1)) and a slope (m), the equation of the line is

[y-y_1=m(x-x_1)]

## 3. Piecewise Linear Functions

A function is piecewise linear if it is made up of parts of lines

We graph this line by sketching the appropriate parts of each line on the same graph.

## 4. Applications

[y = -dfrac{1}{5} x + 156.]

Hint: We have the two points: ((0,32)) and ((100,212)).

Suppose that your company earned $30,000 five years ago and$35,000 three years ago. Assuming a linear growth model, how much will it earn this year?

My rental was bought for $204,000 three years ago. Depreciation is set so that the house depreciates linearly to zero in twenty years from the purchase of the house. If I plan to sell the house in twelve years for$250,000 and capital gains taxes are 28% of the difference between the purchase price and the depreciated value, what will my taxes be?

Wasabi restaurant must pay either a flat rate of $400 for rent or 5% of the revenue, whichever is larger. Come up with the equation of the function that relates rent as a function of revenue Larry Green (Lake Tahoe Community College) ## Crisis Connections & 2-1-1 At Crisis Connections, we play a critical role in the mental and behavioral well being of Washingtonians. By maintaining a database with over 6,000 unique programs and 1,600 agencies we are able to connect people to vital resources in their communities. Additionally, King County 2-1-1, 24-hour crisis line and Washington Recovery helplines are available 7 days a week for people before, during, or after a crisis. Throughout COVID-19, our services will remain essential as the needs of the community evolve. Since the state ordered stay healthy, stay at home mandate, we a spike in calls on our 2-1-1 lines that correlated with Governor Inslee’s restriction mandates, today, calls have increased 73% from 20,000 to over 35,000 calls a month. Going into the fifth month of Isolation people are experiencing suicidal thoughts and turning to substance use. Staff and volunteers report that callers are relapsing on drugs &amp alcohol domestic violence cases have increased and we are receiving more calls around severe depression and anxiety. Crisis Connections is committed to answering the needs of the community, and we have increased our ability to respond to a crisis. ## LINE APK 11.2.1 With the LINE app, users will be able to stay close to their families and friends. It provides a very handy and powerful messaging feature that comes with support for anything you want, including Emoji and unique stickers. LINE is also able to make video calls to other users that have the application, free of charge, and it can be used to place calls to other numbers and to people who are not LINE users. To make this experience complete, LINE also integrates a social network that lets users post updates and to keep informed. It&rsquos also possible to follow some official channels as well. #### Features: • Send messages and make video calls for free to other LINE users • Tell other people what you&rsquore doing in the timeline • Support for group calls up to 200 people • Store messages, photos, and videos in Keep • Make international calls at preferential rates #### What's new in LINE APK 11.2.1: For more information on downloading LINE to your phone, check out our guide: how to install APK files. ## Essential Quality is 2-1 morning line favorite for Belmont NEW YORK (AP) — Essential Quality was set Tuesday as a 2-1 favorite for the Belmont Stakes, which will be run Saturday without a horse from trainer Bob Baffert pending an investigation into Medina Spirit’s failed drug test after winning the Kentucky Derby. Preakness winner Rombauer and Essential Quality headline a field of eight horses for the third jewel of the Triple Crown. Baffert has no horses running at Belmont Park because the New York Racing Association suspended him in light of Medina Spirit’s positive drug test for a corticosteroid at the Derby. Essential Quality, who drew the No. 2 post position, went off as the Derby favorite and finished fourth. Brad Cox’s other Derby horse, Mandaloun, would be named the winner of that race if Medina Spirit is disqualified, but Cox opted to run only Essential Quality in the Belmont. “He’s a valid favorite for sure,” Hot Rod Charlie’s trainer, Doug O’Neill, said of Essential Quality. “He didn’t have the greatest of trips in the Derby and ran a dynamite fourth. You’ve got to get through him. The win goes through him, for sure.” Rombauer is the 3-1 second choice on the morning line. Jockey Flavien Prat left Rombauer despite winning the Preakness to honor a commitment to ride Hot Rod Charlie, who is the 7-2 third choice from the No. 4 position. “It’s a lot of respect that they’re giving Charlie,” O’Neill said. Hall of Famer John Velazquez will ride Rombauer, who drew the third post position just outside Essential Quality and No. 1 Bourbonic, who is back after skipping the Preakness. Bourbonic is 15-1, No. 5 France Go de Ina is 30-1, No. 6 Known Agenda is 6-1, No. 7 Rock Your World is 9-2 and No. 8 Overtook 20-1. “I think the key to the race is how much pace with Essential Quality, Hot Rod Charlie and Rock Your World,” said trainer Todd Pletcher, who has three of the eight horses in the race: Bourbonic, Known Agenda and Overtook. Pletcher’s longtime assistant, Michael McCarthy, won the Preakness with Rombauer in first Triple Crown race as a trainer. Pletcher joked, “The pressure’s all on him,” going into the Belmont. It might actually be on Cox given the lofty expectations on Essential Quality. Retired jockey Jerry Bailey said that considering the rough trip Essential Quality got in the Derby, the gray colt was the best horse in the race. “He is just a grinder,” said Bailey, who is now an NBC Sports analyst. “He’s built perfectly for this kind of race. His running style is exactly the style that you want for the Belmont Stakes. He just keeps coming. That he definitely has in his favor.” The 1 1/2-mile distance is the unknown. The Belmont is back to its traditional “test of the champion” distance on the giant Belmont Park oval known as “Big Sandy” after the race was shortened in 2020 because it led off the Triple Crown. “We’re kind of in the same boat as everybody here,” France Go de Ina exercise rider Masaki Tanako said through interpreter Kate Hunter. “No one’s run a mile and a half before, so we’re all on the same page.” If anyone but Rombauer wins the Belmont, it’ll mark the third consecutive year and fifth in the past eight with a different horse winning each of the Triple Crown races. “It should be an interesting race,” said Pletcher, who is looking for his fourth Belmont victory. Rebel’s Romance was a late defection from the field Tuesday morning because of a hind leg infection. “As we know in these big races, there’s just no point in going there with only 75% of a horse,” trainer Charlie Appleby said. “So we’ll give this one a swerve and regroup.” ## Essential Quality is 2-1 morning line favorite for Belmont NEW YORK — Essential Quality was set Tuesday as a 2-1 favorite for the Belmont Stakes, which will be run Saturday without a horse from trainer Bob Baffert pending an investigation into Medina Spirit’s failed drug test after winning the Kentucky Derby. Preakness winner Rombauer and Essential Quality headline a field of eight horses for the third jewel of the Triple Crown. Baffert has no horses running at Belmont Park because the New York Racing Association suspended him in light of Medina Spirit’s positive drug test for a corticosteroid at the Derby. Essential Quality, who drew the No. 2 post position, went off as the Derby favorite and finished fourth. Brad Cox’s other Derby horse, Mandaloun, would be named the winner of that race if Medina Spirit is disqualified, but Cox opted to run only Essential Quality in the Belmont. “He’s a valid favorite for sure,” Hot Rod Charlie’s trainer, Doug O’Neill, said of Essential Quality. “He didn’t have the greatest of trips in the Derby and ran a dynamite fourth. You’ve got to get through him. The win goes through him, for sure.” Rombauer is the 3-1 second choice on the morning line. Jockey Flavien Prat left Rombauer despite winning the Preakness to honor a commitment to ride Hot Rod Charlie, who is the 7-2 third choice from the No. 4 position. “It’s a lot of respect that they’re giving Charlie,” O’Neill said. Hall of Famer John Velazquez will ride Rombauer, who drew the third post position just outside Essential Quality and No. 1 Bourbonic, who is back after skipping the Preakness. Bourbonic is 15-1, No. 5 France Go de Ina is 30-1, No. 6 Known Agenda is 6-1, No. 7 Rock Your World is 9-2 and No. 8 Overtook 20-1. “I think the key to the race is how much pace with Essential Quality, Hot Rod Charlie and Rock Your World,” said trainer Todd Pletcher, who has three of the eight horses in the race: Bourbonic, Known Agenda and Overtook. Pletcher’s longtime assistant, Michael McCarthy, won the Preakness with Rombauer in first Triple Crown race as a trainer. Pletcher joked, “The pressure’s all on him,” going into the Belmont. It might actually be on Cox given the lofty expectations on Essential Quality. Retired jockey Jerry Bailey said that considering the rough trip Essential Quality got in the Derby, the gray colt was the best horse in the race. “He is just a grinder,” said Bailey, who is now an NBC Sports analyst. “He’s built perfectly for this kind of race. His running style is exactly the style that you want for the Belmont Stakes. He just keeps coming. That he definitely has in his favor.” The 1 1/2-mile distance is the unknown. The Belmont is back to its traditional “test of the champion” distance on the giant Belmont Park oval known as “Big Sandy” after the race was shortened in 2020 because it led off the Triple Crown. “We’re kind of in the same boat as everybody here,” France Go de Ina exercise rider Masaki Tanako said through interpreter Kate Hunter. “No one’s run a mile and a half before, so we’re all on the same page.” If anyone but Rombauer wins the Belmont, it’ll mark the third consecutive year and fifth in the past eight with a different horse winning each of the Triple Crown races. “It should be an interesting race,” said Pletcher, who is looking for his fourth Belmont victory. Rebel’s Romance was a late defection from the field Tuesday morning because of a hind leg infection. “As we know in these big races, there’s just no point in going there with only 75% of a horse,” trainer Charlie Appleby said. “So we’ll give this one a swerve and regroup.” Linux Professional Institute (LPI) is the global certification standard and career support organization for open source professionals. With more than 200,000 certification holders, it's the world’s first and largest vendor-neutral Linux and open source certification body. LPI has certified professionals in over 180 countries, delivers exams in multiple languages, and has hundreds of training partners. Our purpose is to enable economic and creative opportunities for everybody by making open source knowledge and skills certification universally accessible. Spot a mistake or want to help improve this page? Please let us know. © Copyright 1999-2020 The Linux Professional Institute Inc. All rights reserved. #### YOU CAN IMPACT THE LIVES OF OKLAHOMANS IN A MEANINGFUL WAY. #### VOLUNTEER Whether educating children about suicide prevention or providing compassionate crisis intervention over the phone, volunteers give the vital support required to help Oklahomans in need. MORE #### DONATE Another way you can help the lives of Oklahomans in need is through financial support. Learn more about joining our leadership giving society, Answering the Call. MORE #### EVENTS There is no better way to get to know us and help people in need at the same time than by participating in one of our events. MORE ## What does 2-1-1 provide? #### Dial 2-1-1 For Community Services Whether you need help or want to provide help, 2-1-1 is the best way to locate hundreds of services in your community. When you dial 2-1-1, you will be connected to a trained professional, who can provide referrals to valuable health and human services in your area. ### Community Services Basic human needs: food pantries, shelters, rent or utility assistance Physical & mental health resources: Medicaid, Medicare, prenatal care, children's health insurance programs, crisis intervention, support groups, counseling, alcohol & drug rehabilitation. Work initiatives: educational & vocational training programs, English as a second language classes, job training, General Educational Development (GED) preparation, financial & transportation assitance. Support for seniors & those with disabilities: Agencies on aging, centers for independent living, adult day care, meals at home, respite care, home health care, transportation & recreation. Support for children, youth & families: After-school programs, tutoring, mentorship programs, family resource centers, protective services, counseling, early childhood learning programs, child care referral centers, & recreation. Much, much more: Just give us a call & see what we can do for you. #### Search here for information and resources in any of Arizona’s 15 counties ### What is 2-1-1 Arizona? The 2-1-1 Arizona Information and Referral Services program was founded in 1964 as Community Information and Referral Services and incorporated as a private, nonprofit 501(c)(3) organization in 1979. Solari acquired the program in 2017. 2-1-1 Arizona Information and Referral Service operates 24 hours per day, seven days per week and every day of the year. Live-operator service is available at all times in English and Spanish and assistance is available in other languages via real-time interpreter services. 2-1-1 Arizona operators will help individuals and families find resources that are available to them locally, throughout the state, and provide connections to critical services that can improve – and save – lives, including: • Supplemental Food and Nutrition Programs • Shelter and Housing Options • Utilities Assistance • Emergency Information and Disaster Relief • Employment and Education Opportunities • Services for Veterans • Healthcare, vaccination and health epidemic information • Addiction Prevention and Rehabilitation Programs • Re-entry help for ex-offenders • Support groups for individuals with mental illnesses or special needs • A safe, confidential path out of physical and/or emotional domestic violence ### 2-1-1 Arizona Programs In addition to our statewide information and referral service, 2-1-1 Arizona provides multiple programs to residents throughout the state. All programs listed are accessible by dialing 2-1-1. 2-1-1 Arizona serves as our state’s COVID-19 Hotline, providing essential information on the virus, testing sites, vaccine information, and much more. This program supports individuals and families experiencing homelessness in the designated counties by connecting them to locally-available resources and services. Call us for more information. 2-1-1 Arizona is proud to offer a variety of free transportation options to the community through our 2-1-1 Transportation Hotline. After determining the option that best suits your needs, please call the 2-1-1 Transportation Hotline at 1-855-345-6432, 8 a.m. to 5 p.m. daily, to schedule your ride! A free, federally-funded program that helps people and communities recover from the effects of the COVID-19 pandemic through short-term interventions that provide emotional support and connections to community resources. Maricopa County Department of Public Health created FindHelpPHX.org to give residents an easy way to find health and social services. Arizona is one of the hottest places on earth from May to September. Heat-related illnesses are common during the summer, and some heat-related illnesses can even be fatal. Below you can find resources and tips to stay hydrated and safe in the Arizona heat. You can also call 2-1-1 to speak to a specialist about heat related services in your area. If you are experiencing a medical emergency or any symptoms of heat-related illness, call 911 immediately. ## 2.1: Lines Regression analysis involves the study of the form and direction of the relationship between two or more variables. The main purpose of regression analysis is to predict the value of a dependent or response variable based on values of the independent or explanatory variables. Simple linear regression analysis involves the study of the linear or straight-line relationship between two numerical variables: the dependent variable and one numerical explanatory variable. Correlation analysis involves the study of the strength of the relationship between two variables. A supporting role of correlation analysis is to discover those explanatory variables that are strongly related to the response variable to improve the predictions made. For example, suppose we want to predict the number of hours it would take to perform an audit on a client. One explanatory variable might be be the dollar amount of client assets. Another explanatory variable might be the number of employees. After gathering and analyzing data we discover that the correlation between hours and assets is much higher than between hours and employees. In this case, we would be better off using assets as the explanatory variable. This set of module notes introduces techniques for presenting and describing simple linear regressions and correlations. Module 2.2 Notes describe how we test linear regressions for statistical significance and practical utility, and how the linear regression model can be used for prediction. The outline of steps to conduct a complete simple linear regression and correlation analysis is: This set of module notes will carry us through Steps 1 through 3 above. Module 2.2 Notes will cover Steps 4 through 7. In Module 3 we will expand this model to consider the relationship between the dependent variables and multiple independent variables, including nonlinear terms and categorical variables. Step 1: Hypothesize the Regression Model Relating the Dependent and Independent Variables The dependent or response variable, identified by the symbol Y, is the variable we wish to predict. The independent or explanatory variable, identified by the symbol X, is the predictor variable. In simple linear regression, we propose the following population straight-line model relating Y and X: B 1 = Slope = the amount of increase in Y (or decrease if For a particular observation, for example the "ith" observation, this equation becomes: This equation implies that each observation in a set of data has an actual Y value, an X value, a predicted Y value, and error which is the actual Y value minus the predicted Y value. In regression analysis, one of our objectives is to select those predictor variables that result in as little error as possible, recognizing there will always be some error in prediction. This equation is often referred to as the probabilistic model relating Y to X. The deterministic model is just the straight-line or prediction part without the actual value of Y and its error: In step two, we will fit a straight-line model based on sample data to estimate the above simple linear regression equation. That's enough theory. Let's go to step two, look at some data, and create the scatter diagram. Step 2: Gather Data and Describe the Form and Direction of the Relationship with a Scatter Diagram The example to illustrate simple linear regression analysis is about a audit company - that is, a company that is in the business of performing financial audits. This company maintains a very small internal workforce and thus relies of external auditors to perform client audits. The company would like a model to predict the number of external audit hours it would need to contract in order to do an audit. Such a model would be very helpful in budgeting and planning. Management believes that a good predictor variable would be client assets. In order to build the model, a sample of data must be gathered. Worksheet 2.1.1 shows the result of the sample. The first column, Assets, are values of the independent variable (this is the X variable) in thousands of dollars. The second column, ExtHours, contains values of the dependent variable (this is the Y variable) in hours. So, the first row of numbers represents an audit completed in the past for a client with assets of$ 3,200,000. The audit company had to contract for 700 external hours to perform the audit. Note that in regression analysis, every observation has two values, an X value and a Y value.

In the Assignment section of the Main Module 2 web page, you will see that the first item for Assignment 2 is entering the X and Y data in an Excel Spreadsheet. Hopefully, you can think of a good response variable from your work , service or home environment. Perhaps you would like to predict profit contribution, sales, or salary, or hours to complete a task. Once you determine what you would like to predict or understand, then pick a variable that you think explains or predicts your response variable. Perhaps labor cost is a good X (cost driver) to predict profit contribution (Y). Perhaps years of experience is a good X variable to predict salary (Y). Once you select your X and Y variables, try to collect 50 observations. In regression and correlation analysis, an observation involves an X and a Y value. For example, sales in month 1 were 334 units. Here, 1 is the value for X and 334 is the value for Y for the first observation. Another example, an employee in the database earns $50,000 (the value of Y for this sample observation) and has worked for 22 years (the value of X for this sample observation). Fifty observations is more than the minimum required, so you can get by with less if you have to. The minimum required for a two-variable regression model is 20 observations (10 observations per variable). The next task is to create the scatter diagram. In regression analysis, the scatter diagram is used to plot the independent variable on the X or horizontal axis, and the dependent variable on the Y or vertical axis. To produce a scatter diagram, highlight the X and Y data columns including the column titles. Then select the Chart Wizard on the Standard Toolbar, then XY Scatter, then respond to the dialog screen questions. It will take a couple of tries to get the hang of making scatter diagrams but after some practice you should be able to replicate the scatter diagram shown in Worksheet 2.1.2. In Assignment 2, the second item is for you to create a scatter diagram. Note that as I was going through the dialogue boxes, I used the opportunity to label the X and Y axis's, as well as give the diagram a title. This scatter diagram shows a positive form of relationship between X and Y, meaning that when X increases, Y increases. It appears that when X increases, Y increases at a constant rate, meaning that the form of the relationship is linear. A comment on page presentation. If you click on File on the Standard Toolbar, then Print Preview , you can see where the scatter diagram will appear on the worksheet page. If you want to move it, just click on any part of the white area of the diagram and click and drag the chart. If you want to change the shape of the chart, click on the chart again and note the squares along the borders of the chart. If you click and drag on the middle squares you can make the chart wider, narrower, longer or shorter. Note finally that when you click on any chart, the word Data changes to Chart on the Standard Toolbar so you can switch between data functions and chart functions. Let's summarize what we have learned thus far. Regression analysis includes the study of the form and direction of the relationship between dependent and independent variables. In this case, we have one dependent (Y) and one independent variable (X). The form of a relationship can be linear or curvilinear. The form in Worksheet 2.1.2. above happens to look like a linear relationship. Worksheet 2.1.3 illustrates a curvilinear relationship. Note with the curvilinear relationship, as assets increased initially, external audit hours remained relatively constant up to clients with assets of approximately$5,000,000. Then it appears that external hours increase at a slightly increasing rate from $5,000,000 to$9,000,000. We will see in Module 3 that this is curvature: Y increases at an increasing rate as X increases. Curvature also occurs when Y increases at a decreasing rate as X increases.

Before continuing with the example, let's summarize the direction component of the relationship. Our example in Worksheet 2.1.2 shows a positive direction. Worksheet 2.1.4 shows what a negative direction would look like.

In this worksheet, as assets increase, internal hours decrease: this describes a negative relationship between X and Y.

To describe the relationship between two variables, we look at the form (linear or curvilinear) and the direction (positive or negative) of the relationship. Linear form means that as X increases, Y increases or decreases at a constant rate. Positive direction means that Y increases when X increases and negative direction means that Y decreases when X increases.

The last component of the relationship between two variables is strength. We will talk about measuring strength in Step 3, as we need some numbers to do that.

Step 3: Determine the Simple Linear Regression Equation and Correlation Coefficient

Regression Coefficients
Our next step is to find values for b 0 and b 1 in the following simple linear regression equation:

This equation, based on sample data, is used to estimate the hypothesized population Eq. 2.1.3. Note I have made all of the symbols lower case to distinguish the sample equation from the population equations shown as Eq. 2.1.1. - Eq. 2.1.3. Some texts put hats ( ^ ) on the symbols in Eq. 2.1.4 to distinguish the sample equation from the population model. Our task is to estimate numerical values for the intercept, b 0 , and the slope, b 1 . These are called the regression parameters in the simple linear regression equation (the equation is also known as the least squares regression equation or the trend equation or simply the regression).

If you were a careful artist, you could take a ruler and draw a straight-line as close as possible to every point in Worksheet 2.1.2. Then, extend the left end of that line to the Y axis. The y value at the point where an extension of the line touches the Y axis is called the intercept, the value of y when x equals zero. Next, anywhere on the line, draw a horizontal line one unit long in the X direction. Now draw a vertical line to the regression equation. The length of the vertical line divided by the length of the horizontal line represents the amount of change in Y for the unit change in X. This is called the slope of the line. Don't be alarmed - we will let the computer do the "line drawing" to estimate the slope and the intercept - I just wanted to go over the concept.

Actually, the computer uses mathematics to solve equations to determine the value of the slope and intercept. The technique is called the least squares method of regression. It essentially involves trying to minimize the error (actual value of Y minus the predicted value of y) in the equation Sum (Y - y) 2 . To let Excel do the work, first make a copy of the scatter diagram to preserve the original. To copy the diagram, put the cursor anywhere in the white area of the the scatter diagram chart. When you click the left mouse button, the chart becomes highlighted (small squares or handles appear around the border of the chart). Now select Edit on the Standard Toolbar and Copy from the pulldown menu. Now move the cursor, select a new cell of the worksheet, and select Edit on the Standard Toolbar and Paste from the pulldown menu. You should get another copy of the scatter diagram.

Now select (highlight) the copy of the scatter diagram by clicking anywhere on the white chart surface and select Chart on the top menu bar. Note that this menu bar has the word Data instead of the word Chart unless you have highlighted a chart, such as the scatter diagram. Next select Add Trendline from the pulldown menu and you will get a dialog box. The default Linear trend/regression is what we want. Before selecting OK , select the Options Tab . Then select Display Equation and Display R-Square . You should get Worksheet 2.1.5, as shown below.

The least squares regression equation, or simply, the linear regression equation, is shown as:

After we finish Steps 1- 6, we will use this equation to make a prediction. To jump ahead, what if we want to predict the hours it will take to audit a company with $6,000,000 in assets. Looking at the Worksheet 2.1.5 regression line, if we go straight up from 6000 on the X axis, we touch the line at a y value a little over 1,000 hours. To be more accurate, we can substitute 6000 into Eq. 2.1.5 and get: Note carefully that I substituted 6000 into Eq. 2.1.6 rather than 6000000 since the original data was entered in thousands. However, before we use the equation for prediction we have to test it's practical and statistical utility (Steps 4 and 5). For now, let's be sure we understand how to i nterpret the equation. The intercept is 440.05. This means that the value of y (External Hours) is 440.05 when x (assets) equals zero. Now this is really just a theoretical point helpful in placing the equation on the scatter diagram. It is theoretical without practical value because we did not have any x (asset) values equal to zero in the original data. Some suggest that the intercept is like a fixed value - what we need to get started without any value for x at all. But to know this, we would have had to include observations where x in fact equals zero. Otherwise, we are just guessing. In fact, an ethical caution in regression is not to interpret the results of regression models outside of the range of the original data. Now let's look at the slope , which is 0.1. The slope is interpreted as follows: y (External Hours) is predicted to increase 0.1 when x (assets) increase by one. To make this a bit more practical, we can say that External Hours increase by 0.1 when Assets increase by$1,000 (since our data was is in thousands of dollars, 1 unit of x is equal to $1,000). Since the relationship is linear and the coefficients are proportional, we can also say that external hours increase by 1 when assets increase by$10,000 or external hours increase by 10 when assets increase by $100,000 or external hours increase by 100 when assets increase by$1,000,000. Now we have something! The firm should plan on 100 more external hours for every increase of $1,000,000. Caution as before : this interpretation only applies within the range of our data. We don't know what the slope is above$9,000,000 since we did not have any observations above $9,000,000. We do not extrapolate beyond the range of our data when making interpretations. The intercept in the regression equation is the value of y when x equals zero. It has no practical interpretation unless the regression model was built on data where some of the values of x were zero. The slope of the regression equation indicates the predicted change in y (increase if the slope is positive decrease if the slope is negative) for a one-unit increase in x. Regression equations are the most widely used statistical tools in business since they can be used to predict the value of a response variable, such as sales, based on a predictor variable. We discussed form and direction as important aspects of the relationship between the two variables. The strength of the relationship between two variables is also an important aspect to know about in business. Correlation Analysis Recall earlier that we said correlation analysis is used to measure the strength of the linear relationship between two quantitative variables. To find the correlation coefficient, we begin with the coefficient of determination, R 2 . Look back at Worksheet 2.1.5 and note the R 2 = 0.8173 or 0.82 on the scatter diagram. R-Square, or R 2 , is the symbol for the coefficient of determination. We will see its math later. For now, the interpretation of R 2 is simply the amount of sample variation in Y that is explained by X. For my example, we would say that client assets explain 82% of the sample variation in external hours. As you look at a scatter diagram you notice that the value of Y changes or varies for different values of X. Strongly related variables are those in which changes in X result in predictable changes in Y. In other words, X is explaining a large percent of the variation in Y. Weakly related variables, such as those with R 2 below 25%, suggest that changes in X do not result in predictable changes in Y. We will have more to say about R 2 when we get to Step 4 in Module 2.2 Notes. I'll close this brief introduction with the note that R 2 should be as close to 100% as possible in order for us to have models that are practically useful. A good general benchmark is that R 2 should be at least above 50%, although it should be noted that specific industries/service sectors may have their own traditional benchmarks for R 2 . The correlation coefficient, r, is the statistic commonly used to report the strength of a linear relationship between two variables. In fact, the word has crept into common English usage when we say something like, "there is a high correlation between how much I study and my GPA" (at least I hope we say something like that!). The correlation coefficient is simply the square root of R 2 . For this example, r = +0.904. This r of +0.904 represents a strong, positive, linear relationship between client assets and external hours. How do I get the direction ? By looking at the sign on the slope coefficient. If the sign is positive, r is positive, and vice-versa. Worksheet 2.1.4 shows a relationship in which the r would have a negative sign. How do I get the measure of strength ? That one is tougher but here are some benchmarks that are common in general business/service sectors (you may find different benchmarks in medical practice, psychology, and specific industries/service sectors, and so forth): There are two cautions with using the correlation coefficient. First, we can say that X and Y are strongly related, which implies that changes in X result in predictable changes in Y. But unless we do an experiment, we are cautioned against saying that X causes Y from an ethical perspective. Think about examples of this. The r between consumption of the alcohol beverage Scotch and donations to charitable organizations is very high, such as above a positive 0.90. We would not say that such consumption causes donations to increase, or reduced consumption causes donations to decrease because the causation variable is probably disposable personal income. When DPI goes up, donations and consumption go up. That being said, we can still rely on the value of r to select variables that have an impact or result in a change in Y, without having to do an experiment. That is, marketing executives in the Scotch industry can still pattern sales projections off projections of aggregate donations to charitable organizations - to make predictions, you do not have to prove causation. The second caution is to remember that r explains the strength of linear relationships. Look at the following example in Worksheet 2.1.6. The R 2 here is only 35% meaning that client assets now only explain 35% of the sample variation in external hours. This gives an r of +0.59, which borders on a weak relationship. In actuality, the relationship between client assets and external hours is indeed strong - but the strength lies in the curvilinear relationship between the two variables, not the linear relationship. More on that in Module 3. For now, just recognize that many people misapply the correlation coefficient to models that have curvilinear rather than linear form. A closing comment on correlation analysis. Since r is dimensionless and varies between -1 and +1, it can be thought of as a standardized measure of the strength of the linear relationship between two variables. Related to the correlation coefficient is covariance , a non-standardized measure of the strength of the linear relationship between two variables. The covariance is computed by multiplying the correlation coefficient by the product of the standard deviations of the two variables, thus mathematically defining the relationship. While the correlation coefficient is the more commonly used measure of the strength of the linear relationship between two variables, financial models such as used in portfolio theory incorporate covariance so you may see that statistic in a finance class. This closes Module 2.1 Notes. You should be able to get through Items 1 through 4 of Assignment 2 at this point. Outliers and Influential Variables Before we go to Module Notes 2.2, let me illustrate one last caution in Steps 1 - 3 that you may run into as you prepare for Assignment 2. Recall that we relied on the histogram in Module 1 to identify outliers to the distribution under examination. We can also have outliers in regression analysis. Let's look at a modified scatter diagram in Worksheet 2.1.7. This scatter diagram is similar to that in Worksheets 2.1.2 and 2.1.5 except that I changed the value of two of the observations. The observation with assets of just over$3,000,000 and external hours of 100 is well below the regression line. This would lead us to expect that it is an outlier to the regression model. When we get to Module Notes 2.2, we will look at a way to precisely determine if that observation is an outlier or not. We use the same rules as before - if an observation is more than 3 standard deviations from the regression line, it is an outlier.

There is one other observation that appears apart from the data. It is the observation with a value of fewer than 600 external hours and less than \$1,000,000 in assets. While this observation is separated from the data, it is quite close to the regression line. Thus, it is not an outlier to the regression model. However, since the point is separated from the data, we call it an influential observation. As in our study of descriptive statistics for individual variables in Module 1, outliers and influential variables should be identified and removed from the data set prior to numerical analysis. As before, sometimes outliers and influential observations suggest a need to stratify the data before further analysis sometimes outliers and influential observations are just individual events (sometimes even input errors!) that should be removed before further analysis.

Anderson, D., Sweeney, D., & Williams, T. (2001). Contemporary Business Statistics with Microsoft Excel. Cincinnati, OH: South-Western, Chapter 3 (Section 3.1) and Chapter 12 (through Section 12.8).