Learn on PengiOpenStax Algebra and TrigonometryChapter 4: Linear Functions

Lesson 4.3 : Fitting Linear Models to Data

New Concept Explore how to represent data relationships using scatter plots. You'll learn to find the "line of best fit" through linear regression, creating a model to analyze trends and make predictions about real world scenarios.

Section 1

πŸ“˜ Fitting Linear Models to Data

New Concept

Explore how to represent data relationships using scatter plots. You'll learn to find the "line of best fit" through linear regression, creating a model to analyze trends and make predictions about real-world scenarios.

What’s next

Up next, you'll work through interactive examples, drawing scatter plots and using a graphing utility to find the line of best fit for various datasets.

Section 2

Drawing and Interpreting Scatter Plots

Property

A scatter plot is a graph of plotted points that may show a relationship between two sets of data. If the relationship is from a linear model, or a model that is nearly linear, the professor can draw conclusions using his knowledge of linear functions.

Examples

  • Plotting a person's age against their hours of sleep per night might show a negative trend, where older people tend to sleep less.
  • A scatter plot of the number of hours spent studying for a test versus the score received would likely show a positive trend.
  • Plotting a person's shoe size against the number of books they read in a year would likely show no relationship, with points scattered randomly.

Explanation

Think of a scatter plot as a visual detective tool. It plots dots for two sets of data, like study time and grades, to help you see if they're related by revealing a pattern or trend, such as a line or curve.

Section 3

Finding a Line of Best Fit

Property

Once we recognize a need for a linear function to model data, one way to approximate our linear function is to sketch the line that seems to best fit the data. Then we can extend the line until we can verify the y-intercept. We can approximate the slope of the line by extending it until we can estimate the riserun\frac{\text{rise}}{\text{run}}. This gives an equation of the form y=mx+by = mx + b.

Examples

  • On a scatter plot of a seedling's height over time, you could draw a line that passes through (Day 2, 4 cm) and (Day 8, 16 cm). The slope is m=16βˆ’48βˆ’2=126=2m = \frac{16-4}{8-2} = \frac{12}{6} = 2.
  • If the line from the previous example appears to cross the y-axis at 0, the y-intercept is b=0b=0. The equation for the seedling's height hh on day dd is h(d)=2dh(d) = 2d.
  • Using the model h(d)=2dh(d) = 2d, you can estimate the height on Day 5 would be h(5)=2(5)=10h(5) = 2(5) = 10 cm.

Explanation

Eyeballing a line of best fit means drawing a straight line that passes as close as possible to all the data points. This line acts as a simple summary of the data's trend, giving you a general equation to work with.

Section 4

Interpolation and Extrapolation

Property

Different methods of making predictions are used to analyze data.

  • The method of interpolation involves predicting a value inside the domain and/or range of the data.
  • The method of extrapolation involves predicting a value outside the domain and/or range of the data.
  • Model breakdown occurs at the point when the model no longer applies.

Examples

  • If you have data on a car's value from years 1 to 5, predicting its value at year 3 is interpolation.
  • Using the same car data, predicting its value at year 10 is extrapolation. The linear trend may not hold that long.
  • A model predicting a child's shoe size might work from ages 4 to 12, but it would fail if extrapolated to age 40. This is an example of model breakdown.

Explanation

Interpolation is making a safe prediction within the known range of your data. Extrapolation is a riskier guess made outside your data's range, where the trend might failβ€”a point known as model breakdown.

Section 5

Finding the Line of Best Fit Using a Graphing Utility

Property

While eyeballing a line works reasonably well, there are statistical techniques for fitting a line to data that minimize the differences between the line and data values. One such technique is called least squares regression. To find the best fit line using linear regression:

  1. Enter the input in List 1 (L1).
  2. Enter the output in List 2 (L2).
  3. On a graphing utility, select Linear Regression (LinReg).

Examples

  • To model the relationship between a pizza's diameter and its price, enter diameters in L1 and prices in L2 of a graphing utility.
  • After entering the data, selecting the LinReg function might give you the equation P(d)=1.5dβˆ’2P(d) = 1.5d - 2, where dd is diameter and PP is price.
  • Using this model, you can predict that a 16-inch pizza would cost P(16)=1.5(16)βˆ’2=22P(16) = 1.5(16) - 2 = 22 dollars.

Explanation

Linear regression uses a calculator or computer to find the single best mathematical line to fit your data. It's far more precise than just drawing a line by hand and gives you a highly accurate model for predictions.

Section 6

Correlation Coefficient

Property

The correlation coefficient is a value, rr, between βˆ’1-1 and 11.

  • r>0r > 0 suggests a positive (increasing) relationship
  • r<0r < 0 suggests a negative (decreasing) relationship
  • The closer the value is to 0, the more scattered the data.
  • The closer the value is to 1 or βˆ’1-1, the less scattered the data is.

Examples

  • The relationship between the distance a car is driven and the amount of fuel used would have a strong positive correlation, with an rr value close to 1, like r=0.98r = 0.98.
  • Data comparing the number of hours a candle has been burning to its remaining height would show a strong negative correlation, with an rr value close to -1, like r=βˆ’0.99r = -0.99.
  • A scatter plot of a person's height versus the last digit of their phone number would have a correlation coefficient very close to 0.

Explanation

The correlation coefficient, rr, is a score from -1 to 1 that measures how strong a linear relationship is. A score near 1 or -1 means the data points form an almost perfect line, while a score near 0 means no line.

Section 7

Fitting a Regression Line to a Set of Data

Property

Once we determine that a set of data is linear using the correlation coefficient, we can use the regression line to make predictions. A regression line is a line that is closest to the data in the scatter plot, which means that only one such line is a best fit for the data.

Examples

  • A coffee shop's profit PP is modeled by P(c)=2.5cβˆ’50P(c) = 2.5c - 50, where cc is the number of customers. They can predict that with 100 customers, the profit will be P(100)=2.5(100)βˆ’50=200P(100) = 2.5(100) - 50 = 200 dollars.
  • Using a model for a phone's battery life BB in hours, B(t)=βˆ’10t+100B(t) = -10t + 100, where tt is hours of screen time. You can predict the battery will be at 50% after t=5t=5 hours.
  • A regression model for home prices is V(y)=5000yβˆ’9900000V(y) = 5000y - 9900000, where yy is the year. To find when the value will reach 200,000 dollars, you solve 200000=5000yβˆ’9900000200000 = 5000y - 9900000, which gives the year y=2020y=2020.

Explanation

Once you have your regression line, you can use its equation as a predictive tool. This allows you to forecast future results or estimate missing values, turning your data into valuable insights about trends.

Book overview

Jump across lessons in the current chapter without opening the full course modal.

Continue this chapter

Chapter 4: Linear Functions

  1. Lesson 1

    Lesson 4.1: Linear Functions

  2. Lesson 2

    Lesson 4.2 : Modeling with Linear Functions

  3. Lesson 3Current

    Lesson 4.3 : Fitting Linear Models to Data

Lesson overview

Expand to review the lesson summary and core properties.

Expand

Section 1

πŸ“˜ Fitting Linear Models to Data

New Concept

Explore how to represent data relationships using scatter plots. You'll learn to find the "line of best fit" through linear regression, creating a model to analyze trends and make predictions about real-world scenarios.

What’s next

Up next, you'll work through interactive examples, drawing scatter plots and using a graphing utility to find the line of best fit for various datasets.

Section 2

Drawing and Interpreting Scatter Plots

Property

A scatter plot is a graph of plotted points that may show a relationship between two sets of data. If the relationship is from a linear model, or a model that is nearly linear, the professor can draw conclusions using his knowledge of linear functions.

Examples

  • Plotting a person's age against their hours of sleep per night might show a negative trend, where older people tend to sleep less.
  • A scatter plot of the number of hours spent studying for a test versus the score received would likely show a positive trend.
  • Plotting a person's shoe size against the number of books they read in a year would likely show no relationship, with points scattered randomly.

Explanation

Think of a scatter plot as a visual detective tool. It plots dots for two sets of data, like study time and grades, to help you see if they're related by revealing a pattern or trend, such as a line or curve.

Section 3

Finding a Line of Best Fit

Property

Once we recognize a need for a linear function to model data, one way to approximate our linear function is to sketch the line that seems to best fit the data. Then we can extend the line until we can verify the y-intercept. We can approximate the slope of the line by extending it until we can estimate the riserun\frac{\text{rise}}{\text{run}}. This gives an equation of the form y=mx+by = mx + b.

Examples

  • On a scatter plot of a seedling's height over time, you could draw a line that passes through (Day 2, 4 cm) and (Day 8, 16 cm). The slope is m=16βˆ’48βˆ’2=126=2m = \frac{16-4}{8-2} = \frac{12}{6} = 2.
  • If the line from the previous example appears to cross the y-axis at 0, the y-intercept is b=0b=0. The equation for the seedling's height hh on day dd is h(d)=2dh(d) = 2d.
  • Using the model h(d)=2dh(d) = 2d, you can estimate the height on Day 5 would be h(5)=2(5)=10h(5) = 2(5) = 10 cm.

Explanation

Eyeballing a line of best fit means drawing a straight line that passes as close as possible to all the data points. This line acts as a simple summary of the data's trend, giving you a general equation to work with.

Section 4

Interpolation and Extrapolation

Property

Different methods of making predictions are used to analyze data.

  • The method of interpolation involves predicting a value inside the domain and/or range of the data.
  • The method of extrapolation involves predicting a value outside the domain and/or range of the data.
  • Model breakdown occurs at the point when the model no longer applies.

Examples

  • If you have data on a car's value from years 1 to 5, predicting its value at year 3 is interpolation.
  • Using the same car data, predicting its value at year 10 is extrapolation. The linear trend may not hold that long.
  • A model predicting a child's shoe size might work from ages 4 to 12, but it would fail if extrapolated to age 40. This is an example of model breakdown.

Explanation

Interpolation is making a safe prediction within the known range of your data. Extrapolation is a riskier guess made outside your data's range, where the trend might failβ€”a point known as model breakdown.

Section 5

Finding the Line of Best Fit Using a Graphing Utility

Property

While eyeballing a line works reasonably well, there are statistical techniques for fitting a line to data that minimize the differences between the line and data values. One such technique is called least squares regression. To find the best fit line using linear regression:

  1. Enter the input in List 1 (L1).
  2. Enter the output in List 2 (L2).
  3. On a graphing utility, select Linear Regression (LinReg).

Examples

  • To model the relationship between a pizza's diameter and its price, enter diameters in L1 and prices in L2 of a graphing utility.
  • After entering the data, selecting the LinReg function might give you the equation P(d)=1.5dβˆ’2P(d) = 1.5d - 2, where dd is diameter and PP is price.
  • Using this model, you can predict that a 16-inch pizza would cost P(16)=1.5(16)βˆ’2=22P(16) = 1.5(16) - 2 = 22 dollars.

Explanation

Linear regression uses a calculator or computer to find the single best mathematical line to fit your data. It's far more precise than just drawing a line by hand and gives you a highly accurate model for predictions.

Section 6

Correlation Coefficient

Property

The correlation coefficient is a value, rr, between βˆ’1-1 and 11.

  • r>0r > 0 suggests a positive (increasing) relationship
  • r<0r < 0 suggests a negative (decreasing) relationship
  • The closer the value is to 0, the more scattered the data.
  • The closer the value is to 1 or βˆ’1-1, the less scattered the data is.

Examples

  • The relationship between the distance a car is driven and the amount of fuel used would have a strong positive correlation, with an rr value close to 1, like r=0.98r = 0.98.
  • Data comparing the number of hours a candle has been burning to its remaining height would show a strong negative correlation, with an rr value close to -1, like r=βˆ’0.99r = -0.99.
  • A scatter plot of a person's height versus the last digit of their phone number would have a correlation coefficient very close to 0.

Explanation

The correlation coefficient, rr, is a score from -1 to 1 that measures how strong a linear relationship is. A score near 1 or -1 means the data points form an almost perfect line, while a score near 0 means no line.

Section 7

Fitting a Regression Line to a Set of Data

Property

Once we determine that a set of data is linear using the correlation coefficient, we can use the regression line to make predictions. A regression line is a line that is closest to the data in the scatter plot, which means that only one such line is a best fit for the data.

Examples

  • A coffee shop's profit PP is modeled by P(c)=2.5cβˆ’50P(c) = 2.5c - 50, where cc is the number of customers. They can predict that with 100 customers, the profit will be P(100)=2.5(100)βˆ’50=200P(100) = 2.5(100) - 50 = 200 dollars.
  • Using a model for a phone's battery life BB in hours, B(t)=βˆ’10t+100B(t) = -10t + 100, where tt is hours of screen time. You can predict the battery will be at 50% after t=5t=5 hours.
  • A regression model for home prices is V(y)=5000yβˆ’9900000V(y) = 5000y - 9900000, where yy is the year. To find when the value will reach 200,000 dollars, you solve 200000=5000yβˆ’9900000200000 = 5000y - 9900000, which gives the year y=2020y=2020.

Explanation

Once you have your regression line, you can use its equation as a predictive tool. This allows you to forecast future results or estimate missing values, turning your data into valuable insights about trends.

Book overview

Jump across lessons in the current chapter without opening the full course modal.

Continue this chapter

Chapter 4: Linear Functions

  1. Lesson 1

    Lesson 4.1: Linear Functions

  2. Lesson 2

    Lesson 4.2 : Modeling with Linear Functions

  3. Lesson 3Current

    Lesson 4.3 : Fitting Linear Models to Data