Gradient Descent in LR

Gradient Descent is an optimization algorithm used to find the best values of slope (m) and intercept (b) in Linear Regression (LR). It helps minimize prediction error by continuously updating model parameters step by step.

Instead of calculating the best-fit line directly using formulas, Gradient Descent gradually learns the optimal line through iterations.

Why Gradient Descent is Needed

Suppose a regression model predicts values poorly.

Example:

Actual Marks Predicted Marks
50 30
60 35
70 40

The prediction error is high.

Gradient Descent helps:

  • Reduce prediction error

  • Improve model accuracy

  • Find optimal values of m and b

Main Idea of Gradient Descent

Gradient Descent works like this:

1. Start with random values of m and b
2. Calculate prediction error
3. Update m and b
4. Repeat until error becomes very small

Linear Regression Equation

y = mx + b

Where:

  • y → Predicted output

  • x → Input feature

  • m → Slope

  • b → Intercept

Cost Function

Gradient Descent minimizes the Cost Function.

The most common cost function is:

Mean Squared Error (MSE)

MSE Formula

MSE = Σ(actual_y - predicted_y)^2 / n

Goal:

Minimize MSE

Important Terms

1. Learning Rate

Learning Rate controls:

How big each step should be

Small Learning Rate

Slow learning
More iterations

Large Learning Rate

May skip minimum point
Unstable learning

2. Iterations

Iterations represent:

How many times parameters are updated

More iterations usually improve learning.

Mathematical Example

Dataset

X Y
1 2
2 4
3 6

Here:

  • X is the input value

  • Y is the actual output value

The goal is to find the best line for prediction.

Linear Regression Equation

predicted_y = mx + b

Where:

  • m = slope

  • b = intercept

  • x = input value

  • predicted_y = predicted output

Initial Values

Let us start with:

m = 0
b = 0
Learning Rate = 0.01
n = 3

Here:

  • m and b are initialized with zero

  • Learning Rate controls the size of each update step

  • n is the number of data points

Step 1: Calculate Predictions

Using:

predicted_y = mx + b

Since:

m = 0
b = 0

For X = 1:

predicted_y = (0 * 1) + 0 = 0

For X = 2:

predicted_y = (0 * 2) + 0 = 0

For X = 3:

predicted_y = (0 * 3) + 0 = 0

Step 2: Calculate Errors

Error = Actual Y - Predicted Y
X Actual Y Predicted Y Error
1 2 0 2
2 4 0 4
3 6 0 6

At the starting point, the errors are large because the model has not learned yet.

Step 3: Formula for Updating m and b

Gradient Descent updates m and b using these formulas:

m = m - LearningRate * dm
b = b - LearningRate * db

Where:

  • dm = derivative of error with respect to m

  • db = derivative of error with respect to b

Step 4: Calculate dm

Formula:

dm = (-2/n) * Σ[X * (Y - predicted_y)]

Substitute values:

dm = (-2/3) * [(1 * 2) + (2 * 4) + (3 * 6)]

Calculate inside the bracket:

(1 * 2) = 2
(2 * 4) = 8
(3 * 6) = 18

Now add them:

2 + 8 + 18 = 28

So:

dm = (-2/3) * 28
dm = -18.67

Step 5: Calculate db

Formula:

db = (-2/n) * Σ(Y - predicted_y)

Substitute values:

db = (-2/3) * (2 + 4 + 6)

Add the values:

2 + 4 + 6 = 12

So:

db = (-2/3) * 12
db = -8

Step 6: Update m

Formula:

m = m - LearningRate * dm

Substitute values:

m = 0 - (0.01 * -18.67)
m = 0 + 0.1867
m = 0.1867

Step 7: Update b

Formula:

b = b - LearningRate * db

Substitute values:

b = 0 - (0.01 * -8)
b = 0 + 0.08
b = 0.08

Updated Values After First Iteration

After one iteration of Gradient Descent:

m = 0.1867
b = 0.08

So the new prediction equation becomes:

predicted_y = 0.1867x + 0.08

Step 8: Check New Predictions

For X = 1:

predicted_y = (0.1867 * 1) + 0.08
predicted_y = 0.2667

For X = 2:

predicted_y = (0.1867 * 2) + 0.08
predicted_y = 0.4534

For X = 3:

predicted_y = (0.1867 * 3) + 0.08
predicted_y = 0.6401

New Prediction Table

X Actual Y Old Prediction New Prediction
1 2 0 0.2667
2 4 0 0.4534
3 6 0 0.6401

The predictions are still not perfect, but they have improved slightly from the initial prediction of 0.

What Happens Next?

Gradient Descent repeats the same process many times:

1. Calculate predictions
2. Calculate errors
3. Calculate dm and db
4. Update m and b
5. Repeat

After many iterations, m and b move closer to the best values.

For this dataset, the ideal line is:

predicted_y = 2x + 0

So after enough iterations:

m ≈ 2
b ≈ 0

In the first iteration, Gradient Descent changed:

m = 0 → 0.1867
b = 0 → 0.08

This means the model started learning from the data.

With repeated iterations, the model continues improving until the prediction error becomes minimum.

Visualization of Learning

Iteration 1 → High Error
Iteration 10 → Lower Error
Iteration 100 → Minimum Error

Python Example — Gradient Descent

import numpy as np

# Dataset
X = np.array([1, 2, 3])
Y = np.array([2, 4, 6])

# Initial values
m = 0
b = 0

# Learning rate
L = 0.01

# Iterations
epochs = 1000

n = len(X)

# Gradient Descent
for i in range(epochs):

Y_pred = m * X + b

# Derivatives
dm = (-2/n) * sum(X * (Y - Y_pred))
db = (-2/n) * sum(Y - Y_pred)

# Update values
m = m - L * dm
b = b - L * db

print("Slope:", m)
print("Intercept:", b)

Expected Output

Slope ≈ 2
Intercept ≈ 0

Final Equation

y = 2x

What Gradient Descent Learned

The algorithm learned:

When X increases,
Y increases proportionally.

Types of Gradient Descent

Type Description
Batch Gradient Descent Uses entire dataset
Stochastic Gradient Descent Uses one sample at a time
Mini-Batch Gradient Descent Uses small batches

Advantages of Gradient Descent

  • Works for large datasets

  • Efficient optimization

  • Widely used in Deep Learning

  • Helps minimize prediction error

Limitations

  • Requires proper learning rate

  • Can be slow for complex problems

  • May get stuck in local minima

Important Points

1. Gradient Descent minimizes the cost function.

2. Learning Rate controls step size.

3. Gradient Descent updates slope and intercept iteratively.

4. MSE is commonly used as the cost function.

5. Gradient Descent is widely used in Machine Learning and Deep Learning.

Summary

Gradient Descent is an optimization algorithm used in Linear Regression to minimize prediction error by continuously updating slope and intercept values. It helps models learn the best-fit line step by step through iterative optimization.

Keywords

Gradient Descent, Gradient Descent in Linear Regression, Optimization Algorithm, Cost Function, Mean Squared Error, MSE, Learning Rate, Gradient Descent Algorithm, Linear Regression Optimization, Batch Gradient Descent, Stochastic Gradient Descent, Mini Batch Gradient Descent, Machine Learning Optimization, Regression Optimization, Gradient Descent using Python.

Check your knowledge

Quickly verify what you've learned from this tutorial.

Question 1

Which parameters are updated by Gradient Descent in Linear Regression?

Gradient Descent updates the slope (m) and intercept (b) to find the best-fit regression line.

Question 2

What does the Cost Function measure?

The Cost Function measures how far the model's predictions are from the actual values.

Question 3

Which Cost Function is most commonly used in Linear Regression?

MSE calculates the average squared difference between actual and predicted values.

Question 4

What is the role of the Learning Rate?

The Learning Rate determines how large or small each update step will be during optimization.

Question 5

In the mathematical example, what were the initial values of m and b?

The example starts with both slope and intercept initialized to zero.

Congratulations!

You've successfully mastered the knowledge check for "Gradient Descent in LR."

For more questions and practice, click the link below:

Practice More Questions
Previous Topic Gradient Descent Next Topic Multiple Linear Regression