Gradient Descent in LR

Gradient Descent is an optimization algorithm used to find the best values of slope (m) and intercept (b) in Linear Regression (LR). It helps minimize prediction error by continuously updating model parameters step by step.

Instead of calculating the best-fit line directly using formulas, Gradient Descent gradually learns the optimal line through iterations.

Why Gradient Descent is Needed

Suppose a regression model predicts values poorly.

Example:

Actual Marks	Predicted Marks
50	30
60	35
70	40

The prediction error is high.

Gradient Descent helps:

Reduce prediction error
Improve model accuracy
Find optimal values of m and b

Main Idea of Gradient Descent

Gradient Descent works like this:

1. Start with random values of m and b
2. Calculate prediction error
3. Update m and b
4. Repeat until error becomes very small

Linear Regression Equation

y = mx + b

Where:

y → Predicted output
x → Input feature
m → Slope
b → Intercept

Cost Function

Gradient Descent minimizes the Cost Function.

The most common cost function is:

Mean Squared Error (MSE)

MSE Formula

MSE = Σ(actual_y - predicted_y)^2 / n

Goal:

Minimize MSE

Important Terms

1. Learning Rate

Learning Rate controls:

How big each step should be

Small Learning Rate

Slow learning
More iterations

Large Learning Rate

May skip minimum point
Unstable learning

2. Iterations

Iterations represent:

How many times parameters are updated

More iterations usually improve learning.

Mathematical Example

Dataset

X	Y
1	2
2	4
3	6

Here:

X is the input value
Y is the actual output value

The goal is to find the best line for prediction.

Linear Regression Equation

predicted_y = mx + b

Where:

m = slope
b = intercept
x = input value
predicted_y = predicted output

Initial Values

Let us start with:

m = 0
b = 0
Learning Rate = 0.01
n = 3

Here:

m and b are initialized with zero
Learning Rate controls the size of each update step
n is the number of data points

Step 1: Calculate Predictions

Using:

predicted_y = mx + b

Since:

m = 0
b = 0

For X = 1:

predicted_y = (0 * 1) + 0 = 0

For X = 2:

predicted_y = (0 * 2) + 0 = 0

For X = 3:

predicted_y = (0 * 3) + 0 = 0

Step 2: Calculate Errors

Error = Actual Y - Predicted Y

X	Actual Y	Error
1	2	2
2	4	4
3	6	6

At the starting point, the errors are large because the model has not learned yet.

Step 3: Formula for Updating m and b

Gradient Descent updates m and b using these formulas:

m = m - LearningRate * dm

b = b - LearningRate * db

Where:

dm = derivative of error with respect to m
db = derivative of error with respect to b

Step 4: Calculate dm

Formula:

dm = (-2/n) * Σ[X * (Y - predicted_y)]

Substitute values:

dm = (-2/3) * [(1 * 2) + (2 * 4) + (3 * 6)]

Calculate inside the bracket:

(1 * 2) = 2
(2 * 4) = 8
(3 * 6) = 18

Now add them:

2 + 8 + 18 = 28

So:

dm = (-2/3) * 28

dm = -18.67

Step 5: Calculate db

Formula:

db = (-2/n) * Σ(Y - predicted_y)

Substitute values:

db = (-2/3) * (2 + 4 + 6)

Add the values:

2 + 4 + 6 = 12

So:

db = (-2/3) * 12

db = -8

Step 6: Update m

Formula:

m = m - LearningRate * dm

Substitute values:

m = 0 - (0.01 * -18.67)

m = 0 + 0.1867

m = 0.1867

Step 7: Update b

Formula:

b = b - LearningRate * db

Substitute values:

b = 0 - (0.01 * -8)

b = 0 + 0.08

b = 0.08

Updated Values After First Iteration

After one iteration of Gradient Descent:

m = 0.1867
b = 0.08

So the new prediction equation becomes:

predicted_y = 0.1867x + 0.08

Step 8: Check New Predictions

For X = 1:

predicted_y = (0.1867 * 1) + 0.08
predicted_y = 0.2667

For X = 2:

predicted_y = (0.1867 * 2) + 0.08
predicted_y = 0.4534

For X = 3:

predicted_y = (0.1867 * 3) + 0.08
predicted_y = 0.6401

New Prediction Table

X	Actual Y	New Prediction
1	2	0.2667
2	4	0.4534
3	6	0.6401

The predictions are still not perfect, but they have improved slightly from the initial prediction of 0.

What Happens Next?

Gradient Descent repeats the same process many times:

1. Calculate predictions
2. Calculate errors
3. Calculate dm and db
4. Update m and b
5. Repeat

After many iterations, m and b move closer to the best values.

For this dataset, the ideal line is:

predicted_y = 2x + 0

So after enough iterations:

m ≈ 2
b ≈ 0

In the first iteration, Gradient Descent changed:

m = 0 → 0.1867
b = 0 → 0.08

This means the model started learning from the data.

With repeated iterations, the model continues improving until the prediction error becomes minimum.

Visualization of Learning

Iteration 1 → High Error
Iteration 10 → Lower Error
Iteration 100 → Minimum Error

Python Example — Gradient Descent

import numpy as np

# Dataset
X = np.array([1, 2, 3])
Y = np.array([2, 4, 6])

# Initial values
m = 0
b = 0

# Learning rate
L = 0.01

# Iterations
epochs = 1000

n = len(X)

# Gradient Descent
for i in range(epochs):

    Y_pred = m * X + b

    # Derivatives
    dm = (-2/n) * sum(X * (Y - Y_pred))
    db = (-2/n) * sum(Y - Y_pred)

    # Update values
    m = m - L * dm
    b = b - L * db

print("Slope:", m)
print("Intercept:", b)

Expected Output

Slope ≈ 2
Intercept ≈ 0

Final Equation

y = 2x

What Gradient Descent Learned

The algorithm learned:

When X increases,
Y increases proportionally.

Types of Gradient Descent

Type	Description
Batch Gradient Descent	Uses entire dataset
Stochastic Gradient Descent	Uses one sample at a time
Mini-Batch Gradient Descent	Uses small batches

Advantages of Gradient Descent

Works for large datasets
Efficient optimization
Widely used in Deep Learning
Helps minimize prediction error

Limitations

Requires proper learning rate
Can be slow for complex problems
May get stuck in local minima

Important Points

1. Gradient Descent minimizes the cost function.

2. Learning Rate controls step size.

3. Gradient Descent updates slope and intercept iteratively.

4. MSE is commonly used as the cost function.

5. Gradient Descent is widely used in Machine Learning and Deep Learning.

Summary

Gradient Descent is an optimization algorithm used in Linear Regression to minimize prediction error by continuously updating slope and intercept values. It helps models learn the best-fit line step by step through iterative optimization.