A linear model is a type of regression model that assumes a linear relationship between the input features and the output variable. The general formula for a linear model is:
$$ y=w_0+w_1x_1+w_2x_2+…+w_nx_n+ε $$
where y is the output variable, $x_1,x_2,...,x_n$ are the input features, $w_0,w_1,w_2,...,w_n$ are the coefficients, and ε is the error term. The goal of linear regression is to estimate the values of the coefficients $w_0,w_1,w_2,...,w_n$ that minimize the sum of squared errors between the predicted values and the actual values.
The given formula can also be written as:
$$ y=W \cdot X+ε $$
where y is the output variable, W is the vector of coefficients ($W = [w_0,w_1, w_2, ..., w_n]$), X is the matrix of input features ($X = [x_1, x_2, ..., x_n]$), and ε is the error term. The goal of linear regression is to estimate the values of the coefficients W that minimize the sum of squared errors between the predicted values and the actual values.
A cost function is used to evaluate the performance of a machine learning algorithm by measuring the difference between the predicted output and the actual output. The goal of the algorithm is to minimize the cost function by adjusting the model's parameters. By minimizing the cost function, the predicted values will become closer to the actual values, resulting in a more accurate model.
The formula for a Squared error cost function in a linear regression model is:
$$
J(\vec{w}, b)=\frac{1}{2m}\sum_{i=1}^m(f_{(\vec{w}, b)}(x^{(i)})-y^{(i)})^2
$$
Where $J(\vec{w}, b)$ is the cost function, m is the number of training examples, $f_{(\vec{w}, b)}(x^{(i)})$ is the predicted output for the $i^{th}$ training example, $y^{(i)}$ is the actual output for the $i^{th}$ training example, and $(\vec{w}, b)$ are the model parameters.
Gradient descent is a popular optimization algorithm that is used to minimize the cost function in a machine learning model. The idea behind gradient descent is to iteratively adjust the model's parameters in the direction of the negative gradient of the cost function. By doing so, the algorithm can find the values of the parameters that minimize the cost function.
The general formula for gradient descent is:
$$ w_j=w_j-α\frac{∂J(\vec{w}, b)}{∂w_j} $$