An in-depth analysis and derivation of extremum conditions based on gradient information

. This paper explores the necessary conditions for extremum in both constrained and unconstrained problems by extracting fundamental principles of constraint conditions. We provide a precise geometric understanding of the Lagrange Multiplier, prioritizing analytical insight. Beginning with a geometric interpretation of the gradient, we leverage the expansion of functions and their images to comprehend extremum and detail the Lagrangian derivation process.We expand the base vectors of the constraint surface into those of the full space and use a transition matrix to assess the function's extremum. This demonstrates how the second derivative matrix is transformed into its full-space representation to discern extrema in optimization problems. Additionally, we introduce incremental variables to optimize the second-order derivative matrix in full space, providing a novel perspective to solve extremal necessary conditions.


Introduction
Since the introduction of Dantzig's simplex method in 1947, optimization has undergone significant development.Novel techniques continue to emerge, and practical applications are constantly evolving, particularly with the growth of internet software technology.Optimization problems now play a pivotal role in Machine Learning and Deep Learning.The primary goal of solving an optimization problem is to identify the optimal solution of the objective function within the constraints associated with it.In nonlinear programming, for example, we need to solve the gradient of the objective function to locate the extremum.If the optimization problem is constrained, we must identify the extremum of the objective function within the predefined constraints.Thus, finding the extremum of a function plays a critical role in solving optimization problems.
Despite the abundance of algorithms and their various applications, many learners struggle with optimization problems due to a lack of understanding the essence of constraints and extrema.Additionally, there is a lack of comprehensive comprehension regarding the necessary conditions for extrema and their derivation.Our paper addresses these issues by examining the derivation of the Lagrange multiplier method as a case study and conducting a thorough examination of the conditions for extrema.
The Lagrange multiplier method helps determine local extrema of a multi-variable function subject to constraints and transforms an optimization problem with "d" variables and "k" constraints into a system of equations with "d+k" variables.Our paper provides a comprehensive analysis and mathematical derivation of the geometric interpretation of gradients, expansion of multivariate functions, extremum conditions, and constrained optimization problems.By approaching the problem from a differential perspective, we derive the necessary conditions for extrema of multivariate functions, incorporating knowledge related to quadratic forms and the Hessian matrix.Furthermore, through the derivation of the Lagrangian, we present a comparative analysis of the necessary conditions for extrema in unconstrained and constrained problems, advocating for an understanding of the essence of the Lagrange multiplier from both the perspectives of the full space and the constraint surface.

Geometric significance of the gradient
The Hamilton operator , referred to as the nabla operator, is defined as an operator in vector calculus.It represents the vector aggregate of partial derivatives for a physical quantity in coordinate directions.
We call vector as gradient of the function We observe that d = (d 1 , d 2 , ⋯ , d  ) (3) The gradient extends the concept of the derivative to functions with multiple variables.Just as the derivative measures the rate of change of a single-variable function, the gradient is a vector containing the partial derivatives of a multivariable function, representing its rate of change in each variable.It provides a comprehensive description of how the function changes in every direction within its input space, similar to the role of the derivative in single-variable calculus, which explains instantaneous change at a specific point.
By Cauchy-Schwarz inequality We define the cosine between  and d to be Equation ( 6) reveals several properties of gradients.When the modulus of √∑ d is constant, a smaller angle between two vectors d and corresponds to a larger value for d.This means that the function increases most rapidly in the direction of the gradient and decreases most rapidly in the opposite direction.Moreover, the rate of change is perpendicular to the gradient's direction.
For a multivariate function ℝ n → ℝ , if we want further information about the second-order derivatives, we must differentiate the function twice.From this, we can construct the Hessian matrix by combining the second-order partial derivatives of each independent variable.
Hessian matrix contains details about the gradients' changes across each independent variable and serves as an extension of the gradient in terms of depth or higher-order differentiation.
Assume  0 is a local minimumand there are no constraints.If  0 is a local extremum, then the first derivative test implies that at this point, the derivative of () in any direction must be zero.In other words, the directional derivative of () at  0 in any direction  is zero.To put it mathematically: ( 0 ) =  (8) This condition indicates that the gradient is orthogonal to all vectors in the domain, meaning it must be zero vector.If the gradient were not zero, it would imply that there exists a direction in which the function is increasing or decreasing, contradicting the assumption that  0 is a local extremum.
When  = 0,in order to determine the sufficient condition of function, we have to conduct a second derivative test.
By Taylor expansion, for sufficiently small ( > 0), any nonzero vector ,we have: Where  0 is the expansion point, ( 0 ) = 0 is the gradient, and ( 0 ) is the Hessian matrix.
If there are constraint conditions, the independent variable can only be active within the constraint space.To achieve extremal under such constraints, it is necessary to ensure that the variation rate of the function is zeroed in all feasible directions of independent movement.
Provided constraint condition   ( 1 ,  2 , ⋯ ,   ) = 0,  = 1,2, ⋯ ,  (12) Since vectors perpendicular to the isosurfaces are gradient vectors, then vectors perpendicular to the constraint space Eq(3.1) are The final constraint space is the intersection of these m constraint spaces, and since the variables can only move within the constraint space that perpendicular to the gradient direction, the linear superposition of the gradients of each constraint space is Then the above system of equations can be reformulated as

Novel perspective of the necessary conditions for second-order derivatives
We have previously discussed the distinctions between unconstrained and constrained conditions, as well as the second-order derivative tests conducted to examine the sufficiency of extrema conditions.Now, let us shift our focus to addressing second-order derivative problems in the context of constrained conditions.[3] According to our understanding of equation (8) and equation ( 16), we need to perform a second order derivative test The necessary condition for the function to attain an extremum at points where the gradient is zero is that  2 () + ∑    2   ()  =1 must be negative definite.However, given that the independent variable  can only operate within the final constraint surface, it is sufficient to satisfy the positive definiteness or positive semi-definiteness of  2 () + ∑    2  Based on the KKT conditions of constrained optimization problems and convex optimization theory, we can draw the following conclusion: if  * is a local minimum point of a convex optimization problem, then there exists a vector operator  *  such that [4] Since the second-order derivative matrixin the constraint space is obtained after the transformation in the form of  ′ over the full space, the extremal case of the optimization problem can be determined through  ′ .Going one step further and introducing incremental variables Under full space  ′ can be optimized as [6] Where  is a symmetric matrix of order( − ) × ( − ),  is a matrix of order ( − ) × , and  is a matrix of order  × .The necessary conditions for obtaining the extrema only require  to be semi-positive definite or semi-negative definite.

Conclusion
Correct, the necessary condition for the optimization problem with constraints is that the matrix  2 () + ∑    2   ()  =1 is negative definite at the critical point where the gradient is zero.However, because the variables are constrained to the final constraint surface, it is sufficient to have a positive definite or semi-positive definite of  2 () + ∑    2   ()  =1 within the linear subspace of the final constraint surface.The Lagrange function is used to convert a constrained problem into an unconstrained problem, and constructing the Lagrange function can be viewed as one of the processes of expanding the base vectors of the constraint surface to the base vectors of the full space.By introducing the transition matrix , we can expand the base vectors of the constraint surface to a complete set of base vectors in the whole space and judge the extremum via the matrix  ′ .
In retrospect, whether dealing with constrained or unconstrained extremum problems, we can approach them from a geometric perspective.By establishing a linear space based on the given constraints, we can simplify complex extremum problems by interpreting gradients in geometric terms.