The Stochastic Gradient Method is the most commonly used method for training neural networks. This talk will provide the formal basis that underlies the definition of a gradient as direction of steepest ascent. Very general and natural properties of resulting differential equations, so-called gradient systems, will be outlined. In the context of learning, the gradient method is used for the optimization of a loss function. However, as learning is data driven, a stochastic version of the gradient method has to be applied, which leads to the Stochastic Gradient Method, the main subject of this talk.