对一阶机器学习优化算法ADMM、Coordinate Descent和Gradient Descent的paper加以简单整理。
ADMM
Coordinate Descent
parallel
- Parallel Coordinate Descent for L1-Regularized Loss Minimization
- Parallel Dual Coordinate Descent Method for Large-scale Linear Classification in Multi-core Environments(SVM parallel)
Gradient Descent
variants
- Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- ADADELTA: AN ADAPTIVE LEARNING RATE METHOD
- ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION
- RMSprop(ppt)
- Unit Tests for Stochastic Optimization
parallel - Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
- Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning
- Large Scale Distributed Deep Networks(Downpour SGD)