A convergence theory for deep learning via over-parameterization Z Allen-Zhu, Y Li, Z Song International Conference on Machine Learning, 242-252, 2019 | 1180 | 2019 |
Learning and generalization in overparameterized neural networks, going beyond two layers Z Allen-Zhu, Y Li, Y Liang Advances in neural information processing systems 32, 2019 | 674 | 2019 |
Convergence analysis of two-layer neural networks with relu activation Y Li, Y Yuan Advances in neural information processing systems 30, 2017 | 637 | 2017 |
Learning overparameterized neural networks via stochastic gradient descent on structured data Y Li, Y Liang Advances in neural information processing systems 31, 2018 | 586 | 2018 |
A theoretical analysis of NDCG type ranking measures Y Wang, L Wang, Y Li, D He, TY Liu Conference on learning theory, 25-54, 2013 | 561 | 2013 |
A latent variable model approach to pmi-based word embeddings S Arora, Y Li, Y Liang, T Ma, A Risteski Transactions of the Association for Computational Linguistics 4, 385-399, 2016 | 513* | 2016 |
Lora: Low-rank adaptation of large language models EJ Hu, Y Shen, P Wallis, Z Allen-Zhu, Y Li, S Wang, L Wang, W Chen arXiv preprint arXiv:2106.09685, 2021 | 370 | 2021 |
An alternative view: When does SGD escape local minima? B Kleinberg, Y Li, Y Yuan International conference on machine learning, 2698-2707, 2018 | 270 | 2018 |
Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations Y Li, T Ma, H Zhang Conference On Learning Theory, 2-47, 2018 | 257 | 2018 |
Towards explaining the regularization effect of initial large learning rate in training neural networks Y Li, C Wei, T Ma Advances in Neural Information Processing Systems 32, 2019 | 226 | 2019 |
Linear algebraic structure of word senses, with applications to polysemy S Arora, Y Li, Y Liang, T Ma, A Risteski Transactions of the Association for Computational Linguistics 6, 483-495, 2018 | 217 | 2018 |
Towards understanding ensemble, knowledge distillation and self-distillation in deep learning Z Allen-Zhu, Y Li arXiv preprint arXiv:2012.09816, 2020 | 190 | 2020 |
Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees Y Luo, H Xu, Y Li, Y Tian, T Darrell, T Ma arXiv preprint arXiv:1807.03858, 2018 | 189 | 2018 |
Sparks of artificial general intelligence: Early experiments with gpt-4 S Bubeck, V Chandrasekaran, R Eldan, J Gehrke, E Horvitz, E Kamar, ... arXiv preprint arXiv:2303.12712, 2023 | 185 | 2023 |
On the convergence rate of training recurrent neural networks Z Allen-Zhu, Y Li, Z Song Advances in neural information processing systems 32, 2019 | 161 | 2019 |
What can resnet learn efficiently, going beyond kernels? Z Allen-Zhu, Y Li Advances in Neural Information Processing Systems 32, 2019 | 156 | 2019 |
Neon2: Finding local minima via first-order oracles Z Allen-Zhu, Y Li Advances in Neural Information Processing Systems 31, 2018 | 132 | 2018 |
LazySVD: Even faster SVD decomposition yet without agonizing pain Z Allen-Zhu, Y Li Advances in neural information processing systems 29, 2016 | 124 | 2016 |
Much faster algorithms for matrix scaling Z Allen-Zhu, Y Li, R Oliveira, A Wigderson 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS …, 2017 | 112 | 2017 |
Backward feature correction: How deep learning performs deep learning Z Allen-Zhu, Y Li arXiv preprint arXiv:2001.04413, 2020 | 104 | 2020 |