Deformable convolutional networks J Dai*, H Qi*, Y Xiong*, Y Li*, G Zhang*, H Hu, Y Wei (* co-first author) International Conference on Computer Vision, 2017 | 6688 | 2017 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 1845 | 2023 |
Picking Winning Tickets Before Training by Preserving Gradient Flow C Wang, G Zhang, R Grosse International Conference on Learning Representations, 2020 | 671 | 2020 |
Benchmarking Model-Based Reinforcement Learning T Wang, X Bao, I Clavera, J Hoang, Y Wen, E Langlois, S Zhang, G Zhang, ... | 463 | 2019 |
Functional Variational Bayesian Neural Networks S Sun*, G Zhang*, J Shi*, R Grosse (* indicates co-first author) International Conference on Learning Representations, 2019 | 302 | 2019 |
Three Mechanisms of Weight Decay Regularization G Zhang, C Wang, B Xu, R Grosse International Conference on Learning Representations, 2019 | 294 | 2019 |
Noisy Natural Gradient as Variational Inference G Zhang*, S Sun*, D Duvenaud, R Grosse (* indicates co-first author) International Conference on Machine Learning, 2018 | 247 | 2018 |
Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model G Zhang, L Li, Z Nado, J Martens, S Sachdeva, G Dahl, C Shallue, ... Advances in neural information processing systems, 2019 | 147 | 2019 |
Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks G Zhang, J Martens, R Grosse Advances in Neural Information Processing Systems, 2019 | 145 | 2019 |
EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis C Wang, R Grosse, S Fidler, G Zhang International Conference on Machine Learning, 2019 | 129 | 2019 |
On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach Y Wang*, G Zhang*, J Ba (* indicates co-first author) International Conference on Learning Representations, 2020 | 116 | 2020 |
Differentiable Compositional Kernel Learning for Gaussian Processes S Sun, G Zhang, C Wang, W Zeng, J Li, R Grosse International Conference on Machine Learning, 2018 | 90 | 2018 |
Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization G Zhang, Y Wang, L Lessard, R Grosse International Conference on Artificial Intelligence and Statistics (AISTATS), 2022 | 61 | 2022 |
An empirical study of stochastic gradient descent with structured covariance noise Y Wen, K Luk, M Gazeau, G Zhang, H Chan, J Ba International Conference on Artificial Intelligence and Statistics, 3621-3631, 2020 | 55* | 2020 |
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise G Zhang, K Hsu, J Li, C Finn, R Grosse Advances in Neural Information Processing Systems, 2021 | 34 | 2021 |
Deep transformers without shortcuts: Modifying self-attention for faithful signal propagation B He, J Martens, G Zhang, A Botev, A Brock, SL Smith, YW Teh arXiv preprint arXiv:2302.10322, 2023 | 32 | 2023 |
A Unified Analysis of First-Order Methods for Smooth Games via Integral Quadratic Constraints G Zhang, X Bao, L Lessard, R Grosse Journal of Machine Learning Research, 2021 | 32 | 2021 |
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers G Zhang, A Botev, J Martens International Conference on Learning Representations, 2022 | 29 | 2022 |
Eigenvalue Corrected Noisy Natural Gradient J Bae, G Zhang, R Grosse Neural Information Processing Systems (Bayesian Deep Learning Workshop), 2018 | 24 | 2018 |
On the suboptimality of negative momentum for minimax optimization G Zhang, Y Wang International Conference on Artificial Intelligence and Statistics, 2021 | 22 | 2021 |