Gpipe: Efficient training of giant neural networks using pipeline parallelism Y Huang, Y Cheng, A Bapna, O Firat, D Chen, M Chen, HJ Lee, J Ngiam, ... Advances in neural information processing systems 32, 2019 | 1685 | 2019 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 1579 | 2023 |
The best of both worlds: Combining recent advances in neural machine translation MX Chen, O Firat, A Bapna, M Johnson, W Macherey, G Foster, L Jones, ... arXiv preprint arXiv:1804.09849, 2018 | 535 | 2018 |
Massively multilingual neural machine translation in the wild: Findings and challenges N Arivazhagan, A Bapna, O Firat, D Lepikhin, M Johnson, M Krikun, ... arXiv preprint arXiv:1907.05019, 2019 | 405 | 2019 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024 | 394 | 2024 |
Gmail Smart Compose: Real-Time Assisted Writing MX Chen, BN Lee, G Bansal, Y Cao, S Zhang, J Lu, J Tsay, Y Wang, ... Proceedings of the 25th ACM SIGKDD International Conference on Knowledge …, 2019 | 238 | 2019 |
Lingvo: a modular and scalable framework for sequence-to-sequence modeling J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ... arXiv preprint arXiv:1902.08295, 2019 | 209 | 2019 |
Training deeper neural machine translation models with transparent attention A Bapna, MX Chen, O Firat, Y Cao, Y Wu arXiv preprint arXiv:1808.07561, 2018 | 124 | 2018 |
Leveraging monolingual data with self-supervision for multilingual neural machine translation A Siddhant, A Bapna, Y Cao, O Firat, M Chen, S Kudugunta, ... arXiv preprint arXiv:2005.04816, 2020 | 80 | 2020 |
Building machine translation systems for the next thousand languages A Bapna, I Caswell, J Kreutzer, O Firat, D van Esch, A Siddhant, M Niu, ... arXiv preprint arXiv:2205.03983, 2022 | 71 | 2022 |
Unsupervised deep haar scattering on graphs X Chen, X Cheng, S Mallat Advances in Neural Information Processing Systems 27, 2014 | 64 | 2014 |
Predicting a user's next cell with supervised learning based on channel states X Chen, F Mériaux, S Valentin 2013 IEEE 14th workshop on signal processing advances in wireless …, 2013 | 56 | 2013 |
Deep Haar scattering networks X Cheng, X Chen, S Mallat Information and Inference: A Journal of the IMA 5 (2), 105-133, 2016 | 43 | 2016 |
Towards the next 1000 languages in multilingual machine translation: Exploring the synergy between supervised and self-supervised learning A Siddhant, A Bapna, O Firat, Y Cao, MX Chen, I Caswell, X Garcia arXiv preprint arXiv:2201.03110, 2022 | 30 | 2022 |
Music genre classification using multiscale scattering and sparse representations X Chen, PJ Ramadge 2013 47th Annual Conference on Information Sciences and Systems (CISS), 1-6, 2013 | 29 | 2013 |
Towards end-to-end in-image neural machine translation E Mansimov, M Stern, M Chen, O Firat, J Uszkoreit, P Jain arXiv preprint arXiv:2010.10648, 2020 | 22 | 2020 |
Faster transformer decoding: N-gram masked self-attention C Chelba, M Chen, A Bapna, N Shazeer arXiv preprint arXiv:2001.04589, 2020 | 17 | 2020 |
Collaborative representation, sparsity or nonlinearity: What is key to dictionary based classification? X Chen, PJ Ramadge 2014 IEEE International Conference on Acoustics, Speech and Signal …, 2014 | 13 | 2014 |
Rapid domain adaptation for machine translation with monolingual data M Mahdieh, MX Chen, Y Cao, O Firat arXiv preprint arXiv:2010.12652, 2020 | 8 | 2020 |
GPipe: Easy Scaling with Micro-Batch Pipel ine Parallelism Y Huang, Y Cheng, A Bapna, O Firat, MX Chen, D Chen, HJ Lee, J Ngiam, ... proceeding of Computer Science> Computer Vision and Pattern Recognition, 2019 | 8 | 2019 |