Samyam Rajbhandari

Cited by

	All	Since 2019
Citations	4559	4447
h-index	21	18
i10-index	23	21

2300

1150

575

1725

2014201520162017201820192020202120222023202414 19 9 22 33 78 127 300 717 2294 922

Public access

View all

4 articles

3 articles

available

not available

Based on funding mandates

Co-authors

He YuxiongMicrosoft ResearchVerified email at microsoft.com
Minjia ZhangUniversity of Illinois at Urbana-ChampaginVerified email at illinois.edu
P SadayappanProfessor, Kahlert School of Computing, University of UtahVerified email at cs.utah.edu
Sriram KrishnamoorthyGoogleVerified email at google.com
Pai-Wei LaiMetaVerified email at meta.com
Michael CarbinMassachusetts Institute of TechnologyVerified email at csail.mit.edu
Wei WenResearch Scientist, Meta AIVerified email at fb.com
Robert J. HarrisonStony Brook UniversityVerified email at stonybrook.edu
Louis-Noel PouchetColorado State UniversityVerified email at colostate.edu
Jinsung KimChung-Ang UniversityVerified email at cau.ac.kr
Karol KowalskiPacific Northwest National LaboratoryVerified email at pnnl.gov
Edward ValeevVirginia TechVerified email at vt.edu

Samyam Rajbhandari

Microsoft Artificial Intelligence and Research, Ohio State University

No verified email - Homepage

Deep Learning High Performance Computing Systems


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ...	1130	2023
ZeRO: Memory optimizations toward training trillion parameter models S Rajbhandari, J Rasley, O Ruwase, Y He SC20: International Conference for High Performance Computing, Networking …, 2020	759	2020
Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters J Rasley, S Rajbhandari, O Ruwase, Y He Proceedings of the 26th ACM SIGKDD International Conference on Knowledge …, 2020	661	2020
Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... arXiv preprint arXiv:2201.11990, 2022	487	2022
{Zero-offload}: Democratizing {billion-scale} model training J Ren, S Rajbhandari, RY Aminabadi, O Ruwase, S Yang, M Zhang, D Li, ... 2021 USENIX Annual Technical Conference (USENIX ATC 21), 551-564, 2021	227	2021
Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning S Rajbhandari, O Ruwase, J Rasley, S Smith, Y He Proceedings of the international conference for high performance computing …, 2021	202	2021
Learning intrinsic sparse structures within long short-term memory W Wen, Y He, S Rajbhandari, M Zhang, W Wang, F Liu, B Hu, Y Chen, ... arXiv preprint arXiv:1709.05027, 2017	150	2017
Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale S Rajbhandari, C Li, Z Yao, M Zhang, RY Aminabadi, AA Awan, J Rasley, ... International conference on machine learning, 18332-18346, 2022	140	2022
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ... SC22: International Conference for High Performance Computing, Networking …, 2022	121	2022
Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, and Bryan Catanzaro S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... Using deepspeed and megatron to train megatron-turing nlg 530b, a large …, 2022	119	2022
{DeepCPU}: Serving {RNN-based} Deep Learning Models 10x Faster M Zhang, S Rajbhandari, W Wang, Y He 2018 USENIX Annual Technical Conference (USENIX ATC 18), 951-965, 2018	114	2018
1-bit adam: Communication efficient large-scale training with adam’s convergence speed H Tang, S Gan, AA Awan, S Rajbhandari, C Li, X Lian, J Liu, C Zhang, ... International Conference on Machine Learning, 10118-10129, 2021	69	2021
Scalable and efficient moe training for multitask multilingual models YJ Kim, AA Awan, A Muzio, AFC Salinas, L Lu, A Hendy, S Rajbhandari, ... arXiv preprint arXiv:2109.10465, 2021	58	2021
Neural network training performance optimization framework TA Chilimbi, O Ruwase, S Rajbhandari, M Carbin, Y He US Patent App. 14/986,186, 2017	39	2017
A communication-optimal framework for contracting distributed tensors S Rajbhandari, A Nikam, PW Lai, K Stock, S Krishnamoorthy, ... SC'14: Proceedings of the International Conference for High Performance …, 2014	35	2014
Optimizing CNNs on multicores for scalability, performance and goodput S Rajbhandari, Y He, O Ruwase, M Carbin, T Chilimbi ACM SIGARCH Computer Architecture News 45 (1), 267-280, 2017	30	2017
A framework for load balancing of tensor contraction expressions via dynamic task partitioning PW Lai, K Stock, S Rajbhandari, S Krishnamoorthy, P Sadayappan Proceedings of the International Conference on High Performance Computing …, 2013	29	2013
On fusing recursive traversals of Kd trees S Rajbhandari, J Kim, S Krishnamoorthy, LN Pouchet, F Rastello, ... Proceedings of the 25th International Conference on Compiler Construction …, 2016	28	2016
1-bit LAMB: communication efficient large-scale large-batch training with LAMB’s convergence speed C Li, AA Awan, H Tang, S Rajbhandari, Y He 2022 IEEE 29th International Conference on High Performance Computing, Data …, 2022	26	2022
Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales Z Yao, RY Aminabadi, O Ruwase, S Rajbhandari, X Wu, AA Awan, ... arXiv preprint arXiv:2308.01320, 2023	25	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors