Follow
He Yuxiong
He Yuxiong
Microsoft Research
Verified email at microsoft.com - Homepage
Title
Cited by
Cited by
Year
Zero: Memory optimizations toward training trillion parameter models
S Rajbhandari, J Rasley, O Ruwase, Y He
SC20: International Conference for High Performance Computing, Networking …, 2020
2732020
Graph query processing using plurality of engines
S Elnikety, Y He, S Sakr
US Patent 9,053,210, 2015
2542015
Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters
J Rasley, S Rajbhandari, O Ruwase, Y He
Proceedings of the 26th ACM SIGKDD International Conference on Knowledge …, 2020
1752020
Provably-efficient job scheduling for energy and fairness in geographically distributed data centers
S Ren, Y He, F Xu
2012 IEEE 32nd International Conference on Distributed Computing Systems, 22-31, 2012
1472012
Learning intrinsic sparse structures within long short-term memory
W Wen, Y He, S Rajbhandari, M Zhang, W Wang, F Liu, B Hu, Y Chen, ...
arXiv preprint arXiv:1709.05027, 2017
1382017
The Cilkview scalability analyzer
Y He, CE Leiserson, WM Leiserson
Proceedings of the twenty-second annual ACM symposium on Parallelism in …, 2010
1352010
Adaptive work-stealing with parallelism feedback
K Agrawal, CE Leiserson, Y He, WJ Hsu
ACM Transactions on Computer Systems (TOCS) 26 (3), 1-32, 2008
1352008
Few-to-many: Incremental parallelism for reducing tail latency in interactive services
ME Haque, YH Eom, Y He, S Elnikety, R Bianchini, KS McKinley
ACM SIGPLAN Notices 50 (4), 161-175, 2015
1252015
Predictive parallelization: Taming tail latencies in web search
M Jeon, S Kim, S Hwang, Y He, S Elnikety, AL Cox, S Rixner
Proceedings of the 37th international ACM SIGIR conference on Research …, 2014
1102014
Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model
S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ...
arXiv preprint arXiv:2201.11990, 2022
1082022
{DeepCPU}: Serving {RNN-based} Deep Learning Models 10x Faster
M Zhang, S Rajbhandari, W Wang, Y He
2018 USENIX Annual Technical Conference (USENIX ATC 18), 951-965, 2018
912018
Swayam: distributed autoscaling to meet slas of machine learning inference services with resource efficiency
A Gujarati, S Elnikety, Y He, KS McKinley, BB Brandenburg
Proceedings of the 18th ACM/IFIP/USENIX middleware conference, 109-120, 2017
902017
Adaptive scheduling with parallelism feedback
K Agrawal, Y He, WJ Hsu, CE Leiserson
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice …, 2006
892006
Performance modeling and scalability optimization of distributed deep learning systems
F Yan, O Ruwase, Y He, T Chilimbi
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge …, 2015
872015
Zeta: Scheduling interactive services with partial execution
Y He, S Elnikety, J Larus, C Yan
Proceedings of the Third ACM Symposium on Cloud Computing, 1-14, 2012
822012
Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning
S Rajbhandari, O Ruwase, J Rasley, S Smith, Y He
Proceedings of the International Conference for High Performance Computing …, 2021
762021
Mercury: A memory-constrained spatio-temporal real-time search on microblogs
A Magdy, MF Mokbel, S Elnikety, S Nath, Y He
2014 IEEE 30th International Conference on Data Engineering, 172-183, 2014
742014
{ZeRO-Offload}: Democratizing {Billion-Scale} Model Training
J Ren, S Rajbhandari, RY Aminabadi, O Ruwase, S Yang, M Zhang, D Li, ...
2021 USENIX Annual Technical Conference (USENIX ATC 21), 551-564, 2021
722021
G-SPARQL: a hybrid engine for querying large attributed graphs
S Sakr, S Elnikety, Y He
Proceedings of the 21st ACM international conference on Information and …, 2012
692012
Adaptive parallelism for web search
M Jeon, Y He, S Elnikety, AL Cox, S Rixner
Proceedings of the 8th ACM European Conference on Computer Systems, 155-168, 2013
682013
The system can't perform the operation now. Try again later.
Articles 1–20