Follow
Yi Zhu
Yi Zhu
Microsoft Research Asia
Verified email at microsoft.com
Title
Cited by
Cited by
Year
You only cache once: Decoder-decoder architectures for language models
Y Sun, L Dong, Y Zhu, S Huang, W Wang, S Ma, Q Zhang, J Wang, F Wei
arXiv preprint arXiv:2405.05254, 2024
452024
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
X Guan, LL Zhang, Y Liu, N Shang, Y Sun, Y Zhu, F Yang, M Yang
arXiv preprint arXiv:2501.04519, 2025
332025
Differential Transformer
T Ye, L Dong, Y Xia, Y Sun, Y Zhu, G Huang, F Wei
arXiv preprint arXiv:2410.05258, 2024
322024
{nnScaler}:{Constraint-Guided} Parallelization Plan Generation for Deep Learning Training
Z Lin, Y Miao, Q Zhang, F Yang, Y Zhu, C Li, S Maleki, X Cao, N Shang, ...
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024
92024
The system can't perform the operation now. Try again later.
Articles 1–4