Follow
Zhihang Yuan
Zhihang Yuan
Infini-AI
Verified email at infini-ai.com - Homepage
Title
Cited by
Cited by
Year
FPGA-based accelerator for long short-term memory recurrent neural networks
Y Guan, Z Yuan, G Sun, J Cong
2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), 629-634, 2017
2402017
Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization
Z Yuan, C Xue, Y Chen, Q Wu, G Sun
European conference on computer vision, 191-207, 2022
150*2022
Post-training quantization on diffusion models
Y Shang, Z Yuan, B Xie, B Wu, Y Yan
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023
1192023
Rptq: Reorder-based post-training quantization for large language models
Z Yuan, L Niu, J Liu, W Liu, X Wang, Y Shang, G Sun, Q Wu, J Wu, B Wu
arXiv preprint arXiv:2304.01089, 2023
632023
A survey on efficient inference for large language models
Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou, L Wang, Z Yuan, X Li, ...
arXiv preprint arXiv:2404.14294, 2024
562024
Pd-quant: Post-training quantization based on prediction difference metric
J Liu, L Niu, Z Yuan, D Yang, X Wang, W Liu
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023
562023
Llm inference unveiled: Survey and roofline model insights
Z Yuan, Y Shang, Y Zhou, Z Dong, Z Zhou, C Xue, B Wu, Z Li, Q Gu, ...
arXiv preprint arXiv:2402.16363, 2024
432024
Pb-llm: Partially binarized large language models
Y Shang, Z Yuan, Q Wu, Z Dong
arXiv preprint arXiv:2310.00034, 2023
402023
S2DNAS: Transforming static CNN model for dynamic inference via neural architecture search
Z Yuan, B Wu, G Sun, Z Liang, S Zhao, W Bi
Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020
372020
Reducing overfitting in deep convolutional neural networks using redundancy regularizer
B Wu, Z Liu, Z Yuan, G Sun, C Wu
Artificial Neural Networks and Machine Learning–ICANN 2017: 26th …, 2017
332017
Asvd: Activation-aware singular value decomposition for compressing large language models
Z Yuan, Y Shang, Y Song, Q Wu, Y Yan, G Sun
arXiv preprint arXiv:2312.05821, 2023
262023
NAS4RRAM: neural network architecture search for inference on RRAM-based accelerators
Z Yuan, J Liu, X Li, L Yan, H Chen, B Wu, Y Yang, G Sun
Science China Information Sciences 64 (6), 160407, 2021
232021
Latency-aware spatial-wise dynamic networks
Y Han, Z Yuan, Y Pu, C Xue, S Song, G Sun, G Huang
Advances in Neural Information Processing Systems 35, 36845-36857, 2022
222022
Wkvquant: Quantizing weight and key/value cache for large language models gains more
Y Yue, Z Yuan, H Duanmu, S Zhou, J Wu, L Nie
arXiv preprint arXiv:2402.12065, 2024
212024
Latency-aware Unified Dynamic Networks for Efficient Image Recognition
Y Han, Z Liu, Z Yuan, Y Pu, C Wang, S Song, G Huang
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
182024
Using data compression for optimizing FPGA-based convolutional neural network accelerators
Y Guan, N Xu, C Zhang, Z Yuan, J Cong
International workshop on advanced parallel processing technologies, 14-26, 2017
132017
Quest: Low-bit diffusion model quantization via efficient selective finetuning
H Wang, Y Shang, Z Yuan, J Wu, J Yan, Y Yan
arXiv preprint arXiv:2402.03666, 2024
122024
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
H Duanmu, Z Yuan, X Li, J Duan, X Zhang, D Lin
arXiv preprint arXiv:2405.06219, 2024
102024
Mim4dd: Mutual information maximization for dataset distillation
Y Shang, Z Yuan, Y Yan
Advances in Neural Information Processing Systems 36, 2024
102024
Enas4d: Efficient multi-stage cnn architecture search for dynamic inference
Z Yuan, X Liu, B Wu, G Sun
arXiv preprint arXiv:2009.09182, 2020
72020
The system can't perform the operation now. Try again later.
Articles 1–20