Sehoon Kim
Cited by
Cited by
A survey of quantization methods for efficient neural network inference
A Gholami, S Kim, Z Dong, Z Yao, MW Mahoney, K Keutzer
Low-Power Computer Vision, 291-326, 2022
I-BERT: Integer-only BERT quantization
S Kim, A Gholami, Z Yao, MW Mahoney, K Keutzer
International conference on machine learning, 5506-5518, 2021
Learned Token Pruning for Transformers
S Kim, S Shen, D Thorsley, A Gholami, W Kwon, J Hassoun, K Keutzer
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and …, 2022
Hessian-aware pruning and optimal neural implant
S Yu, Z Yao, A Gholami, Z Dong, S Kim, MW Mahoney, K Keutzer
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2022
AI and Memory Wall
A Gholami, Z Yao, S Kim, M Mahoney, K Keutzer
RiseLab Blog Post,, 2021
Applications and techniques for fast machine learning in science
AMC Deiana, N Tran, J Agar, M Blott, G Di Guglielmo, J Duarte, P Harris, ...
Frontiers in big Data 5, 787421, 2022
Squeezeformer: An efficient transformer for automatic speech recognition
S Kim, A Gholami, A Shaw, N Lee, K Mangalam, J Malik, MW Mahoney, ...
Advances in Neural Information Processing Systems 35, 2022
A Fast Post-Training Pruning Framework for Transformers
W Kwon, S Kim, MW Mahoney, J Hassoun, K Keutzer, A Gholami
Advances in Neural Information Processing Systems 35, 2022
Integer-Only Zero-Shot Quantization for Efficient Speech Recognition
S Kim, A Gholami, Z Yao, N Lee, P Wang, A Nrusimha, B Zhai, T Gao, ...
ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022
Full Stack Optimization of Transformer Inference: a Survey
S Kim, C Hooper, T Wattanawong, M Kang, R Yan, H Genc, G Dinh, ...
arXiv preprint arXiv:2302.14017, 2023
SqueezeLLM: Dense-and-Sparse Quantization
S Kim, C Hooper, A Gholami, Z Dong, X Li, S Shen, MW Mahoney, ...
arXiv preprint arXiv:2306.07629, 2023
Big little transformer decoder
S Kim, K Mangalam, J Malik, MW Mahoney, A Gholami, K Keutzer
arXiv preprint arXiv:2302.07863, 2023
WindTunnel: towards differentiable ML pipelines beyond a single model
GI Yu, S Amizadeh, S Kim, A Pagnoni, C Zhang, BG Chun, M Weimer, ...
Proceedings of the VLDB Endowment 15 (1), 11-20, 2021
Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs
T Kim, E Jeong, GW Kim, Y Koo, S Kim, G Yu, BG Chun
Advances in Neural Information Processing Systems 34, 1468-1480, 2021
Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms
J Xu, S Kim, B Nikolic, YS Shao
2021 IEEE International Symposium on Performance Analysis of Systems and …, 2021
SPEED: Speculative Pipelined Execution for Efficient Decoding
C Hooper, S Kim, H Mohammadzadeh, H Genc, K Keutzer, A Gholami, ...
arXiv preprint arXiv:2310.12072, 2023
Method and apparatus for executing deep learning programs
T bum Kim, BG Chun, GW Kim, YM Koo, GI YU, EJ Jeong, SH Kim
US Patent App. 17/954,345, 2023
The system can't perform the operation now. Try again later.
Articles 1–17