Sehoon Kim

Cited by

	All	Since 2019
Citations	1971	1969
h-index	13	13
i10-index	15	15

780

390

195

585

202120222023202465 401 774 718

Public access

View all

5 articles

0 articles

available

not available

Based on funding mandates

Sehoon Kim

University of California, Berkeley

Verified email at berkeley.edu - Homepage

Efficient Deep Learning Machine Learning AI Systems


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
A survey of quantization methods for efficient neural network inference A Gholami, S Kim, Z Dong, Z Yao, MW Mahoney, K Keutzer Low-Power Computer Vision, 291-326, 2022	1015	2022
I-BERT: Integer-only BERT quantization S Kim, A Gholami, Z Yao, MW Mahoney, K Keutzer International conference on machine learning, 5506-5518, 2021	292	2021
Learned Token Pruning for Transformers S Kim, S Shen, D Thorsley, A Gholami, W Kwon, J Hassoun, K Keutzer Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and …, 2022	101	2022
A Fast Post-Training Pruning Framework for Transformers W Kwon, S Kim, MW Mahoney, J Hassoun, K Keutzer, A Gholami Advances in Neural Information Processing Systems 35, 2022	87	2022
SqueezeLLM: Dense-and-Sparse Quantization S Kim, C Hooper, A Gholami, Z Dong, X Li, S Shen, MW Mahoney, ... arXiv preprint arXiv:2306.07629, 2023	84	2023
Squeezeformer: An efficient transformer for automatic speech recognition S Kim, A Gholami, A Shaw, N Lee, K Mangalam, J Malik, MW Mahoney, ... Advances in Neural Information Processing Systems 35, 2022	77	2022
Applications and techniques for fast machine learning in science AMC Deiana, N Tran, J Agar, M Blott, G Di Guglielmo, J Duarte, P Harris, ... Frontiers in big Data 5, 787421, 2022	57	2022
Speculative decoding with big little decoder S Kim, K Mangalam, S Moon, J Malik, MW Mahoney, A Gholami, ... Advances in Neural Information Processing Systems 36, 2024	53*	2024
Hessian-aware pruning and optimal neural implant S Yu, Z Yao, A Gholami, Z Dong, S Kim, MW Mahoney, K Keutzer Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2022	52	2022
Full Stack Optimization of Transformer Inference: a Survey S Kim, C Hooper, T Wattanawong, M Kang, R Yan, H Genc, G Dinh, ... arXiv preprint arXiv:2302.14017, 2023	51	2023
Integer-Only Zero-Shot Quantization for Efficient Speech Recognition S Kim, A Gholami, Z Yao, N Lee, P Wang, A Nrusimha, B Zhai, T Gao, ... ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022	21	2022
AI and memory wall A Gholami, Z Yao, S Kim, C Hooper, MW Mahoney, K Keutzer IEEE Micro, 2024	17*	2024
An LLM Compiler for Parallel Function Calling S Kim, S Moon, R Tabrizi, N Lee, MW Mahoney, K Keutzer, A Gholami arXiv preprint arXiv:2312.04511, 2023	15	2023
WindTunnel: towards differentiable ML pipelines beyond a single model GI Yu, S Amizadeh, S Kim, A Pagnoni, C Zhang, BG Chun, M Weimer, ... Proceedings of the VLDB Endowment 15 (1), 11-20, 2021	13*	2021
SPEED: Speculative Pipelined Execution for Efficient Decoding C Hooper, S Kim, H Mohammadzadeh, H Genc, K Keutzer, A Gholami, ... arXiv preprint arXiv:2310.12072, 2023	11	2023
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization C Hooper, S Kim, H Mohammadzadeh, MW Mahoney, YS Shao, ... arXiv preprint arXiv:2401.18079, 2024	8	2024
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement N Lee, T Wattanawong, S Kim, K Mangalam, S Shen, G Anumanchipali, ... arXiv preprint arXiv:2403.15042, 2024	7	2024
Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs T Kim, E Jeong, GW Kim, Y Koo, S Kim, G Yu, BG Chun Advances in Neural Information Processing Systems 34, 1468-1480, 2021	5	2021
Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms J Xu, S Kim, B Nikolic, YS Shao 2021 IEEE International Symposium on Performance Analysis of Systems and …, 2021	5	2021
Characterizing Prompt Compression Methods for Long Context Inference S Jha, LE Erdogan, S Kim, K Keutzer, A Gholami arXiv preprint arXiv:2407.08892, 2024		2024

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by