Follow
Eran Malach
Eran Malach
Kempner Institute, Harvard
Verified email at fas.harvard.edu - Homepage
Title
Cited by
Cited by
Year
Decoupling" when to update" from" how to update"
E Malach, S Shalev-Shwartz
Advances in neural information processing systems 30, 2017
6632017
Proving the lottery ticket hypothesis: Pruning is all you need
E Malach, G Yehudai, S Shalev-Schwartz, O Shamir
International Conference on Machine Learning, 6682-6691, 2020
3102020
SGD learns over-parameterized networks that provably generalize on linearly separable data
A Brutzkus, A Globerson, E Malach, S Shalev-Shwartz
arXiv preprint arXiv:1710.10174, 2017
2952017
Hidden progress in deep learning: Sgd learns parities near the computational limit
B Barak, B Edelman, S Goel, S Kakade, E Malach, C Zhang
Advances in Neural Information Processing Systems 35, 21750-21764, 2022
1292022
Learning parities with neural networks
A Daniely, E Malach
Advances in Neural Information Processing Systems 33, 20356-20365, 2020
942020
Is deeper better only when shallow is good?
E Malach, S Shalev-Shwartz
Advances in Neural Information Processing Systems 32, 2019
532019
Quantifying the benefit of using differentiable learning over tangent kernels
E Malach, P Kamath, E Abbe, N Srebro
International Conference on Machine Learning, 7379-7389, 2021
482021
Repeat after me: Transformers are better than state space models at copying
S Jelassi, D Brandfonbrener, SM Kakade, E Malach
arXiv preprint arXiv:2402.01032, 2024
412024
A provably correct algorithm for deep learning that actually works
E Malach, S Shalev-Shwartz
arXiv preprint arXiv:1803.09522, 2018
322018
Decoupling gating from linearity
J Fiat, E Malach, S Shalev-Shwartz
arXiv preprint arXiv:1906.05032, 2019
302019
The evolution of statistical induction heads: In-context learning markov chains
BL Edelman, E Edelman, S Goel, E Malach, N Tsilivis
arXiv preprint arXiv:2402.11004, 2024
292024
Auto-regressive next-token predictors are universal learners
E Malach
arXiv preprint arXiv:2309.06979, 2023
222023
On the power of differentiable learning versus PAC and SQ learning
E Abbe, P Kamath, E Malach, C Sandon, N Srebro
Advances in Neural Information Processing Systems 34, 24340-24351, 2021
222021
The connection between approximation, depth separation and learnability in neural networks
E Malach, G Yehudai, S Shalev-Schwartz, O Shamir
Conference on Learning Theory, 3265-3295, 2021
222021
ID3 learns juntas for smoothed product distributions
A Brutzkus, A Daniely, E Malach
Conference on Learning Theory, 902-915, 2020
212020
Computational separation between convolutional and fully-connected networks
E Malach, S Shalev-Shwartz
arXiv preprint arXiv:2010.01369, 2020
192020
Knowledge distillation: Bad models can be good role models
G Kaplun, E Malach, P Nakkiran, S Shalev-Shwartz
Advances in Neural Information Processing Systems 35, 28683-28694, 2022
182022
On the optimality of trees generated by id3
A Brutzkus, A Daniely, E Malach
arXiv preprint arXiv:1907.05444, 2019
132019
When hardness of approximation meets hardness of learning
E Malach, S Shalev-Shwartz
Journal of Machine Learning Research 23 (91), 1-24, 2022
122022
The implications of local correlation on learning some deep functions
E Malach, S Shalev-Shwartz
Advances in Neural Information Processing Systems 33, 1322-1332, 2020
112020
The system can't perform the operation now. Try again later.
Articles 1–20