Bo Wu
Cited by
Cited by
Complexity Analysis and Algorithm Design for Reorganizing Data to Minimize Non-Coalesced GPU Memory Accesses
B Wu, Z Zhao, E Zhang, Y Jiang, X Shen
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013
Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations
B Wu, G Chen, D Li, X Shen, J Vetter
The 29th International Conference on Supercomputing, 2015
Can PCM Benefit GPU? Reconciling Hybrid Memory Design with GPU Massive Parallelism for Energy Efficiency
B Wang, B Wu, D Li, X Shen, W Yu, Y Jiao, J Vetter
The 22nd International Conference on Parallel Architectures and Compilation …, 2013
PORPLE: An Extensible Optimizer for Portable Data Placement on GPU
G Chen, B Wu, D Li, X Shen
The 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014
Flep: Enabling flexible and efficient preemption on gpus
B Wu, X Liu, X Zhou, C Jiang
ACM SIGPLAN Notices 52 (4), 483-496, 2017
FinePar: Irregularity-aware fine-grained workload partitioning on integrated architectures
F Zhang, B Wu, J Zhai, B He, W Chen
2017 IEEE/ACM International Symposium on Code Generation and Optimization …, 2017
Automine: harmonizing high-level abstraction and high performance for graph mining
D Mawhirter, B Wu
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 509-523, 2019
Graphie: Large-scale asynchronous graph traversals on just a GPU
W Han, D Mawhirter, B Wu, M Buland
2017 26th International Conference on Parallel Architectures and Compilation …, 2017
ScaAnalyzer: A Tool to Identify Memory Scalability Bottlenecks in Parallel Programs
X Liu, B Wu
The International Conference for High Performance Computing, Networking …, 2015
Grnn: Low-latency and scalable rnn inference on gpus
C Holmes, D Mawhirter, Y He, F Yan, B Wu
Proceedings of the Fourteenth EuroSys Conference 2019, 1-16, 2019
Challenging the" embarrassingly sequential" parallelizing finite state machine-based computations through principled speculation
Z Zhao, B Wu, X Shen
ACM SIGARCH Computer Architecture News 42 (1), 543-558, 2014
Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters
W Zhang, W Cui, K Fu, Q Chen, DE Mawhirter, B Wu, C Li, M Guo
Proceedings of the ACM international conference on supercomputing, 58-68, 2019
Co-run scheduling with power cap on integrated cpu-gpu systems
Q Zhu, B Wu, X Shen, L Shen, Z Wang
2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2017
Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control
B Wu, EZ Zhang, X Shen
The Twentieth International Conference on Parallel Architectures and …, 2011
Graphzero: Breaking symmetry for efficient graph mining
D Mawhirter, S Reinehr, C Holmes, T Liu, B Wu
arXiv preprint arXiv:1911.12877, 2019
Automatic irregularity-aware fine-grained workload partitioning on integrated architectures
F Zhang, J Zhai, B Wu, B He, W Chen, X Du
IEEE Transactions on Knowledge and Data Engineering 33 (3), 867-881, 2019
Graphphi: efficient parallel graph processing on emerging throughput-oriented architectures
Z Peng, A Powell, B Wu, T Bicer, B Ren
Proceedings of the 27th International Conference on Parallel Architectures …, 2018
Simple profile rectifications go a long way
B Wu, M Zhou, X Shen, Y Gao, R Silvera, G Yiu
European Conference on Object-Oriented Programming, 654-678, 2013
Optimizing data placement on GPU memory: A portable approach
G Chen, X Shen, B Wu, D Li
IEEE Transactions on Computers 66 (3), 473-487, 2016
Understanding Co-Run Degradations on Integrated Heterogeneous Processors
Q Zhu, B Wu, X Shen, L Shen, Z Wang
The 27th International Workshop on Languages and Compilers for Parallel …, 2014
The system can't perform the operation now. Try again later.
Articles 1–20