Gpu Software Engineer Intern
- Designed accelerated algorithms to improve calculation speed by using GPUs and multiple threads.
- Transformed single-threaded C++ CPU code to multithreaded GPU Python code using CUDA
- Designed efficient multithreaded algorithms that avoided deadlocks, read/write conflicts, and other pitfalls to ensure seamless parallel execution.
- Ported CuPy code back to C++ with CUDA and targeted a 4x to 100x speedup depending on input size.