Preferred Qualifications:
Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Engine & Framework Expertise: Working knowledge of state-of-the-art inference/training stacks: sglang, vLLM, TensorRT-LLM, DeepSpeed, or Megatron-LM. Deep understanding of optimization patterns: PagedAttention, RadixAttention (Prefix Caching), continuous batching, and speculative decoding.
Operator & GEMM Optimization: Practical experience with CUTLASS, CuTe, or OpenAI Triton. Expertise in high-performance linear algebra (GEMM) optimization, including tiling strategies, data layouts, and mixed-precision accumulation.
Distributed Systems: Proficiency in multi-GPU/multi-node scaling using NCCL and parallelism strategies (Tensor, Pipeline, and Sequence parallelism).
Vibe Coding & AI-Native Velocity: An AI-native mindset: Expert at using vibe coding tools to bypass boilerplate and accelerate the development lifecycle. The technical intuition to architect systems rapidly, moving from "vibe" to "highly-optimized production code" with extreme velocity.