
普通员工/个人贡献者
AI 估算 · 30k–50k
GPU性能优化岗位稀缺,经验要求高,深圳市场竞争力强,AMD薪资水平优越。
该职位负责AI/ML工作负载的高性能GPU内核开发与优化,涉及LLM、生成式AI等前沿模型
BS required. MS preferred with several years of relevant industry experience.
High-Performance Kernel Development: Design, implement, and optimize high-performance GPU kernels for AI/ML workloads to maximize hardware utilization. Performance Optimization: Analyze and optimize kernel execution for latency and throughput, addressing bottlenecks in memory bandwidth, instruction latency, and thread divergence. Workload Analysis: Evaluate the end-to-end performance impact of individual kernels on full-stack AI models, ensuring that micro-optimizations translate to application-level speedups. Profiling & Tuning: Utilize advanced GPU profiling tools (e.g., ROCm Profiler, Pytorch Profiler) to identify performance cliffs, stall pipelines, and memory hierarchy inefficiencies. Architecture Adaptation: Tailor implementation strategies to leverage specific features of modern GPU architectures (e.g., Matrix Cores, HBM characteristics). Framework Integration: Collaborate with software stack teams to expose optimized kernels within high-level frameworks and inference engines.
GPU Architecture Mastery: In-depth understanding of modern GPU underlying architectures, including streaming multiprocessors (SMs/CUs), memory hierarchy (registers, shared memory, L1/L2 cache, HBM), and warp/wavefront execution models. Kernel Programming Expertise: Strong proficiency in C++ and parallel computing, with extensive hands-on experience in NVIDIA CUDA or AMD HIP kernel programming. Performance Engineering: Demonstrated ability to debug and profile complex GPU workloads, interpreting low-level metrics to drive architectural-aware optimizations. Systems Knowledge: Familiarity with asynchronous execution, stream management, and host-device memory transfers. Python DSLs & Triton: Experience implementing kernels using OpenAI Triton or other Python-based DSLs for agile kernel development and auto-tuning. Inference Engine Experience: Hands-on experience integrating custom kernels into large-scale inference frameworks such as vLLM , SGLang , or TensorRT-LLM . Deep Learning Frameworks: Familiarity with writing custom extensions or operators for PyTorch (C++/CUDA extensions). Hardware Agnosticism: Experience porting kernels between NVIDIA and AMD architectures or working with cross-platform HPC libraries.
优点
缺点 / 挑战
大厂核心岗位,前沿AI技术栈,薪资优厚,但工作强度大且需现场办公。
AMD作为上市巨头,薪资竞争力强,福利完善,但具体薪酬未披露,预估偏高。
岗位聚焦AI前沿技术,涉及LLM、GPU架构等,技能成长空间极大,且公司鼓励创新。
仅现场办公,未提及WLB,芯片行业通常工作强度较高,办公地点在深圳科技园。
AI芯片行业高速增长,AMD在AI领域投入巨大,技术导向强,社会意义中性。