
普通员工/个人贡献者
AI 估算 · 50k–80k
GPU Kernel专家稀缺,AMD作为AI芯片巨头薪资竞争力强,结合上海高级技术岗位市场水平估算。
加入AMD,你将参与设计并优化GPU Kernel,为AI训练与推理提供高性能软件解决方案
THE PERSON: Success in this role requires expert knowledge in machine learning areas such as kernel operators (like MHA, MLA, MOE etc.) by program languages (like Triton/DSL, cuda/hip, PTX/ASM etc.) and some development libraries (like cutlass/CK etc.), frameworks, distributions, compilers, performance optimizations for inference or training, along with strong programming skills in C++ and Python. Candidates must also have hands-on experience with industry AI use scenarios and solutions, end-to-end pipelines, frameworks or SDKs, parallel programming, and having strong debugging and development skillsets.
KEY RESPONSIBILITIES:
PREFERRED EXPERIENCE: Ability to work independently, define project goals and scope, and lead your own development effort. Solid communication skills, in both English and mandarin. Excellent in GPU kernel primitive like Attention (FA, PA, MLA, linear Attn etc.), MOE, TOPK design (algorithms) and development (with Triton/DSL, Cutlass/CK, CUDA/HIP, PTX/ASM etc.). Excellent programming skills in Python, C++ and software skills, including debugging and performance analysis. Experiences with model inference optimization process like GEMM/convolution tuning, graph optimization and operator fusion. Experiences with AI frameworks(e.g. vLLM, Sglang, Megatron-LM, Deepspeed, TensorRT, TensorRT-LLM etc.). Knowledge of compiler (Torch, Triton, LLVM, XLA HLO, graph) is a plus. Knowledge of Linux ROCm/CUDA runtime and KMD/UMD driver is a plus. Knowledge of AI distribution solutions (i.e. EP/SP/CP/TP/PP/DP, DeepEp, DualPipe, PD aggregation etc., KV cache transfer and storage). Knowledge of AI distributed network communication with multi-GPU and multi-node collective communication primitives (NCCL/RCCL), NIC/GPU drivers for RDMA/GDR and high-speed network etc. Knowledge of Linux OS/driver, CI and toolchain (profiler/DCGM) development and debugging.
优点
缺点 / 挑战
暂无明显挑战项
AI芯片巨头核心岗位,前沿GPU Kernel技术,高成长高回报,WLB一般。
AMD作为上市跨国巨头,薪资福利具有竞争力,但JD未披露具体数字,按行业惯例属于市场偏上水平。
岗位涉及最前沿的GPU Kernel技术和AI加速,成长空间极大,但JD未明确提及晋升路径。
上海现场办公,JD未提及弹性工作或WLB信息,推测为常规办公模式,通勤可能较长。
AMD致力于加速AI计算,岗位直接贡献于下一代计算体验,使命感强,但JD中未明确社会影响表述。