超威半导体的AI Product Performance Engineer薪资是多少？

该职位薪资范围为 45k–80k（人民币/月）。

AI Product Performance Engineer的工作地点在哪里？

该职位工作地点位于北京市。工作形式为仅现场办公。

超威半导体的AI Product Performance Engineer有什么任职要求？

该职位要求本科学历及高级经验工作经验。

AI Product Performance Engineer

🤖 AI 估测：¥45K-80K

发布时间：17 天前

立即应聘

ℹ️关于这个职位

这是一个专注于AI产品性能优化的工程师职位

你将负责设计和优化用于AI/ML工作负载的高性能GPU内核，以最大化硬件利用率

工作内容包括分析性能瓶颈、使用专业工具进行剖析和调优，并与软件栈团队合作，将优化后的内核集成到高级框架和推理引擎中

✓工作职责

High-Performance Kernel Development: Design, implement, and optimize high-performance GPU kernels for AI/ML workloads to maximize hardware utilization.

Performance Optimization: Analyze and optimize kernel execution for latency and throughput, addressing bottlenecks in memory bandwidth, instruction latency, and thread divergence.

Workload Analysis: Evaluate the end-to-end performance impact of individual kernels on full-stack AI models, ensuring that micro-optimizations translate to application-level speedups.

Profiling & Tuning: Utilize advanced GPU profiling tools (e.g., ROCm Profiler, Pytorch Profiler) to identify performance cliffs, stall pipelines, and memory hierarchy inefficiencies.

Architecture Adaptation: Tailor implementation strategies to leverage specific features of modern GPU architectures (e.g., Matrix Cores, HBM characteristics).

Framework Integration: Collaborate with software stack teams to expose optimized kernels within high-level frameworks and inference engines.

⭐最低要求

BS required. MS preferred with several years of relevant industry experience

👍优先资格

GPU Architecture Mastery: In-depth understanding of modern GPU underlying architectures, including streaming multiprocessors (SMs/CUs), memory hierarchy (registers, shared memory, L1/L2 cache, HBM), and warp/wavefront execution models.

Kernel Programming Expertise: Strong proficiency in C++ and parallel computing, with extensive hands-on experience in NVIDIA CUDA or AMD HIP kernel programming.

Performance Engineering: Demonstrated ability to debug and profile complex GPU workloads, interpreting low-level metrics to drive architectural-aware optimizations.

Systems Knowledge: Familiarity with asynchronous execution, stream management, and host-device memory transfers.

Python DSLs & Triton: Experience implementing kernels using OpenAI Triton or other Python-based DSLs for agile kernel development and auto-tuning.

Inference Engine Experience: Hands-on experience integrating custom kernels into large-scale inference frameworks such as vLLM , SGLang , or TensorRT-LLM .

Deep Learning Frameworks: Familiarity with writing custom extensions or operators for PyTorch (C++/CUDA extensions).

Hardware Agnosticism: Experience porting kernels between NVIDIA and AMD architectures or working with cross-platform HPC libraries.

AI Product Performance Engineer

🤖 AI 估测：¥45K-80K

发布时间：17 天前

核心评价