Bachelor's degree in Computer Science, Data Science, Information Technology, or a related field
+ years of experience in data engineering, data operations, or a similar role, delivering production-grade data pipelines and services Strong ownership of deliverables, high standards for code review and peer mentorship, and a commitment to clear documentation, metadata and lineage publishing, and sustainable data engineering standards and practices Excellent analytical and problem-solving skills Understanding of requirements related to product supply and manufacturing in the pharmaceutical industry Good communication and collaboration abilities to work independently in a self-organized team Fluent in Chinese and English Willingness and ability to learn and integrate new tools and technologies to enhance work efficiency and effectiveness Understanding of Data Product concepts and lifecycle management, including data quality key dimensions and tools such as Great Expectations, dbt tests, or Elementary Hands-on experience building Retrieval-Augmented Generation (RAG) data pipelines, including automated document ingestion, preprocessing, chunking, embedding generation, and loading data into vector stores for efficient retrieval Familiarity with RAG architectures and components such as vector databases, embedding models, and LLMs; experience designing these systems is a plus Hands-on proficiency in SQL, Python, PySpark, or Spark to build scalable data pipelines using modern data processing frameworks such as Databricks or Snowflake Proven experience with ETL/ELT tools such as Kettle/Pentaho, DataWorks, or similar Ability to integrate data from source systems using interface patterns such as ODBC, GraphQL, OData, REST, or event streaming Experience with cost and performance optimization and FinOps practices for cloud data platforms such as Snowflake or Databricks, ensuring reliable production operations Understanding of modern lakehouse table formats such as Apache Iceberg and Delta Lake, and ability to design, build, maintain and document conceptual, logical and physical data models Understanding of data architecture patterns such as data hub, data mesh, or data fabric Experience with GxP and computerized system validation is a plus