Description
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
Principal Validation Engineer – AI/ML
THE ROLE
We are seeking an experienced System Software Design Engineer with deep expertise in OS internals, proficiency across multiple Linux distributions, and hands-on experience with test frameworks and C/C++/Python development. In this role, you will design robust test strategies, build comprehensive test plans, develop automation solutions, and collaborate closely with development teams to deliver high-quality software for GPU-based products.
THE PERSON
The ideal candidate is passionate about software engineering and possesses the leadership skills to drive complex technical challenges to resolution. Your curiosity fuels continuous learning and innovation—helping improve how we work as a team and organization every day. You will join a results-oriented, collaborative environment that supports your career growth and technical development.
KEY RESPONSIBILITIES
Test Strategy & Execution
- Design, develop, and execute comprehensive test plans, strategies, and test cases for complex system-level features
- Perform functional, integration, and system-level testing across multiple Linux distributions (Ubuntu, RHEL, SLES, etc.)
- Ensure thorough test coverage for GPU product features
- Review requirements and create traceable test cases
Debugging & Analysis
- Analyze and debug issues across the full software stack using deep knowledge of Linux internals, system services, kernel behavior, performance tools, and logs
- Investigate failures, perform root cause analysis, and provide detailed debug information to development teams
Automation & Development
- Develop and maintain automated tests using gtest, ctest, and other relevant test frameworks
- Write clean, maintainable C/C++ and Python code for test automation, validation tools, and testing infrastructure
- Drive continuous improvements in test processes, tooling, and coverage
Collaboration & Leadership
- Collaborate with cross-functional teams to ensure testability and influence design decisions that improve product quality
- Mentor junior engineers and contribute to building a high-quality engineering culture
- Independently drive tasks to completion as a proactive contributor
REQUIRED QUALIFICATIONS
Experience
- 8+ years of industry experience in software development, testing, validation, or system bring-up
- Product development or systems engineering background with hardware platforms and their software/firmware ecosystems
Technical Expertise
- Strong understanding of Linux internals: system architecture, boot flow, processes, memory management, networking, and kernel fundamentals
- Hands-on experience with multiple Linux distributions and package/configuration management
- Proficiency in C/C++ and scripting languages (Python/Shell preferred)
- Expert-level debugging skills with tools such as gdb, valgrind, strace, and perf
- Strong knowledge of GPU, CPU, SoC, or computer system architecture
Testing & Validation
- Good hands-on experience with test frameworks such as gtest, ctest, or similar
- Solid understanding of software testing processes, SDLC, and best practices
- Ability to design effective test plans and test cases for complex software components
Soft Skills
- Strong analytical and problem-solving skills
- Excellent communication and collaboration abilities
PREFERRED QUALIFICATIONS
Specialized Domain Knowledge
- Deep learning, high-performance computing, or GPU server-based computing
- CUDA GPU computing languages
- Parallel computing with MPI programming experience
- AI/ML concepts and applications
- General computer architecture concepts
Infrastructure & Tools
- CI/CD tools (Jenkins, GitHub Actions, GitLab CI)
- Container technologies (Docker, Podman, Kubernetes)
- Cloud, virtualization, and container environments
- Large-scale datacenter engineering experience
Testing Expertise
- Performance testing for hardware-software systems
- System-level, functional, and environmental stress testing
- Windows operating system testing experience
Leadership
- Strong interpersonal, organizational, and technical leadership skills
ACADEMIC CREDENTIALS
- Bachelor's or Master's degree in Computer Engineering, Electronics / Electrical Engineering
#LI-NR1
Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's “Responsible AI Policy” is available here.
This posting is for an existing vacancy.
Apply on company website