Search for More Jobs
Get alerts for jobs like this Get jobs like this tweeted to you
Company: AMD
Location: Austin, TX
Career Level: Mid-Senior Level
Industries: Technology, Software, IT, Electronics

Description



WHAT YOU DO AT AMD CHANGES EVERYTHING 

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.  Together, we advance your career.  



THE ROLE:

We are seeking a collaborative and motivated Systems Test Architect to join our team. In this role, you will develop and execute methodologies and test content for system-level hardware, firmware and software validation on machine learning systems built with AMD's technologies. You'll work closely with architecture, design, and post-silicon teams to identify and resolve complex systems, improving future debug and test features. This position involves developing automation, refining validation processes, executing test cases, analyzing data and driving system-level quality across AMD's product portfolio. We welcome applicants from diverse backgrounds who bring curiosity, technical depth, and a commitment to innovation.

THE PERSON:

As a Systems Test Architect, you will deliver our next generation of system tests for our products. In this high visibility position, your systems engineering expertise will be necessary to find and resolve system's HW, FW and SW issues; your SW background will help us advance our test coverage and improve automation frameworks; and your experience with large scale clusters will help us to debug critical issues only seen on scale.

KEY RESPONSIBILITIES:

  • Lead the definition of the test procedures to validate the design of various system components on the next generation of machine learning architectures. This includes all components of servers including CPU/GPU/Memory/BIOS/BMC/IO/storage/networking, etc.  
  • Lead efforts to validate Scale-up and Scale-out architectures, including definition of test plans at cluster-level as well as writing code to support the validation.
  • Translating system specs into a robust system integration test plan.
  • Develop complex/critical test content as well as monitoring/debug/root-cause SW mechanisms that others can use in their plans.
  • Establish methods to validate, monitor and root-cause errors at cluster-level.
  • Make improvements to system level integration test strategies, methodologies, and processes
  • Collaboration with customer and multi-functional HW and SW teams to debug and tackle complex issues
  • Investigate, profile and enable test content for a wide variety of system domains, as well as benchmarks and proxies for customer workloads into our own test frameworks. These range from industry standard benchmarks to state-of-the-art training and inference applications.
  • Develop and improve automation features according to requirements
  • Monitor and analyze the execution of automated tests at scale (for hundreds or thousands of systems),

PREFERRED EXPERIENCE:

  • Prior experience working on HPC or Machine Learning HW systems for large data centers
  • Several years of experience writing software in languages such as Python and/or C/C++ is a must.
  • Post-silicon system integration, system testing
  • Debugging skills at SoC (System on a Chip), System level and cluster level.
  • Experience with Computer Architecture concepts and silicon features, particularly on machine learning systems (scale-up/scale-out experience is a plus)
  • Computer enthusiasts and excellent knowledge of current machine learning technologies in the data center.
  • Effective communication skills including influencing and working across large multi-functional HW, SW, architecture teams

 

ACADEMIC CREDENTIALS: 

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent 

#LI-KW1



Benefits offered are described:  AMD benefits at a glance.

 

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law.   We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.


 Apply on company website