Search for More Jobs

Get alerts for jobs like this Get jobs like this tweeted to you

Company: AMD

Location: Austin, TX

Career Level: Director

Industries: Technology, Software, IT, Electronics

Apply on company website View all jobs at this company

Description

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE:

AMD is seeking a Sr level Engineer to lead company-level innovation in GPU microarchitecture performance measurement, parallel programming optimization, and advanced software diagnostics. This role centers on deep technical leadership in performance attribution, hardware/software observability, and defect localization across GPU compute stacks. In this role you will define and architect next-generation methodologies for microarchitectural analysis, counter design, instrumentation, and diagnostic tooling that enable precise performance understanding from silicon through runtime and application layers.

This individual will work closely with GPU architecture, silicon design, firmware, drivers, compiler/runtime, and tools teams to ensure AMD platforms deliver measurable, explainable, and reproducible performance across generations.

THE PERSON:

Are you a hands-on architect in areas like GPU/accelerator or HPC performance engineering, microarchitecture analysis, compilers, runtime systems, and diagnostics.

KEY RESPONSIBILITIES:

Microarchitecture Performance Measurement & Attribution

Define AMD's methodology for cycle-accurate and counter-driven performance attribution across GPU generations.
Architect performance measurement frameworks that correlate workload behavior to microarchitectural structures (CUs/SIMDs, wavefront schedulers, issue pipelines, register files, memory hierarchy, cache systems, fabric/interconnect).
Drive counter architecture definition and validation to ensure observability of pipeline stalls, cache contention, memory divergence, synchronization overhead, and scheduling inefficiencies.
Establish rigorous approaches for bottleneck classification: compute-bound, memory-bound, latency-bound, fabric-bound, and occupancy-limited regimes.
Develop scalable performance modeling techniques linking pre-silicon simulation, emulation, and post-silicon telemetry.

Parallel Programming Performance Optimization

Architect end-to-end performance workflows: microbenchmarks, workload decomposition, instrumentation, trace capture, and guided optimization.
Lead development of profiling and visualization systems exposing pipeline stages, wave occupancy, cache behavior, memory bandwidth utilization, atomic/synchronization costs, and interconnect utilization.
Influence compiler and runtime optimizations including code generation, scheduling, register allocation, vectorization, tiling, kernel fusion, and launch configuration strategies.
Drive auto-tuning and kernel optimization frameworks for AI/HPC workloads (GEMM, convolution, attention, graph workloads) across GPU generations and heterogeneous system configurations.
Ensure strong correlation between synthetic benchmarks, application kernels, and real-world workloads.

Advanced Software Diagnostics & Defect Localization

Architect diagnostic frameworks capable of detecting, isolating, and reproducing defects across silicon, firmware, driver, runtime, and application layers.
Develop static and dynamic analysis tools tailored to GPU execution and memory consistency models.
Lead development of GPU-focused sanitizers, race detectors, memory checkers, hang analysis tools, and fuzzing frameworks.
Build automated triage systems integrating telemetry, crash signatures, counter anomalies, and workload traces to accelerate root cause identification.
Drive methodologies for deterministic repro, workload minimization, and differential testing across hardware stepping and driver/compiler revisions.
Collaborate with architecture and validation teams to improve design-for-observability and post-silicon debug capabilities.

Tooling & Observability Architecture

Influence design of profiling and performance counter infrastructure in collaboration with silicon teams.
Guide evolution of ROCm profiling tools, trace systems, and low-level instrumentation interfaces.
Ensure alignment between hardware counters, compiler instrumentation, runtime telemetry, and developer-facing tools.
Establish reproducible measurement standards across lab and production environments.

PREFERRED EXPERIENCE:

Hands-on experience in GPU/accelerator or HPC performance engineering, microarchitecture analysis, compilers, runtime systems, and diagnostics.
Deep expertise in GPU microarchitecture: SIMD/CU design, wavefront scheduling, issue pipelines, cache hierarchies, shared/local memory, and interconnect fabrics.
Proven experience designing or leveraging hardware performance counters for bottleneck attribution and workload characterization.
Strong background in profiling, trace analysis, and performance modeling (pre- and post-silicon).
Demonstrated experience building diagnostic tooling: sanitizers, race detection, memory analysis, fuzzing, crash triage systems.
Strong Linux systems knowledge including kernel, drivers, and multi-GPU/multi-node environments.
Proficiency in C/C++ and Python; familiarity with LLVM IR, GPU ISA, and compiler backends.
Track record of delivering measurable performance improvements in production silicon and software stack

ACADEMIC CREDENTIALS:

Bachelors or Masters degree in electrical or computer science engineering

LOCATION:

Austin, TX (Flexible/Hybrid)

This role is not eligible for visa sponsorship.

#LI-RL1

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's “Responsible AI Policy” is available here.

This posting is for an existing vacancy.

Apply on company website

Principal GPU Performance and Diagnostic Software Architect Job Listing at AMD in Austin, TX (Job ID 79593-en-us)

Description

Job Seekers

Principal GPU Performance and Diagnostic Software Architect Job Listing at AMD in Austin, TX (Job ID 79593-en-us)

Description

Find Connections via Linkedin

General Tips

Asking for Help

Getting Introduced

Job Seekers