Architecture Intern - Inference
Company: Etched
Location: San Jose
Posted on: April 2, 2026
|
|
|
Job Description:
About Etched Etched is building the world’s first AI inference
system purpose-built for transformers - delivering over 10x higher
performance and dramatically lower cost and latency than a B200.
With Etched ASICs, you can build products that would be impossible
with GPUs, like real-time video generation models and extremely
deep & parallel chain-of-thought reasoning agents. Backed by
hundreds of millions from top-tier investors and staffed by leading
engineers, Etched is redefining the infrastructure layer for the
fastest growing industry in history. Job Summary We are seeking a
talented Architecture intern to join our team and contribute to the
design of next-generation AI accelerators. This role focuses on
developing and optimizing compute architectures that deliver
exceptional performance and efficiency for transformer workloads.
You will work on cutting-edge architectural problems and
performance modeling over the course of your internship. Key
responsibilities Support porting state-of-the-art models to our
architecture. Help build programming abstractions and testing
capabilities to rapidly iterate on model porting. Assist in
building, enhancing, and scaling Sohu’s runtime, including
multi-node inference, intra-node execution, state management, and
robust error handling. Contribute to optimizing routing and
communication layers using Sohu’s collectives. Utilize performance
profiling and debugging tools to identify bottlenecks and
correctness issues. Develop and leverage a deep understanding of
Sohu to co-design both HW instructions and model architecture
operations to maximize model performance Implement high-performance
software components for the Model Toolkit You may be a good fit if
you have Progress towards a Bachelor’s, Master’s, or PhD degree in
computer science, computer engineering, applied mathematics, or a
related field Proficiency in Python, C++ Understanding of
performance-sensitive or complex distributed software systems, e.g.
Linux internals, accelerator architectures (e.g. GPUs, TPUs),
Compilers, or high-speed interconnects (e.g. NVLink, InfiniBand).
Ported applications to non-standard accelerator hardware or
hardware platforms. Deep knowledge of transformer model
architectures and/or inference serving stacks (vLLM, SGLang, etc.)
Strong candidates may have some experience with Proficiency in Rust
Low-latency, high-performance applications using both kernel-level
and user-space networking stacks. Deep understanding of distributed
systems concepts, algorithms, and challenges, including consensus
protocols, consistency models, and communication patterns. Solid
grasp of Transformer architectures, particularly Mixture-of-Experts
(MoE). Built applications with extensive SIMD (Single Instruction,
Multiple Data) optimizations for performance-critical paths.
Familiarity with PyTorch or JAX. Math competitions (AIME, AMC, etc)
We encourage you to apply even if you do not believe you meet every
qualification. Program details 12-week paid internship (June -
August 2026) Generous housing support for those relocating Daily
lunch and dinner in our office Based at our office in San Jose, CA
Direct mentorship from industry leaders and world-class engineers
Opportunity to work on one of the most important problems of our
time For any questions, contact internships@etched.com . How we’re
different Etched believes in the Bitter Lesson . We think most of
the progress in the AI field has come from using more FLOPs to
train and run models, and the best way to get more FLOPs is to
build model-specific hardware. Larger and larger training runs
encourage companies to consolidate around fewer model
architectures, which creates a market for single-model ASICs. We
are a fully in-person team in West San Jose, and greatly value
engineering skills. We do not have boundaries between engineering
and research, and we expect all of our technical staff to
contribute to both as needed.
Keywords: Etched, Redwood City , Architecture Intern - Inference, Engineering , San Jose, California