GPU Engineer

Kog ·

unknownRemote (Europe-compatible)

Salary Range (USD)

Negotiable

Location

Paris, France

Visa Support

Not mentioned

Funding Stage

Unknown

Job Responsibilities

• low-level kernel work in CUDA/PTX or HIP/CDNA ISA
• monokernel pipeline
• profiling infrastructure
• scaling to the frontier MoE models
• building agents that optimize kernels and inference autonomously

Required Skills

CUDAPTXHIPCDNA ISA

Engineering Culture & Tech Stack

CUDAPTXHIPCDNA ISA

ownership

autonomous optimization

Raw Post

Show original text

Kog (https://kog.ai) | GPU Engineer | Paris, France | REMOTE within a Europe-compatible timezone, one week per month onsite in Paris We are hiring a GPU Engineer to work on the fastest LLM inference engine on standard datacenter GPUs. You would own low-level kernel work in CUDA/PTX or HIP/CDNA ISA, the monokernel pipeline, profiling infrastructure inside it, scaling to the frontier MoE models that run in production, and building our own agents that optimize kernels and inference autonomously. We generate 3,000 tokens/s per request on 8x AMD MI300X and 2,100 on 8x NVIDIA H200, at batch size 1, FP16, no speculative decoding. At batch size 1, the decode is GEMV, so it is memory bandwidth bound, and MBU is what counts. We rewrote the whole hot path ourselves, from the assembly on the chip up to the Transformer we designed around it, with the full decode running as a single persistent GPU kernel. Try it at https://playground.kog.ai Showing your code is part of the process. If you are outside a Europe-compatible timezone, relocation to one is required. Apply: https://jobs.ashbyhq.com/kog/e3950334-a2a6-43cc-a744-df6c386... Questions, email me at nicolas.constant@kog.ai

AI Risk Insights

No major risk signals detected.

Recent News

No recent updates

Data Source

Content parsed by LLM from Hacker News raw data. Confidence:HIGH

Set up Profile