As an ML Engineer at Bold, you’ll build the AI inference layer that runs directly on customer endpoints. You will take small language model from prototype to efficient, reliable, production-ready implementations across hardware and software.
This role is for you if you want to be at the forefront of edge inference, turning ML into high-performance production systems that run under real-world constraints.
Key Responsibilities:
- Build and optimize the inference engine, utilizing CPU and GPU to run small language models on the edge.
- Develop the engineering components around model execution, including preprocessing, post-processing, telemetry, and failure handling.
- Optimize runtime performance across latency, CPU usage, memory footprint, startup time, binary size, and reliability constraints.
- Apply strong systems engineering judgment when balancing detection quality, performance, explainability, operational complexity, and maintainability.
- Own ambiguous technical problems end to end, from design and implementation through production rollout.
Qualifications:
- 5+ years of engineering experience, with strong hands-on experience in a systems language like Rust, C, or C++.
- Experience building production ML, classification, detection, or decisioning systems.
- Strong understanding of model inference, classification pipelines, runtime constraints, and performance optimization.
- Solid understanding of systems design, operating systems, multi-threading, async programming, and resource management.
- Good understanding of ML fundamentals, including model architectures, feature engineering, classifiers, embeddings, thresholds, and evaluation tradeoffs.
- Strong ownership, collaboration, and communication skills, with the ability to work effectively across disciplines.