Software Engineer, Data Platform
About the Role
As a Software Engineer on the Data Platform team, you will focus on designing, building, and maintaining core data and machine learning infrastructure with a strong emphasis on software architecture and code quality. You will develop systems to ingest, process, and manage petabytes of telemetry and sensor data within a globally distributed data lake, enabling high-throughput, low-latency access for both model training and real-time inference. Your work will empower ML engineers and data scientists to iterate quickly and enhance system performance.
Responsibilities-
Build and maintain reliable data pipelines and core datasets to support simulation, analytics, machine learning workflows, and business needs
-
Design and implement scalable database architectures for large, complex datasets, optimizing for performance, cost, and ease of use
-
Collaborate closely with teams such as Simulation, Perception, Prediction, and Planning to understand data requirements and workflows
-
Evaluate, integrate, and extend open-source tools (e.g., Apache Spark, Ray, Apache Beam, Argo Workflows) and internal systems
-
Strong proficiency in Python (required); C++ experience is a plus
-
Proven experience writing high-quality, maintainable code and designing scalable, reliable systems
-
Experience using Kubernetes to deploy and manage distributed systems
-
Practical experience with large-scale open-source data infrastructure (e.g., Kafka, Flink, Cassandra, Redis)
-
Solid understanding of distributed systems and big data platforms, with experience managing petabyte-scale datasets
-
Experience building and operating large-scale machine learning systems
-
Familiarity with ML/AI workflows and machine learning pipelines
-
Experience optimizing performance and resource usage in distributed environments
-
Familiarity with data visualization and dashboarding tools (e.g., Grafana, Apache Superset)
-
Experience with cloud infrastructure (e.g., AWS, GCP, Azure)