Job Description
As Head of Data Engineering, you will lead a team of software engineers responsible for the design, development, and operation of all back-end services. This includes data integration and ingestion, processing, and the application of machine learning model algorithms on large, complex, biological data sets. Our product engineering teams use these back-end services to build and deliver cutting-edge genomic analysis to our customers. We have decided to modernize and re-architect our core data platform and move from a batch-based processing system to a continuous event-based streaming system. The core technologies of the platform will include AWS, Airflow, HDFS/Hadoop, Spark, Kafka, NoSQL (Hbase/Cassandra), and Clickhouse. We primarily use Python.
The role reports to the CTO, where you will have the opportunity to make a significant impact on the company's success at a critical stage in our growth. This is an incredible opportunity to discover the world of real-time data processing and the use of artificial intelligence at scale. You will play an active role in leading our next-generation platform's design, development, and deployment, which is critical to our success.
We are looking for a strong engineering leader who is a strategic and innovative problem-solver; someone with a passion for applying technology to solve real-world customer problems at scale. The ideal candidate is passionate about building high-performance teams who are focused on quality and innovation, and who demonstrates excellent organizational and communication skills with other engineers and leaders throughout the company.
The Role Is:
- Full-time
- Fully remote
- No agencies
- Euro hours
- Salary: $140k – $200k USD per year. Equity is also available.
Responsibilities:
- Lead and develop a team of talented data/software engineers to design, plan, develop, and deploy improvements to back-end platform services related to data ingestion, data processing, and analytics.
- Create a culture of working with big and sensitive data.
- Design the architecture and then lead the implementation of scalable data processing systems.
- Plan the development of a data platform as a SaaS product.
- Collaborate broadly across the organization and with senior leadership to drive team and individual performance focused on clear outcomes and team OKRs.
- Evaluate resource costs, determine the composition of the required team, top-level roadmap, and perform project risk assessment.
- Foster the adoption of best engineering practices across all aspects of software development to build, deploy, test, and release large scale services with quality and agility, while maintaining our current platform to continue to meet customer commitments.
- Facilitate overall technology strategy, quarterly, and yearly goals, drive engineering best practices, and take ownership of delivering on core outcomes.
Required Skills & Experience:
- 7+ years of extensive experience in Data technologies across streaming and batch-oriented realms, cutting across data acquisition, storage, processing, and consumption patterns in operational and analytical domains, as well as expertise in cloud-related data services (AWS / Azure / GCP).
- 5+ years leading highly technical and high performance engineering teams, with experience in people management (hiring and layoff) and performance management (coaching & mentoring). Have led technical Architecture, Design, and Delivery of Big Data and Cloud Data solutions (AWS, Azure, GCP) for multiple projects. Proven track record of architecting, designing, and delivering complex Big Data and Cloud Data projects (AWS, Azure, GCP) to solve problems at scale, especially distributed data platforms (Hadoop/Kafka).
- Expert in distributed data processing frameworks like Spark, Storm, Flink, and Parquet across batch and streaming realms; expert in programming languages, preferably Scala, with Python secondary, and expert at distributed messaging/streaming frameworks like Kafka, Pulsar, Google Pub/Sub, Azure EventHub, and AWS Kinesis.
- Experience with NoSQL databases (Cassandra/HBase/MongoDB/ElasticSearch/Neo4j) and scalable, analytical data stores like Snowflake, BigQuery, Redshift, and Teradata.
- Professional experience with workflow management (Nextflow, Snakemake, Airflow, etc.).
- Deep knowledge of scalable data models, queries, and operations that address various consumption patterns, including random-access and sequential-access, and necessary optimisations like bucketing, aggregating, and sharding.
- Experience in performance-tuning, optimization, and scaling solutions from a storage/processing standpoint.
- Experience with setting up data engineering practices across architecture, design, coding, quality assurance, and deployment of such, using industry-standard DevOps practices for CI/CD, and leveraging tools like Jenkins/Bamboo, Maven, Junit, SonarQube, Terraform (one-click infrastructure setup), Kubernetes, and containerisation.
- Solid understanding of Data Governance, Data Security, Data Cataloguing, and Data Lineage concepts (experience with tools like Collibra in these areas is preferred).
- Passion for recruiting, developing, mentoring, and retaining a world-class engineering team.
- Lean-thinking mindset, comfortable with Agile planning and estimation rituals, flexible, and able to thrive in a fast-paced, innovative young company.
- Excellent written and verbal English-language communication skills, with the ability to adapt the level of detail to various audiences, and able to concisely explain technical concepts to business stakeholders.