As a Big Data Engineer, you will work closely with the team and our stakeholders to build and deliver our Hadoop/NoSQL based solutions for a next generation Big Data Analytics platform.
You’ll be designing and producing high performing and stable applications to perform complex processing of petabyte scale data in a Apache NiFi/Hadoop/NoSQL based environment.
Building real-time data streaming applications which are integrated with the business systems to create value from analytical models and to drive rapid decision making
Sourcing, ingesting, wrangling and validating data sets, building pipelines to transform data and produce analytical records for machine learning applications
You’ll have extensive experience with performance tuning applications on Hadoop/Apache NiFi by configuring Apache NiFi/Hadoop/NoSQL based systems to maximise efficiency and performance.
You'll take ownership and increase automation, security and scale of complex data that drive use cases requested by our analytical project partners.
We are looking for you if you have
BS degree or higher in a technology related field (e.g. Computer Science, Math, Information Systems, Industrial Engineering or another quantitative field)
You have a software engineering mindset. You may even be a software engineer with a focus or passion for data-driven solutions.
Minimum 2 years experience in designing, building and managing applications to process large amounts of data in a (Cloudera) Hadoop data platform
Have programming proficiency in SQL, Python, Groovy or Scala/Java
You’ll build robust data streaming and batch pipelines that output very high data quality at scale using combination of Apache NiFi, Apache Spark, Spark Streaming, Apache Kafka and Apache Airflow.
Experience in designing, building and managing applications to that process large amounts of structured and unstructured data in a Hadoop/NoSQL based ecosystem
Experience working with Apache NiFi (Cloudera DataFlow)
Familiarity with Linux systems, including bash programming
Experience working with Apache Hive
Experience working with relational databases like Teradata, Oracle
Experience with NoSQL distributed databases data stores like Apache Hbase, Druid and Elasticsearch/Opensearch, Apache Phoenix
Experience with other distributed technologies such as Cassandra, MongoDB or Apache Kudu is a plus
Experience with data pipeline and workflow management tools (Apache Airflow, etc.) is a plus
Familiarity with the container technologies (Docker, Kubernetes) is a plus