Big Data Engineer (Spark)

Category: coding

Compensation: $85 – $135/hr

Employment Type: Remote

Locations: USA, UK, Canada, Germany, Netherlands

Skills: Spark, Scala/Python, Hadoop

Job Description

About this role

Spark is the workhorse of large-scale data processing, and writing it well requires understanding the JVM, the cluster, and the laws of distributed computation at the same time. As a Big Data Engineer (Spark) for AI training, you will help AI generate Spark code that doesn't just run but scales, recovers, and respects partitioning, shuffles, and skew.

Key Responsibilities

• Generate and evaluate Spark instruction-response pairs covering DataFrame, SQL, and Structured Streaming APIs.

• Review AI-generated code in Scala Spark, PySpark, and Spark SQL.

• Provide feedback on partitioning strategies, broadcast joins, and skew handling.

• Validate AI handling of Delta Lake, Iceberg, and Hudi table formats.

• Evaluate cluster sizing, dynamic allocation, and Spark-on-Kubernetes patterns.

• Identify subtle issues in shuffle behavior, serialization, and AQE-related regressions.

Ideal Qualifications

• 6• years in big data engineering, including 4• years writing production Spark.

• Deep familiarity with both Scala Spark and PySpark.

• Strong grasp of Spark internals (Catalyst, Tungsten, AQE) and distributed-systems trade-offs.

• Experience with at least one modern table format (Delta, Iceberg, Hudi).

• Comfort with cloud data platforms (Databricks, EMR, Dataproc, Synapse).

• Familiarity with Kafka, Airflow, or dbt is a plus.

Project Timeline

• Start Date: Immediate

• Duration: Ongoing

• Commitment: Flexible, 10-25 hours/week

Contract & Payment Terms

• Independent contractor agreement

• Remote work — anywhere in eligible locations

• Weekly payment via Stripe or bank transfer

• Flexible hours

Scale AI's grasp of distributed data with Spark — apply now!

Apply for Big Data Engineer (Spark) at IXO