Streaming / Real-time Data Engineer
Job Description
About this role
Real-time data systems sit at a hard intersection: distributed systems, exactly-once delivery, watermarks, and ever-shifting state. As a Streaming Data Engineer for AI training, you will help AI generate streaming code that handles late events, schema evolution, and rebalances without losing or duplicating records.
Key Responsibilities
• Generate and evaluate streaming instruction-response pairs across Kafka, Flink, Kafka Streams, and ksqlDB.
• Review AI-generated code for correct windowing, watermarking, and event-time semantics.
• Provide feedback on consumer-group design, partition strategy, and rebalance handling.
• Validate AI handling of schema registries (Confluent, Apicurio) and Avro/Protobuf evolution.
• Evaluate exactly-once patterns, idempotent producers, and transactional consumers.
• Identify subtle issues in state-store sizing, checkpointing, and reprocessing.
Ideal Qualifications
• 5• years building streaming systems in production.
• Deep familiarity with Kafka and at least one stream-processing framework (Flink, Kafka Streams, or Spark Structured Streaming).
• Strong grasp of distributed-systems concepts (CAP, exactly-once, watermarks).
• Experience with schema management and event-driven architecture patterns.
• Comfort with Java/Scala on the JVM, plus operational tools (Cruise Control, Strimzi).
• Familiarity with Pulsar, Redpanda, or AWS MSK is a plus.
Project Timeline
• Start Date: Immediate
• Duration: Ongoing
• Commitment: Flexible, 10-25 hours/week
Contract & Payment Terms
• Independent contractor agreement
• Remote work — anywhere in eligible locations
• Weekly payment via Stripe or bank transfer
• Flexible hours
Tune AI for the event-time mindset of streaming systems — apply now!