Edmund Xin

Apache Flink

A stream processing engine for low latency and high throughput.

A Flink job is a Directed Acyclic Graph of operators

Components

JobManager

Each Flink task has a JobManager which runs the DAG. It coordinates aspects for the overall link.

Main responsibilities:

  • Scheduling what runs where based on parallelism and the available resources
  • Tracking the job state (running, restarting, failed) and handles failover
  • Coordinates checkpoints

TaskManager

A worker process that actually runs the operators. Each corresponds to a JVM process.

Main responsibilities:

  • Actually executes the operators
  • Hosts the state for the operators (RocksDB state)
  • Runs network stack for shuffles
Software Engineer