Apache Flink
A stream processing engine for low latency and high throughput.
A Flink job is a Directed Acyclic Graph of operators
Components
JobManager
Each Flink task has a JobManager which runs the DAG. It coordinates aspects for the overall link.
Main responsibilities:
- Scheduling what runs where based on parallelism and the available resources
- Tracking the job state (running, restarting, failed) and handles failover
- Coordinates checkpoints
TaskManager
A worker process that actually runs the operators. Each corresponds to a JVM process.
Main responsibilities:
- Actually executes the operators
- Hosts the state for the operators (RocksDB state)
- Runs network stack for shuffles