spark-job-server provides a RESTful interface for submitting and managing Spark jobs, jars, and job contexts. ## Features - *"Spark as a Service"*: Simple REST interface for all aspects of job, context management - Supports sub-second low-latency jobs via long-running job contexts - Start and stop job contexts for RDD sharing and low-latency jobs; change resources on restart - Kill running jobs via stop context - Separate jar uploading step for faster job startup - Asynchronous and synchronous job API. Synchronous API is great for low latency jobs! - Works with Standalone Spark as well as Mesos - Job and jar info is persisted via a pluggable DAO interface ## Architecture The job server is intended to be run as one or more independent processes, separate from the Spark cluster (though it very well may be colocated with say the Master). At first glance, it seems many of these functions (eg job management) could be integrated into the Spark standalone master. While this is true, we believe there are many significant reasons to keep it separate: - We want the job server to work for Mesos and YARN as well - Spark and Mesos masters are organized around "applications" or contexts, but the job server supports running many discrete "jobs" inside a single context - We want it to support Shark functionality in the future - Loose coupling allows for flexible HA arrangements (multiple job servers targeting same standalone master, or possibly multiple Spark clusters per job server) ## API ### Jars GET /jars - lists all the jars and the last upload timestamp POST /jars/ - uploads a new jar under ### Contexts GET /contexts - lists all current contexts POST /contexts/ - creates a new context DELETE /contexts/ - stops a context and all jobs running in it ### Jobs Jobs submitted to the job server must implement a `SparkJob` trait. It has a main `runJob` method which is passed a SparkContext and a typesafe Config object. Results returned by the method are made available through the REST API. GET /jobs - Lists the last N jobs POST /jobs - Starts a new job, use ?sync=true to wait for results GET /jobs/ - gets the result or status of a specific job