> API server communicate with driver?
I assume by "driver" you mean the SparkContext within which each job is running right? This is created by the job server itself. You can think of the workflow like this (we can post one we've been working on to make things more clear):
- User does a POST /jobs to initiate a job. Either this is ad-hoc job (temporary context) or runs in a pre-created context.
- Job server finds or creates the context
- Job server loads the class for the job, which must implement a trait, and invokes a method, passing in the SparkContext instance.
> Will it share the same context as the driver?
Yes. So, all jobs passed to the job server should implement a trait, and the trait has a method like this:
- This is the entry point for a Spark Job Server to execute Spark jobs.
- This function should create or reuse RDDs and return the result at the end, which the
- Job Server will cache or display.
- @param sc a SparkContext for the job. May be reused across jobs.
- @param config the Typesafe Config object passed into the job request
- @return the job result
def runJob(sc: SparkContext, config: Config): Any
The user can submit multiple jobs to the same context – for example, the first job can create a cached RDD, and the second one can query it.
Hope that answers your questions, and looking forward to more feedback.