Uploaded image for project: 'Apache Spark'
  1. SPARK-1136

Fix FaultToleranceTest for Docker 0.8.1


    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: Build
    • Labels:


      Several changes were made between Docker 0.6 (when our spark-test docker files were created) and the current version of Docker, 0.8.1. There are two relevant to the FaultToleranceTest that causes it to fail:

      1) A random host name is assigned to Docker containers. This host name, unlike the IP address, is not reachable from outside the container, but by default we'll try to use it as the Worker's Akka host. This fails when a newly-elected Master attempts to recover a Worker, since the Worker is not actually reachable at the host address it connected from.

      2) IP addresses are now reassigned immediately upon container recycling. This means that we can confuse "old" and "new" Workers or Masters that happened to be assigned the same IP address. The main obvious issue that arises is when a Worker gets a "attempted to re-register" exception when it takes on a previous Worker's IP address during Master recovery.




            • Assignee:
              ilikerps Aaron Davidson
              ilikerps Aaron Davidson
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: