Uploaded image for project: 'Apache Spark'
  1. Apache Spark
  2. SPARK-1136

Fix FaultToleranceTest for Docker 0.8.1

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: Build
    • Labels:
      None

      Description

      Several changes were made between Docker 0.6 (when our spark-test docker files were created) and the current version of Docker, 0.8.1. There are two relevant to the FaultToleranceTest that causes it to fail:

      1) A random host name is assigned to Docker containers. This host name, unlike the IP address, is not reachable from outside the container, but by default we'll try to use it as the Worker's Akka host. This fails when a newly-elected Master attempts to recover a Worker, since the Worker is not actually reachable at the host address it connected from.

      2) IP addresses are now reassigned immediately upon container recycling. This means that we can confuse "old" and "new" Workers or Masters that happened to be assigned the same IP address. The main obvious issue that arises is when a Worker gets a "attempted to re-register" exception when it takes on a previous Worker's IP address during Master recovery.

        Gliffy Diagrams

          Attachments

            Activity

            There are no comments yet on this issue.

              People

              • Assignee:
                ilikerps Aaron Davidson
                Reporter:
                ilikerps Aaron Davidson
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: