Uploaded image for project: 'Apache Spark'
  1. Apache Spark
  2. SPARK-1194

The same-RDD rule for cache replacement is not properly implemented


    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 1.0.0
    • Component/s: Spark Core
    • Labels:


      The same-RDD rule for cache replacement described in the original RDD paper prevents cycling partitions from the same RDD in and out. Commit 6098f7e meant to implement this, but I believe it introduced some problems.

      In the current implementation, when selecting candidate blocks to be swapped out, once we find a block from the same RDD that the block to be stored belongs to, cache eviction fails and aborts. Also, LRU eviction (as described in the paper) is not employed.

      A possible cache eviction strategy can be: keep selecting blocks not from the RDD that the block to be stored belongs to until either enough free space can be ensured (cache eviction succeeds) or all such blocks are checked (cache eviction fails).

      LRU should also be employed, but not necessarily in this issue.

      Any thoughts? Especially, did I miss any apparent facts behind current implementation?

      Update: LRU is implemented with LinkedHashMap by setting constructor argument accessOrder to true.




            • Assignee:
              liancheng Cheng Lian
              liancheng Cheng Lian
            • Votes:
              2 Vote for this issue
              5 Start watching this issue


              • Created: