A. Verma: Deadline-based workload management for MapReduce enviroment
Key Tech: allocate a tailored number of map/reduce slots to job, when the job profiling info is available
New idea:
1.Makespan theorem: two bound for the makespan of greedy task assignment
lower bound = n.avg/k
uppper bound: (n-1).avg/k + max
where
avg : average duration of n tasks
max: maximum duration of n tasks
* these bounds are paiticularly useful when max << n.avg/k
2. allocate the minimal resource quota required for meeting the constraint, while leave the remainning, spare resource to the future arriving jobs.
a)
b)
c)
where N-num of tasks; S-num of slots; M-durations
--Minimal combinations of map/reduce slots (Sm, Sr)
Sm = min(Nm,Sm)
Sr - slove the equation c)
3. decide a new job to wait , if it can be complete in time. Otherwise, calcuate how many the slots should cancel processing their tasks.
4. job profiling: use its past executions or execute it on a smaller data set
Experiment:
1. Twitter: data-an edgelist of twitter uerids; computation-counts the number of asymmetirc links in dataset.
2. profiling info :Using StatAssist(tool), identify the statistical distributions which best fits the plot, eg, LogNormal, Gamma, Exponential....
3. Deadline/completionTime uniformly distribute in the interval [T, 2T], T is completionTime given all the cluster resource
4. disable speculation
Related work
1. FLEX (J. Wolf)-
pros: a speedup function that produces the job execution time as a function of the allocated slots
cons: not clear for different sizes of input datasets
2. Flow shop model (B. Moseley)-
pros: formalize scheduling as a genelized version of the classical two-stage flexible flow shop problem with identical machines
minimze the makespan of jobs offline and online
3. ParaTimer (K. Morton)-
pros: estimate the progress of parallel queries expressed as Pig scripts that can translate into DAGs of Mapreduce jobs
cons: map/reduce tasks of the same job have the same duration
浙公网安备 33010602011771号