22
0
0.5
1
1.5
2
2.5
Base
Shuffle
Front Load
Threading
Run Clustering
Compare
Compare w bad
Application
Application w bad
Dependency
Dependency w bad
(a) 92 Users.
0
0.5
1
1.5
2
2.5
Base
Shuffle
Front Load
Threading
Run Clustering
Compare
Compare w bad
Application
Application w bad
Dependency
Dependency w bad
(b) 92 News.
Figure 9: Learner evaluation 92 traces The evaluation results from the 92 traces. Note that none of the learners matches the
best heuristic.
performs better. Because they both use non-optimal approaches, it is reasonable that one works better in
some cases and another in others.
However, the dependency learner does handle the addition of the bad heuristic better than the other
learners; this makes sense because the dependency learner is more aware of context. The bad heuristic is
bad in two ways: first, it adds constraints which will perform poorly. Second, it makes other constraints
look bad, causing the learner to discard what could be useful constraints. Because the dependency learner is
careful about context, it can avoid discarding useful constraints due to influence from constraints from the
bad heuristic.
It is also interesting to notice that sometimes the bad heuristic actually improves performance for the
dependency learner. This may be due to the fact that the learner is better at determining interdependencies,
and can thus pick out the occasional good suggestion among the bad. It is also possible that in some cases
the spread of data that the bad heuristic performs may be beneficial in concert with the other heuristics.
It is also interesting to note the difference between the Cello92 and Cello99 traces. The heuristics
perform fairly well on the Cello92 traces, giving reasonable gains. However, the heuristics perform poorly
on the Cello99 trace, usually causing degraded performance. This is probably due to the fact that the Cello99
traces are much larger, more intricate, and have a larger rate of change. The 99 Reference trace is interesting
in particular, because it is evident that the high rate of change in constraint worth showed in Figure 8 leads
23
0
0.5
1
1.5
2
2.5
Base
Shuffle
Front Load
Threading
Run Clustering
Compare
Compare w bad
Application
Application w bad
Dependency
Dependency w bad
(a) 99 Source.
0
0.5
1
1.5
2
2.5
Base
Shuffle
Front Load
Threading
Run Clustering
Compare
Compare w bad
Application
Application w bad
Dependency
Dependency w bad
(b) 99 Reference.
0
0.5
1
1.5
2
2.5
Base
Shuffle
Front Load
Threading
Run Clustering
Compare
Compare w bad
Application
Application w bad
Dependency
Dependency w bad
(c) 99 Development.
Figure 10: Learner evaluation 99 traces The results of the learner evaluations for the 99 traces. Note that none of the learners
matches the best heuristic.
to poor performance of the learners.
Overall the learners do not perform very well. The learners are never better than the best heuristic,
although they are sometimes close. In addition, the addition of the bad heuristic usually causes significant
degradation in performance.
24
9 Future Work
There are a large number of future directions one could take with this work. We have tried several ad-hoc
approaches to solving the learner problem, but a variety of other methods could applied, including genetic
algorithms or other statistical methods. Extending the dependency learner to sort the list after placing each
constraint would be one interesting option to explore. The fairly short lengths of conflicts suggest that an
extension to the greedy method of conflict resolution which looks forward a few steps before placing each
constraint may also be promising. In addition, thinking about the problem from a block-centric view point
may lead to alternate optimization methods.
As mentioned in the paper, exploration into the granularity of the optimization would also be interesting.
Doing reorganization at a file or other larger granularity may shrink the problem to a more manageable
size.
The heuristic list could also be expanded; the current list is far from exhaustive. It would be interesting
to see how the learner fares with more heuristics, including heuristics using knowledge of data structure,
such as file system information. It could also be useful to further tune the existing heuristics, to improve
their performance.
Additional exploration into the performance analyzer could also be useful. The current analyzer is very
computationally intensive, limiting the number of iterations that can be performed. A faster, less accurate
performance analyzer might yield better results due to its ability to look at more options. Some methods
of performance analysis may also allow efficient evaluation of individual constraint addition or removal,
allowing a more targeted approach to evaluating constraint value. More exploration into the optimal disk
size could also be useful.
10 Conclusion
The concept of a two-tiered learning architecture provides the promise of a flexible and robust way of
combining optimization approaches. However, building a learning agent for such a system has proved
difficult. Unfortunately, constraints are not independent; significant dependency exists both in the value
of constraints and in conflicts between constraints. Our approaches have had some nominal success, but
better approaches will be necessary for a full solution. We have outlined a structure for a two-tiered learning
architecture for disk layout, analyzed the key challenges in building such a system, and suggested directions
for further research. This research also suggests a higher level point; that the problem of policy automation
is difficult in practice.
25
11 Acknowledgements
I would like to thank my advisor Greg Ganger. I would also like to thank the many people who have helped
me with ideas, discussions, and writing code. I especially thank Craig Soules, Eno Thereska, John Strunk,
James Hendricks and Shobha Venkataraman. I am funded by an NSF fellowship.