CMSC 441, Spring 1997
Project
Text Formatting by Dynamic Programming
Implement and experimentally evaluate a dynamic programming solution
to the "paragraphing problem" (Problem 16-2 on page 325)--the problem
of neatly formatting a paragraph of text with left- and right-justified
margins. Communicate your findings in writing in a thoughtful
technical report. This project will count for 12% of your
semester grade.
Although you are required to complete certain specific tasks, you
are also invited and encouraged to use this assignment as a starting
point for your own investigations into applications of dynamic programming
in text formatting.
Purpose
The purpose of this project is to give you some hands-on experience with
experimental analysis of algorithms. In addition, this project
will help you learn the important algorithm design strategy of
dynamic programming, and it will give you an opportunity to
practice communicating your thoughts in a technical report.
Paragraph Filling
An important task in text processing is filling a paragraph---that is,
neatly arranging the words in a paragraph so that the left and right
margins are flush. To fill a paragraph, one must decide where each
new line begins. The input to the paragraphing problem is a sequence
of words and a text width; the output is a setting of the paragraph
(actually print the paragraph with appropriate line breaks). The order
of the words must remain the same, and no line may overflow the text
width.
Two strategies are popular---greedy and dynamic programming. The
greedy approach, which is used for example by ClarisWorks (a popular
wusiwyg text formatting program for the Mac), simply fills the current
line word-by-word until the line is full. The more sophisticated
dynamic programming approach, such as that used by TeX and LaTeX
(document preparation systems for mathematical text), attempts to find
a more beautiful formatting of the input paragraph by considering many
possible line breaks. Your task is to compare these two approaches.
For simplicity, limit your work to fixed-width fonts (each character
of the alphabet has the same width), and do not hyphenate any word.
Programming Tasks
Implement and experimentally evaluate two solutions to the
paragraphing problem: one solution based on the greedy approach, the
other based on dynamic programming. For each of these two strategies,
design and implement the fastest algorithm you can. Carefully ensure
that your program based on dynamic programming correctly indeed
computes the optimal line breaks (many students who try to solve
Problem 16-2 mistakenly give a greedy approach).
For each algorithm, experimentally evaluate the algorithm in two
ways: measure the actual running time, and numerically score the
quality of results (as scored by an objective function). Be sure to
test each algorithm on a variety of input sizes and for several input
texts for each input size.
Although not required to do so, you are invited also to
measure the actual space usage of each algorithm.
All implementations must be done in the C or C++ programming language.
Questions
In carrying this project, you are required to address each of
the following questions:
- In terms of time, space, and quality of results, how do the
greedy and dynamic programming approaches to text formatting compare?
- Concretely show many sample runs (including the input, output,
running times and space usage). A large portion of your project grade
will depend upon your choice of sample runs. Your suite of sample runs
should be robust and should illustrate the performance of the
algorithms in a variety of situations. In particular, they should
include situations where each algorithm produces relatively good and
relatively bad performances.
- How is the performance of the dynamic programming solution
to Problem 16-2 affected by modifications to the objective function?
For example, what happens if the cubic objective function is changed
to a linear or quadratic function?
Can you suggest and test other interesting objective functions?
How well do these quantitative objective functions match with your
subjective evaluations of the beauty of the results?
- Why does ClarisWorks use a greedy approach to text formatting?
- Another problem in text formatting is page breaking--deciding
where each new page should begin. Discuss the advantages and
disadvantages of computing page breaks with a dynamic programming
strategy. Why do you think that TeX does not use a dynamic
programming approach to page breaking?
In addition, you are expected to raise and answer at least one
additional question of your own choice relating to this project.
What to hand in
Each student must hand in a written project report in the format of a
computer science technical report. This report must explain what you
did, why you did it, how you did it, what you found, and what is the
significance of your findings. Do not simply list experimental
findings; be sure also to interpret your findings. In
addition, comment on the engineering aspects of your work: What
difficulties did you encounter, how did you resolve these
difficulties, and what were the consequences of your solutions to
these difficulties?
Whenever possible, summarize your important experimental findings
in appropriate graphs. As a separate appendix to your report, include
a well-documented copy of all source code. For more about technical
reports, see Alan Sherman's
Guide to writing technical reports.
Deadline
The project report is due at the beginning of class
on Thursday, May 8, 1997.
Grading
Your grade will be based on your written report. Quality, not
quantity, is important. Specifically, your report will be evaluated
on the basis of its scientific merit (correctness, significance,
novelty, nontriviality, scientific completeness, thorough testing),
effective presentation (clarity, organization, English usage), and
appropriateness to the assignment (following instructions).
Last Modified: 2/19/97
Alan T. Sherman, sherman@cs.umbc.edu