CMSC 441, Spring 1997

Project

Text Formatting by Dynamic Programming

Implement and experimentally evaluate a dynamic programming solution to the "paragraphing problem" (Problem 16-2 on page 325)--the problem of neatly formatting a paragraph of text with left- and right-justified margins. Communicate your findings in writing in a thoughtful technical report. This project will count for 12% of your semester grade.

Although you are required to complete certain specific tasks, you are also invited and encouraged to use this assignment as a starting point for your own investigations into applications of dynamic programming in text formatting.

Purpose

The purpose of this project is to give you some hands-on experience with experimental analysis of algorithms. In addition, this project will help you learn the important algorithm design strategy of dynamic programming, and it will give you an opportunity to practice communicating your thoughts in a technical report.

Paragraph Filling

An important task in text processing is filling a paragraph---that is, neatly arranging the words in a paragraph so that the left and right margins are flush. To fill a paragraph, one must decide where each new line begins. The input to the paragraphing problem is a sequence of words and a text width; the output is a setting of the paragraph (actually print the paragraph with appropriate line breaks). The order of the words must remain the same, and no line may overflow the text width.

Two strategies are popular---greedy and dynamic programming. The greedy approach, which is used for example by ClarisWorks (a popular wusiwyg text formatting program for the Mac), simply fills the current line word-by-word until the line is full. The more sophisticated dynamic programming approach, such as that used by TeX and LaTeX (document preparation systems for mathematical text), attempts to find a more beautiful formatting of the input paragraph by considering many possible line breaks. Your task is to compare these two approaches.

For simplicity, limit your work to fixed-width fonts (each character of the alphabet has the same width), and do not hyphenate any word.

Programming Tasks

Implement and experimentally evaluate two solutions to the paragraphing problem: one solution based on the greedy approach, the other based on dynamic programming. For each of these two strategies, design and implement the fastest algorithm you can. Carefully ensure that your program based on dynamic programming correctly indeed computes the optimal line breaks (many students who try to solve Problem 16-2 mistakenly give a greedy approach).

For each algorithm, experimentally evaluate the algorithm in two ways: measure the actual running time, and numerically score the quality of results (as scored by an objective function). Be sure to test each algorithm on a variety of input sizes and for several input texts for each input size.

Although not required to do so, you are invited also to measure the actual space usage of each algorithm.

All implementations must be done in the C or C++ programming language.

Questions

In carrying this project, you are required to address each of the following questions:
  1. In terms of time, space, and quality of results, how do the greedy and dynamic programming approaches to text formatting compare?

  2. Concretely show many sample runs (including the input, output, running times and space usage). A large portion of your project grade will depend upon your choice of sample runs. Your suite of sample runs should be robust and should illustrate the performance of the algorithms in a variety of situations. In particular, they should include situations where each algorithm produces relatively good and relatively bad performances.

  3. How is the performance of the dynamic programming solution to Problem 16-2 affected by modifications to the objective function? For example, what happens if the cubic objective function is changed to a linear or quadratic function? Can you suggest and test other interesting objective functions? How well do these quantitative objective functions match with your subjective evaluations of the beauty of the results?

  4. Why does ClarisWorks use a greedy approach to text formatting?

  5. Another problem in text formatting is page breaking--deciding where each new page should begin. Discuss the advantages and disadvantages of computing page breaks with a dynamic programming strategy. Why do you think that TeX does not use a dynamic programming approach to page breaking?
In addition, you are expected to raise and answer at least one additional question of your own choice relating to this project.

What to hand in

Each student must hand in a written project report in the format of a computer science technical report. This report must explain what you did, why you did it, how you did it, what you found, and what is the significance of your findings. Do not simply list experimental findings; be sure also to interpret your findings. In addition, comment on the engineering aspects of your work: What difficulties did you encounter, how did you resolve these difficulties, and what were the consequences of your solutions to these difficulties?

Whenever possible, summarize your important experimental findings in appropriate graphs. As a separate appendix to your report, include a well-documented copy of all source code. For more about technical reports, see Alan Sherman's Guide to writing technical reports.

Deadline

The project report is due at the beginning of class on Thursday, May 8, 1997.

Grading

Your grade will be based on your written report. Quality, not quantity, is important. Specifically, your report will be evaluated on the basis of its scientific merit (correctness, significance, novelty, nontriviality, scientific completeness, thorough testing), effective presentation (clarity, organization, English usage), and appropriateness to the assignment (following instructions).
Last Modified: 2/19/97

Alan T. Sherman, sherman@cs.umbc.edu