CMSC-341, Spring 1999, Sections 101 and 201

Project 3


Assigned   8 March 1999            Due   2 April 1999


Background

Section 7.6 of the Heileman text describes an application that produces a rough index for a text document. In this project, you will build that application using the list classes from the last project. In addition, you will make some enhancements to the indexing application:


Description

Start with the classes Bnode, BinaryTree, and BST from the textbook web page and make the following modifications:

Create files WordRecord.H and WordRecord.C for the WordRecord class shown on page 211. Adapt the WordRecord class as follows:

In a file called Proj3_aux.C, write the Index() and ReadWord() functions. Modify the Index() function so that it has four parameters as follows:

When writing the functions in Proj3_aux.C, you may use the code on pages 212-213 as a guideline, but you do not have to duplicate the textbook's approach. Make improvements and corrections as needed.

In a file called Proj3.C write a short main() that gets three file names and an integer from the command line, specified in the following order:

The command line arguments should be processed with appropriate error-handling, including informative error messages if the wrong number and/or type of values. Next, attempt to open the three files, again with appropriate error-handling. If successful, call the Index() function.

The format of the output file should be one word per line, with each word followed by its list of page numbers on the same line. Words and page numbers should be separated by one or more spaces. See Sample Output below.


Summary of Tasks

Here are your tasks:
  1. Make sure your ArrayList class works and that it has an << operator defined.
  2. Revise and test the Bnode, BinaryTree, and BST classes. Don't forget to add documentation.
  3. Adapt the WordRecord class shown on page 211.
  4. Write the Index() and other auxilliary functions in Proj3_aux.C.
  5. In Proj3.C, write a main() that processes the command line arguments and calls Index().
  6. Test, test, test! Be sure that you have successfully implemented all of the enhancements described in the Background section above.
  7. The name of your executable should be Proj3.


Grading

Project grading is described on the Project Policy page.


Academic Integrity

Please re-read the Project Policy page for details on honesty in doing projects for this course.


Sample Output

Here is a very simple example to show the expected format of the output. You are responsible for developing your own test cases and input files. All projects will be tested with the same data files, but the files will not be disclosed in advance.

Command line: Proj3 input.txt skip.txt output.txt 5

Contents of input.txt file:

The concept of abstract data types pervades much of the theory of data structures, and also forms the central concept of the class in object-oriented programming. PgBk The binary search tree data structure is a binary tree in which the vertices must obey a specific order. PgBk The hash table is a data structure that is used to implement the Dictionary ADT. Contents of skip.txt file: a and is to Contents of output.txt file: abstract 1 ADT 3 also 1 binary 2 central 1 class 1 concept 1 data 1 2 3 Dictionary 3 forms 1 hash 3 implement 3 in 1 2 much 1 must 2 obey 2 object-oriented 1 of 1 order 2 pervades 1 programming 1 search 2 specific 2 structure 2 3 structures 1 table 3 that 3 theory 1 tree 2 types 1 used 3 vertices 2 which 2 Notice that the word "the" does not appear in the output file because its number of occurrences exceeds the threshold value specified on the command line.
Last modified on Friday March 12, 1999 (04:52:15 EST) by Alan Baumgarten
email:
abaumg1@cs.umbc.edu

Back up to Spring 1999 CMSC-341 Section 1 and 2 Homepage