- Analyzing strings
- Comparing Grading Schemes
- Classifying Students, Revisited
- Algorithms, Containers, and Iterators
- Details
Comparing Grading Schemes
Imagine that we have a grading scheme that bases students' final grades in part on their median homework scores. Devious students can exploit this scheme by deliberately not turning in all their homework assignments. After all, the bottom half of their homework grades has no effect on their final grade. If they've done enough homework to ensure a good grade, why not stop doing homework altogether?
In our experience, most students do not exploit this particular loophole. However, we did have an occasion to teach one class that gleefully and openly did so. We wondered whether the students who skipped homework had, on average, different final grades than those who did all the homework. While we were thinking about how to answer that question, we decided that it might be interesting to see what the answer would be if we used one of two alternative grading schemes:
Using the average instead of the median, and treating those assignments that the student failed to turn in as 0
Using the median of only the assignments that the student actually submitted
For each of these grading schemes, we wanted to compare the median grade of the students who turned in all their homework with the median grade of the students who missed one or more assignments. We wound up with a program that had to solve two distinct subproblems:
Read all the student records, separating the students who did all the homework from the others
Apply each of the grading schemes to all the students in each group, and report the median grade of each group
Working with Student Records
Our first subproblem is to read and classify the student records. Fortunately, we already have some code that we can use in solving this part of the problem: We can use the Student_info type and the associated read function (from previous book chapters) to read the student data records. What we don't have yet is a function that checks whether a student has done all the homework. Writing such a function is easy:
bool did_all_hw(const Student_info& s) { return ((find(s.homework.begin(), s.homework.end(), 0)) == s.homework.end()); }
This function looks in s.homework to see whether any of the values stored there are 0. Because we give at least partial credit for any assignment that is turned in, a 0 grade means that the assignment was not submitted. We compare the return from find with homework.end(). As usual, find returns its second argument if it fails to find the value that it seeks.
With these two functions, writing code to read and separate the student records is simplicity itself. We'll read each student record, check whether the student did all the homework, and append the record to one of two vectors, which, for want of a better idea, we'll name did and didnt. While we're at it, we'll check that neither vector is empty so that we'll know that our analysis will actually tell us something useful:
vector<Student_info> did, didnt; Student_info student; // read all the records, separating them based on whether all homework was done while (read(cin, student)) { if (did_all_hw(student)) did.push_back(student); else didnt.push_back(student); } // check that both groups contain data if (did.empty()) { cout << "No student did all the homework!" << endl; return 1; } if (didnt.empty()) { cout << "Every student did all the homework!" << endl; return 1; }
The only new idea here is the emptymember function, which yields true if the container is empty and false otherwise. It is a better idea to use this function to check for an empty container than it is to compare the size with 0 because, for some kinds of containers, it might be more efficient to check whether the container has any elements than to figure out exactly how many elements there are.
Analyzing the Grades
We now know how to read and classify student records into the did and didnt vectors. The next step is to analyze them, which means that we need to think a little about how to structure the analysis.
We know that we have three analyses to perform, and each analysis has two parts that analyze separately the students who did and who didn't do all the homework. Because we will do each analysis on two sets of data, we certainly want to make each analysis its own function. However, there are some operations, such as reporting in a common format, that we are going to want to do on pairs of analyses rather than on individual analyses. Evidently, we'll want to make writing the results of each pair of analyses into a function as well.
The tricky part is that we want to call the function that writes the analysis results three times, once for each kind of analysis. We want that function to call the appropriate analysis function twice, once each for the did and didnt objects. However, we want the function that generates the reports to call a different analysis function each time we call it. How do we arrange that?
The easiest solution is to define three analysis functions and pass each one as an argument to the reporting function. In this case, we want our output routine to take five arguments:
The stream on which to write the output
A string that represents the name of the analysis
The function to use for the analysis
Two arguments, each of which is one of the vectors that we want to analyze
For example, let's assume that the first analysis, which looks at the medians, is done by a function called median_analysis. Then we'd like to report the results for each group of students by executing:
write_analysis(cout, "median", median_analysis, did, didnt);
Before we define write_analysis, let's define median_analysis. We would like to give that function a vector of student records, and we would like it to compute the students' grades according to the normal grading scheme and to return the median of those grades. We can define that function as follows:
// this function doesn't quite work double median_analysis (const vector<Student_info>& students) { vector<double> grades; transform(students.begin(), students.end(), back_inserter(grades), grade); return median(grades); }
Although this function might appear difficult at first glance, it introduces only one new ideanamely, the transform function. This function takes three iterators and a function. The first two iterators specify a range of elements to transform; the third iterator is the destination into which to put the result of running the function.
When we call transform, we are responsible for ensuring that the destination has room for the values from the input sequence. In this case, there is no problem because we obtain the destination by calling back_inserter, thereby arranging that transform's results will be appended to grades, which will automatically grow as necessary to accommodate the results.
The fourth argument to transform is a function that transform applies to each element of the input sequence to obtain the corresponding element of the output sequence. Therefore, when we call transform in this example, the effect is to apply the grade function to each element of students and to append each grade to the vector named grades. When we have all these students' grades, we call median to compute their median.
There's only one problem: As the comment notes, this function doesn't quite work. One reason that it doesn't work is that there are several overloaded versions of the grade function. The compiler doesn't know which version to call because we haven't given grade any arguments. We need a way to tell the compiler to do so.
The other reason is that the grade function will throw an exception if any student did no homework at all, and the transform function does nothing about exceptions. If an exception occurs, the transform function will be stopped at the point of the exception, and control will return to median_analysis. Because median_analysis doesn't handle the exception, either, the exception will continue to propagate outward. The effect will be that this function will also exit prematurely, passing control to its caller, and so on, until control reaches an appropriate catch. If there is no such catch, as would be likely in this case, the program itself is terminated and the message that was thrown is printed (or not, depending on the implementation).
We can solve both problems by writing an auxiliary function that will try the grade function and handle the exception. Because we are calling the grade function explicitly rather than passing it as an argument, the compiler will be able to figure out which version we mean:
double grade_aux(const Student_info& s) { try { return grade(s); catch (domain_error) { return grade(s.midterm, s.final, 0); } }
This function will call the correct version of grade. If an exception occurs, we will catch it and call the version of grade that takes three doubles that represent the exam scores and overall homework grade. Thus, we'll assume that students who did no homework at all got a 0 grade on their homework, but their exams still count.
Now we can rewrite the analysis function to use grade_aux:
// this version works fine double median_analysis (const vector<Student_info>& students) { vector<double> grades; transform(students.begin(), students.end(), back_inserter(grades), grade_aux); return median(grades); }
Having seen what an analysis routine looks like, we are now in a position to define write_analysis, which uses an analysis routine to compare two sets of students:
void write_analysis(ostream& out, const string& name, double analysis(const vector<Student_info>&), const vector<Student_info>& did, const vector<Student_info>& didnt) { out << name << ": median(did) = " << analysis(did) << ", median(didnt) = " << analysis(didnt) << endl; }
Again, this function is surprisingly small, although it does introduce two new ideas. The first is how to define a parameter that represents a function. The parameter definition for analysis looks just like the function declaration that we wrote earlier in the book. (Actually, as we shall learn later, there is slightly more going on here than meets the eye. The additional detail doesn't affect the current discussion directly.) The other new idea is the return type, void. The built-in type void can be used only in a few restricted ways, one of which is to name a return type. When we say that a function "returns" a void, we're really saying that it has no return value. We can exit from such a function by executing a return statement with no value, such as
return;
or, as we do here, by falling off the end of the function. Ordinarily, we cannot just fall off the end of a function, but the language allows functions that return void to do so.
At this point, we can write the rest of our program:
int main() { // students who did and didn't do all their homework vector<Student_info> did, didnt; // read the student records and partition them Student_info student; while (read(cin, student)) { if (did_all_hw(student)) did.push_back(student); else didnt.push_back(student); } // verify that the analyses will show us something if (did.empty()) { cout << "No student did all the homework!" << endl; return 1; } if (didnt.empty()) { cout << "Every student did all the homework!" << endl; return 1; } // do the analyses write_analysis (cout, "median", median_analysis, did, didnt); write_analysis(cout, "average", average_analysis, did, didnt); write_analysis(cout, "median of homework turned in", optimistic_median_analysis, did, didnt); return 0; }
All that remains is to write average_analysis and optimistic_median_analysis.
Grading Based on Average Homework Grade
We would like the average_analysis function to compute the students' grades by using the average homework grade rather than the median. Therefore, the logical first step is to write a function to compute the average of a vector, with the aim of using it instead of median for grade computation:
double average(const vector<double>& v) { return accumulate(v.begin(), v.end(), 0.0) / v.size(); }
This function uses accumulate, which, unlike the other library algorithms we've used, is declared in <numeric>. As this header's name implies, it offers tools for numeric computation. The accumulate function adds the values in the range denoted by its first two arguments, starting the summation with the value given by its third argument.
The type of the sum is the type of the third argument, so it is crucially important for us to use 0.0, as we did here, instead of 0. Otherwise, the result would be an int, and any fractional part would be lost.
Having used accumulate to generate the sum of all the elements in the range, we divide that sum by v.size(), which is the number of elements in the range. The result of that division, of course, is the average, which we return to our caller.
Once we have the average function, we can use it to implement the average_grade function to reflect this alternative grading policy:
double average_grade(const Student_info& s) { return grade(s.midterm, s.final, average(s.homework)); }
This function uses the average function to compute an overall homework grade, which it then gives to the grade function to use in computing the final grade.
With this infrastructure in place, the average_analysis function is simplicity itself:
double average_analysis(const vector<Student_info>& students) { vector<double> grades; transform(students.begin(), students.end(), back_inserter(grades), average_grade); return median(grades); }
The only difference between this function and median_analysis is its name and its use of average_grade instead of grade_aux.
Median of the Completed Homework
The last analysis scheme, optimistic_median_analysis, gets its name from the optimistic assumption that the students' grades on the homework that they didn't turn in would have been the same as those on the homework that they did turn in. With that assumption, we would like to compute the median of just the homework that each student submitted. We'll call this computation an optimistic median, and we'll begin by writing a function to compute it. Of course, we have to contend with the possibility that a student did no homework at all, in which case we'll use 0 as the overall homework grade:
// median of the nonzero elements of s.homework, or 0 if no such elements exist double optimistic_median (const Student_info& s) { vector<double> nonzero; remove_copy(s.homework.begin(), s.homework.end(), back_inserter(nonzero), 0); if (nonzero.empty()) return grade(s.midterm, s.final, 0); else return grade(s.midterm, s.final, median(nonzero)); }
This function works by extracting the nonzero elements from the homework vector and putting them into a new vector, called nonzero. Once we have the nonzero homework grades, we call the correct version of grade to compute the final score based on the median of the homework assignments that were actually submitted.
The only new idea in this function is how we get values into nonzero, which we do by calling the remove_copy algorithm. To understand the call to remove_copy, you may find it useful to know that the library provides "copying" versions of many of the algorithms. So, for example, remove_copy does what remove does but copies its results to an indicated destination.
The remove function finds all values that match a given value and "removes" those values from the container. All the values in the input sequence that are not "removed" will be copied into the destination. We'll have more to say shortly about what "remove" means in this context.
The remove_copy function takes three iterators and a value. As with most algorithms, the first two iterators denote the input sequence. The third denotes the beginning of the destination for the copy. As with copy, the remove_copy algorithm assumes that there is enough space in the destination to hold all the elements that are copied. We call back_inserter to grow nonzero as needed.
We should now be able to see that the effect of the remove_copy call is to copy into nonzero all the nonzero elements in s.homework. We then check whether nonzero is empty, and, if not, we do the normal grade calculation based on the median of the nonzero grades. If nonzero is empty, then we use 0 as the homework grade.
Of course, to complete our analysis, we need to write an analysis function to call our optimistic_median function. We leave doing so as an exercise.