Real World Q&A
Accountability
Real World Q&A
The original schools were in Milwaukee—high poverty, high minority, high second language. Since then, the 90-90-90 techniques have been replicated all over the country with multiple language and ethnic groups. The school in California that had the highest achievement gains for two consecutive years, Mead Valley Elementary, was 100% poverty and 99% Latino and second language, and used 90-90-90 techniques. It's also been replicated in deep south rural schools, Native American schools, and many other schools with diverse ethnic and linguistic backgrounds.
What percentage of the population in the studies were ELL and special education learners?
We have worked with 100% ELL schools and with schools that have full inclusion. In some cases, more than half the students with IEPs, including students with autism, ED, LD, and developmental delays, met or exceeded state standards, with the only adaptation on the state test being time.
Are there effective resources/best practices in the area of writing to deal with a significantly large ELL population?
See the new books Reason to Write and the Reason to Write Student Handbook for more elaboration here. In general, the keys for ELL students include multiple opportunities for success, multiple ways of representing ideas (writing, oral, webs, pictures), and focused feedback. By "focused feedback" I mean don't try to do grammar, organization, word choice, simile, metaphor, irony, and spelling all at the same time. With my ELL students, I would start focusing on JUST organization - beginning, middle, and end. If we had that, it was worthy of celebration. Then I might add just ONE convention, such as capitalization or ending punctuation. Incremental steps, regular feedback, and opportunities for IMMEDIATE correction and success.
Assessment
Real World Q&A
There is an abundant body of research on the value of literacy and writing in particular. Frankly, it’s just common sense that students who do more nonfiction writing, along with editing and rewriting, will improve thinking and reasoning skills, and that will improve their abilities in science, social studies, mathematics, and everything else that they do in life. But if people need to see published resources, I would recommend the following:
- Reeves, D.B., “Standards are not enough: Essential transformations for successful schools,” National Association of Secondary School Principals Bulletin, December 2000.
- Darling-Hammond, L., The Right to Learn, Jossey Bass, 1997
- Reeves, D.B., The Leader’s Guide to Standards, Jossey Bass, 2002.
- Calkins, L., The Art of Teaching Writing, Heinneman.
Here are some ideas to compromise with teachers on the issue of maintaining their creativity while simultaneously maintaining a commitment to excellence:
- Allow teachers to choose the prompt so that the writing assignments fit into their subject. Everyone uses the same scoring rubric, of course, in order to maintain consistency of expectations. But the subject matter of the prompt can be selected by the teachers.
- Allow teachers outside of language arts to use an abbreviated scoring rubric, focusing primarily on organization and conventions. Some of the complexities of the traditional rubrics can be overwhelming for people outside of language arts.
- Allow team scoring, so that the same assignment receives credit in both science and language arts, for example, and teachers score them together.
- Give up time in faculty meetings for collaborative scoring so that teachers know that the administration is willing to give up its meeting so that teachers will have more time. The same can be done with perhaps half of the building and district professional development hours.
These are all reasonable compromises that show your good faith. The essential question that every teacher must address is this: Is what we have been doing in the past working? Are our students writing well enough to have opportunities beyond high school? Ask some local community college and university and technical school faculty to talk with you about this issue. They will uniformly report that even students with good work ethic and decent test scores are writing abysmally, and that this is hurting the career and academic opportunities for these students. The remedy for this is more writing, more editing, more feedback – and all of those things in more subjects. Do you have more 9th graders than 12th graders? In virtually every high school in the land, the answer is yes, and that is because students do not succeed in 9th and 10th grades, and then drop out. These students are not stupid, but they lack essential skills for success in school and in life, and our failure to intervene to give them those skills results in a lifetime of adverse consequences for them.
One final note: The thesis of the “we can’t be creative” argument is that because of the demands of standards, you just can’t be creative but only must teach to the test all day. If that thesis were true, then the evidence should indicate that teachers who do mindless test drills all day long have higher test scores than teachers whose classrooms are marked by creativity, thinking, engagement, analysis, rigor, communication, and, of course, writing. After all, their reasoning goes, you just don’t have time for all those good things if you are doing test drills, and the test drills are the only way to have high scores. In fact, the evidence is the opposite of that hypothesis. I’m advocating FOR creativity, and writing, thinking, engagement, and analysis are all parts of a creative classroom.
I'm an advocate of VERY short (10-12 items) assessments done very frequently. In my classes I would do weekly assessment, but many schools are doing bi-weekly assessment successfully. Certainly formative assessments must be no less frequent than quarterly.
The key is not, however, simply "doing" the assessment -- the key is how teachers use the results to make immediate improvements in teaching and learning. The longer the interval between the assessments, the less likely it is that we can make meaningful alterations in teaching strategies and curriculum.
With regard to the "not enough time for instruction" argument, I can only quote what I heard another Virginia educator say when confronted with the same challenge. "These assessment [they did them biweekly] SAVE us time -- it's the only way we know what to teach and who needs special assistance."
There is not a shred of evidence that covering the curriculum and checking off items on pacing charts is equivalent to student learning. In areas where students succeed -- from electronic games to music to athletics - they receive very frequent feedback. Marzano is just one of many researchers who has established that of all the things that we do as teachers, feedback is the single greatest influence on student achievement, provided that the feedback is timely, accurate, and specific. End of semester finals do not meet any of those criteria.
I am very sympathetic to teachers who feel overwhelmed by the sheer quantity of standards and curriculum elements. The best response to that challenge, however, is not a failure to assess students, but rather a narrowing of the scope of the assessments. It is not necessary to address every element of the standards. We can save a great deal of time if we first identify "Power Standards" (see the book by Larry Ainsworth of that title), and then focus our assessment efforts on the most important assessments.
I can see two sides of this. My very strong inclination is to agree with you, because I have seen some terrific writing assessments in music and art, and I would hate to have those efforts diluted. On the other hand, in many school systems the student:teacher ratio for music, art, and PE is MUCH higher than for other classes, and the use of common assessments can place an excessive burden on those teachers compared to their colleagues. Therefore, some team assessment might make sense -- study the Firebird Suite and artistic representations of it in music AND art, and submit written reflection for credit in both classes, with grading duties split between two teachers.
At the end of the day, we need to honor your desire to focus on your discipline and use your expertise to help students link literacy skills to your class, without drowning you in paperwork. There must be a reasonable compromise here somewhere -- perhaps one quarter doing discipline-specific assessments and the next quarter doing cross-disciplinary assessments.
First, I reject the premise that there is a formula for grading based on the use of the average. We make 5th graders learn that the arithmetic mean is not always the best representation of a data set; surely teachers and school leaders can learn the same lesson.
Second, I applaud the notion of making formative assessments "count" - but neither these assessments nor ANY single project, paper, assessment, or test should be allowed to so profoundly influence a student's grade. Each time we have the "killer project" we administer the academic death penalty, telling students that resilience doesn't matter, finishing strong doesn't count, and a single bad week or month can ruin an entire year. Rather than teaching resilience, we teach defeatism.
Therefore, let me offer some practical ways out of this dilemma.
I would concur with making the assessments "count" - but only as part of a menu of student projects, assignments, and assessments. If the blow the assessment, the consequence is not a failure, but rather the consequence is that students select something else from the menu. The same if they miss an assignment, blow a test, or fail to turn in a project (or, just as commonly, complete the project, but leave it at the bottom of a locker that resembles a toxic waste dump). Our responses should be neither sympathy nor judgment, but simply the rational, logical, encouraging, and firm response that students are responsible for their work, and when they miss important work, they don't fail, but rather they select something else form the menu and GET THE WORK DONE. This results in more work of higher quality, better grades, fewer failures, and appropriate respect for formative assessments without making formative assessments "make or break" tests.
Data
Real World Q&A
Whether you use SPSS or Microsoft Excel or just a hand-written data set, here are the keys to bear in mind:
First, be sure to make a “same student to same student” comparison. In other words, it is of some value to compare last year’s third grade students to this year’s third grade students. Perhaps the differences in scores will be due to the differences in curriculum. But perhaps the differences in scores will due to the fact that they are different children. It is therefore far more persuasive to have pre- and post- test data that will – for the very same group of children – the gains that they make with treatment A (the previous curriculum) and then the gains that they make with treatment B (the new curriculum). You would expect both data sets to show gains – after all, the kids were in school. But your hypothesis is that the gains would be of a different magnitude.
Second, if you are examining effects across many different classrooms, then it is imperative that we acknowledge that there is more going on here than the “presence or absence” or a new curriculum. In every curriculum or teaching reform, the fundamental issue is DEGREE OF IMPLEMENTATAION. I have seen new reading curricula implemented, for example, with some classrooms devoting 90 minutes each day, others devoting 120 minutes each day, and others devoting 180 minutes each day to the “same” curriculum. In other cases, I have seen many schools claiming to have the same new curriculum, but the actual use and implementation varied widely. When researchers tracked the actual degree of use of the new curriculum, they found that low levels of implementation were worse than no change at all, and only the highest levels of implementation were effective. This will help you avoid the “brand name fallacy” in which vendors attempt to claim that the brand name of the curriculum is the salient variable in student achievement, when you and I know that teaching, leadership, and daily implementation are far more important.
Thanks very much for a very thoughtful note. I agree that MAP is not a great reflection of the success of your students. However, I would not exclude MAP from the equation. One common problem in gifted education is that teachers assume "they already know that" and fail to give students the grade-level instruction that they need in addition to enrichment opportunities. That's one reason that gifted kids drop out in high school - they have never experienced failure, and then all of a sudden a high school teacher expects them to have some basic skills that they were never taught because teachers in earlier grades assumed that they had map reading, number operations, and essay construction down pat.
The real question is, as you suggest, what additional measures beyond MAP can we use? First, I think we need to stick with same student to same student comparisons, not this year's group to last year's group. There implies, therefore, some sort of pre- and post- test information during the same academic year.
Second, we'll need a variety of instruments, including traditional ones. As David Perkins reminds us, students who are gifted in one area may not be gifted in others, and therefore the student with superior mathematical ability may nevertheless use traditional literacy tests, or vice versa.
Third, we'll need tests that are sensitive to exceptionally fact progress. That's why test of a single grade level do not work with gift students, because even if we give a 5th grade test to a 4th grade student, we have essentially created a "ceiling effect" of one year of progress above grade level. More nuanced tests, such as those that use Item Response Theory to give progressively more challenging questions each time a student answers correctly, are better suited for this task. But at the end of the day, you and your colleagues (and students) will complain of test fatigue if we just test the kids all the time.
Fourth, to address the test fatigue issue and also challenge students appropriately, I would favor have them create their own assessments. First, I'd have an honest talk about times when they felt that a test did not allow them to show all that they know. When a 5th grader can do algebra, scoring 100% on a multiplication and fractions test is not very satisfying. So let them think about different types of assessment --- closed end, open-end, performance, etc. - and have them construct an assessment each month that, if they were the teacher, would allow their students to demonstrate all that they have learned. The worst case is that some kids will sand-bag, using this as an opportunity to get out of work by dumbing down the task. I think that risk is minimal, particularly when other students are going to create some wonderfully creative and challenging tasks in a variety of different assessment formats.
Instruction
Real World Q&A
With regard to high school reading comprehension, I would be less inclined to use a scripted program than to use techniques that can be applied in many different high school subjects. Examples of activities that can be integrated into your current curriculum without the cost of purchasing a separate program include the following:
1) Summaries - In social studies and science classes, require students to write brief summaries of 1-2 page textbook passages. These summaries need not be formal essays. It is sufficient to have them write the “main idea” and then “three or more supporting details.” This is also an excellent way to get better value out of time devoted to SSR—Sustained Silent Reading— at the high school level.
2) Similarities and differences - this is particularly helpful when students are reading accounts of historical events or scientific observations.
3) Graphic organizers - to demonstrate student understanding of complex literary structures, including novels, plays, and poems, students can use graphic organizers (See Marzano & Pickering) to illustrate the patterns of events, settings, and characters.
4) Inferences - When students review a graph in a math, science, physical education, or social studies class, students should write a 3-4 sentence explanation that elaborates on the meaning of the graph and the relationship among the variables.
5) Compare and contrast - In art classes, students can compare and contrast different artistic styles or the characteristics of different artistic media.
6) Strategies and rules - In PE and technology classes, students can describe which strategies work are most effective for various athletic and electronic games.
This is a start. The key is that when students do more writing, with editing and rewriting, their performance improves in every academic subject. A recent Education Week report indicated that 63% of high school students never complete a research paper requiring footnotes and bibliographies. This failure to challenge students to think, research, and write is directly associated with the failures of students in college and technical school.
For very recent information on the value of writing for ALL students, see Dr. Mel Levine's new book, The Myth of Laziness. I have also addressed this in my books, The Daily Disciplines of Leadership and The Leader's Guide to Standards.
Parents of teens may think that their days of influence are almost over. As the parent of a couple of teenagers, I know that I feel that way, too, on some days. Nevertheless, parents remain the most important influence in the lives of their children. By their daily lives and models, not just their words, parents are more important than peers, teachers, television, and culture.
When it comes to academic help for teens, parents can be particularly frustrated. “I can't help my kid write a term paper and I certainly can't help with chemistry or algebra!” That's probably true, and even if you could, you should not provide that sort of assistance. But there are two important things that parents can do. First, they can insist that their teens keep an assignment notebook and calendar, and parents can check this every day. Personal organization is one of the most important factors that distinguishes successful high school students. Second, parents can insist that their children engage in regular writing. Writing—particularly nonfiction writing with editing and rewriting—is strongly associated with improved student achievement in every other academic area, including math, science, and social studies. Here are three things parents can do to improve student writing:
1) Thank you notes. That's right—old-fashioned, handwritten thank you notes. Sending thank you notes is not just good manners, but will distinguish your children from the vast majority who never take the time to extend this simple courtesy.
2) Family history. The next time you visit grandparents or other older relatives, ask your teens to take some time to interview them, asking them about their parents and grandparents and the times they remember. The World War II generation is vanishing, and if you are lucky enough to have relatives who experienced that era, have your teens write about it. It may rekindle an interest in history and, at the very least, will renew relationships in your family. You probably have family members who remember the Civil Rights movement or the Vietnam era. Have your teens write about those memories. This will be more compelling than anything they have heard in class or read in a textbook.
3) Advice to new students. Ask your teens to write to a younger sibling or family friend of elementary school or middle school age. Give them advice on what is required for success in high school. Their definition of “success” should include not only academic success, but safety, social relationships, and other things that they wish they had known before they entered high school.
Finally, there are things that parents can do to help the entire school. If your school doesn't have a newspaper, help to start one. I volunteer in one of my children's school every week and we publish a newspaper full of student writing, including news stories, sports news, editorials, cartoons, puzzles, advice columns, and anything else that interests the students. The students run the show, editing, writing, and distributing the newspaper. The newspaper staff includes learning-disabled students who have made important contributions and work with other colleagues on the newspaper staff to express their opinions and contribute to the paper in meaningful ways.
I get this question a lot and we've tried to make a lot of research that is precisely on point to this. The three test sources are:
1) Darling-Hammond, Linda, The Right to Learn, Jossey-Bass, 1997— in the first 100 pages there are dozens of citations about this point precisely—linking authentic performance assessment, including writing, to higher multiple choice state test scores.
2) Reeves, D.B., "Standards are not enough" in the December 2000 NASSP Bulletin (lead article). This is the evidence that I've used in my keynotes, but some people need to see it in an academic journal. The issue editor for that one was Robert Marzano.
3) Marzano, Robert, and others. In Classroom Strategies that Work, almost all of the dependent variables were multiple choice test scores, yet none of the nine strategies were classical multiple choice test prep. Of those most related to our writing work, the note-taking, similarities and differences, and graphic organizers are all directly related to the writing processes that we advocate.
Of course, I always close the statement of evidence with this: Is the evidence perfect? Of course not—no research ever is, in education or, for that matter in evidence. But consider this—if we are wrong, and students do a lot of extra literacy and writing and it turned out that they didn't really need to do so, what is the risk—overly literate students? If we are right, and students do need this but we fail to provide it, what is the risk? Long-term deprivation of basic skills that have dramatic negative consequences in all disciplines. So—I think the evidence is pretty good. But even if it were not so good, the risk of our being wrong is very low, and the risk of our being right (and people failing to heed the evidence) is extremely high.
Hope this helps. I'm happy to participate in conference calls with clients who want to discuss this more. I also encourage them to conduct their own case studies and action research to test these hypotheses. See if the students who do a great deal more literacy in science, for example, wind up with lower science scores. See if students who write summaries of each sub-chapter in a social studies book are irreparably damaged. The real issue is whether these strategies are more dangerous for student learning than the strategies that they are using now. They do not need a set of perfect research to suggest that change is needed.
I agree that the culture of many communities requires letter grades, but at the very least we can make them accurate and effective, criteria which few report cards meet today (see the work of Guskey, Stiggins, O'Connor, and Marzano, as well as my own work, on this point). For example, one of the worst practices is the use of the arithmetic mean, or average, so simply translating "exemplary" into an A and "not meeting" into an F and then averaging the results pretty much defeats the purpose of effective grading practice. In one school, for example, there were six student assessments each quarter. An "assessment" might be a lab, paper, project, etc. Their system was:
4 "exemplary" and 2 "proficient" = A
4 "proficient" and 2 "progressing" = B
3 "proficient" = C
No D's allowed.
Anything less than C work resulted in the "IP" - "In Progress" - and students were given two weeks to achieve at least C level work; otherwise, they received a failing grade. Notice that the emphasis in this system is on quality of work, not speed of work. The consequence for turning in work that was below standard was not a zero or an F, but rather the requirement that the student respect teacher feedback and submits the work again. Conversely, the reward for getting work done well the first time might well be the completion of a class before the end of the semester. A growing number of schools are giving students incentives to "get it right the first time" by working hard, respecting teachers, and learning the material. The reward? No finals, early finishes to class, and what students crave -- freedom of choice.
First, I would not be bound by the "three" rule - I've had students revise work as much as nine times - they had to learn that I was serious about quality work and that I would not "give" them a D for work that was not satisfactory. In general, I won't accept work below "B" level work, which leaves me exposed to two complaints -- students work harder than in any other class, and I have grade inflation. My response is that when students are working harder for higher a grade, that's not grade inflation, but "performance inflation."
Neither retention nor social promotions are effective. The only reasonable response to poor student performance is IMMEDIATE intervention that is decisive and effective. This requires frequent assessment, RADICAL changes in schedule where appropriate (180 minutes of literacy per day is not unusual in schools that are making dramatic improvements), appropriate use of free time (home rooms, study halls, academic advisories, and lunch) to give students freedom of choice on how to use that time only when they have earned it. "The price of freedom is proficiency."
Thanks very much for your very thoughtful question. First, I want to express my complete agreement with your staff. We ALL agree that responsibility and citizenship is a good idea. Kids SHOULD have appropriate consequences for failure to do homework, complete projects, and doing badly on tests. The only question is how BEST to encourage responsibility and citizenship and what the appropriate consequences should be.
This is very important - we are not starting with the perspective of "I'm right and the teachers are wrong." We're starting with the perspective that you, your staff, and I all love kids, care about them, and want them to grow up with a good sense of responsibility and citizenship.
Now that we are starting from common ground, let's ask some questions:
1) Are our present practices leading students to improve their rates of homework completion and classroom success? If so, then let's just check the data - what was the percentage of failures five years ago? Three years ago? Last year? If our strategies are effective, I would expect that the failures - particularly failures due to the failure to complete homework, are declining significantly. But that's not, in fact, what I see around the country. The typical grading practices - zeroes for missing work, refusing to take late work, refusing to allow students to resubmit work, use of the average - are not providing improved performance. In fact, teachers complain to me all the time that students are not completing work, that they are disengaged and non-responsive. In other words, if our goal is improved citizenship and responsibility, what we are doing now apparently is not working very well.
2) What alternatives have we tried? In almost every school, I find wide variation in teacher grading practices. There are some teachers who, quietly and almost anonymously, have been experimenting with different practices. Before you consider anything I have to say, conduct a "treasure hunt" by analyzing those classes where failure rates have declined and achievement has improved. Look in different departments around the district where success is high - drivers education, music, computer programming? What do those areas have in common that we can learn from? One thing that I know is true in all three is that when you make a mistake, it doesn't lead to failure, but rather to listening to teacher feedback, respecting teacher feedback, improving performance, and ultimately passing the assessment.
3) What will be our criteria for decision? Can we at least agree that even if people are skeptical, we'll let the evidence be our guide? I've worked in very remote parts of Africa where people did not believe that vaccinations were effective. They didn't want to see my studies or hear a lecture on western medicine. But they were willing to look at children who lived or avoided horrible life-long disabilities because they didn't get polio (it's still rampant in parts of the developing world). The evidence, not my beliefs or their beliefs, ultimately allowed for more vaccinations. So in our schools, can we agree that even if we're not sure, we'll at least try some experiments, and then let the evidence decide? I think that teachers are smart - they care about kids and love them. But they are skeptical because they don't like to see another "hot idea" come and go. So, let's take our time, try it out, but let's also have the intellectual integrity to let the evidence and not personal feelings, decide.
4) Can we agree on some fundamental boundaries? Even if we disagree on policy, can we agree on values such as fairness? Can we agree that grading practices should not be based upon subjective appraisals that can be influenced by gender, race, economic status, or parent activism? Can we agree that the central purpose of feedback, including grades, is the improvement of student achievement?
5) What's in it for the teachers? Can we agree that if we can improve policies that will reduce our failure rate, that we would have happier, more engaged, and better behaved students? Can we agree that if we have fewer students repeating grades and courses, we'll have fewer angry and bored students?
6) What's in it for the school and community? Can we agree that if we have fewer students repeating math and English, that ultimately we'll have more opportunities for art, music, technology, service learning, and other things that both students and teachers find engaging and worthwhile?
Once we have settled these questions, let's try some experiments. I'm not saying I have all the answers, but perhaps different teachers would try different things. Some might just eliminate the zero. Some might stop the average. Others might try a "menu" system such as I use, where the consequence for missing work of blowing a test is selecting other items from the menu. Others might experiment with rewards for work that is on time or early rather than punishment for work that is late.
In other words, I'm not asking you to use MY system, but rather that you use your good judgment and the thoughtful good will of your colleagues to: a) admit that what we are doing now could be improved and b) experiment with different ideas that improve achievement and reduce failures and c) agree that the final school-wide decision will be based on evidence and not personal prejudices.
As a math teacher myself, I understand the quandary. We want students to know the math and also to understand the context of the problem. I used to tell my students, whether they were in elementary school or graduate school, “Mathematics is about describing the universe using numbers, symbols, and words, and we're going to use all three of those this year.” I tried to make it clear that this is about communication—not just about calculation—and it sounds as if your math teachers have similar feelings.
Now to the grading issue. I love your idea of distinguishing between academic content and “listening and following directions” but we would all acknowledge that on a test and in life, students must do BOTH—it's not an either/or proposition. Therefore, the report card is a great idea and the math teacher who insists that students label their problems has a good idea. I don't even mind if they take points off—BUT BUT BUT—those deductions should be in PENCIL, and the student should, prior to turning work in, conduct a self-assessment. In some cases, I have seen a checklist by the teacher's desk and in other cases (including all of my classes) there was a checklist on top of the test that the student had to complete. It included things such as “Name on paper” and could also include “Labels on each answer.” If a student still fails to label the answer correctly, the appropriate response is not a failing grade, but the requirement that the student do it again—that's why the deduction is in pencil. The lesson I want them to learn is not “I'm a failure in math” but rather “I'd better label my answers or it's a lot of extra work so I might as well do it right the first time.” When the student answers the problem correctly, the deduction can be removed. It's the BEHAVIOR of labeling the problem that I want to reinforce, and my experience suggests that in some cases, it takes more than one-trial learning for that to take place.
Pre and post tests should contain items that are parallel - that is, similar in format and difficulty - but not identical. In math, that's pretty straight-forward - keep the item format nearly identical, but change the numbers. Writing is more tricky, because as your colleague notes, student performance is a combination of writing and their knowledge of the prompt. One way to maximize the impact of writing (and minimize the impact of student knowledge of the prompt content) is to increase the number of prompts - perhaps giving students a choice of three prompts for both the pre- and post- test. When students have choice, they have better engagement. Moreover, when teachers are looking at essays with different prompts and content, they are more likely to focus exclusively on the matter at hand - writing - not the student's knowledge of content from the prompt. There are a number of resources on improved classroom assessment, and I would particularly consider the work of Tom Guskey, Jane Bailey, Bob Marzano (see especially his new book on Classroom Assessment and Grading That Work), and Rick Stiggins. One of my books, Making Standards Work (3rd edition) might be useful to you as well, particularly on the issue of creating assessments that are engaging for students. I'm also publishing new book on assessment this fall that will chapters by Marzano, Stiggins, Guskey, and several other leaders in the field.
I do want to caution that when we try to make classroom assessments too perfect, we make some unfortunate trade-offs. For example, you get higher statistical reliability with more items, but I've seen students and teachers get crushed by 80-item 2-hour tests because a professor told them they need to be "reliable and valid." But a psychometrically perfect assessment that alienates kids and teachers and is so long that the results are not delivered to the students in a timely way is simply a waste of time. Better to do mini-assessments - 12 or 15 items - that may not have the statistical perfection of 80 items, but where students get same-day or next-day feedback, and teachers use the data in real time to improve instruction. You might get more "perfect" assessments by purchasing them from test companies, but then you're simply paying for the professional development of the employees of test companies instead of using those resources to help teachers design and evaluate god assessments. When teachers create the assessments collaboratively and score them together in a fair and consistent manner, they not only gain the value of accurate and timely feedback for students, but they also are much more likely to have alignment between instruction, curriculum, standards, and assessment. Teachers will also be emotionally and professional engaged in assessment rather than having assessment staked on to the top of an already overflowing plate.
Thanks very much for your thoughtful and heart-felt message. We both agree on the need to instill discipline and work ethic in students. I also know that we both are seeking the most effective way to do this. I'm not an ivory-tower theoretician. I'm a teacher and parent genuinely seeking the best way to motivate and engage students to achieve higher levels of performance and personal responsibility. I wish that my 8th grade son, the youngest of four, had teachers as thoughtful and committed as you.
So, let's see if we can find some common ground. For example, we both agree that we should not "give credit for no work" - so how about the use of the zero on a four-point scale rather than on the 100-point scale. This would convey the same message about the penalty for no work that we both agree is important, while not inflating the value of the D - truly awful work - in comparison to no work.
How about a menu system in which the penalty for not submitting work on time is simply more work? When they fail to submit work, the "punishment" is not a zero, but actually doing work - and not copying from their peers who submitted work on time, but selecting different work from the assignment menu.
How about a reward system for submitting work on time? In my classes, I used to provide 50 points for the first draft and 50 points for the final draft. But in the unlikely event that someone submitted a pluperfect essay on the first submission, I wrote "100 points - no rewrite necessary." Word quickly spread that the way to "get over" in my class was to do it right the first time.
I'm genuinely seeking solutions here, and I do not have the conceit that I have solved this very vexing dilemma. Thanks for taking the time to engage this very challenging issue.
Leadership
Real World Q&A
First, congratulations on your terrific initiative! You clearly have the right components in place. You are looking at student work and you are committed to collaboration. Moreover, you have created a safe way for teachers to share student work because you are making the submissions anonymous so that neither the identity of the student nor the teacher is disclosed.
You asked for some ways of adding structure to this process, so let me share what I do in these circumstances.
First, I hand out a single piece of student work—remember that all of the samples started in life as “proficient“—and ask each teacher to evaluate it ALONE, using the scoring rubric that was used for the assessment. I then track the percentage of faculty members who rated that piece of work Exemplary, Proficient, Progressing, or Not Meeting Standards. I note the percentage of agreement—typically fairly low.
Second, I give teachers the opportunity to collaboratively score the assessment in groups of two or three, and make any revisions they wish. I note the percentage of agreement again—typically a little bit better.
Third, we have a group meeting to revise the scoring rubric. The theme of this meetings is “the enemy is not each other; the enemy is ambiguity.” Where there is a disagreement it is not the fault of the teacher, but the fault of an ambiguous scoring rubric. We use a collaborative process to revise and improve the scoring rubric, making it more specific. Then we score the same piece of work AGAIN. The third time, the level of agreement is always higher than the first two times.
Finally, we look at this process from the student's ponit of view. What would they think of the first level of scoring? Pretty unfair, right? As we worked together, collaborated, and refined the rubric, our fairness improved. This process is not about some state mandate, but about our fundamental shared value of FAIRNESS.
In most schools, this takes ALL YEAR LONG - it's not a one-short staff development meeting. In some districts, they open EVERY faculty meeting, including cabinet meetings at the central office, with collaborative scoring of student work. They learn that the faster they reach 80% consensus, the faster they move on to other agenda items.
School Improvement
Real World Q&A
The original schools were in Milwaukee—high poverty, high minority, high second language. Since then, the 90-90-90 techniques have been replicated all over the country with multiple language and ethnic groups. The school in California that had the highest achievement gains for two consecutive years, Mead Valley Elementary, was 100% poverty and 99% Latino and second language, and used 90-90-90 techniques. It's also been replicated in deep south rural schools, Native American schools, and many other schools with diverse ethnic and linguistic backgrounds.
Standards
Real World Q&A
Thanks very much for your inquiry. We have provided some examples of Power Standards, but they are only for illustration purposes. There is no substitute for your OWN faculty members going through your standards and asking:
1) What endures?
2) What has leverage?
3) What is really necessary for success at the next grade level?
Then, of course, the job is not done. The next step is to agree on common assessments that all faculty members agree to use so that the focus on these Power Standards has meaning and consistency.
New York tried to do a report card including every standard, and it took them 12 pages for 4th graders. That has, thankfully, been abandoned. It just alienates parents and teachers alike. What is quite reasonable, however, is to have the report card reflect seven to ten standards that are absolutely vital for future success. In that way, even if a student gets (through the alchemy of the grading process) a C or a B, the parents will still know if the student is not succeeding in critical areas.
You can find some examples of language arts standards in the book Reason to Write (Simon & Schuster, 2002).
You asked for the 6th grade math Power Standards. They are:
1) Number operations with and without a calculator.
2) Tables, charts, graphs—create them given a data set, and draw inferences from them once they are completed.
3) Fraction and decimal operations—again with and without a calculator.
4) Measurement in English and metric units.
5) Given a story problem, convert it to an accurate picture
6) Properties of a triangle and rectangle.
7) Two-dimensional scale drawing.
I'm fully aware that there are many other things in the 6th grade curriculum, but every 7th and 8th grade teacher I've ever interviewed has said they would be happier if students could do these seven things rather than be “checked off” on 80 other things that are in the standards.
Accountability
The original schools were in Milwaukee—high poverty, high minority, high second language. Since then, the 90-90-90 techniques have been replicated all over the country with multiple language and ethnic groups. The school in California that had the highest achievement gains for two consecutive years, Mead Valley Elementary, was 100% poverty and 99% Latino and second language, and used 90-90-90 techniques. It's also been replicated in deep south rural schools, Native American schools, and many other schools with diverse ethnic and linguistic backgrounds.
What percentage of the population in the studies were ELL and special education learners?
We have worked with 100% ELL schools and with schools that have full inclusion. In some cases, more than half the students with IEPs, including students with autism, ED, LD, and developmental delays, met or exceeded state standards, with the only adaptation on the state test being time.
Are there effective resources/best practices in the area of writing to deal with a significantly large ELL population?
See the new books Reason to Write and the Reason to Write Student Handbook for more elaboration here. In general, the keys for ELL students include multiple opportunities for success, multiple ways of representing ideas (writing, oral, webs, pictures), and focused feedback. By "focused feedback" I mean don't try to do grammar, organization, word choice, simile, metaphor, irony, and spelling all at the same time. With my ELL students, I would start focusing on JUST organization - beginning, middle, and end. If we had that, it was worthy of celebration. Then I might add just ONE convention, such as capitalization or ending punctuation. Incremental steps, regular feedback, and opportunities for IMMEDIATE correction and success.
Assessment
There is an abundant body of research on the value of literacy and writing in particular. Frankly, it’s just common sense that students who do more nonfiction writing, along with editing and rewriting, will improve thinking and reasoning skills, and that will improve their abilities in science, social studies, mathematics, and everything else that they do in life. But if people need to see published resources, I would recommend the following:
- Reeves, D.B., “Standards are not enough: Essential transformations for successful schools,” National Association of Secondary School Principals Bulletin, December 2000.
- Darling-Hammond, L., The Right to Learn, Jossey Bass, 1997
- Reeves, D.B., The Leader’s Guide to Standards, Jossey Bass, 2002.
- Calkins, L., The Art of Teaching Writing, Heinneman.
Here are some ideas to compromise with teachers on the issue of maintaining their creativity while simultaneously maintaining a commitment to excellence:
- Allow teachers to choose the prompt so that the writing assignments fit into their subject. Everyone uses the same scoring rubric, of course, in order to maintain consistency of expectations. But the subject matter of the prompt can be selected by the teachers.
- Allow teachers outside of language arts to use an abbreviated scoring rubric, focusing primarily on organization and conventions. Some of the complexities of the traditional rubrics can be overwhelming for people outside of language arts.
- Allow team scoring, so that the same assignment receives credit in both science and language arts, for example, and teachers score them together.
- Give up time in faculty meetings for collaborative scoring so that teachers know that the administration is willing to give up its meeting so that teachers will have more time. The same can be done with perhaps half of the building and district professional development hours.
These are all reasonable compromises that show your good faith. The essential question that every teacher must address is this: Is what we have been doing in the past working? Are our students writing well enough to have opportunities beyond high school? Ask some local community college and university and technical school faculty to talk with you about this issue. They will uniformly report that even students with good work ethic and decent test scores are writing abysmally, and that this is hurting the career and academic opportunities for these students. The remedy for this is more writing, more editing, more feedback – and all of those things in more subjects. Do you have more 9th graders than 12th graders? In virtually every high school in the land, the answer is yes, and that is because students do not succeed in 9th and 10th grades, and then drop out. These students are not stupid, but they lack essential skills for success in school and in life, and our failure to intervene to give them those skills results in a lifetime of adverse consequences for them.
One final note: The thesis of the “we can’t be creative” argument is that because of the demands of standards, you just can’t be creative but only must teach to the test all day. If that thesis were true, then the evidence should indicate that teachers who do mindless test drills all day long have higher test scores than teachers whose classrooms are marked by creativity, thinking, engagement, analysis, rigor, communication, and, of course, writing. After all, their reasoning goes, you just don’t have time for all those good things if you are doing test drills, and the test drills are the only way to have high scores. In fact, the evidence is the opposite of that hypothesis. I’m advocating FOR creativity, and writing, thinking, engagement, and analysis are all parts of a creative classroom.
I'm an advocate of VERY short (10-12 items) assessments done very frequently. In my classes I would do weekly assessment, but many schools are doing bi-weekly assessment successfully. Certainly formative assessments must be no less frequent than quarterly.
The key is not, however, simply "doing" the assessment -- the key is how teachers use the results to make immediate improvements in teaching and learning. The longer the interval between the assessments, the less likely it is that we can make meaningful alterations in teaching strategies and curriculum.
With regard to the "not enough time for instruction" argument, I can only quote what I heard another Virginia educator say when confronted with the same challenge. "These assessment [they did them biweekly] SAVE us time -- it's the only way we know what to teach and who needs special assistance."
There is not a shred of evidence that covering the curriculum and checking off items on pacing charts is equivalent to student learning. In areas where students succeed -- from electronic games to music to athletics - they receive very frequent feedback. Marzano is just one of many researchers who has established that of all the things that we do as teachers, feedback is the single greatest influence on student achievement, provided that the feedback is timely, accurate, and specific. End of semester finals do not meet any of those criteria.
I am very sympathetic to teachers who feel overwhelmed by the sheer quantity of standards and curriculum elements. The best response to that challenge, however, is not a failure to assess students, but rather a narrowing of the scope of the assessments. It is not necessary to address every element of the standards. We can save a great deal of time if we first identify "Power Standards" (see the book by Larry Ainsworth of that title), and then focus our assessment efforts on the most important assessments.
I can see two sides of this. My very strong inclination is to agree with you, because I have seen some terrific writing assessments in music and art, and I would hate to have those efforts diluted. On the other hand, in many school systems the student:teacher ratio for music, art, and PE is MUCH higher than for other classes, and the use of common assessments can place an excessive burden on those teachers compared to their colleagues. Therefore, some team assessment might make sense -- study the Firebird Suite and artistic representations of it in music AND art, and submit written reflection for credit in both classes, with grading duties split between two teachers.
At the end of the day, we need to honor your desire to focus on your discipline and use your expertise to help students link literacy skills to your class, without drowning you in paperwork. There must be a reasonable compromise here somewhere -- perhaps one quarter doing discipline-specific assessments and the next quarter doing cross-disciplinary assessments.
First, I reject the premise that there is a formula for grading based on the use of the average. We make 5th graders learn that the arithmetic mean is not always the best representation of a data set; surely teachers and school leaders can learn the same lesson.
Second, I applaud the notion of making formative assessments "count" - but neither these assessments nor ANY single project, paper, assessment, or test should be allowed to so profoundly influence a student's grade. Each time we have the "killer project" we administer the academic death penalty, telling students that resilience doesn't matter, finishing strong doesn't count, and a single bad week or month can ruin an entire year. Rather than teaching resilience, we teach defeatism.
Therefore, let me offer some practical ways out of this dilemma.
I would concur with making the assessments "count" - but only as part of a menu of student projects, assignments, and assessments. If the blow the assessment, the consequence is not a failure, but rather the consequence is that students select something else from the menu. The same if they miss an assignment, blow a test, or fail to turn in a project (or, just as commonly, complete the project, but leave it at the bottom of a locker that resembles a toxic waste dump). Our responses should be neither sympathy nor judgment, but simply the rational, logical, encouraging, and firm response that students are responsible for their work, and when they miss important work, they don't fail, but rather they select something else form the menu and GET THE WORK DONE. This results in more work of higher quality, better grades, fewer failures, and appropriate respect for formative assessments without making formative assessments "make or break" tests.
Data
Whether you use SPSS or Microsoft Excel or just a hand-written data set, here are the keys to bear in mind:
First, be sure to make a “same student to same student” comparison. In other words, it is of some value to compare last year’s third grade students to this year’s third grade students. Perhaps the differences in scores will be due to the differences in curriculum. But perhaps the differences in scores will due to the fact that they are different children. It is therefore far more persuasive to have pre- and post- test data that will – for the very same group of children – the gains that they make with treatment A (the previous curriculum) and then the gains that they make with treatment B (the new curriculum). You would expect both data sets to show gains – after all, the kids were in school. But your hypothesis is that the gains would be of a different magnitude.
Second, if you are examining effects across many different classrooms, then it is imperative that we acknowledge that there is more going on here than the “presence or absence” or a new curriculum. In every curriculum or teaching reform, the fundamental issue is DEGREE OF IMPLEMENTATAION. I have seen new reading curricula implemented, for example, with some classrooms devoting 90 minutes each day, others devoting 120 minutes each day, and others devoting 180 minutes each day to the “same” curriculum. In other cases, I have seen many schools claiming to have the same new curriculum, but the actual use and implementation varied widely. When researchers tracked the actual degree of use of the new curriculum, they found that low levels of implementation were worse than no change at all, and only the highest levels of implementation were effective. This will help you avoid the “brand name fallacy” in which vendors attempt to claim that the brand name of the curriculum is the salient variable in student achievement, when you and I know that teaching, leadership, and daily implementation are far more important.
Thanks very much for a very thoughtful note. I agree that MAP is not a great reflection of the success of your students. However, I would not exclude MAP from the equation. One common problem in gifted education is that teachers assume "they already know that" and fail to give students the grade-level instruction that they need in addition to enrichment opportunities. That's one reason that gifted kids drop out in high school - they have never experienced failure, and then all of a sudden a high school teacher expects them to have some basic skills that they were never taught because teachers in earlier grades assumed that they had map reading, number operations, and essay construction down pat.
The real question is, as you suggest, what additional measures beyond MAP can we use? First, I think we need to stick with same student to same student comparisons, not this year's group to last year's group. There implies, therefore, some sort of pre- and post- test information during the same academic year.
Second, we'll need a variety of instruments, including traditional ones. As David Perkins reminds us, students who are gifted in one area may not be gifted in others, and therefore the student with superior mathematical ability may nevertheless use traditional literacy tests, or vice versa.
Third, we'll need tests that are sensitive to exceptionally fact progress. That's why test of a single grade level do not work with gift students, because even if we give a 5th grade test to a 4th grade student, we have essentially created a "ceiling effect" of one year of progress above grade level. More nuanced tests, such as those that use Item Response Theory to give progressively more challenging questions each time a student answers correctly, are better suited for this task. But at the end of the day, you and your colleagues (and students) will complain of test fatigue if we just test the kids all the time.
Fourth, to address the test fatigue issue and also challenge students appropriately, I would favor have them create their own assessments. First, I'd have an honest talk about times when they felt that a test did not allow them to show all that they know. When a 5th grader can do algebra, scoring 100% on a multiplication and fractions test is not very satisfying. So let them think about different types of assessment --- closed end, open-end, performance, etc. - and have them construct an assessment each month that, if they were the teacher, would allow their students to demonstrate all that they have learned. The worst case is that some kids will sand-bag, using this as an opportunity to get out of work by dumbing down the task. I think that risk is minimal, particularly when other students are going to create some wonderfully creative and challenging tasks in a variety of different assessment formats.
Instruction
With regard to high school reading comprehension, I would be less inclined to use a scripted program than to use techniques that can be applied in many different high school subjects. Examples of activities that can be integrated into your current curriculum without the cost of purchasing a separate program include the following:
1) Summaries - In social studies and science classes, require students to write brief summaries of 1-2 page textbook passages. These summaries need not be formal essays. It is sufficient to have them write the “main idea” and then “three or more supporting details.” This is also an excellent way to get better value out of time devoted to SSR—Sustained Silent Reading— at the high school level.
2) Similarities and differences - this is particularly helpful when students are reading accounts of historical events or scientific observations.
3) Graphic organizers - to demonstrate student understanding of complex literary structures, including novels, plays, and poems, students can use graphic organizers (See Marzano & Pickering) to illustrate the patterns of events, settings, and characters.
4) Inferences - When students review a graph in a math, science, physical education, or social studies class, students should write a 3-4 sentence explanation that elaborates on the meaning of the graph and the relationship among the variables.
5) Compare and contrast - In art classes, students can compare and contrast different artistic styles or the characteristics of different artistic media.
6) Strategies and rules - In PE and technology classes, students can describe which strategies work are most effective for various athletic and electronic games.
This is a start. The key is that when students do more writing, with editing and rewriting, their performance improves in every academic subject. A recent Education Week report indicated that 63% of high school students never complete a research paper requiring footnotes and bibliographies. This failure to challenge students to think, research, and write is directly associated with the failures of students in college and technical school.
For very recent information on the value of writing for ALL students, see Dr. Mel Levine's new book, The Myth of Laziness. I have also addressed this in my books, The Daily Disciplines of Leadership and The Leader's Guide to Standards.
Parents of teens may think that their days of influence are almost over. As the parent of a couple of teenagers, I know that I feel that way, too, on some days. Nevertheless, parents remain the most important influence in the lives of their children. By their daily lives and models, not just their words, parents are more important than peers, teachers, television, and culture.
When it comes to academic help for teens, parents can be particularly frustrated. “I can't help my kid write a term paper and I certainly can't help with chemistry or algebra!” That's probably true, and even if you could, you should not provide that sort of assistance. But there are two important things that parents can do. First, they can insist that their teens keep an assignment notebook and calendar, and parents can check this every day. Personal organization is one of the most important factors that distinguishes successful high school students. Second, parents can insist that their children engage in regular writing. Writing—particularly nonfiction writing with editing and rewriting—is strongly associated with improved student achievement in every other academic area, including math, science, and social studies. Here are three things parents can do to improve student writing:
1) Thank you notes. That's right—old-fashioned, handwritten thank you notes. Sending thank you notes is not just good manners, but will distinguish your children from the vast majority who never take the time to extend this simple courtesy.
2) Family history. The next time you visit grandparents or other older relatives, ask your teens to take some time to interview them, asking them about their parents and grandparents and the times they remember. The World War II generation is vanishing, and if you are lucky enough to have relatives who experienced that era, have your teens write about it. It may rekindle an interest in history and, at the very least, will renew relationships in your family. You probably have family members who remember the Civil Rights movement or the Vietnam era. Have your teens write about those memories. This will be more compelling than anything they have heard in class or read in a textbook.
3) Advice to new students. Ask your teens to write to a younger sibling or family friend of elementary school or middle school age. Give them advice on what is required for success in high school. Their definition of “success” should include not only academic success, but safety, social relationships, and other things that they wish they had known before they entered high school.
Finally, there are things that parents can do to help the entire school. If your school doesn't have a newspaper, help to start one. I volunteer in one of my children's school every week and we publish a newspaper full of student writing, including news stories, sports news, editorials, cartoons, puzzles, advice columns, and anything else that interests the students. The students run the show, editing, writing, and distributing the newspaper. The newspaper staff includes learning-disabled students who have made important contributions and work with other colleagues on the newspaper staff to express their opinions and contribute to the paper in meaningful ways.
I get this question a lot and we've tried to make a lot of research that is precisely on point to this. The three test sources are:
1) Darling-Hammond, Linda, The Right to Learn, Jossey-Bass, 1997— in the first 100 pages there are dozens of citations about this point precisely—linking authentic performance assessment, including writing, to higher multiple choice state test scores.
2) Reeves, D.B., "Standards are not enough" in the December 2000 NASSP Bulletin (lead article). This is the evidence that I've used in my keynotes, but some people need to see it in an academic journal. The issue editor for that one was Robert Marzano.
3) Marzano, Robert, and others. In Classroom Strategies that Work, almost all of the dependent variables were multiple choice test scores, yet none of the nine strategies were classical multiple choice test prep. Of those most related to our writing work, the note-taking, similarities and differences, and graphic organizers are all directly related to the writing processes that we advocate.
Of course, I always close the statement of evidence with this: Is the evidence perfect? Of course not—no research ever is, in education or, for that matter in evidence. But consider this—if we are wrong, and students do a lot of extra literacy and writing and it turned out that they didn't really need to do so, what is the risk—overly literate students? If we are right, and students do need this but we fail to provide it, what is the risk? Long-term deprivation of basic skills that have dramatic negative consequences in all disciplines. So—I think the evidence is pretty good. But even if it were not so good, the risk of our being wrong is very low, and the risk of our being right (and people failing to heed the evidence) is extremely high.
Hope this helps. I'm happy to participate in conference calls with clients who want to discuss this more. I also encourage them to conduct their own case studies and action research to test these hypotheses. See if the students who do a great deal more literacy in science, for example, wind up with lower science scores. See if students who write summaries of each sub-chapter in a social studies book are irreparably damaged. The real issue is whether these strategies are more dangerous for student learning than the strategies that they are using now. They do not need a set of perfect research to suggest that change is needed.
I agree that the culture of many communities requires letter grades, but at the very least we can make them accurate and effective, criteria which few report cards meet today (see the work of Guskey, Stiggins, O'Connor, and Marzano, as well as my own work, on this point). For example, one of the worst practices is the use of the arithmetic mean, or average, so simply translating "exemplary" into an A and "not meeting" into an F and then averaging the results pretty much defeats the purpose of effective grading practice. In one school, for example, there were six student assessments each quarter. An "assessment" might be a lab, paper, project, etc. Their system was:
4 "exemplary" and 2 "proficient" = A
4 "proficient" and 2 "progressing" = B
3 "proficient" = C
No D's allowed.
Anything less than C work resulted in the "IP" - "In Progress" - and students were given two weeks to achieve at least C level work; otherwise, they received a failing grade. Notice that the emphasis in this system is on quality of work, not speed of work. The consequence for turning in work that was below standard was not a zero or an F, but rather the requirement that the student respect teacher feedback and submits the work again. Conversely, the reward for getting work done well the first time might well be the completion of a class before the end of the semester. A growing number of schools are giving students incentives to "get it right the first time" by working hard, respecting teachers, and learning the material. The reward? No finals, early finishes to class, and what students crave -- freedom of choice.
First, I would not be bound by the "three" rule - I've had students revise work as much as nine times - they had to learn that I was serious about quality work and that I would not "give" them a D for work that was not satisfactory. In general, I won't accept work below "B" level work, which leaves me exposed to two complaints -- students work harder than in any other class, and I have grade inflation. My response is that when students are working harder for higher a grade, that's not grade inflation, but "performance inflation."
Neither retention nor social promotions are effective. The only reasonable response to poor student performance is IMMEDIATE intervention that is decisive and effective. This requires frequent assessment, RADICAL changes in schedule where appropriate (180 minutes of literacy per day is not unusual in schools that are making dramatic improvements), appropriate use of free time (home rooms, study halls, academic advisories, and lunch) to give students freedom of choice on how to use that time only when they have earned it. "The price of freedom is proficiency."
Thanks very much for your very thoughtful question. First, I want to express my complete agreement with your staff. We ALL agree that responsibility and citizenship is a good idea. Kids SHOULD have appropriate consequences for failure to do homework, complete projects, and doing badly on tests. The only question is how BEST to encourage responsibility and citizenship and what the appropriate consequences should be.
This is very important - we are not starting with the perspective of "I'm right and the teachers are wrong." We're starting with the perspective that you, your staff, and I all love kids, care about them, and want them to grow up with a good sense of responsibility and citizenship.
Now that we are starting from common ground, let's ask some questions:
1) Are our present practices leading students to improve their rates of homework completion and classroom success? If so, then let's just check the data - what was the percentage of failures five years ago? Three years ago? Last year? If our strategies are effective, I would expect that the failures - particularly failures due to the failure to complete homework, are declining significantly. But that's not, in fact, what I see around the country. The typical grading practices - zeroes for missing work, refusing to take late work, refusing to allow students to resubmit work, use of the average - are not providing improved performance. In fact, teachers complain to me all the time that students are not completing work, that they are disengaged and non-responsive. In other words, if our goal is improved citizenship and responsibility, what we are doing now apparently is not working very well.
2) What alternatives have we tried? In almost every school, I find wide variation in teacher grading practices. There are some teachers who, quietly and almost anonymously, have been experimenting with different practices. Before you consider anything I have to say, conduct a "treasure hunt" by analyzing those classes where failure rates have declined and achievement has improved. Look in different departments around the district where success is high - drivers education, music, computer programming? What do those areas have in common that we can learn from? One thing that I know is true in all three is that when you make a mistake, it doesn't lead to failure, but rather to listening to teacher feedback, respecting teacher feedback, improving performance, and ultimately passing the assessment.
3) What will be our criteria for decision? Can we at least agree that even if people are skeptical, we'll let the evidence be our guide? I've worked in very remote parts of Africa where people did not believe that vaccinations were effective. They didn't want to see my studies or hear a lecture on western medicine. But they were willing to look at children who lived or avoided horrible life-long disabilities because they didn't get polio (it's still rampant in parts of the developing world). The evidence, not my beliefs or their beliefs, ultimately allowed for more vaccinations. So in our schools, can we agree that even if we're not sure, we'll at least try some experiments, and then let the evidence decide? I think that teachers are smart - they care about kids and love them. But they are skeptical because they don't like to see another "hot idea" come and go. So, let's take our time, try it out, but let's also have the intellectual integrity to let the evidence and not personal feelings, decide.
4) Can we agree on some fundamental boundaries? Even if we disagree on policy, can we agree on values such as fairness? Can we agree that grading practices should not be based upon subjective appraisals that can be influenced by gender, race, economic status, or parent activism? Can we agree that the central purpose of feedback, including grades, is the improvement of student achievement?
5) What's in it for the teachers? Can we agree that if we can improve policies that will reduce our failure rate, that we would have happier, more engaged, and better behaved students? Can we agree that if we have fewer students repeating grades and courses, we'll have fewer angry and bored students?
6) What's in it for the school and community? Can we agree that if we have fewer students repeating math and English, that ultimately we'll have more opportunities for art, music, technology, service learning, and other things that both students and teachers find engaging and worthwhile?
Once we have settled these questions, let's try some experiments. I'm not saying I have all the answers, but perhaps different teachers would try different things. Some might just eliminate the zero. Some might stop the average. Others might try a "menu" system such as I use, where the consequence for missing work of blowing a test is selecting other items from the menu. Others might experiment with rewards for work that is on time or early rather than punishment for work that is late.
In other words, I'm not asking you to use MY system, but rather that you use your good judgment and the thoughtful good will of your colleagues to: a) admit that what we are doing now could be improved and b) experiment with different ideas that improve achievement and reduce failures and c) agree that the final school-wide decision will be based on evidence and not personal prejudices.
As a math teacher myself, I understand the quandary. We want students to know the math and also to understand the context of the problem. I used to tell my students, whether they were in elementary school or graduate school, “Mathematics is about describing the universe using numbers, symbols, and words, and we're going to use all three of those this year.” I tried to make it clear that this is about communication—not just about calculation—and it sounds as if your math teachers have similar feelings.
Now to the grading issue. I love your idea of distinguishing between academic content and “listening and following directions” but we would all acknowledge that on a test and in life, students must do BOTH—it's not an either/or proposition. Therefore, the report card is a great idea and the math teacher who insists that students label their problems has a good idea. I don't even mind if they take points off—BUT BUT BUT—those deductions should be in PENCIL, and the student should, prior to turning work in, conduct a self-assessment. In some cases, I have seen a checklist by the teacher's desk and in other cases (including all of my classes) there was a checklist on top of the test that the student had to complete. It included things such as “Name on paper” and could also include “Labels on each answer.” If a student still fails to label the answer correctly, the appropriate response is not a failing grade, but the requirement that the student do it again—that's why the deduction is in pencil. The lesson I want them to learn is not “I'm a failure in math” but rather “I'd better label my answers or it's a lot of extra work so I might as well do it right the first time.” When the student answers the problem correctly, the deduction can be removed. It's the BEHAVIOR of labeling the problem that I want to reinforce, and my experience suggests that in some cases, it takes more than one-trial learning for that to take place.
Pre and post tests should contain items that are parallel - that is, similar in format and difficulty - but not identical. In math, that's pretty straight-forward - keep the item format nearly identical, but change the numbers. Writing is more tricky, because as your colleague notes, student performance is a combination of writing and their knowledge of the prompt. One way to maximize the impact of writing (and minimize the impact of student knowledge of the prompt content) is to increase the number of prompts - perhaps giving students a choice of three prompts for both the pre- and post- test. When students have choice, they have better engagement. Moreover, when teachers are looking at essays with different prompts and content, they are more likely to focus exclusively on the matter at hand - writing - not the student's knowledge of content from the prompt. There are a number of resources on improved classroom assessment, and I would particularly consider the work of Tom Guskey, Jane Bailey, Bob Marzano (see especially his new book on Classroom Assessment and Grading That Work), and Rick Stiggins. One of my books, Making Standards Work (3rd edition) might be useful to you as well, particularly on the issue of creating assessments that are engaging for students. I'm also publishing new book on assessment this fall that will chapters by Marzano, Stiggins, Guskey, and several other leaders in the field.
I do want to caution that when we try to make classroom assessments too perfect, we make some unfortunate trade-offs. For example, you get higher statistical reliability with more items, but I've seen students and teachers get crushed by 80-item 2-hour tests because a professor told them they need to be "reliable and valid." But a psychometrically perfect assessment that alienates kids and teachers and is so long that the results are not delivered to the students in a timely way is simply a waste of time. Better to do mini-assessments - 12 or 15 items - that may not have the statistical perfection of 80 items, but where students get same-day or next-day feedback, and teachers use the data in real time to improve instruction. You might get more "perfect" assessments by purchasing them from test companies, but then you're simply paying for the professional development of the employees of test companies instead of using those resources to help teachers design and evaluate god assessments. When teachers create the assessments collaboratively and score them together in a fair and consistent manner, they not only gain the value of accurate and timely feedback for students, but they also are much more likely to have alignment between instruction, curriculum, standards, and assessment. Teachers will also be emotionally and professional engaged in assessment rather than having assessment staked on to the top of an already overflowing plate.
Thanks very much for your thoughtful and heart-felt message. We both agree on the need to instill discipline and work ethic in students. I also know that we both are seeking the most effective way to do this. I'm not an ivory-tower theoretician. I'm a teacher and parent genuinely seeking the best way to motivate and engage students to achieve higher levels of performance and personal responsibility. I wish that my 8th grade son, the youngest of four, had teachers as thoughtful and committed as you.
So, let's see if we can find some common ground. For example, we both agree that we should not "give credit for no work" - so how about the use of the zero on a four-point scale rather than on the 100-point scale. This would convey the same message about the penalty for no work that we both agree is important, while not inflating the value of the D - truly awful work - in comparison to no work.
How about a menu system in which the penalty for not submitting work on time is simply more work? When they fail to submit work, the "punishment" is not a zero, but actually doing work - and not copying from their peers who submitted work on time, but selecting different work from the assignment menu.
How about a reward system for submitting work on time? In my classes, I used to provide 50 points for the first draft and 50 points for the final draft. But in the unlikely event that someone submitted a pluperfect essay on the first submission, I wrote "100 points - no rewrite necessary." Word quickly spread that the way to "get over" in my class was to do it right the first time.
I'm genuinely seeking solutions here, and I do not have the conceit that I have solved this very vexing dilemma. Thanks for taking the time to engage this very challenging issue.
Leadership
First, congratulations on your terrific initiative! You clearly have the right components in place. You are looking at student work and you are committed to collaboration. Moreover, you have created a safe way for teachers to share student work because you are making the submissions anonymous so that neither the identity of the student nor the teacher is disclosed.
You asked for some ways of adding structure to this process, so let me share what I do in these circumstances.
First, I hand out a single piece of student work—remember that all of the samples started in life as “proficient“—and ask each teacher to evaluate it ALONE, using the scoring rubric that was used for the assessment. I then track the percentage of faculty members who rated that piece of work Exemplary, Proficient, Progressing, or Not Meeting Standards. I note the percentage of agreement—typically fairly low.
Second, I give teachers the opportunity to collaboratively score the assessment in groups of two or three, and make any revisions they wish. I note the percentage of agreement again—typically a little bit better.
Third, we have a group meeting to revise the scoring rubric. The theme of this meetings is “the enemy is not each other; the enemy is ambiguity.” Where there is a disagreement it is not the fault of the teacher, but the fault of an ambiguous scoring rubric. We use a collaborative process to revise and improve the scoring rubric, making it more specific. Then we score the same piece of work AGAIN. The third time, the level of agreement is always higher than the first two times.
Finally, we look at this process from the student's ponit of view. What would they think of the first level of scoring? Pretty unfair, right? As we worked together, collaborated, and refined the rubric, our fairness improved. This process is not about some state mandate, but about our fundamental shared value of FAIRNESS.
In most schools, this takes ALL YEAR LONG - it's not a one-short staff development meeting. In some districts, they open EVERY faculty meeting, including cabinet meetings at the central office, with collaborative scoring of student work. They learn that the faster they reach 80% consensus, the faster they move on to other agenda items.
School System Improvement
The original schools were in Milwaukee—high poverty, high minority, high second language. Since then, the 90-90-90 techniques have been replicated all over the country with multiple language and ethnic groups. The school in California that had the highest achievement gains for two consecutive years, Mead Valley Elementary, was 100% poverty and 99% Latino and second language, and used 90-90-90 techniques. It's also been replicated in deep south rural schools, Native American schools, and many other schools with diverse ethnic and linguistic backgrounds.
Standards
Thanks very much for your inquiry. We have provided some examples of Power Standards, but they are only for illustration purposes. There is no substitute for your OWN faculty members going through your standards and asking:
1) What endures?
2) What has leverage?
3) What is really necessary for success at the next grade level?
Then, of course, the job is not done. The next step is to agree on common assessments that all faculty members agree to use so that the focus on these Power Standards has meaning and consistency.
New York tried to do a report card including every standard, and it took them 12 pages for 4th graders. That has, thankfully, been abandoned. It just alienates parents and teachers alike. What is quite reasonable, however, is to have the report card reflect seven to ten standards that are absolutely vital for future success. In that way, even if a student gets (through the alchemy of the grading process) a C or a B, the parents will still know if the student is not succeeding in critical areas.
You can find some examples of language arts standards in the book Reason to Write (Simon & Schuster, 2002).
You asked for the 6th grade math Power Standards. They are:
1) Number operations with and without a calculator.
2) Tables, charts, graphs—create them given a data set, and draw inferences from them once they are completed.
3) Fraction and decimal operations—again with and without a calculator.
4) Measurement in English and metric units.
5) Given a story problem, convert it to an accurate picture
6) Properties of a triangle and rectangle.
7) Two-dimensional scale drawing.
I'm fully aware that there are many other things in the 6th grade curriculum, but every 7th and 8th grade teacher I've ever interviewed has said they would be happier if students could do these seven things rather than be “checked off” on 80 other things that are in the standards.




