Writing assessment refers to an area of study that contains theories and practices that guide the evaluation of a writer's performance or potential through a writing task. Writing assessment can be considered a combination of scholarship from composition studies and measurement theory within educational assessment. Writing assessment can also refer to the technologies and practices used to evaluate student writing and learning.
Writing assessment began as a classroom practice during the first two decades of the 20th century, though high-stakes and standardized tests also emerged during this time. During the 1930s, College Board shifted from using direct writing assessment to indirect assessment because these tests were more cost-effective and were believed to be more reliable. Starting in the 1950s, more students from diverse backgrounds were attending colleges and universities, so administrators made use of standardized testing to decide where these students should be placed, what and how to teach them, and how to measure that they learned what they needed to learn. The large-scale statewide writing assessments that developed during this time combined direct writing assessment with multiple-choice items, a practice that remains dominant today across U.S. large scale testing programs, such as the SAT and GRE. These assessments usually take place outside of the classroom, at the state and national level. However, as more and more students were placed into courses based on their standardized testing scores, writing teachers began to notice a conflict between what students were being tested on—grammar, usage, and vocabulary—and what the teachers were actually teaching—writing process and revision. Because of this divide, educators began pushing for writing assessments that were designed and implemented at the local, programmatic and classroom levels. As writing teachers began designing local assessments, the methods of assessment began to diversify, resulting in timed essay tests, locally designed rubrics, and portfolios. In addition to the classroom and programmatic levels, writing assessment is also hugely influential on writing centers for writing center assessment, and similar academic support centers.
Because writing assessment is used in multiple contexts, the history of writing assessment can be traced through examining specific concepts and situations that prompt major shifts in theories and practices. Writing assessment scholars do not always agree about the origin of writing assessment.
In "Looking Back as We Look Forward: Historicizing Writing Assessment as a Rhetorical Act," Kathleen Blake Yancey offers a history of writing assessment by tracing three major shifts in methods used in assessing writing. She describes the three major shifts through the metaphor of overlapping waves: "with one wave feeding into another but without completely displacing waves that came before". In other words, the theories and practices from each wave are still present in some current contexts, but each wave marks the prominent theories and practices of the time.
The first wave of writing assessment (1950-1970) sought objective tests with indirect measures of assessment. The second wave (1970-1986) focused on holistically scored tests where the students' actual writing began to be assessed. And the third wave (since 1986) shifted toward assessing a collection of student work (i.e. portfolio assessment) and programmatic assessment.
Bob Broad in What We Really Value points to the publication of Factors in Judgments of Writing Ability in 1961 by Diederich, French, and Carlton as the birth of modern writing assessment. Diederich, French, and Carlton based much of their book on research conducted through the Educational Testing Service (ETS) for the previous decade. This book is an attempt to standardize the assessment of writing and, according to Broad, created a base of research in writing assessment.
In 1897 Dr.J M Rice of America proved through research that subjective test and essay type test are not reliable, so as result came objective type test .
Validity and reliability
Yancey traces the major shifts in writing assessment by pointing toward each wave's swing toward or away from the concepts of validity and reliability. Peggy O'Neill, Cindy Moore, and Brian Huot explain in A Guide To College Writing Assessment that reliability and validity are the most important terms in discussing best practices in writing assessment.
In the first wave of writing assessment, the emphasis is on reliability: reliability confronts questions over the consistency of a test. In this wave, the central concern was to assess writing with the best predictability with the least amount of cost and work.
The shift toward the second wave marked a move toward considering principles of validity. Validity confronts questions over a test's appropriateness and effectiveness for the given purpose. Methods in this wave were more concerned with a test's construct validity: whether the material prompted from a test is an appropriate measure of what the test purports to measure. Teachers began to see an incongruence between the material being prompted to measure writing and the material teachers were asking students to write. Holistic scoring, championed by Edward M. White, emerged in this wave. It is one method of assessment where students' writing is prompted to measure their writing ability.
The third wave of writing assessment emerges with continued interest in the validity of assessment methods. This wave began to consider an expanded definition of validity that includes how portfolio assessment contributes to learning and teaching. In this wave, portfolio assessment emerges to emphasize theories and practices in Composition and Writing Studies such as revision, drafting, and process.
Direct and indirect assessment
Indirect writing assessments typically consist of multiple choice tests on grammar, usage, and vocabulary. Examples include high-stakes standardized tests such as the ACT, SAT, and GRE, which are most often used by colleges and universities for admissions purposes. Other indirect assessments, such as Compass and Accuplacer, are used to place students into remedial or mainstream writing courses. Direct writing assessments, like the timed essay test, require at least one sample of student writing and are viewed by many writing assessment scholars as more valid than indirect tests because they are assessing actual samples of writing. Portfolio assessment, which generally consists of several pieces of student writing written over the course of a semester, began to replace timed essays during the late 1980s and early 1990s. Portfolio assessment is viewed as being even more valid than timed essay tests because it focuses on multiple samples of student writing that have been composed in the authentic context of the classroom. Portfolios enable assessors to examine multiple samples of student writing and multiple drafts of a single essay.
This section needs expansion. You can help by adding to it.(October 2016)
Methods of writing assessment vary depending on the context and type of assessment. The following is an incomplete list of writing assessments frequently administered:
Portfolio assessment is typically used to assess what students have learned at the end of a course or over a period of several years. Course portfolios consist of multiple samples of student writing and a reflective letter or essay in which students describe their writing and work for the course. "Showcase portfolios" contain final drafts of student writing, and "process portfolios" contain multiple drafts of each piece of writing. Both print and electronic portfolios can be either showcase or process portfolios, though electronic portfolios typically contain hyperlinks from the reflective essay or letter to samples of student work and, sometimes, outside sources.
Timed essay tests were developed as an alternative to multiple choice, indirect writing assessments. Timed essay tests are often used to place students into writing courses appropriate for their skill level. These tests are usually proctored, meaning that testing takes place in a specific location in which students are given a prompt to write in response to within a set time limit. The SAT and GRE both contain timed essay portions.
A rubric is a tool used in writing assessment that can be used in several writing contexts. A rubric consists of a set of criteria or descriptions that guides a rater to score or grade a writer. The origins of rubrics can be traced to early attempts in education to standardize and scale writing in the early 20th century. Ernest C Noyes argues in November 1912 for a shift toward assessment practices that were more science-based. One of the original scales used in education was developed by Milo B. Hillegas in A Scale for the Measurement of Quality in English Composition by Young People. This scale is commonly referred to as the Hillegas Scale. The Hillegas Scale and other scales used in education were used by administrators to compare the progress of schools.
In 1961, Diederich, French, and Carlton from the Educational Testing Service (ETS) publish Factors in Judgments for Writing Ability a rubric compiled from a series of raters whose comments were categorized and condensed into a five-factor rubric:
- Ideas: relevance, clarity, quantity, development, persuasiveness
- Form: Organization and analysis
- Flavor: style, interest, sincerity
- Mechanics: specific errors in punctuation, grammar, etc.
- Wording: choice and arrangement of words
As rubrics began to be used in the classroom, teachers began to advocate for criteria to be negotiated with students to have students stake a claim in the how they would be assessed. Scholars such as Chris Gallagher and Eric Turley, Bob Broad, and Asao Inoue (among many) have advocated that effective use of rubrics comes from local, contextual, and negotiated criteria.
Multiple-choice tests contain questions about usage, grammar, and vocabulary. Standardized tests like the SAT, ACT, and GRE are typically used for college or graduate school admission. Other tests, such as Compass and Accuplacer, are typically used to place students into remedial or mainstream writing courses.
Automated essay scoring
Automated essay scoring (AES) is the use of non-human, computer-assisted assessment practices to rate, score, or grade writing tasks.
Some scholars in writing assessment focus their research on the influence of race on the performance on writing assessments. Scholarship in race and writing assessment seek to study how categories of race and perceptions of race continues to shape writing assessment outcomes. However, scholars in writing assessment recognize that racism in the 21st century is no longer explicit, but point out that writing assessment practices are silently racist. Nicholas Behm and Keith D. Miller in "Challenging the Frameworks of Color-Blind Racism: Why We Need a Fourth Wave of Writing Assessment Scholarship" advocate for the recognition of another wave after the three that Yancey offers. Behm and Miller advocate for a wave where the intersections of race and writing assessment are brought to the forefront of assessment practices. As the authors explain, racial inequalities in writing assessment are typically justified with non-racial reasons.
- ^Behizadeh, Nadia and George Engelhard Jr. "Historical View of the influences of measurement and writing theories on the practice of writing assessment in the United States" Assessing Writing 16 (2011) 189-211.
- ^Huot, B. & Neal, M. (2006). Writing assessment: A techno-history. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of Writing Research (pp. 417-432). New York, NY: Guilford Press.
- ^ abcBehizadeh, Nadia and George Engelhard Jr. "Historical View of the influences of measurement and writing theories on the practice of writing assessment in the United States" Assessing Writing 16 (2011) 189-211
- ^ abcdefghYancey, Kathleen Blake. "Looking Back as We Look Forward: Historicizing Writing Assessment as a Rhetorical Act." College Composition and Communication. 50.3 (1999): 483-503. Web. 23 Feb. 2013.
- ^Huot, Brian. (Re)Articulating Writing Assessment for Teaching and Learning. Logan, Utah: Utah State UP, 2002.
- ^Bell, James H. (2001). "When Hard Questions Are Asked: Evaluating Writing Centers". The Writing Center Journal. 21.1: 7–28.
- ^Broad, Bob. What we Really Value: Beyond Rubrics in Teaching and Assessing Writing. Logan, UT: Utah State University Press, 2003. Print
- ^Diederich, P.G.; French, J. W.; Carlton, S. T. (1961) Factors in Judgments of Writing Ability. Princeton, NJ: Educational Testing Service
- ^Yancey, Kathleen Blake. "Looking Back as We Look Forward"
- ^O'Neill, Peggy, Cindy Moore, and Brian Huot. A Guide to College Writing Assessment. Logan, UT: Utah State University Press, 2009. Print.
- ^Yancey, Kathleen Blake. "Looking back as We Look Forward"
- ^"Holisticism." College Composition and Communication, 35 (December, 1984): 400-409.
- ^Emmons, Kimberly. "Rethinking Genres of Reflection: Student Portfolio Cover Letters and the Narrative of Progress." Composition Studies 31.1 (2003): 43-62.
- ^ abNeal, Michael. Writing Assessment and the Revolution in Digital Texts and Technologies. NY: Teachers College, 2011.
- ^White, Edward. "The Scoring of Writing Portfolios: Phase 2." College Composition and Communication 56.4 (2005): 581-599.
- ^ abYancey, Kathleen. "Postmodernism, Palimpsest, and Portfolios: Theoretical Issues in the Representation of Student Work." ePortfolio Performance Support Systems: Constructing, Presenting, and Assessing Portfolios. Eds Katherine V. Wills and Rich Rice. Fort Collins, Colorado: WAC Clearinghouse. Web. 16 November 2013.
- ^Turley, Eric D. and Chris Gallagher. "On the 'Uses' of Rubrics: Reframing the Great Rubric Debate" The English Journal Vol 97. No. 4. (Mar. 2008) pp 87-92.
- ^Diederich, P.G.; French, J. W.; Carlton, S. T. (1961) Factors in Judgments of Writing Ability.
- ^Turley, Eric D. and Chris Gallagher. "On the 'Uses' of Rubrics: Reframing the Great Rubric Debate"
- ^Broad, Bob. What we Really Value: Beyond Rubrics in Teaching and Assessing Writing
- ^Inoue, Asao B. "Community-based Assessment Pedagogy." Assessing Writing. 9 (2005): 208-38. Web. 23 Feb 2013.
- ^Bonilla-Silva, Eduardo. Racism Without Racists: Color-Blind Racism and the Persistence of Racial Inequality in the United States. Lanham, MD: Rowman & LittleField Publishers, Inc., 2006. Print.
- ^Behm, Nicholas, and Keith D. Miller. "Challenging the Frameworks of Color-blind Racism: Why We Need a Fourth Wave of Writing Assessment Scholarship." Race and Writing Assessment. Asao B. Inoue, and Mya Poe, eds. NYC: Peter Lang Publishing, 2012. 127-38. Print.
This article is about educational assessment, including the work of institutional researchers. For other uses of the term assessment, see Assessment (disambiguation).
Educational assessment is the systematic process of documenting and using empirical data on the knowledge, skill, attitudes, and beliefs to refine programs and improve student learning. Assessment data can be obtained from directly examining student work to assess the achievement of learning outcomes or can be based on data from which one can make inferences about learning. Assessment is often used interchangeably with test, but not limited to tests. Assessment can focus on the individual learner, the learning community (class, workshop, or other organized group of learners), a course, an academic program, the institution, or the educational system as a whole (also known as granularity). The word 'assessment' came into use in an educational context after the Second World War.
As a continuous process, assessment establishes measurable and clear student learning outcomes for learning, provisioning a sufficient amount of learning opportunities to achieve these outcomes, implementing a systematic way of gathering, analyzing and interpreting evidence to determine how well student learning matches expectations, and using the collected information to inform improvement in student learning.
The final purpose of assessment practices in education depends on the theoretical framework of the practitioners and researchers, their assumptions and beliefs about the nature of human mind, the origin of knowledge, and the process of learning.
The term assessment is generally used to refer to all activities teachers use to help students learn and to gauge student progress. Assessment can be divided for the sake of convenience using the following categorizations:
- Initial, formative, summative and diagnostic assessment
- Objective and subjective
- Referencing (criterion-referenced, norm-referenced, and ipsative)
- Informal and formal
- Internal and external
Placement, formative, summative and diagnostic
Assessment is often divided into initial, formative, and summative categories for the purpose of considering different objectives for assessment practices.
- Placement assessment – Placement evaluation is used to place students according to prior achievement or personal characteristics, at the most appropriate point in an instructional sequence, in a unique instructional strategy, or with a suitable teacher conducted through placement testing, i.e. the tests that colleges and universities use to assess college readiness and place students into their initial classes. Placement evaluation, also referred to as pre-assessment or initial assessment, is conducted prior to instruction or intervention to establish a baseline from which individual student growth can be measured. This type of an assessment is used to know what the student's skill level is about the subject. It helps the teacher to explain the material more efficiently. These assessments are not graded.
- Formative assessment – Formative assessment is generally carried out throughout a course or project. Formative assessment, also referred to as "educative assessment," is used to aid learning. In an educational setting, formative assessment might be a teacher (or peer) or the learner, providing feedback on a student's work and would not necessarily be used for grading purposes. Formative assessments can take the form of diagnostic, standardized tests, quizzes, oral question, or draft work. Formative assessments are carried out concurrently with instructions. The result may count. The formative assessments aim to see if the students understand the instruction before doing a summative assessment.
- Summative assessment – Summative assessment is generally carried out at the end of a course or project. In an educational setting, summative assessments are typically used to assign students a course grade. Summative assessments are evaluative. Summative assessments are made to summarize what the students have learned, to determine whether they understand the subject matter well. This type of assessment is typically graded (e.g. pass/fail, 0-100) and can take the form of tests, exams or projects. Summative assessments are often used to determine whether a student has passed or failed a class. A criticism of summative assessments is that they are reductive, and learners discover how well they have acquired knowledge too late for it to be of use.
- Diagnostic assessment – Diagnostic assessment deals with the whole difficulties at the end that occurs during the learning process.
Jay McTighe and Ken O'Connor proposed seven practices to effective learning. One of them is about showing the criteria of the evaluation before the test. Another is about the importance of pre-assessment to know what the skill levels of a student are before giving instructions. Giving a lot of feedback and encouraging are other practices.
Educational researcher Robert Stake explains the difference between formative and summative assessment with the following analogy:
When the cook tastes the soup, that's formative. When the guests taste the soup, that's summative.
Summative and formative assessment are often referred to in a learning context as assessment of learning and assessment for learning respectively. Assessment of learning is generally summative in nature and intended to measure learning outcomes and report those outcomes to students, parents and administrators. Assessment of learning generally occurs at the conclusion of a class, course, semester or academic year. Assessment for learning is generally formative in nature and is used by teachers to consider approaches to teaching and next steps for individual learners and the class.
A common form of formative assessment is diagnostic assessment. Diagnostic assessment measures a student's current knowledge and skills for the purpose of identifying a suitable program of learning. Self-assessment is a form of diagnostic assessment which involves students assessing themselves. Forward-looking assessment asks those being assessed to consider themselves in hypothetical future situations.
Performance-based assessment is similar to summative assessment, as it focuses on achievement. It is often aligned with the standards-based education reform and outcomes-based education movement. Though ideally they are significantly different from a traditional multiple choice test, they are most commonly associated with standards-based assessment which use free-form responses to standard questions scored by human scorers on a standards-based scale, meeting, falling below or exceeding a performance standard rather than being ranked on a curve. A well-defined task is identified and students are asked to create, produce or do something, often in settings that involve real-world application of knowledge and skills. Proficiency is demonstrated by providing an extended response. Performance formats are further differentiated into products and performances. The performance may result in a product, such as a painting, portfolio, paper or exhibition, or it may consist of a performance, such as a speech, athletic skill, musical recital or reading.
Objective and subjective
Assessment (either summative or formative) is often categorized as either objective or subjective. Objective assessment is a form of questioning which has a single correct answer. Subjective assessment is a form of questioning which may have more than one correct answer (or more than one way of expressing the correct answer). There are various types of objective and subjective questions. Objective question types include true/false answers, multiple choice, multiple-response and matching questions. Subjective questions include extended-response questions and essays. Objective assessment is well suited to the increasingly popular computerized or online assessment format.
Some have argued that the distinction between objective and subjective assessments is neither useful nor accurate because, in reality, there is no such thing as "objective" assessment. In fact, all assessments are created with inherent biases built into decisions about relevant subject matter and content, as well as cultural (class, ethnic, and gender) biases.
Basis of comparison
Test results can be compared against an established criterion, or against the performance of other students, or against previous performance:
- Criterion-referenced assessment, typically using a criterion-referenced test, as the name implies, occurs when candidates are measured against defined (and objective) criteria. Criterion-referenced assessment is often, but not always, used to establish a person's competence (whether s/he can do something). The best known example of criterion-referenced assessment is the driving test, when learner drivers are measured against a range of explicit criteria (such as "Not endangering other road users").
- Norm-referenced assessment (colloquially known as "grading on the curve"), typically using a norm-referenced test, is not measured against defined criteria. This type of assessment is relative to the student body undertaking the assessment. It is effectively a way of comparing students. The IQ test is the best known example of norm-referenced assessment. Many entrance tests (to prestigious schools or universities) are norm-referenced, permitting a fixed proportion of students to pass ("passing" in this context means being accepted into the school or university rather than an explicit level of ability). This means that standards may vary from year to year, depending on the quality of the cohort; criterion-referenced assessment does not vary from year to year (unless the criteria change).
- Ipsative assessment is self comparison either in the same domain over time, or comparative to other domains within the same student.
Informal and formal
Assessment can be either formal or informal. Formal assessment usually implies a written document, such as a test, quiz, or paper. A formal assessment is given a numerical score or grade based on student performance, whereas an informal assessment does not contribute to a student's final grade. An informal assessment usually occurs in a more casual manner and may include observation, inventories, checklists, rating scales, rubrics, performance and portfolio assessments, participation, peer and self-evaluation, and discussion.
Internal and external
Internal assessment is set and marked by the school (i.e. teachers). Students get the mark and feedback regarding the assessment. External assessment is set by the governing body, and is marked by non-biased personnel. Some external assessments give much more limited feedback in their marking. However, in tests such as Australia's NAPLAN, the criterion addressed by students is given detailed feedback in order for their teachers to address and compare the student's learning achievements and also to plan for the future.
Standards of quality
In general, high-quality assessments are considered those with a high level of reliability and validity. Approaches to reliability and validity vary, however.
Reliability relates to the consistency of an assessment. A reliable assessment is one that consistently achieves the same results with the same (or similar) cohort of students. Various factors affect reliability—including ambiguous questions, too many options within a question paper, vague marking instructions and poorly trained markers. Traditionally, the reliability of an assessment is based on the following:
- Temporal stability: Performance on a test is comparable on two or more separate occasions.
- Form equivalence: Performance among examinees is equivalent on different forms of a test based on the same content.
- Internal consistency: Responses on a test are consistent across questions. For example: In a survey that asks respondents to rate attitudes toward technology, consistency would be expected in responses to the following questions:
- "I feel very negative about computers in general."
- "I enjoy using computers."
The reliability of a measurement x can also be defined quantitatively as: where is the reliability in the observed (test) score, x; and are the variability in ‘true’ (i.e., candidate’s innate performance) and measured test scores respectively. can range from 0 (completely unreliable), to 1 (completely reliable).
Main article: Test validity
Valid assessment is one that measures what it is intended to measure. For example, it would not be valid to assess driving skills through a written test alone. A more valid way of assessing driving skills would be through a combination of tests that help determine what a driver knows, such as through a written test of driving knowledge, and what a driver is able to do, such as through a performance assessment of actual driving. Teachers frequently complain that some examinations do not properly assess the syllabus upon which the examination is based; they are, effectively, questioning the validity of the exam.
Validity of an assessment is generally gauged through examination of evidence in the following categories:
- Content – Does the content of the test measure stated objectives?
- Criterion – Do scores correlate to an outside reference? (ex: Do high scores on a 4th grade reading test accurately predict reading skill in future grades?)
- Construct – Does the assessment correspond to other significant variables? (ex: Do ESL students consistently perform differently on a writing exam than native English speakers?)
A good assessment has both validity and reliability, plus the other quality attributes noted above for a specific context and purpose. In practice, an assessment is rarely totally valid or totally reliable. A ruler which is marked wrongly will always give the same (wrong) measurements. It is very reliable, but not very valid. Asking random individuals to tell the time without looking at a clock or watch is sometimes used as an example of an assessment which is valid, but not reliable. The answers will vary between individuals, but the average answer is probably close to the actual time. In many fields, such as medical research, educational testing, and psychology, there will often be a trade-off between reliability and validity. A history test written for high validity will have many essay and fill-in-the-blank questions. It will be a good measure of mastery of the subject, but difficult to score completely accurately. A history test written for high reliability will be entirely multiple choice. It isn't as good at measuring knowledge of history, but can easily be scored with great precision. We may generalize from this. The more reliable our estimate is of what we purport to measure, the less certain we are that we are actually measuring that aspect of attainment.
It is well to distinguish between "subject-matter" validity and "predictive" validity. The former, used widely in education, predicts the score a student would get on a similar test but with different questions. The latter, used widely in the workplace, predicts performance. Thus, a subject-matter-valid test of knowledge of driving rules is appropriate while a predictively valid test would assess whether the potential driver could follow those rules.
In the field of evaluation, and in particular educational evaluation, the Joint Committee on Standards for Educational Evaluation has published three sets of standards for evaluations. "The Personnel Evaluation Standards" was published in 1988, The Program Evaluation Standards (2nd edition) was published in 1994, and The Student Evaluation Standards was published in 2003.
Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing and improving the identified form of evaluation. Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will provide sound, accurate, and credible information about student learning and performance.
Summary table of the main theoretical frameworks
The following table summarizes the main theoretical frameworks behind almost all the theoretical and research work, and the instructional practices in education (one of them being, of course, the practice of assessment). These different frameworks have given rise to interesting debates among scholars.
|Philosophical orientation||Hume: British empiricism||Kant, Descartes: Continental rationalism||Hegel, Marx: cultural dialectic|
|Metaphorical orientation||Mechanistic/Operation of a Machine or Computer||Organismic/Growth of a Plant||Contextualist/Examination of a Historical Event|
|Leading theorists||B. F. Skinner (behaviorism)/ Herb Simon, John Anderson, Robert Gagné: (cognitivism)||Jean Piaget/Robbie Case||Lev Vygotsky, Luria, Bruner/Alan Collins, Jim Greeno, Ann Brown, John Bransford|
|Nature of mind||Initially blank device that detects patterns in the world and operates on them. Qualitatively identical to lower animals, but quantitatively superior.||Organ that evolved to acquire knowledge by making sense of the world. Uniquely human, qualitatively different from lower animals.||Unique among species for developing language, tools, and education.|
|Nature of knowledge|
|Hierarchically organized associations that present an accurate but incomplete representation of the world. Assumes that the sum of the components of knowledge is the same as the whole. Because knowledge is accurately represented by components, one who demonstrates those components is presumed to know||General and/or specific cognitive and conceptual structures, constructed by the mind and according to rational criteria. Essentially these are the higher-level structures that are constructed to assimilate new info to existing structure and as the structures accommodate more new info. Knowledge is represented by ability to solve new problems.||Distributed across people, communities, and physical environment. Represents culture of community that continues to create it. To know means to be attuned to the constraints and affordances of systems in which activity occurs. Knowledge is represented in the regularities of successful activity.|
|Nature of learning (the process by which knowledge is increased or modified)||Forming and strengthening cognitive or S-R associations. Generation of knowledge by (1) exposure to pattern, (2) efficiently recognizing and responding to pattern (3) recognizing patterns in other contexts.||Engaging in active process of making sense of ("rationalizing") the environment. Mind applying existing structure to new experience to rationalize it. You don't really learn the components, only structures needed to deal with those components later.||Increasing ability to participate in a particular community of practice. Initiation into the life of a group, strengthening ability to participate by becoming attuned to constraints and affordances.|
|Features of authentic assessment||Assess knowledge components. Focus on mastery of many components and fluency. Use psychometrics to standardize.||Assess extended performance on new problems. Credit varieties of excellence.||Assess participation in inquiry and social practices of learning (e.g. portfolios, observations) Students should participate in assessment process. Assessments should be integrated into larger environment.|
Concerns over how best to apply assessment practices across public school systems have largely focused on questions about the use of high-stakes testing and standardized tests, often used to gauge student progress, teacher quality, and school-, district-, or statewide educational success.
No Child Left Behind
For most researchers and practitioners, the question is not whether tests should be administered at all—there is a general consensus that, when administered in useful ways, tests can offer useful information about student progress and curriculum implementation, as well as offering formative uses for learners. The real issue, then, is whether testing practices as currently implemented can provide these services for educators and students.
In the U.S., the No Child Left Behind Act mandates standardized testing nationwide. These tests align with state curriculum and link teacher, student, district, and state accountability to the results of these tests. Proponents of NCLB argue that it offers a tangible method of gauging educational success, holding teachers and schools accountable for failing scores, and closing the achievement gap across class and ethnicity.
Opponents of standardized testing dispute these claims, arguing that holding educators accountable for test results leads to the practice of "teaching to the test." Additionally, many argue that the focus on standardized testing encourages teachers to equip students with a narrow set of skills that enhance test performance without actually fostering a deeper understanding of subject matter or key principles within a knowledge domain.
Main article: High-stakes testing
The assessments which have caused the most controversy in the U.S. are the use of high school graduation examinations, which are used to deny diplomas to students who have attended high school for four years, but cannot demonstrate that they have learned the required material when writing exams. Opponents say that no student who has put in four years of seat time should be denied a high school diploma merely for repeatedly failing a test, or even for not knowing the required material.
High-stakes tests have been blamed for causing sickness and test anxiety in students and teachers, and for teachers choosing to narrow the curriculum towards what the teacher believes will be tested. In an exercise designed to make children comfortable about testing, a Spokane, Washington newspaper published a picture of a monster that feeds on fear. The published image is purportedly the response of a student who was asked to draw a picture of what she thought of the state assessment.
Other critics, such as Washington State University's Don Orlich, question the use of test items far beyond standard cognitive levels for students' age.
Compared to portfolio assessments, simple multiple-choice tests are much less expensive, less prone to disagreement between scorers, and can be scored quickly enough to be returned before the end of the school year. Standardized tests (all students take the same test under the same conditions) often use multiple-choice tests for these reasons. Orlich criticizes the use of expensive, holistically graded tests, rather than inexpensive multiple-choice "bubble tests", to measure the quality of both the system and individuals for very large numbers of students. Other prominent critics of high-stakes testing include Fairtest and Alfie Kohn.
The use of IQ tests has been banned in some states for educational decisions, and norm-referenced tests, which rank students from "best" to "worst", have been criticized for bias against minorities. Most education officials support criterion-referenced tests (each individual student's score depends solely on whether he answered the questions correctly, regardless of whether his neighbors did better or worse) for making high-stakes decisions.
21st century assessment
It has been widely noted that with the emergence of social media and Web 2.0 technologies and mindsets, learning is increasingly collaborative and knowledge increasingly distributed across many members of a learning community. Traditional assessment practices, however, focus in large part on the individual and fail to account for knowledge-building and learning in context. As researchers in the field of assessment consider the cultural shifts that arise from the emergence of a more participatory culture, they will need to find new methods of applying assessments to learners.
Assessment in a democratic school
Sudbury model of democratic education schools do not perform and do not offer assessments, evaluations, transcripts, or recommendations, asserting that they do not rate people, and that school is not a judge; comparing students to each other, or to some standard that has been set is for them a violation of the student's right to privacy and to self-determination. Students decide for themselves how to measure their progress as self-starting learners as a process of self-evaluation: real lifelong learning and the proper educational assessment for the 21st century, they adduce.
According to Sudbury schools, this policy does not cause harm to their students as they move on to life outside the school. However, they admit it makes the process more difficult, but that such hardship is part of the students learning to make their own way, set their own standards and meet their own goals.
The no-grading and no-rating policy helps to create an atmosphere free of competition among students or battles for adult approval, and encourages a positive cooperative environment amongst the student body.
The final stage of a Sudbury education, should the student choose to take it, is the graduation thesis. Each student writes on the topic of how they have prepared themselves for adulthood and entering the community at large. This thesis is submitted to the Assembly, who reviews it. The final stage of the thesis process is an oral defense given by the student in which they open the floor for questions, challenges and comments from all Assembly members. At the end, the Assembly votes by secret ballot on whether or not to award a diploma.
Assessing ELL students
A major concern with the use of educational assessments is the overall validity, accuracy, and fairness when it comes to assessing English language learners (ELL). The majority of assessments within the United States have normative standards based on the English-speaking culture, which does not adequately represent ELL populations. Consequently, it would in many cases be inaccurate and inappropriate to draw conclusions from ELL students’ normative scores. Research shows that the majority of schools do not appropriately modify assessments in order to accommodate students from unique cultural backgrounds. This has resulted in the over-referral of ELL students to special education, causing them to be disproportionately represented in special education programs. Although some may see this inappropriate placement in special education as supportive and helpful, research has shown that inappropriately placed students actually regressed in progress.
It is often necessary to utilize the services of a translator in order to administer the assessment in an ELL student’s native language; however, there are several issues when translating assessment items. One issue is that translations can frequently suggest a correct or expected response, changing the difficulty of the assessment item. Additionally, the translation of assessment items can sometimes distort the original meaning of the item. Finally, many translators are not qualified or properly trained to work with ELL students in an assessment situation. All of these factors compromise the validity and fairness of assessments, making the results not reliable. Nonverbal assessments have shown to be less discriminatory for ELL students, however, some still present cultural biases within the assessment items.
When considering an ELL student for special education the assessment team should integrate and interpret all of the information collected in order to ensure a non biased conclusion. The decision should be based on multidimensional sources of data including teacher and parent interviews, as well as classroom observations. Decisions should take the students unique cultural, linguistic, and experiential backgrounds into consideration, and should not be strictly based on assessment results.
- ^Allen, M.J. (2004). Assessing Academic Programs in Higher Education. San Francisco: Jossey-Bass.
- ^Kuh, G.D.; Jankowski, N.; Ikenberry, S.O. (2014). Knowing What Students Know and Can Do: The Current State of Learning Outcomes Assessment in U.S. Colleges and Universities(PDF). Urbana: University of Illinois and Indiana University, National Institute for Learning Outcomes Assessment.
- ^National council on Measurement in Education http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorA
- ^Nelson, Robert; Dawson, Phillip (2014). "A contribution to the history of assessment: how a conversation simulator redeems Socratic method". Assessment & Evaluation in Higher Education. 39 (2): 195–204. doi:10.1080/02602938.2013.798394.
- ^Suskie, Linda (2004). Assessing Student Learning. Bolton, MA: Anker.
- ^Black, Paul, & William, Dylan (October 1998). "Inside the Black Box: Raising Standards Through Classroom Assessment."Phi Beta Kappan. Available at PDKintl.org. Retrieved January 28, 2009.
- ^ abcdMctighe, Jay; O'Connor, Ken (November 2005). "Seven practices for effective learning". Educational Leadership. 63 (3): 10–17. Retrieved 3 March 2017.
- ^Scriven, M. (1991). Evaluation thesaurus. 4th ed. Newbury Park, CA:Sage Publications. ISBN 0-8039-4364-4.
- ^Earl, Lorna (2003). Assessment as Learning: Using Classroom Assessment to Maximise Student Learning. Thousand Oaks, CA, Corwin Press. ISBN 0-7619-4626-8
- ^Reed, Daniel. "Diagnostic Assessment in Language Teaching and Learning." Center for Language Education and Research, available at Google.com. Retrieved January 28, 2009.
- ^Joint Information Systems Committee (JISC). "What Do We Mean by e-Assessment?" JISC InfoNet. Retrieved January 29, 2009 from http://tools.jiscinfonet.ac.uk/downloads/vle/eassessment-printable.pdf
- ^Educational Technologies at Virginia Tech. "Assessment Purposes." VirginiaTech DesignShop: Lessons in Effective Teaching, available at Edtech.vt.eduArchived 2009-02-26 at the Wayback Machine.. Retrieved January 29, 2009.
- ^Valencia, Sheila W. "What Are the Different Forms of Authentic Assessment?" Understanding Authentic Classroom-Based Literacy Assessment (1997), available at Eduplace.com. Retrieved January 29, 2009.
- ^Yu, Chong Ho (2005). "Reliability and Validity." Educational Assessment. Available at Creative-wisdom.com. Retrieved January 29, 2009.
- ^Moskal, Barbara M., & Leydens, Jon A (2000). "Scoring Rubric Development: Validity and Reliability." Practical Assessment, Research & Evaluation, 7(10). Retrieved January 30, 2009.
- ^Joint Committee on Standards for Educational Evaluation. (1988). "The Personnel Evaluation Standards: How to Assess Systems for Evaluating Educators". Newbury Park, CA: Sage Publications.
- ^Joint Committee on Standards for Educational Evaluation. (1994).The Program Evaluation Standards, 2nd Edition. Newbury Park, CA: Sage Publications.
- ^Committee on Standards for Educational Evaluation. (2003). The Student Evaluation Standards: How to Improve Evaluations of Students. Newbury Park, CA: Corwin Press.
- ^American Psychological Association. "Appropriate Use of High-Stakes Testing in Our Nation's Schools." APA Online, available at APA.org, Retrieved January 24, 2010
- ^(nd) Reauthorization of NCLB. Department of Education. Retrieved 1/29/09.
- ^(nd) What's Wrong With Standardized Testing? FairTest.org. Retrieved January 29, 2009.
- ^Dang, Nick (18 March 2003). "Reform education, not exit exams". Daily Bruin. [permanent dead link]
- ^Weinkopf, Chris (2002). "Blame the test: LAUSD denies responsibility for low scores". Daily News.
- ^"Blaming The Test". Investor's Business Daily. 11 May 2006. [permanent dead link]
- ^ abBach, Deborah, & Blanchard, Jessica (April 19, 2005). "WASL worries stress kids, schools." Seattle Post-Intelligencer. Retrieved January 30, 2009 from Seattlepi.nwsource.com.
- ^Fadel, Charles, Honey, Margaret, & Pasnik, Shelley (May 18, 2007). "Assessment in the Age of Innovation." Education Week. Retrieved January 29, 2009 from http://www.edweek.org/ew/articles/2007/05/23/38fadel.h26.html
- ^Greenberg, D. (2000). 21st Century Schools, edited transcript of a talk delivered at the April 2000 International Conference on Learning in the 21st Century.
- ^Greenberg, D. (1987). Chapter 20,Evaluation, Free at Last — The Sudbury Valley School.
- ^Graduation Thesis Procedure, Mountain Laurel Sudbury School.
- ^ abcdhttp://ehis.ebscohost.com.libdata.lib.ua.edu/eds/pdfviewer/pdfviewer?sid=221ae7c6-6895-4b02-bc69-759936218fba%40sessionmgr104&vid=12&hid=20[dead link]
- ^ abcde"Archived copy"(PDF). Archived from the original(PDF) on 2012-05-29. Retrieved 2012-04-11.
- Carless, David. Excellence in University Assessment: Learning from Award-Winning Practice. London: Routledge, 2015.
- Phelps, Richard P., Ed. Correcting Fallacies about Educational and Psychological Testing. Washington, DC: American Psychological Association, 2008.
- Phelps, Richard P., Standardized Testing Primer. New York: Peter Lang, 2007.