Assessment Outline

Here’s an 8-minute video about my assignment outline on Youtube:

http://www.youtube.com/watch?v=NSBFx19312Q

You should also be able to access higher quality videos in the following formats through the University of Bath Learning Materials Filestore (lmf):

Assignment outline: Windows Media Format

Assignment outline: Flash

And… in case you have any trouble accessing the LMF, here’s a high-quality version generally available on the web courtesy of the University’s video streaming server:

http://wms.bath.ac.uk/live/LTEO/LJ_MA_ASS_WMV.wmv

Ranked assessment using a judgemental pairs method; some questions…

And, yes, I’m still thinking about this 2006 video from Teachers TV: http://www.teachers.tv/video/5431

The programme reported on a pilot of the use of PDAs for students to record the process of a design project – sketches, digital photos, notes – as they worked, effectively creating an electronic portfolio. The research was carried out by Tony Wheeler and Richard Kimbell from Goldsmiths’ Technology Education Research Unit, and you can find some more information about the project here.

The assessment of process, particularly the creative process, has been a major theme in our conversations on this unit. The use of PDAs and other mobile devices for students to contemporaneously capture their work has obvious benefits in terms of the immediacy, mobility and general convenience of the technology; it makes sense that one might receive a more valid or honest representation of the process than if the student was taking pictures, making drawings & notes, and then sitting down at a computer at some point in the future, putting everything in order and linking it together. According to Tony Wheeler, the contemporaneous recording of and reflection on students’ work also helps to guide what the students do next.

The interesting thing about this pilot was how the researchers went about assessing the students’ work; they ranked all the submissions using a judgemental pairs system.

The reason they gave for doing this was the difficulty with the meaning of numerical marks, which according to Tony Wheeler are – despite the existence of clear criteria – often awarded according to a subconscious comparison with an imaginary standard. The researchers found that the judgmental pairs system of ranking – previously quite labour-intensive – becomes much faster and simpler when the work is electronic and easily shared.

Although the researchers felt that the ranking system they’d used had benefits in terms of consistency (reliability?), they didn’t really go into much detail about the criteria that they’d based their judgements on – you got the impression that they’d looked at the students’ work as a whole and simply followed their instincts in a rather ‘organic’ way about the depth of process each student had gone through and reported. I don’t doubt that this method would result in a fair amount of consistency between markers, but it does leave me puzzled about the exact nature and content of the ‘summative judgements’ that the students would have received as the output of the assessment process, as I presume they didn’t merely receive a piece of paper with their rank order on! As the researchers said themselves; employers want flexibility, initiative and the ability to collaborate, and we need to assess those capabilities. Were these examples of the type of criteria the researchers were basing their comparative judgements on? Did they make these explicit from the start – to the students and/or to each other? Were they referred to in the ‘summative judgements’ that resulted? Did they feel that this was primarily a norm-referenced assessment, or a criterion-referenced one? I think I’d really like to have a conversation with the Goldsmiths researchers about this – I have so many unanswered questions! However, they’re probably really busy people so I guess I’ll just have to read the project report

mobile oral assessment & objective e-testing

Still thinking about this 2006 video from Teachers TV: http://www.teachers.tv/video/5431

Having already touched on the challenge of gaining stakeholders’ trust in the capability of e-assessment in my previous post, there were a few more issues raised in the programme that were highly relevant to the topics we’ve discussed so far in this unit, and also to my own experience:

One central theme of the programme was validity of assessment; the narrator begins by questioning the validity of assessing students in isolation by means of pen and paper examinations. As Hal Maclean from Ultralab states; “Society has moved on since this was a valid way to test children”. I would add adults to that as well!

One example given in the programme of how technology is enabling dramatic change in assessment practices is the use of oral assessment via mobile phones. The researchers felt that it might be more appropriate for young people to express what they’ve learned orally, through a medium that they’re familiar and at ease with (which makes sense). This form of assessment also offers the learner choice about and where to complete the activity. As an aside, it was interesting that the researchers found that students had a preference for a female robotic voice – it makes sense that the impersonal effect created by a robotic voice might reduce performance anxiety.

We’ve certainly spoken a lot about the validity and fitness-for-purpose of timed, written tests throughout this unit. One can see how being able to recall information, explain things and justify one’s actions orally is important – and, as adults in the world of work, whenever we have to do this in an instant manner it tends to be orally. Of course, being able to show understanding and justify one’s actions in writing is also important, but one rarely has to do this rapidly, in isolation and without anything to refer to.

Other alternatives to the pen-and-paper medium include electronic content submitted via PDAs and other mobile devices, and computer-based tests.

Computer-based objective tests not only offer massive gains in terms of efficiency of marking and quality of statistical output, but they also allow tests to be personalised. Different questions to test the same skill mean that students cannot copy from each other, and adaptive functionality ensures that students are appropriately challenged and tested at the right level. There is also the potential here, as with most forms of e-assessment, for the student to choose when they want to be assessed. Computer-based testing also enables the provision of instant feedback. These are key ideas that we’ve discussed within the unit to date – reliability, differentiation, personalisation & instant feedback. I liked the example given in the programme about computer games; that children & young people play them because they are challenging, and they can immediately learn from their mistakes and correct them to progress quickly. It’s very interesting that computer games generally get quite a bad press among parents and teachers – perhaps some of this could be (subconsciously) fuelled by envy of the games’ ability to capture their child’s or pupils’ attention far more effectively than they are capable of doing…?!

We currently offer objective computer-based tests on the ICM programme for our students to test their understanding of key concepts within the subject area of their study modules. Some of them I would judge to be ‘better’ than others – i.e. the ones that have adaptive capability, and provide more detailed feedback with a prompt to try again. Perhaps if/when we get more human resource allocated to online tutoring we’ll be able to make wider use of these formative assessment opportunities. The personalised assessment tasks I mentioned in my previous post could also be designed as low-stakes formative assessment with instant feedback.

More later… still have some ideas to come about ranking over numerical marking, and the assessment of process through PDAs – more of the topics covered in the programme…

e-assessment and inertia

So we’re looking at e-assessment this week – hurrah!! 🙂

Our ‘homework’ was to watch this programme from Teachers TV http://www.teachers.tv/video/5431, which gave several examples of innovative technology-enabled assessment practices.

Overall I felt that this gave a particularly balanced view of e-assessment; a clear picture of the benefits but also a word of warning about how challenging it is to gain credibility and trust for these new assessment practices within society – and I imagine that this, in addition to a lack of functional connections between educators and programmers/technologists, is one of the primary reasons why things have moved on so very little from when this programme was first broadcast nearly three years ago. How exactly do we overcome the natural conservatism among generations who’ve been through paper-based examination systems, and accusations of ‘dumbing down’ and ‘cheating’? I’m personally facing these very issues as I type, since creating a VLE for an electrical engineering DL programme that’s holding on to very traditional assessment methods. As expected, the students are using the discussion forums to ask for and give help on the sample test questions – a system that seems to be working beautifully, and the immediate uptake with little staff encouragement has proved that the students felt there was definitely a need here. However, staff are concerned that, as the students are now ‘talking to each other’, that they’ll be able to ‘cheat’ on the summative tests, which are effectively open-book pen-and-paper exams. In solving one big problem through technology – student isolation – we’ve created another. Whether this makes the old assessment method significantly less reliable or valid than it was before is uncertain, but if the programme team wish to keep using this assessment method, it may be worthwhile using a system that creates personalised tests, similar to the primary maths tests used as an example in the Teachers TV programme. Mark Russell at the University of Hertfordshire has been using a similar system for frequent low-stakes formative assessment in higher-level Engineering courses that he developed several years ago. I invited Mark in recently to demo his system to our programme team, who were quite interested. However, I still feel that there is a fair amount of inertia to overcome, in that the team needs to see a working product before they consider giving it a go. Am I, with my natural & social sciences background, capable of setting up a personalised assessment programme for an M-level electrical engineering unit? I guess, working in the Engineering faculty, I’m bound to be able to find someone who can give me a hand..!

There were plenty more issues raised in the programme that were relevant to what we’ve already covered in the assessment unit, and also to my own experience – see the next post!

Assessment of process

In preparation for last week’s session on Assessment & Creativity, we read a range of sources related to the assessment of process.

Jill Porter’s article on promoting self-assessment with pupils with severe learning difficulties highlighted how central self-assessment can be in the assessment of process; if students are evaluating their performance, reflecting on what they know and setting targets, they are documenting the process as they go. The author suggests a number of tools and technologies for aiding the reflective process – the use of mirrors and cameras to assist self-assessment; circle-time and plan-do-review boards that introduce an element of peer-assessment, and one-to-one sessions to enable teacher assessment. The ideas presented in this article, although in the context of the primary sector, are also relevant to higher education; giving students the opportunity to video face-to-face groupwork, or using mirrors to enable students to sit outside the group and perform a peer-assessment, would both be of benefit to the self- and peer-assessment of process.

Another source we were given to read was a student assignment on the contradictory nature of assessment of the arts. It challenged the viewpoint that objective assessment is a desirable goal, suggesting that any truly objective assessment of the arts can at best only measure ‘technical proficiency’. The ideas presented here reminded me of those of Elliot Eisner, evident in The Art of Educational Evaluation (discussed in an earlier post) as he challenges the setting of educational objectives prior to the learning process.

This piece of writing explained the concept of legitimation of exclusion in a way that I found much clearer than in the Broadfoot article I read previously, noting that, in today’s society, failure is as much of a necessity as success. The author of this piece emphasised, as others have, the weakness of the correlation between exam success at school and success at university and in general employment; however, the fact remains that selection has to occur, and therefore we need something to base that selection on. But it does seem logical that we are compromising pupil and student confidence and success in favour of objectivity and comparability.

I found the author’s reference to the Mcnamara Fallacy thought-provoking. It struck a chord with me as I am currently trying to reconcile the assessment of group discussion within a framework for module assessment in the ICM programme. Not providing credit for these activities does appear to give some students the impression that they are less important that the credit-bearing activities – it’s interesting to look at the fallacy from the perspective of the person who is being measured!

The central conclusions of this text were that teaching students to reflect, evaluate and recognise creation will assist the creative process, and the assessment of that process too, but creativity itself cannot be ‘pre-ordered’. A combination of discussion, feedback, ipsative assessment and self-evaluation is required for the valid assessment of a creative process. The final assessment should draw heavily on the student’s own reflections and self-evaluation, which also helps to foster independence and self-discipline.

The third piece I read was a foreward to Anna Craft’s 2005 book, ‘Creativity in Schools’, by Tim Smit, co-founder of the Eden Project. This drew briefly on a number of ideas, such as the link between emotional development and observation and their contribution towards creativity, and the differing notions of creativity between cultures. I found particularly interesting the idea that creativity in problem-solving is evident in all cultures, but creativity in the arts is generally in inverse proportion to the power of the state and degree of atrophy within a culture. The author also touched on the concept of the ‘throwaway culture’ and suggested that this is a symptom of our constant quest for the new and innovative. Finally – I liked the definition of creativity given here as the interface between self expression (a person’s ‘unique voice’) and the outside world. This surprised me as, to be honest, the wording is far ‘fluffier’ than what I would normally feel at ease with – I might have to explore what this definition means to me in more depth later on.

The final text I read was ‘Assessing the Creative work of Groups’ by Cordelia Bryan, a chapter from the 2004 publication ‘Collaborative Creativity’ (Miell & Littleton, eds). The context of collaborative group work is highly relevant to the work I’m doing at the moment with the ICM programme, and I took some valuable ideas away from this reading. Bryan emphasises the need to prepare students for collaborative groupwork by building trust and understanding of group dynamics and techniques, and explained why this is best carried out within the context of a standard unit of study rather than a separate ‘study skills’ unit. The challenges of allocating credit for group tasks were also explored, and the importance of self and peer assessment – on both achievement of the task and facilitation of the process – was emphasised. Bryan advises that students should be given the opportunity to complete the entire cycle as a learning process, and to share their perceptions of assessing and being assessed on groupwork, before a credit-bearing assessed activity takes place. She also suggests that criteria and grading methods should be discussed and agreed upon after the students have experienced the initial groupwork activity.

One of Bryan’s ideas I found most interesting was the use of ‘peer observers’ – students who observe another group working and make notes in response to key questions, and then feed back to the group on completion of the group task.

Getting stuck in

In preparation for this week’s session with Dr Sue Martin on norm- and criterion-referenced assessment, we were asked to assess an assignment submitted by a previous student on this very topic (I assume it was written a few years ago). This was a really interesting task and I enjoyed it much more than I thought I was going to. I initally didn’t feel that I was qualified to pass judgement on it, but reflecting on the peer-assessment activities I ask the ICM students to take part in (and the frustration I feel when they express the exact same feelings as a reason for not participating) encouraged me to put those thoughts aside.

My feedback on the assignment was as follows:

This assignment demonstrates extensive knowledge of the issues pertinent to norm- and criterion-referenced assessment. The depth of analysis is sufficient in places and lacking in others; on future assignments you might like to consider relying less on bulleted lists. Such lists do not lend themselves easily to analysis or criticism of the ideas presented, or even reporting of the sources of those ideas (see Page 2 for an example). Citing your sources will give you the opportunity to demonstrate your ability to critically analyse the ideas and conclusions of others.

The content of the assignment is clearly relevant to the question and the structure leads to a well-reasoned conclusion. You have drawn on your own experience where required. The section showing justification for criterion-referencing could have benefited from the making of comparisons and connections with the relevant literature, as you have done to good effect in the following section.

Where you have cited academic sources, you have used them well to support the arguments presented. I would actually have been interested to see more evidence of conflict and contradiction in the sources you used; for such a complex topic your argument and conclusion are both very ‘neat’. You may have benefited from making a deeper exploration of the uncertainties, assumptions and values underlying the conclusions made.

You have used a selection of relevant and recent literature and have relied mainly on books. In future assignments you may benefit from exploring a wider range of journal articles and recent papers; you may find that this enables you to unearth conflicting views, and conclusions that you feel more able or willing to challenge and question. This will enable you to demonstrate that you have explored and analysed a range of options, and provide the fuel for you to present ideas or responses that are truly original.

I was satisfied with the depth of the feedback I felt able to give, and felt that I’d framed it in a positive way with a formative emphasis. However, I felt entirely unable to allocate a percentage mark. I don’t feel that this was necessarily due to lack of marking experience or confidence in my judgement; I suspect there is some degree of norm-referencing hard-wiring at work, which makes us feel the need to compare a piece of work with another in order to make sense of the subjective statements made in the assessment criteria! Perhaps that’s what this exercise was designed to show…? If so, it was very effective!

This exercise gave me a fresh perspective of the subjectivity of assessment criteria, and how susceptible they might be to the values and priorities of different markers. For example, one of the first elements in the MA assessment criteria is depth of analysis. What constitutes deep analysis? How long is a piece of string? I have a sheet of paper above my desk that presents two lists – one list of questions as tools for critical thinking, and one list of questions as tools for reflective thinking. They come in handy when I need to dig deep, or even when I’m just trying to work out how I feel about something. They also came in handy when assessing this assignment, as I could pick out the questions that seemed relevant and look for evidence that the student had answered them – or even evidence that those questions had been pondered upon. In most cases there wasn’t much evidence of this, but common sense tells me that this is probably the case for the majority of M-level assignment submissions. I suspect that many students are thinking critically and reflectively, but find it difficult to provide evidence of this while trying to present a scholarly and ‘well-reasoned’ piece of work. The phrase ‘well-reasoned’ implies an argument where all the pieces fit together neatly in support of a conclusion. Incorporating doubt and conflict into this picture presents quite a challenge.

So – we have a situation where I felt that there wasn’t much evidence of deep analysis in this assignment. But on the other hand, if it had been placed alongside several other students’ assignments and looked rather good in comparison, would this justify awarding it a high mark? On yet another hand, I may be off-centre in my standards of judgement as to what constitutes ‘deep analysis’.

Something else I thought about when assessing this assignment was the degree of description that is necessary when writing an M-level assignment. What level of knowledge and understanding can or should one assume of the reader? Did the features of criterion- and norm-referenced assessment have to be listed in the (descriptive) way they were, or were those characteristics established and agreed upon in the literature to such an extent that they could have been tucked away in an appendix, leaving some more leeway in the word count for in-depth analysis and debate?

It’s funny – I’ve read so often about how powerful it is to get students to engage with the marking criteria through peer- and self-assessment exercises, and believed it, and even implemented the theory in the courses I design, but never actually engaged in it myself to this degree. It really is powerful. I feel much better equipped now to write my own assignments…!

Assessment & Social Justice

Futurelab’s literature review on Assessment & Social Justice was an additional reading suggested to us a couple of weeks ago – it was rather interesting so I came back to read it in more detail.

Although there was not much that was particularly new to me in terms of the conclusions made by this review, it really did make me think in more depth about the concept of ‘fairness’, and how it is intertwined with issues such as motivation, self-efficacy and access. Strategies such as low-stakes assessment, personalised learning and extended assessment tasks were highlighted as potentially having a positive impact on social justice in assessment.

I loved the idea of installing a period of feedback and reflection after GCSEs and A-levels – can we do this, please 🙂

The role of technology in assisting with social justice in assessment was explored at length, and although the challenges of security and financial cost were acknowledged, the benefits of adaptive testing, automated marking, translation and speech recognition were shown to be significant. The use of e-portfolios is a great example of participatory social justice in assessment – as individuals make their own choices about how they are going to be demonstrate their learning.

The potential in the future for technology to facilitate more personalised curricula was also explored. The drift in emphasis from understanding of subject-bound content to the demonstration of transferable skills (such as team working, self-management, reflective and creative thinking and problem-solving) is bound to bring new and exciting challenges to assessment processes across all levels of formal education. These are challenges that those of us who work with professional masters programmes have been grappling with over the last few years, so it’ll be great to be part of a growing body of knowledge and experience in how these skills can (and should) be assessed.

The impact of (virtual) classroom evaluation practices

Following on from my previous post on the impact of assessment, evaluation and testing on achievement and motivation, I’m going to narrow my focus on the impact of classroom evaluation practices in particular, drawing on Terry Crooks’ 1988 article – The Impact of Classroom Evaluation on Students – and thinking about how his recommendations for educational practice relate to my own context of online distance learning.

As summarised in my previous post, there is significant potential for classroom evaluation activities to impact positively on students’ achievement and motivation. Crooks emphasises a number of points that should be considered and acted on in order to maximise these positive impacts:

1. Encouraging deep learning strategies
Crooks points out that classroom evaluation tasks should be based on the demonstration of understanding, through the application of learning to new problems, and other higher order thinking skills.
In my context, working with masters-level students, it seems clear that formative evaluation of students’ ability to simply recall information is not something that we should be focusing on. I’ve always wondered about how useful the self-test quizzes are to our students. I feel that we should be concentrating our efforts on activities that require students to use factual knowledge to solve a problem or carry out a process – for example, the Module 3 role-play activity, the group discussions in Modules 4 and 6 and the case study exam practice in Module 5. Ideally the reflective discussions in Modules 1 and 2 (soon to be expanded into the other modules), could be varied over time to bring in a problem-solving flavour, perhaps by alternating introspective, reflective questions with problems or questions to be answered in a collaborative way.

2. Assisting learning through evaluation
Crooks reminds us of the risks of normative grading (see my previous post) and advocates minimal use of summative grading with classroom evaluation activities, and an increased focus on the identification of strengths and weaknesses through useful formative feedback. This is something that we are working hard to address with our online DL programmes; where the original model simply did not incorporate formative evaluation; tutors were employed to mark assignments and examination scripts. It’s simple enough for a learning designer such as myself to incorporate formative evaluation activities (discussions, workshops, etc) into the modules, but in order to have the motivation and confidence to self- and peer-assess, a significant degree of evaluation from the tutor is required. It’s not enough for us simply to offer to pay the (part-time) tutors for this extra work; they have to actually have the spare time to do it, which is where optimism can often be our downfall! Let’s say it’s a process that is ongoing, and progress is definitely being made; more quickly in some areas than others.

3. Giving effective feedback
Crooks recommends that feedback should have an emphasis on personal progress – this is difficult when the majority of our ICM tutors only mark assignments for one unit, but is one of the many reasons behind getting them more involved in the provision of formative feedback. It’s nice to see it happening in Module 1, where the tutor is contracted to provide detailed feedback on a draft assignment submission. Crooks also picks up on the importance of timing – that feedback should be provided as soon as possible after the event, and, where appropriate, so that the student has an opportunity to correct any deficiencies that have been highlighted. Crooks’ third recommendation is that feedback should be as specific as possible – i.e. saying exactly what has been done well and what could have been done better, rather than generalising across the entire piece of work.

Phil Race‘s December 06 presentation on giving fast and effective feedback is full of guidelines and great ideas for making feedback more effective. One idea I’d like to try out is the provision of a formal pathway for students to respond to the feedback they receive on their assignments; an online text submission activity where they can say how they feel about the feedback generally, ask for clarification on anything they don’t understand, and tell us (okay, by ‘us’ I mean tutors) what to stop/start/continue doing when we give feedback. Slide 3 of the presentation has some great points for making sure the timing of feedback is optimised. I’ve always liked the idea of sending out an overall feedback sheet based on common errors and difficulties the day after students submit their assignments, and then following up with personalised comments afterwards, but I haven’t as yet been able to persuade my colleagues of the benefits…

4. Maximising the benefits of co-operation
Co-operative activities can facilitate learning and motivation, and help to develop interpersonal skills and relationships between students. On an international distance learning course such as ICM, this is ultra-important as it is so much easier for students to become isolated. Crooks recommends that co-operation is particularly appropriate for complex tasks where the different perspectives and skills of the group members can complement each other – such as the group role-play task in Module 3 (different skills) and the various group discussions (different perspectives). As I mentioned previously, I think students could benefit more from these different perspectives if more of the group discussions had a problem-solving flavour than a merely reflective flavour. The reflective element is important and I don’t doubt that students find it very interesting at first to hear about the way things are done in different countries and industry sectors, but I think that using these different perspectives to collaboratively solve problems or even to answer questions, as in Modules 4 and 6, would utilise more higher-order thinking and prevent ‘sharing fatigue’.

5. Setting Standards
Crooks concludes that student motivation is highest when evaluation standards are high but attainable. However, these are subjective qualities and therefore some differentiation may be required. As ICM is a masters-level programme, the standards for summative assessment are fairly inflexible, but we can support students who may struggle to reach these standards by setting attainable intermediate targets, and assist all students by providing detailed and specific criteria for all tasks. This prevents misdirection of effort and should decrease the anxiety associated with evaluation. A personal tutoring system would allow individual students’ progress to be monitored more closely across the programme, enabling the setting of personalised goals.

6. Frequency of evaluation
It’s fairly obvious that students will benefit from having regular opportunities to practise and use the skills and knowledge required to achieve the learning outcomes of the programme, and from receving feedback on their performance. These opportunities encourage both active learning and the consolidation of learning. With part-time distance learners, one has to strike a balance between providing enough opportunity for practice, and ensuring those who simply don’t have time to take part in all the available activities are not fatally disadvantaged. I suppose the key is to ensure that the few activites that are offered have maximal benefit in terms of student learning, and are well-supported with tutor input and feedback.

7. Selection of evaluation tasks
…which follows on nicely from the point made above. Crooks advises that the nature and format of evaluation tasks should suit the goals that are being assessed. He also suggests offering a variety of tasks, potentially even giving the students a choice of tasks to complete, as this stimlates and takes advantage of intrinsic motivation.

8. What to evaluate?
To conclude his review, Crooks emphasises again that as educators we should be articulating and evaluating the skills, knowledge and attitudes that we perceive to be the most important. Even if these are hard to evaluate (e.g. reflective thinking springs to mind), we must find ways to assess them.

The impact of assessment on achievement & motivation

For Monday’s session, Eva suggested we read Chapter 3 of Gipps’ Beyond Testing: Towards a theory of educational assessment (1994), and reflect on the ways in which assessment, evaluation & testing impact on students’ achievement & motivation.

Reading this article caused me to think more deeply about the nature of achievement. It’s often the case that the abilities that are tested (i.e. the ones that are easiest and most economical to test), become the ones that are the most taught; there is a bias against teaching those skills that are not, or cannot easily be, measured. Therefore, when a student appears to achieve well, one has to consider the scope of the test’s validity, and the constructs or skills that it is actually measuring.

Gipps highlights Heisenberg’s uncertainty principle – that one cannot measure things without affecting them – and describes the concept of construct-irrelevant variance, or ‘test score pollution’, where test scores may rise due to teaching being focused on the test items and formats themselves, rather than on the constructs or skills they are intended to measure. This is something else to consider; whether an apparent increase in achievement is due to a greater understanding of a construct or better grasp of a skill, or largely due to students being more highly skilled at taking tests.

It is difficult to imagine how the negative effects of teaching to the test can be minimised with the existence of high-stakes testing. As Gipps points out, teachers feel that they have a professional duty to give their students the best chance to pass those tests that will have a significant effect on their lives. Gipps asks whether even a positive impact such as that created with the move from ‘O’ levels to GCSE can become corrupted over the years.

Reflecting on these points helped me to see my own experiences with a fresh perspective. I gained nine ‘A’ grades at GCSE but received a ‘D’ grade in my English Literature examination. Initially I was surprised and disappointed, and felt that the test must be at fault, but over time I began to see the D grade as an absolute and valid measure of my ‘achievement’ in this area. Gipps’ description of test results as ‘a useful but fallible indicator of achievement’ is helpful. It encourages me to examine what exactly my English Literature examination was testing. One might assume that the aim of an English Literature assessment would be to assess a student’s appreciation of good literature. However, this is not an easy quality to measure. The ability to unpick a metaphor, however, is a much easier quality to measure, and I suspect that whoever wrote my English Literature paper was thinking (hoping?) that measuring someone’s ability to deconstruct metaphor was the same thing as measuring their ability to appreciate and enjoy literature – as Gipps phrases it; generalising to other measures of the same construct. Ah well – at least they picked a measure that was resistant to pre-test coaching (not that my English teacher didn’t try). I felt thoroughly prepared (i.e. coached) for the other nine exams, so I’m not suggesting that those results are any less fallible. Society, however, will still simply see me as a ‘high achiever’ in these subjects and a ‘low achiever’ in English Literature. Gipps summarises the conclusions made by Madeus (1988) – that we as educators need to lower the stakes of tests, and also to try and persuade everyone that test results are only one piece of the puzzle.

Taking the latter point – it seems clear to me that having more detailed reporting of results is crucial in order to discourage the making of simplistic assumptions about achievement – this relates to the issues of unidimensionality and universality discussed in my earlier post on Assessment Paradigms.

Looking now at the issue of motivation (which is inextricably linked to achievement), and how it can be affected by assessment/evaluation/testing, the way forward seems a little clearer, particularly with low-stakes assessment; what Crooks (1988) terms ‘classroom evaluation activities’, as I’ll go on to describe below. With high-stakes assessment we have a difficult conflict between the external motivation experienced by students who believe they can succeed and therefore obtain the rewards, and the demotivating effect on those who know they cannot.

According to Crooks (1988), classroom evaluation activities serve to emphasise the skills, knowledge and attitudes that are valued, help to structure approaches to study and consolidate learning, and affect the development of enduring learning strategies. Crucially, when classroom evaluation is used within a framework of attainable sub-goals, each with clear criteria, it affects self-perceptions of competence (self-efficacy), which is shown to be closely related to the use of deep learning strategies and the ability to persist with challenging tasks. The term ‘classroom evaluation activities’ can also be applied to the input from the e-moderator or e-tutor in the online distance learning programmes I work with. A significant part of my role is to encourage the tutors to engage more deeply with the online activities, and I always see evidence of an instant and powerful impact on the students’ motivation when their tutor provides specific, well-timed feedback on an activity.

Norm-referencing can have a negative impact on motivation; not only can it discourage collaboration and threaten peer relations; it essentially attributes success and failure to ability rather than effort.

The GCSE was intended to emphasise positive achievement (and therefore enhance motivation) by allowing students to show what they could do rather than facing them with impossible tasks. This required the use of differentiated tasks and criteria for different levels of expected achievement, which could unfortunately have a demotivating effect on those who realised that, however hard they worked for their maths GCSE, for example, they would only be able to attain a maximum B grade. However, the changes in content and teaching brought about by the move from ‘O’ level to GCSE, particularly in subjects such as MFL, resulted in a massive increase in the number of students choosing to progress to ‘A’ levels.

I’m off to the gym now – but afterwards I’m going to look into classroom evaluation practices more deeply by reading Terry Crooks’ 1988 article on the impact of classroom evaluation practices and reflecting on how the key points he makes relate to my own context of online learning environments 🙂

Dimensions of assessment

To focus on how we assess, Eva’s suggested that we pick two assessment intruments we’re familiar with and analyse them in terms of a number of ‘dimensions ‘of assessment; e.g. formative – summative, standardised – authentic, etc.

The two assessment instruments I’m going to analyse are as follows:

A. An online group role-play activity (case study at go.bath.ac.uk/M3roleplay)
B. Online peer review of assignment drafts (case study at go.bath.ac.uk/ICM_peer_review)

Dimension 1: Formative – Summative
B is clearly entirely formative, while A has a blend of summative and formative elements. The formative capacity of A is mainly within the breakdown of the mark given – as you can see from the mark scheme, the student should come out of the assessment process knowing where marks were gained and lost, and the strengths and areas for development they demonstrated during the task. The students do also receive a single ‘final mark’, but they are encouraged as professionals to interpret the outcome of the assessment as a basis from which to move forward and reflect on how they might improve their skills further, as well as how to apply what they’ve learned to the situations they encounter in their workplace.

Dimension 2: Formal – Informal
Both A and B are primarily formal assessment incidences – however, the more authentic nature of A (and the fact that it is in itself an extended teaching/learning activity) means that informal self, peer and tutor assessment is likely to take place throughout.

Dimension 3: Product – Process
A is primarily about process and B is primarily about product. The mark scheme for A (linked to above) demonstrates that is it mainly process elements such as participation, facilitation and even attitude that are being assessed. The information we’ve been given about this claims that processes are rarely assessed explicitly, ‘probably because of their insubstantial, transient nature’. However, assessment activity A takes place online, and all communication is in written form – which makes the negotiation process and the involvement and attitudes of the group members more concrete and thereofre easier to assess.
We are currently amending the peer review activity (B) so that the process element is taken more into account – not in terms of how their peers have gone about writing their assignments, but in terms of how they have carried out the review. In the most recent version, the two students reviewing an assignment get to compare their reviews with each other, and then tutor reviews the review process and provides feedback to both the author of the assignment and the two reviewers. 

Dimension 4: Continuous – Terminal
I’m not sure if this dimension is relevant to ‘B’ at all, as it seems to me neither continuous nor terminal…? The formal assesssment element of ‘A’ could be continuous if it took place at all three stages of the task, rather than just at the end. As mentioned above, with ‘A’ there is informal self, peer and tutor feedback taking place continuously.

Dimension 5: Standardised – ‘Authentic’
‘A’ was initally conceived to be as authentic an assessment task as we could reasonably make it, not intentionally to the learning and teaching conditions (the assessment task itself was actually seen as part of the learning/teaching activity), but to the situations that the students would be most likely to encounter in their workplace. The groupwork element of ‘A’ certainly raises challenges for its reliability, but I would hope that the validity of this form of assessment might balance that out. ‘B’ incorporates a standardised assessment task and a standard set of assessment criteria that students used to carry out the peer review. The students are given guidance on how to use the criteria, but the lack of any sort of quantitative output makes me question whether reliability is an issue for ‘B’…?

Dimension 6: Internal – External
Oh crumbs – there is so much peer and self-assessment going on here, I would say that both ‘A’ and ‘B’ range from internal to very, very internal 🙂

Dimension 7: Norm-referenced – Criterion-referenced
Both ‘A’ and ‘B’ are criterion-referenced. It should be noted that with ‘A’, there will be probably be some discussion at an exam board about the spread of marks if it does not loosely conform to a normal distribution…!

In conclusion – that was quite an interesting analysis for me to carry out. I didn’t consciously have any preconceptions about how different the two assessment instruments would be in terms of the seven dimensions, but some of the analyses did surprise me a little. I also think that this analysis will come in handy when presenting my experiences of designing and using these instruments to others through case studies and workshops – great stuff 🙂