Worthwhile automated feedback for student writing

This morning got off to a great start when I found this blog post on journals in an English Language Arts class. The best part of this post is not that it encourages journaling – just about any ELA teacher would support that. Instead, what’s unique about this post is how he walks through a worked example of iterative, incremental writing improvement through revision. I believe this process can be scaffolded by automated formative feedback with LightSide.

Journals are a great place for putting formative feedback in an English curriculum because they’re low-stakes and can benefit from feedback. They’re the type of writing that, in current classrooms, doesn’t go through grading for content mastery or expertise. They don’t go on a permanent record. Rather, they’re all about getting practice with writing. People get better at writing through practice, not through the stress of timed exams.

The limit is, students don’t get feedback on journal writing – teachers are busy enough grading papers and tests. There’s little emphasis on revision to your journals. No one is pointing students to the places where they might improve on a first draft in this context. So while journals get students writing, in many cases they’re only supporting that initial step, empowering students to recognize that they can write for pleasure. They’re not building up the entire process.

That’s the place in the classroom where automated essay scoring can make an immediate, big difference.

A Case Study

Let’s look at the three examples from the linked blog post. First, he gives a simple free-write journal entry:

Today I got up. I really didn’t want to get up, but I had to. I went to school thinking I wouldn’t survive it. English was horrid: Mr. Scott talked about journals and I just hate this thing. Math was okay: Mrs. Merck was in a good mood. Science was fun and social studies was pretty interesting, so all in all, it was a decent day.

Here’s the author’s proposed feedback: “Add some details: use literary devices, sensory details, and action verbs to add depth.” With this feedback in mind, he proposes that a second draft might look like the excerpt below.

Today I got up. Rather, I felt like only sheer will power hoisted me up. I really dreaded getting up, but I had to. I stumbled to school thinking I wouldn’t survive it. English was horrid: Mr. Scott jabbered on and on about journals and I just detest this thing. Math was okay: Mrs. Merck was in a good mood. Science was fun and  social studies was pretty interesting, so all in all, it was a decent day.

The second piece of feedback that he gives is to “try adding some background details. Ask yourself, “Why?”; then answer that question.” He then shows a final, third draft of his journal.

Today I got up. Rather, I felt like only sheer will power hoisted me up. I really dreaded getting up because it’s Monday, and absolutely no one likes Monday. Even the cheerfulest, happiest people are grumpy on Monday. I got up, in short, because I had to. I stumbled to school thinking I wouldn’t survive it: rumor had it there was going to be a test in math, and I just knew English was going to be painful. Mr. Scott said yesterday that we’d be working on journals, and I hate them.

Much of the day was exactly like I anticipated: English was horrid: Mr. Scott jabbered on and on about journals and I just detest this thing. Math was okay: Mrs. Merck was in a good mood because the previous period had done really well on their test. Science was fun (we worked on rockets) and  social studies was pretty interesting (we learned how laws are actually made), so all in all, it was a decent day.

What makes feedback a candidate for automation?

The two examples of feedback that this English teacher is proposing are a perfect fit for machine learning-based solutions. Here’s why.

Good automated feedback gets students writing more.

This is the most important thing that any scaffolding for writing can do. The best thing that writers can do to improve their writing is to write more, and to keep practicing. Many students, though, are going to be stuck at the first draft above – they won’t see how they can expand. Inexperienced writers need to be nudged in the right direction with advice like this teacher gives. By giving specific suggestions for how a writer can improve, this teacher is keeping students engaged in the writing process.

Good automated feedback must be about the writing, not context.

Imagine if the teacher had given the following advice: “Tell us about your other classes; add detail about the rest of your day.” There is no hope for machine learning to give this suggestion. Why? Look at the example writing above. We see words like “English,” “Math,” “Science,” and “social studies,” yes. However, the word “class” never shows up. The automated system isn’t going to group these together and know that they’re classes. That’s domain knowledge and expertise that’s outside of the range of machine learning.

Moreover, there’s more inference that needs to be done to say “the rest of your day.” Automated systems have no notion that “a school day is made up of a series of five to eight classes.” There’s no built-in knowledge component that’ll parse out that sort of real-world expertise that comes naturally to human readers. Fundamentally, trying to grasp at inferred context is a losing game for machine learning.

That’s not the feedback this teacher is recommending, though. He’s recognizing places where the student’s writing can improve. Advice to add detail is important. It doesn’t require the machine learning to know how schools work or why students might be bored in school. It requires the system to recognize that sentence structure is simplistic and that detail is lacking. It needs to know what strong and creative word choice looks like compared to the limited choices made by struggling writers. This type of feedback can be achieved today, with existing machine learning algorithms.

Many English teachers take this to heart already. In my interviews with composition teachers, one thing they’ve stressed about giving feedback to students is getting them to see what they’ve actually written. Often, amateur writers make assumptions about what their readers can infer. They skip out on context and detail because it’s obvious to them. But what’s obvious to the writer is not always obvious to their audience. Learning about audience and what you can assume about your reader’s background is a huge step for an author. Framed this way, the naïveté of automated feedback engines has surprising potential.

Good automated feedback can be selective.

Our teacher’s advice is useful to this first draft because he recognized what was missing in the student’s writing. However, I believe the teacher missed out on a chance here. Specifically, his advice didn’t point students to where, within their essay, they could make use of his advice. Advice as generic as “add more detail” could just as well be given by a parrot if it’s not targeted.

Machine learning can do better than this. First, we can recognize algorithmically whether a particular text is in need of this advice at all. Consider an equally generic piece of feedback, like “Add some structure; transition sentences and links from one paragraph to the next.” Clearly, in this draft, the generic advice about detail is more critical to this first step of revision than generic advice about organization would be.

Not every piece of advice is useful for every draft of every essay. It’s easy to dismiss basic advice, but that’s what students need in a first pass. It’s something that can be automated, and if we’ve built an automated tool that makes the correct choice about which basic advice to give, we’re moving forward.

Good automated feedback can be targeted.

The teacher’s advice above is also easy to localize to a specific section of the text. Look at the first two sentences of his writing, in the first and second draft.

Today I got up. I really didn’t want to get up, but I had to.

Today I got up. Rather, I felt like only sheer will power hoisted me up.

This revision improved the writing. It’s also not obvious that students would have known to target that second sentence if they were given a generic prompt like “add more detail.” With automated tools, though, we can do this targeting. We don’t just need to assess which piece of advice to give to an essay – we also need to decide where it goes. “Add more detail here.” gets results that you can’t get with a blanket statement for a text as a whole.

Machine learning with LightSide can do this. Our algorithms use features that are localized to sentences and phrases and words. We know which sections of a text are pushing essays towards quality. The formative feedback that can be generated automatically is coming from exact points within a text. The second sentence of the first draft above has weak word choice and little content – it’s exactly the place that an automated system might recognize that detail is lacking. We can point the student in the right direction and get them thinking about revising the portions of their writing that need the most help.

How do we measure success from machine learning?

LightSide’s automated formative feedback will be a success every time a student chooses to write more because of the intervention from our automated tool. Machine learning scores a victory every time it intervenes in a case where a student would miss out on help because a teacher is too busy.

This is a start. Artificial intelligence is a long way from being useful in all aspects of the writing process. We’re woefully behind on detecting specific errors in grammar, usage, and mechanics, and even coming close to human accuracy. External context and inference is hard. What machine learning excels at, though, is recognizing the quality of sentence structure, the organizational cues that make up well-structured writing, and style that goes along with layered, complex writing. What I would encourage skeptics to do is go back and look at the feedback that they’re giving on these elements of writing. How much deeper, inferential meaning is needed for this kind of feedback? In contrast, how much is local to the way a sentence, in isolation, has been built up and organized? My bet is that there are elements of feedback that fit into the latter category. This is where we can use machine learning effectively for automated feedback.

Get involved.

LightSide’s formative feedback platform is still in progress; this type of feedback isn’t ready for students yet. But you can still get involved in the process.

First, leave a comment or send me an email to tell me what I forgot in this essay. I’m not an English teacher – I guarantee that there are many crucial details I forgot when writing this post. The field can’t move forward without constructive dialogue between people like me – technical researchers – and the people that are helping students daily.

Next, sign up for our mailing list and keep checking back at this blog to see where our thinking is at on a week-to-week basis. If our dialogue with teachers is any use at all, hopefully what we’re offering will evolve to fit into real writing in the classroom by the time we open up our platform to new teachers.

Finally, and most importantly, talk about automated feedback with your friends and colleagues. Don’t dismiss the field out of hand – look for places where teachers are overworked today. Look for the places where students are slipping through the cracks. Ask whether automated feedback has any hope to work for those students.

Forget high-stakes exams – that’s an application of this technology, but it’s a dull genre of writing and dangerously easy to misuse without careful thought. Feedback to students throughout the writing process, though? That’s exciting.

September 11, 2013 / / 12 Comments / Uncategorized
  • sjgknight says:

    Hi Elijah,
    Seems very sensible to me.

    An idea off the top of my head, one thing that might be worth thinking about is the ‘success’ measure above: A student going back and writing more.
    That’s probably a good success measure in many cases, but a feedback loop to check the quality of the additions/changes might be important – what if the prompt actually makes the text worse, doesn’t improve it, or alterations make aspects worse and aspects better (i.e. mixed success). Being able to distinguish between students ‘a’ and ‘b’ and ‘c’ one of whom pays lipservice to the feedback, another who uses some feedback but not others, and another who addresses all aspects of the feedback (although perhaps with varying success) also seems important to me. So one implication of that is that there might be potential for some prompts to make things worse (!) so having a penalty built into the machine learning/success measure is probably wise, but also just that ‘feedback’ is iterative.

    With more of my ‘ex-teacher’ hat on, another thing that’s worth thinking about is whether this approach could be used to support student’s self assessment. I want my students to understand the assessment criteria, so being able to spot areas of weakness in their own writing (and then receiving some automated or semi-automated feedback on that) would be a really useful tool to support that. I think the Pearson WriteToLearn (http://www.writetolearn.net/demonstration.php) tool has been used in that way before. Might also provide useable data to LightSide I guess?

  • Laura Gibbs says:

    Hi Elijah! You know I’m going to disagree about some things here, but first off I have to say THANK YOU for conducting the discussion in a friendly space and inviting comments.

    So, my comments:

    1. There are lots of ways to get students writing more without automated feedback. What about peer feedback? What about self-assessment and revision? (I’m not big on rubrics for grading, but I am very big on rubrics for helping students learn to assess their own writing – and that can work with or without pricey tools from educational publishers like Pearson as the previous commenter mentioned). So, regardless of the machine’s abilities (see next comment), there are many options besides a teacher or a machine to promote more student writing – a goal you and I both agree on I know. Peer feedback in particular is hugely motivating for my students; I read their blogs at random, but I make sure I have mechanisms in place so that all the blog posts do get comments from other students… and I sure with our course management system automated that basically mechanistic process, but it is something I have to manage manually. Ugh.

    2. I am dubious about the claims that machines can accurately detect simple sentences as a problem and lack of detail. Machines can detect surface features of a sentence – but simple is a matter of style; some forms of simplicity are great and very well-chosen, but others are not. The computer cannot know. Likewise for details: computers can recognize obscure vocabulary, but that just leads down the road to thesaurus-driven writing (I still occasionally see students with the high school habit of substituting words in their writing from a thesaurus… and it’s obvious, because it sounds so bad). Chasing obscure vocabulary surely cannot be a goal in and of itself, right? If anything, some students need to learn how to avoid obscure vocabulary, zombified abstractions, etc.

    Anyway, I’m in a rush so those are two thoughts for now, and thanks again for carrying on a discussion like this. I will read your future posts and other comments with interest.

    • Not a fan of machine feedback either but some teachers may have no choice so I’m thinking on how it could be used…damage control, as it were. Bottom line: can I come up with an idea that will pass the +Laura Gibbs test? Or my own as I’m even suspicious about rubric overuse by humans as too formulaic and a step down the Taylorization road to perdition.

      What about coordinating machine feedback on informal writing with peer review. Plus student writers get to review the feedback. I’d tolerate a simple Elbow based rubric for that.

  • elijah says:

    Simon, I don’t think anyone really knows the effect of high-quality automated formative feedback on the revision process. Until we start seeing data from our writing platform where we can have humans look over the series of drafts that come out of an iterative feedback loop, we won’t know what the resulting impact on revision is. I agree, though, that sometimes a revision will make writing worse.

    Both of you make excellent points about capabilities that we haven’t focused on yet, scaffolds outside of the individual revision and feedback process like self-reflection and peer review. Those are important, both on their own but most importantly because they reflect on the issue of motivation. Students need to want to write with LightSide, or it’s not going to grab their attention. Motivation is something that I studied at Carnegie Mellon in the context of collaborative group learning, but its application in the writing process specifically is not my expertise.

    Automated assessment is not the complete solution. If at any point I claim that writing education will be “solved” through purely automated feedback, I’ll have misspoken. What I want people to believe, if my message is successful, is that the feedback students can get from machine learning-based tools is positive, and that it’s a completely natural and beneficial thing to include in the set of tools that teachers are using to help students.

    As for Laura’s last comments, this is something that I will absolutely be digging into in another post, and it’s something that I’ve hammered on before. LightSide only learns to reproduce what it sees from human graders. LightSide will not reward obscure vocabulary items unless it sees human graders doing the same thing. I would argue, then, that if teachers are rewarding a particular attribute of writing, and LightSide chooses not to do the same thing, then it is the automated system at fault for not matching instructor expectations. If there is no evidence from teacher behavior that obscure vocabulary ought to be rewarded, then it is as simple as that – LightSide will not spontaneously decide to start rewarding a GRE lexicon.

    Laura, another thing that I’m excited to write about in the future is the role of genre, as an element of composition, in the feedback process. Over the course of my summer, my conversations with composition instructors have given me a vocabulary for the types of feedback that I believe machine learning can do well. Mechanics feedback is nearly a lost cause – but I believe that completely automated help in building a student’s genre and audience awareness is very attainable with current technology. Hopefully I can elaborate on this in future posts, much more frequently than the weeks and months between external blogs that I’ve written.

  • Laura Gibbs says:

    That is very intriguing stuff, Elijah – the students I work with (college students) actually do pretty well with genre and audience, and what they really are struggling with is mostly writing mechanics. But I am interested in genre and the categories you want to invoke – so please do keep us posted. I would be really curious what your genre and audience mapping looks like!

  • […] guest blogger and friend of e-Literate Elijah Mayfield has another great post up on using machine learning tools in the service of improving student writing over at his company […]

  • Mark Limbach says:

    Great post Elijah. The number 1 objective is to get students to practice writing. Whether by automated grading or peer review, tools that allow instructors to assign writing tasks without burdening themselves are essential today in classrooms, brick-and-mortar or online. And not enough is said regarding the benefit of iterative, scaffolded writing projects. Revising, resubmitting, and inviting new rounds of feedback don’t just build writing skills more rapidly, they are the real-world kinds of writing/collaboration projects these students will see in their future workplaces.

  • Research indicates that the children could gain social skills.
    Head on over to the dosbox download page, and download the correct version of dosbox, depending on your OS.
    Steam is one of the most popular places to buy and download PC games online.

  • Jonathon says:

    When it’s really cold, I may start my car and let it run for
    a few minutes before I leave for work. For example- you can make the final choice for the purchase of a
    particular car only after the new report. If you are
    shipping an exotic, classic and antique car then go for covered car vehicles.

  • They’ve got some wonderful examples of work on their website
    under the Vanvouver Maaco and it will definately
    aid you make your choice. To the north with their faced
    down coat tting on a pillow. When I plunged (lunged, more precisely) into the work I’m doing now, I
    was somewhat confused.

  • Emmett says:

    Remarkable! Ιts genuinely awesome paragraph, Ӏ ɦave gօt much ϲlear
    idea regarԁing from this post.

    Visit mү homepage vitalcleansecomplete.net (Emmett)

  • Hi i am kavin, its my first time to commenting anywhere,
    when i read this paragraph i thought i could also create comment due to this brilliant paragraph.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>