SSkilltecabyclaudinhocode
Enviar skill
← Voltar para o catálogo

homework-grader

Documentos

Rubric-driven AI homework grading system. Grade homework, score student submissions, evaluate assignments against rubrics, batch-process hundreds of papers, generate personalized feedback comments, calibrate AI scoring against teacher standards, export grades to Excel. Supports text, image, and mixed modality submissions. Built-in PDCA quality cycle with bias mitigation.

5estrelas
Ver no GitHub ↗Autor: ChantillyAnLicença: MIT

Homework Grader

A course-agnostic, Rubric-driven evaluation engine for grading student homework with Claude. All course-specific knowledge lives in user-defined Rubric YAML files; this Skill provides the scoring methodology, quality control framework, and batch processing pipeline.


When to Activate

Activate this Skill when the user:

  • Asks to grade, score, or evaluate student homework or assignments
  • Wants to create a rubric or scoring criteria for coursework
  • Needs to batch-process a set of student submissions
  • Asks about calibrating AI scoring against teacher standards
  • Wants to export grades to Excel or generate grade reports
  • Mentions PDCA, quality control, or bias checking in grading context
  • References homework, assignment, submission, coursework evaluation

Keywords: grade homework, score assignments, rubric, evaluate student work, batch grading, calibrate scoring, export grades, feedback comments, PDCA cycle


Core Concepts

Rubric-Driven Design

Every scoring decision traces back to a Rubric YAML file that defines:

  • Criteria with weights, 1-5 anchors, and evidence types
  • Gates for pre-scoring validation (keyword, structure, length, custom)
  • Thresholds for accept/review/reject classification
  • Comment guidelines for feedback language, tone, and structure

The Skill never invents criteria. If the Rubric doesn't define it, it doesn't get scored.

Direct Scoring Method

Each submission is scored independently against absolute standards (not compared to peers). This is the correct method when objective criteria exist — which Rubrics provide by definition.

  • Scale: 1-5 Likert (integer scores per dimension)
  • Process: Evidence → Reasoning → Score (never reversed)
  • Aggregation: Weighted sum across dimensions

PDCA Quality Cycle

Every grading batch follows Plan → Do → Check → Act:

  • Plan: Define/validate Rubric, prepare calibration samples
  • Do: Preprocess submissions, run AI scoring, generate comments
  • Check: Calibrate against teacher scores, check distributions, detect bias
  • Act: Human review of flagged items, refine Rubric for next round

Multimodal Support

Submissions are preprocessed into a unified Intermediate Representation (IR) before scoring. Supported modalities:

  • Text (P0): docx, pdf → Markdown text
  • Image (P1): jpg, png → Claude Vision structured descriptions
  • Video (V2): mp4 → keyframes + transcript (future)
  • Mixed: Combination of above

PDCA Workflow

Phase 1: Plan

Goal: Establish scoring standards and validation baseline.

StepActionOutputExit Criterion
1.1Define or load Rubric YAMLrubric.yamlPasses schema validation
1.2Validate RubricValidation reportWeights sum to 1.0, anchors complete, gates well-formed
1.3Prepare calibration samples3-5 teacher-scored samplesCover good/medium/poor range
1.4Configure batch parametersProcessing configSubmission format, batch size, mode

Exit: Rubric validated + calibration samples ready + teacher confirms.

Failure: Invalid Rubric → fix and re-validate. No calibration samples → teacher must provide at least 3 before proceeding to Do phase.

Phase 2: Do

Goal: Process all submissions and produce AI scores.

StepActionOutputExit Criterion
2.1Collect submissionsworkspace/raw/All files present and readable
2.2Preprocess → IRworkspace/ir/Each submission has valid IR JSON
2.3Run gate checksGate results in IRAll gates executed, failures recorded
2.4Score each submissionworkspace/scores/Each has dimension scores + comment
2.5Generate commentsComments in score records200-400 chars, three sections

Exit: All submissions scored (or failed items logged).

Failure: API errors → retry with exponential backoff (max 3). File corruption → log and skip. Parse errors → retry up to 2 times, then flag for manual.

Phase 3: Check

Goal: Validate AI scoring quality.

StepActionThresholdOn Failure
3.1Calibration: AI vs teacher on samplesκ ≥ 0.70, ρ ≥ 0.80 per dimension→ Back to Plan: adjust anchors
3.2Distribution check|skewness| < 1.0, no >40% concentration→ Spot-check extreme scores
3.3Bias detectionLength-score |ρ| < 0.3, position-score |ρ| < 0.2→ Adjust prompts, re-score
3.4Confidence filtering≤20% mandatory review (conf < 0.6)→ Review flagged items

Exit: All checks pass, or teacher accepts results after reviewing issues.

Failure: κ < 0.70 → return to Plan phase, revise Rubric anchors. Significant bias → adjust scoring prompts and re-run Do phase.

Phase 4: Act

Goal: Finalize grades and capture lessons.

StepActionOutput
4.1Human review of flagged itemsCorrected scores
4.2Export to ExcelGrade spreadsheet
4.3Record Rubric adjustments (if any)Updated Rubric version
4.4Log lessons learnedImprovement log for next cycle

Exit: Final grades exported + Rubric version updated if changed.


Rubric Schema

A Rubric is a YAML file with the following structure. See templates/rubric.yaml.tmpl for a copy-paste template.

Required Fields

rubric:
  id: "course-assignment-v1.0"       # Unique identifier
  name: "Human-readable name"
  version: 1.0

  criteria:
    criterion_id:
      name: "Dimension Name"
      weight: 0.30                    # All weights MUST sum to 1.0
      scale: [1, 2, 3, 4, 5]
      description: "What this measures"
      scoring_guidance: "How to evaluate"
      anchors:
        5: "Excellent — observable criteria"
        4: "Good — observable criteria"
        3: "Adequate — observable criteria"
        2: "Below average — observable criteria"
        1: "Poor — observable criteria"
      evidence_type: quote            # quote | observation | metric

  thresholds:
    accept: 3.0
    reject: 1.5
    review: [1.5, 3.0]               # Must equal [reject, accept]

Optional Fields

Tip: The templates/rubric.yaml.tmpl template includes additional optional fields (created, updated, author, course.code, course.semester, gate.description, notes) not listed here. They are informational metadata — the scoring engine ignores them, but they help with Rubric management.

  course:                             # Remove entirely if not needed
    name: "Course Name"
    submission_type: text             # text | image | video | mixed
    expected_formats: [docx, pdf]
    student_count: 100

  gates:                              # Pre-scoring checks
    - id: "G-001"
      name: "Gate Name"
      check_method: keyword           # keyword | structure | length | custom
      parameters: { keywords: [...], min_count: 1 }
      on_fail: flag                   # fail | flag | warn

  comment_guidelines:
    tone: "constructive, specific"
    language: "zh-CN"
    length_range: [200, 400]
    required_sections: [strengths, weaknesses, suggestions]
    prohibited_patterns: [...]

  history:
    - version: 1.0
      date: "2026-01-01"
      changes: ["Initial version"]

Validation Rules

RuleCheck
Weightssum(criteria.*.weight) = 1.0 (±0.001)
AnchorsEvery value in scale has an anchor description
Thresholdsaccept > reject; review = [reject, accept]
Gate IDsUnique within the Rubric
Gate on_failOne of: fail, flag, warn
evidence_typeOne of: quote, observation, metric

Scoring Protocol

This is the complete protocol for scoring a single submission. Claude executes this directly — no external scrip

Como adicionar

/plugin marketplace add ChantillyAn/homework-grader

O comando exato pode variar conforme o repositório. Confira o README no GitHub.

Comentários · Nenhum comentário

Entre para comentar. Entrar

  • Ainda não há comentários. Seja o primeiro.