SSkilltecabyclaudinhocode
Enviar skill
← Voltar para o catálogo

paper2excel

Documentos

Batch-read PDF papers from a folder, summarize each paper into fixed fields, and export all summaries into one XLSX file. Use when the user gives a folder path and asks to batch summarize PDF papers and write the results to Excel, with title/publish/keywords in English and the analytical summary fields in concise Chinese.

3estrelas
Ver no GitHub ↗Autor: WindZh03Licença: MIT

Paper2Excel

Summarize every PDF paper in one folder into a fixed schema, then export one single-sheet .xlsx file. Keep summaries short, comparable, and strictly based on information inside the PDF.

Workflow

  1. Confirm the input is one folder containing text-based .pdf files.
  2. Run python3 scripts/paper2excel.py check-deps before the first use.
  3. If dependencies are missing, run python3 scripts/paper2excel.py install-deps --target /tmp/paper2excel_deps, then invoke later commands with PYTHONPATH=/tmp/paper2excel_deps.
  4. Run python3 scripts/paper2excel.py extract <folder> --output <extracted.json> to collect file names and extracted text.
  5. Read the generated JSON and summarize papers one by one with the schema in this skill.
  6. Save the structured rows as JSON.
  7. Run python3 scripts/paper2excel.py write-xlsx <rows.json> --output <paper_summaries.xlsx> to generate the workbook.

Process only the current folder by default. Do not recurse into subdirectories unless the user explicitly asks for it.

Output Schema

Create one row per paper with exactly these fields:

  • title
  • publish+time
  • keywords
  • 研究现状
  • motivation
  • insight
  • method
  • 实验结论
  • limitation
  • other

You may also keep source_file in the intermediate JSON for traceability, but the final Excel should prioritize the fields above unless the user asks for extra columns.

Field Rules

  • title: Use the paper title in English from the PDF.
  • publish+time: Use only information stated in the PDF. Prefer Venue Year, for example AAAI 2024. If only the venue is known, write only the venue. If only the year is known, write only the year. If neither is known, leave it empty.
  • keywords: Write exactly 3 English keywords or short phrases, separated by semicolons.
  • 研究现状: About 30 Chinese characters.
  • motivation: About 30 Chinese characters.
  • insight: About 30 Chinese characters.
  • method: About 100 Chinese characters.
  • 实验结论: About 40 Chinese characters.
  • limitation: About 30 Chinese characters.
  • other: About 40 Chinese characters. Use it for one other interesting point that does not fit naturally into the other fields.

Summarization Rules

  • Write title, publish+time, and keywords in English.
  • Write all other fields in concise Chinese.
  • Base every field only on the PDF itself. Do not use web search or outside knowledge.
  • Prefer empty strings over guesses when information is missing.
  • Do not copy long sentences from the paper. Compress into short, high-density statements.
  • Keep each field self-contained and avoid repeating the same point across multiple fields.
  • Treat other as a supplementary highlight, not a duplicate of insight or 实验结论.

Extraction Guidance

  • Use the extracted text JSON as the working source.
  • Prefer the PDF metadata title when it is clean; otherwise infer the title from the first strong title-like line on the first page.
  • Use the first page and conclusion-related sections to recover title, publish+time, and 实验结论.
  • Use abstract, introduction, related work, method, experiments, and limitation/future-work passages to fill the remaining fields.
  • If extraction quality is poor for one file, keep the row conservative instead of hallucinating.

Scripts

  • scripts/paper2excel.py check-deps: Check whether pypdf and openpyxl are available.
  • scripts/paper2excel.py install-deps: Install missing packages into a target directory such as /tmp/paper2excel_deps.
  • scripts/paper2excel.py extract: Scan one folder, extract text from each PDF, and save JSON for downstream summarization.
  • scripts/paper2excel.py write-xlsx: Convert structured JSON rows into one single-sheet .xlsx file.

Example

python3 scripts/paper2excel.py check-deps
python3 scripts/paper2excel.py install-deps --target /tmp/paper2excel_deps
PYTHONPATH=/tmp/paper2excel_deps python3 scripts/paper2excel.py extract /path/to/papers --output /tmp/papers.json

After summarizing into /tmp/paper_rows.json:

PYTHONPATH=/tmp/paper2excel_deps python3 scripts/paper2excel.py write-xlsx /tmp/paper_rows.json --output /tmp/paper_summaries.xlsx

Como adicionar

/plugin marketplace add WindZh03/Paper2Excel-Skill

O comando exato pode variar conforme o repositório. Confira o README no GitHub.

Comentários · Nenhum comentário

Entre para comentar. Entrar

  • Ainda não há comentários. Seja o primeiro.