Import Web Markdown With Gather
Purpose
Use gather as the default local tool for converting a URL into readable markdown.
Recommended Defaults
Run gather with these settings unless the user asks otherwise:
gather --metadata-yaml --inline-links --no-paragraph-links "<url>"
Rationale:
--metadata-yaml: Adds title/date/source in front matter for downstream indexing.--inline-links: Keeps links close to text for RAG/chunk readability.--no-paragraph-links: Avoids repeated reference blocks after each paragraph.
Required Workflow
-
Validate input:
- Accept only
http://orhttps://URLs. - If input is not a URL, ask for one.
- Accept only
-
Run gather:
-
Primary command:
gather --metadata-yaml --inline-links --no-paragraph-links "<url>"
-
-
On failure, retry with fallback mode:
-
First fallback:
gather --metadata-yaml --inline-links --no-paragraph-links \ --no-readability "<url>" -
If the page still fails and raw HTML is available, pass HTML directly:
printf "%s" "$HTML" | gather --html --stdin --metadata-yaml \ --inline-links --no-paragraph-links
-
-
Return markdown text as the main result.
Output Contract
When successful, return:
url: original URLtitle: extracted title when availablemarkdown: full markdown bodyused_fallback:trueif--no-readabilityor--htmlpath was used
Safety And Limits
- Do not execute JavaScript from pages.
- Do not follow login-only pages automatically.
- Preserve the original URL in output metadata.
- If output is empty or too short, report a partial extraction warning.
Examples
Basic import:
gather --metadata-yaml --inline-links --no-paragraph-links "https://example.com/article"
Fallback when readability extraction fails:
gather --metadata-yaml --inline-links --no-paragraph-links --no-readability "https://example.com/article"
Optional Variants
-
Add title only:
gather --title-only "<url>" -
Plain body without source/title injection:
gather --no-include-source --no-include-title "<url>"