Merge augustohp/up into main #1

Merged

marvin merged 14 commits from augustohp/up into main

2026-03-23 01:06:12 +00:00

marvin commented

2026-03-23 01:06:06 +00:00

Owner

No description provided.

marvin added 14 commits

2026-03-23 01:06:06 +00:00

Documents scraper contract and conventions for AI agents 609279cf61

Adds AGENTS.md describing this repo’s structure, the required 3-function
scraper API, available Lua modules, and project conventions (ordering,
indexing, file layout).

This helps AI agents (and humans) generate and review scrapers that match
mangal’s expectations without relying on upstream code spelunking.

Removes dead scrapers and documents how to prune sources 4f1e6e44bf

Adds a maintainer instructions doc to check scraper health so we can
regularly retire providers that were shut down, repurposed, or sold.

Removes five scrapers whose target sites no longer serve readable
content, avoiding broken UX and noisy failures during runs.

Updates AsuraScans to follow the asurascans.com → asuracomic.net
redirect, keeping the provider working without changing its API.

Considered keeping deprecated scrapers as disabled stubs, but deletion
keeps the provider set truthful and reduces ongoing maintenance.

Removes FlameScans scraper after site shutdown 08e112a891

FlameScans no longer serves readable content, so remove the scraper to
avoid broken UX and noisy failures during runs.

Drops the README entry so we stop advertising a dead provider.
Considered keeping a disabled stub, but deletion keeps the set truthful
and reduces ongoing maintenance.

Restores AsuraScans scraping after Next.js redesign 85b71c9f4a

AsuraScans moved from the old WordPress theme to a Next.js SSR app, which
removed the previous selectors and made the headless path unreliable in
devcontainers.

Switch the scraper to pure HTTP fetching. Search now queries the series
listing and extracts titles from series links. Chapter discovery parses
the scrollable chapter list and normalizes relative URLs. Page URLs are
recovered from the Next.js RSC flight payload via regex, then sorted and
deduplicated. This avoids requiring Chromium while keeping results stable.

Testing and expectations

Clear cached data first.

    ./mangal clear --cache && ./mangal clear --anilist

Basic search should return exit code 0 and JSON with the manga name and
URL.

    ./mangal inline --source AsuraScans \
        --query "Greatest Estate Developer" --json

Full pipeline should populate chapters and pages.

    ./mangal inline --source AsuraScans \
        --query "Greatest Estate Developer" \
        --manga first --chapters first \
        --json --populate-pages

Expected result is one match named "The Greatest Estate Developer", about
222 chapters, about 19-20 pages per chapter, and image URLs served from
gg.asuracomic.net.

Removes Mangatoto scraper and docs entry 27c5ed07ce

Mangatoto (mangatoto.com) is no longer a working target, so the scraper was
dead code and a recurring source of confusion/failures.

This deletes `scrapers/Mangatoto.lua` and removes it from `README.md` to keep
the supported sources list accurate.

Defines TestQueries for automated live scraper smoke tests be245a083e

Adds TestQueries() to selected Lua scrapers with a stable query and expected
title, so the mangal scraper integration tests can exercise real pages using
known-good inputs.

This is backwards compatible with older mangal versions: the extra TestQueries()
function is unused unless the scrapers-tagged test harness calls it.

Introduces make targets to run all scrapers or a single scraper through the
mangal test harness while pointing it at this scrapers directory.

Restores Toonily scraping after site requires JavaScript rendering 0369c333d5

Search, chapter listing, and page image extraction now run through the
headless browser flow instead of the direct HTTP client.

The old approach stopped returning reliable results after the site moved
more of the UI behind client-side rendering. This keeps the scraper
functional at the cost of requiring a headless runtime; when unavailable,
the scraper returns empty results instead of failing mid-run.

Re-enables go vet for scraper integration tests 6b5974b511

The vet failures that required disabling vet have been fixed upstream,
so the Makefile no longer forces -vet=off for scraper test targets.

This restores the default safety checks without changing the test
selection, tags, or timeouts.

PLan to create new scrapers based on reference sources 29c4d08ee9

Removes scrapers blocking scraping 1db209be92

Toonily can be reached via proxy for some requests, but chapter/page image
downloads are not reliable due to Cloudflare protection.
ManhuaUs is very fragile, works with proxy but testing it breaks all the
time.

Removing the scraper avoids shipping a broken source and makes the failure
mode explicit. We can re-add it once a headless/proxy download flow is
stable.

Adds 9 scrapers and scaffold tooling for template-based maintenance 443851306c

Scrapers remain self-contained all-in-one .lua files to stay compatible
with the existing scraper loader. Many sites share the same CMS, so the
scraping logic would be identical across scrapers if copied manually.

Templates
---------

Live under ./lib and are inlined into ./scrapers/*.lua at generation time
via scripts/scaffold.sh. Scrapers carry no runtime dependency on lib/.
`make rebuild` re-inlines a template across all scrapers using it while
preserving per-site customizations (URL, author, TestQueries,
SourceOptions) tracked inside [SITE_CUSTOMIZATIONS_*] markers.

New scrapers
------------

- TCBScans: Custom HTML (Tailwind selectors) for One Piece chapters
- WeebCentral: Custom HTMX endpoints for search + chapter list
- Comix: JSON REST API for comic data
- HiveScans: Iken JSON API (modern scanlation platform)
- MangaReadOrg: Madara WordPress template
- MangaTx: MangaThemesia template
- MangaBat: MangaBox template with CDN image delivery
- MangaSect: Liliana template with client-side search filtering
- ToonilyMe: MadTheme template (requires BrightData proxy)

Removes LuminousScans (not working) a5c3ec1ce1

Removes deprecated-source cleanup instructions 71aee1a2f8

Automated tests now surface broken/deprecated scrapers directly, so the
manual, LLM-assisted debugging workflow is redundant.

This drops .github/instructions/remove-deprecated-sources.instructions.md
to keep the repo focused on the test-driven maintenance path.

Documents WebtoonXYZ blocker and fixes MangaTx search query 737fc27b78

Record that WebtoonXYZ is blocked by Cloudflare Turnstile and mark the
Phase 2 task as blocked instead of pending implementation.
The source now remains deferred so effort stays on implementable scrapers.

Capture the failed approaches we tested: direct HTTP, BrightData proxy,
headless browser waits, and abandoned external solver libraries.
These options do not satisfy Turnstile fingerprint validation in this
HTTP-only scraper architecture.

Fix MangaTx search by using the site's `search=` parameter instead of
`title=` and align TestQueries with a stable expected result.
This preserves live smoke test intent while reflecting current behavior.