Merge augustohp/up into main #1

Merged
marvin merged 14 commits from augustohp/up into main 2026-03-23 01:06:12 +00:00
Owner
No description provided.
Adds AGENTS.md describing this repo’s structure, the required 3-function
scraper API, available Lua modules, and project conventions (ordering,
indexing, file layout).

This helps AI agents (and humans) generate and review scrapers that match
mangal’s expectations without relying on upstream code spelunking.
Adds a maintainer instructions doc to check scraper health so we can
regularly retire providers that were shut down, repurposed, or sold.

Removes five scrapers whose target sites no longer serve readable
content, avoiding broken UX and noisy failures during runs.

Updates AsuraScans to follow the asurascans.com → asuracomic.net
redirect, keeping the provider working without changing its API.

Considered keeping deprecated scrapers as disabled stubs, but deletion
keeps the provider set truthful and reduces ongoing maintenance.
FlameScans no longer serves readable content, so remove the scraper to
avoid broken UX and noisy failures during runs.

Drops the README entry so we stop advertising a dead provider.
Considered keeping a disabled stub, but deletion keeps the set truthful
and reduces ongoing maintenance.
AsuraScans moved from the old WordPress theme to a Next.js SSR app, which
removed the previous selectors and made the headless path unreliable in
devcontainers.

Switch the scraper to pure HTTP fetching. Search now queries the series
listing and extracts titles from series links. Chapter discovery parses
the scrollable chapter list and normalizes relative URLs. Page URLs are
recovered from the Next.js RSC flight payload via regex, then sorted and
deduplicated. This avoids requiring Chromium while keeping results stable.

Testing and expectations

Clear cached data first.

    ./mangal clear --cache && ./mangal clear --anilist

Basic search should return exit code 0 and JSON with the manga name and
URL.

    ./mangal inline --source AsuraScans \
        --query "Greatest Estate Developer" --json

Full pipeline should populate chapters and pages.

    ./mangal inline --source AsuraScans \
        --query "Greatest Estate Developer" \
        --manga first --chapters first \
        --json --populate-pages

Expected result is one match named "The Greatest Estate Developer", about
222 chapters, about 19-20 pages per chapter, and image URLs served from
gg.asuracomic.net.
Mangatoto (mangatoto.com) is no longer a working target, so the scraper was
dead code and a recurring source of confusion/failures.

This deletes `scrapers/Mangatoto.lua` and removes it from `README.md` to keep
the supported sources list accurate.
Adds TestQueries() to selected Lua scrapers with a stable query and expected
title, so the mangal scraper integration tests can exercise real pages using
known-good inputs.

This is backwards compatible with older mangal versions: the extra TestQueries()
function is unused unless the scrapers-tagged test harness calls it.

Introduces make targets to run all scrapers or a single scraper through the
mangal test harness while pointing it at this scrapers directory.
Search, chapter listing, and page image extraction now run through the
headless browser flow instead of the direct HTTP client.

The old approach stopped returning reliable results after the site moved
more of the UI behind client-side rendering. This keeps the scraper
functional at the cost of requiring a headless runtime; when unavailable,
the scraper returns empty results instead of failing mid-run.
The vet failures that required disabling vet have been fixed upstream,
so the Makefile no longer forces -vet=off for scraper test targets.

This restores the default safety checks without changing the test
selection, tags, or timeouts.
Toonily can be reached via proxy for some requests, but chapter/page image
downloads are not reliable due to Cloudflare protection.
ManhuaUs is very fragile, works with proxy but testing it breaks all the
time.

Removing the scraper avoids shipping a broken source and makes the failure
mode explicit. We can re-add it once a headless/proxy download flow is
stable.
Scrapers remain self-contained all-in-one .lua files to stay compatible
with the existing scraper loader. Many sites share the same CMS, so the
scraping logic would be identical across scrapers if copied manually.

Templates
---------

Live under ./lib and are inlined into ./scrapers/*.lua at generation time
via scripts/scaffold.sh. Scrapers carry no runtime dependency on lib/.
`make rebuild` re-inlines a template across all scrapers using it while
preserving per-site customizations (URL, author, TestQueries,
SourceOptions) tracked inside [SITE_CUSTOMIZATIONS_*] markers.

New scrapers
------------

- TCBScans: Custom HTML (Tailwind selectors) for One Piece chapters
- WeebCentral: Custom HTMX endpoints for search + chapter list
- Comix: JSON REST API for comic data
- HiveScans: Iken JSON API (modern scanlation platform)
- MangaReadOrg: Madara WordPress template
- MangaTx: MangaThemesia template
- MangaBat: MangaBox template with CDN image delivery
- MangaSect: Liliana template with client-side search filtering
- ToonilyMe: MadTheme template (requires BrightData proxy)
Automated tests now surface broken/deprecated scrapers directly, so the
manual, LLM-assisted debugging workflow is redundant.

This drops .github/instructions/remove-deprecated-sources.instructions.md
to keep the repo focused on the test-driven maintenance path.
Record that WebtoonXYZ is blocked by Cloudflare Turnstile and mark the
Phase 2 task as blocked instead of pending implementation.
The source now remains deferred so effort stays on implementable scrapers.

Capture the failed approaches we tested: direct HTTP, BrightData proxy,
headless browser waits, and abandoned external solver libraries.
These options do not satisfy Turnstile fingerprint validation in this
HTTP-only scraper architecture.

Fix MangaTx search by using the site's `search=` parameter instead of
`title=` and align TestQueries with a stable expected result.
This preserves live smoke test intent while reflecting current behavior.
marvin merged commit ff1528d558 into main 2026-03-23 01:06:12 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
mangal2/mangal-scrapers!1
No description provided.