Someone with a little Linux knowledge could use “wget” or “linkchecker”.
Seth
On Apr 26, 2017, at 3:55 PM, James Saxe via Organizers
<organizers(a)lists.sharedweight.net> wrote:
I have a question for any of you who are involved with website
administration (e.g., for your local dance organization):
Do anyone know of good tools that will automatically go
through a site searching for stale links?
I've been prompted to make this query in part by the recent
messages about the NEFFA LinkFest, which is apparently no longer
maintained because it got to be too much work relative to the
perceived benefit (particularly relative to perceived benefit
for the volunteer maintainer(s)). However, I've noticed that
many local dance groups' websites include lists of external
links, and that while these lists are typically much smaller than
the LinkFest (containing perhaps a dozen to a hundred links vs.
almost 3000 in the LinkFest), they also often are not assiduously
maintained and so often include at least a few links that no
longer point to the expected content.
I recognize that the automatic detection of stale links may not
be a trivial problem, since there can be a variety of different
symptoms. I'e just gone looking through lists of links on
several different sites. Among the things I've found are:
* Links that fail with Error 404 (Not Found)
* Links that fail with Error 403 (Forbidden: You don't have
permission to access ...)
* Links that fail because the browser cant find the server.
* Links that get to the site that apparently used to contain
the desired page but now show a message like "We're sorry,
but we were unable to locate the page you requested."
(Perhaps attempts to follow these links actually produce a
404 code even if the text displayed to the user doesn't
include the string "404".)
* Links that go to sites offering to sell you the (expired)
domain name that included the target page.
* Links to pages that appear to have been taken over by new
owners and no longer display the original content. Among
other things, I've found pages full of text in Chinese
or Cyrillic characters; pages that used to have dance
information and seem to have been taken over by real estate
agents or financial service organizations; and pages that
say "WARNING!! THIS SITE CONTAINS ADULT MATERIALS ..."
(I did not click on the "Continue" button after the warning
to see whether it got me to traditional-dance-related
cotent).
* Links to pages that admit to being no longer maintained.
* Links to pages that in turn offer a (possibly good) link
to a new page with the desired content.
* Links that go to what seem to be the original target
pages, but that also seem not to have been updated in
several years. (For example a site might refer to
"upcoming" event that are now several years in the past.
One might then wonder whether there's a new, actively
maintained, site somewhere with similar, but current,
information.)
If someone creates a link the points to a recent blog entry,
someone who follows the link several years later might get to
now-most-recent page of the blog and only be able to reach
the originally referenced content by clicking something like
"older posts" a large--and unknown--number of times. Similarly,
links to online newspaper or magazine content might now point
to pages that are evidently full of newer content. The old
content might or might not be available somewhere, but there
may be no visible advice about how to find it.
When people migrate their websites, it seems to be common that,
if links into the old site get redirected at all, they end up
redirecting to some place like the home page of of the new site
instead of to the exact page (if one exists) corresponding to the
old target of the link.
Despite these comments about how links can go stale in various
ways that aren't immediately obvious, it seems to me that stale
link detection would be a useful feature for a wide variety of
website administrators--and not just for dance organizers. So
perhaps someone somewhere has put some effort into doing a good
job of it. I'd be interested if anyone knows of examples.
Thanks.
--Jim
_______________________________________________
Organizers mailing list
Organizers(a)lists.sharedweight.net
http://lists.sharedweight.net/listinfo.cgi/organizers-sharedweight.net