[Organizers] Query: Tools to search for stale links (was Re: is NEFFA LinkFest still active)

Seth Seeger via Organizers organizers at lists.sharedweight.net
Wed Apr 26 15:07:51 PDT 2017


Someone with a little Linux knowledge could use “wget” or “linkchecker”.

Seth



> On Apr 26, 2017, at 3:55 PM, James Saxe via Organizers <organizers at lists.sharedweight.net> wrote:
> 
> I have a question for any of you who are involved with website
> administration (e.g., for your local dance organization):
> 
>     Do anyone know of good tools that will automatically go
>     through a site searching for stale links?
> 
> I've been prompted to make this query in part by the recent
> messages about the NEFFA LinkFest, which is apparently no longer
> maintained because it got to be too much work relative to the
> perceived benefit (particularly relative to perceived benefit
> for the volunteer maintainer(s)).  However, I've noticed that
> many local dance groups' websites include lists of external
> links, and that while these lists are typically much smaller than
> the LinkFest (containing perhaps a dozen to a hundred links vs.
> almost 3000 in the LinkFest), they also often are not assiduously
> maintained and so often include at least a few links that no
> longer point to the expected content.
> 
> I recognize that the automatic detection of stale links may not
> be a trivial problem, since there can be a variety of different
> symptoms.  I'e just gone looking through lists of links on
> several different sites.  Among the things I've found are:
> 
>    * Links that fail with Error 404 (Not Found)
> 
>    * Links that fail with Error 403 (Forbidden: You don't have
>      permission to access ...)
> 
>    * Links that fail because the browser cant find the server.
> 
>    * Links that get to the site that apparently used to contain
>      the desired page but now show a message like "We're sorry,
>      but we were unable to locate the page you requested."
>      (Perhaps attempts to follow these links actually produce a
>      404 code even if the text displayed to the user doesn't
>      include the string "404".)
> 
>    * Links that go to sites offering to sell you the (expired)
>      domain name that included the target page.
> 
>    * Links to pages that appear to have been taken over by new
>      owners and no longer display the original content.  Among
>      other things, I've found pages full of text in Chinese
>      or Cyrillic characters; pages that used to have dance
>      information and seem to have been taken over by real estate
>      agents or financial service organizations; and pages that
>      say "WARNING!! THIS SITE CONTAINS ADULT MATERIALS ..."
>      (I did not click on the "Continue" button after the warning
>      to see whether it got me to traditional-dance-related
>      cotent).
> 
>    * Links to pages that admit to being no longer maintained.
> 
>    * Links to pages that in turn offer a (possibly good) link
>      to a new page with the desired content.
> 
>    * Links that go to what seem to be the original target
>      pages, but that also seem not to have been updated in
>      several years.  (For example a site might refer to
>      "upcoming" event that are now several years in the past.
>      One might then wonder whether there's a new, actively
>      maintained, site somewhere with similar, but current,
>      information.)
> 
> If someone creates a link the points to a recent blog entry,
> someone who follows the link several years later might get to
> now-most-recent page of the blog and only be able to reach
> the originally referenced content by clicking something like
> "older posts" a large--and unknown--number of times. Similarly,
> links to online newspaper or magazine content might now point
> to pages that are evidently full of newer content.  The old
> content might or might not be available somewhere, but there
> may be no visible advice about how to find it.
> 
> When people migrate their websites, it seems to be common that,
> if links into the old site get redirected at all, they end up
> redirecting to some place like the home page of of the new site
> instead of to the exact page (if one exists) corresponding to the
> old target of the link. 
> 
> Despite these comments about how links can go stale in various
> ways that aren't immediately obvious, it seems to me that stale
> link detection would be a useful feature for a wide variety of
> website administrators--and not just for dance organizers. So
> perhaps someone somewhere has put some effort into doing a good
> job of it.  I'd be interested if anyone knows of examples.
> 
> Thanks.
> 
> --Jim
> 
> _______________________________________________
> Organizers mailing list
> Organizers at lists.sharedweight.net
> http://lists.sharedweight.net/listinfo.cgi/organizers-sharedweight.net



More information about the Organizers mailing list