Revision 760623 of "ವಿಕಿಪೀಡಿಯ:Link rot" on knwiki

{{pp-semi-indef}}
{{about|(primarily) link rot in [[WP:External links|external links]]|broken section links within Wikipedia|Wikipedia:Database reports/Broken section anchors|internal links which point to deleted or non-existent articles|WP:REDLINKS|other uses|Wikipedia:Citing sources#Preventing and repairing dead links}}
{{redirect|WP:LR|Lua requests|Wikipedia:Lua requests}}
{{Wikipedia how to|WP:LR|WP:404|WP:ROT|WP:BADLINK|WP:LINKROT|WP:DEADLINK}}
{{nutshell|'''Link rot''' can kill poorly sourced citations, but steps may be taken to reduce or repair its effect. Do not merely delete cited information ''solely'' because the URL to the source does not work any longer.}}
Like most large [[website]]s, Wikipedia suffers from the phenomenon known as '''link rot''', where external links, often used as references and citations, gradually become irrelevant or broken (also called a '''[[WP:DEADREF|dead link]]'''), as the linked websites disappear, change their content, or move. This presents a significant threat to Wikipedia's [[WP:RS|reliability]] policy and its [[Wikipedia:Citing sources|source citation]] guideline.

The effort required to prevent [[link rot]] is significantly less than the effort required to repair or mitigate a rotten link. Therefore, prevention of link rot '''strengthens''' the encyclopedia. This guide provides strategies for preventing link rot before it happens. These include the use of [[web archiving]] services and the judicious use of [[Wikipedia:Citation templates|citation templates]].

Editors are encouraged to add an archive link as a part of each citation, or at least submit the referenced URL for archiving,<ref name="ia_form" group="note" /> at the same time that a citation is created or updated.

However, link rot cannot always be prevented, so this guide also explains how to mitigate link rot by finding previously archived links and other sources. These strategies should be implemented in accordance with [[Wikipedia:Citing sources#Preventing and repairing dead links]], which describes the steps to take when a link cannot be repaired.

Except for URLs in the [[Wikipedia:External links|External links]] section that have ''not'' been used to support any article content, '''do not delete''' cited information ''solely'' because the URL to the source does not work any longer.  Recovery and repair options and tools are available. ''[[WP:Verifiability|Verifiability]] does not require that all information be supported by a working link, nor does it require the source to be published online.''

==Preventing link rot==
{{shortcut|WP:PLRT}}
As you [[Wikipedia:Article development|write articles]], you can help prevent link rot in several ways. The first way to prevent link rot is to '''avoid [[Wikipedia:Bare URLs|bare URLs]]''' by recording as much of the exact '''title''', '''author''', '''publisher''' and '''date''' of the source as possible. Optionally, also add the '''accessdate'''. If the link goes bad, this added information can help a future Wikipedian, either editor or reader, locate a new source for the original text, either online or a print copy. This may be impossible with only an isolated, bare [[URL]] that no longer works. Local and school libraries are a good resource for locating such offline sources. Many local libraries have in-house subscriptions to digital databases or inter-library loan agreements, making it easier to retrieve hard-to-find sources.

''As you edit, if an article has bare URLs in its citations, fix them or at least tag the References section with {{tl|linkrot}} as a reminder to complete citation details as above, and to categorize the article as needing cleanup.''

===Web archive services===
{{see also|Wikipedia:List of web archives on Wikipedia|Wikipedia:Citing sources/Further considerations#Pre-emptive archiving}}
A second way to prevent link rot is to use a [[web archiving]] service. The two most popular services are the [[Wikipedia:Using the Wayback Machine|Wayback Machine]], which crawls and archives many web pages as well as having a form to suggest a URL to be archived,<ref name="ia_form" group="note">Using the web form at https://archive.org, enter a URL and click "browse history". This will either redirect to show the latest previously archived copy, present a box near the bottom of the page with a link inviting the user to "save this URL in the Wayback Machine", display a calendar showing the extent of previously archived content for that URL, or show an error message explaining why the URL cannot be archived. If archiving is attempted and ultimately successful, the archived copy usually becomes available within minutes.<br/>
Alternately, you can use the bookmarklets listed at [[Wikipedia:Citing sources/Further considerations#Archiving bookmarklets]]. The bookmarklets enable you to cause the page that you are viewing to be archived with a single click. A new tab will open with the progress of the archiving without disturbing the tab you are using to view the to-be-archived page. Bookmarklets are available for both Archive.org (the Wayback Machine) and WebCite.</ref> and [[Wikipedia:Using WebCite|WebCite]], which provides on-demand web archiving. These services collect and preserve web pages for future use even if the original web page is moved, changed, deleted, or placed behind a [[pay wall]]. Web archiving is especially important when [[WP:Citing sources|citing]] web pages that are unstable or prone to changes, like time sensitive [[news]] articles or pages hosted by financially distressed organizations. Once you have the URL for the archived version of the web page, use the <code>archiveurl=</code> and <code>archivedate=</code> parameters in the [[Wikipedia:Citation templates|citation template]] that you are using. The template will automatically incorporate the archived link into reference.
*{{cite web |url=http://www.freakonomics.com/2008/01/24/wall-street-journal-paywall-sturdier-than-suspected/ |title=Wall Street Journal Paywall Sturdier Than Suspected |last=Dubner |first=Stephen J. |publisher=The New York Times Company |date=January 24, 2008 |accessdate=2009-10-28}}
*{{cite web |url=http://www.freakonomics.com/2008/01/24/wall-street-journal-paywall-sturdier-than-suspected/ |title=Wall Street Journal Paywall Sturdier Than Suspected |last=Dubner |first=Stephen J. |publisher=The New York Times Company |date=January 24, 2008 |archiveurl=https://web.archive.org/web/20110815224523/http://www.freakonomics.com/2008/01/24/wall-street-journal-paywall-sturdier-than-suspected/ |archivedate=2011-08-15 }}
However, not every web page can be archived. Webmasters and publishers may use a [[Robots exclusion standard]] in their domain to disallow archiving, or rely on complicated [[JavaScript]], [[Adobe Flash|Flash]], or other code that is not easily copied. In these cases, alternate methods of preserving the data may be available.

====Robots.txt====
A quirk in the way the Wayback Machine operates means archived copies of sites sometimes become unavailable, for example, the [https://en.wikipedia.org/wiki/Freakonomics#Freakonomics_blog Freakonomics blog] previously hosted at <code>freakonomics.blogs.nytimes.com</code>. Those URLs were [https://web.archive.org/web/20130928002127/http://www.freakonomics.blogs.nytimes.com/robots.txt later] excluded from archiving by the New York Times' robots.txt file; this also made the previously archived content unavailable. robots.txt changes, however, can unhide that which previous changes have hidden, so do not delete an archiveURL solely because the archived content is currently unavailable.  Luckily, in this case, not only can the content be found on a new site that is still open to archiving, but the site's robots.txt later [https://web.archive.org/web/20131217041219/http://freakonomics.com/robots.txt changed] to allow archiving again, and so the old archives are now unhidden ([https://web.archive.org/web/20070623110922/http://freakonomics.blogs.nytimes.com/ example]).

===Alternative methods===
Most [[Wikipedia:Citation templates|citation templates]] have a <code>quote=</code> parameter that can be used to store text quotes of the source material. This can be used to store a limited amount of text from the source within the citation template. This is especially useful for sources that cannot be archived with web archiving services. It can also provide insurance against failure of the chosen web archiving service.

*{{cite web |url=http://freakonomics.blogs.nytimes.com/2008/01/24/wall-street-journal-paywall-sturdier-than-suspected/ |title=Wall Street Journal Paywall Sturdier Than Suspected |last=Dubner |first=Stephen J. |publisher=The New York Times Company|date=January 24, 2008 |archiveurl=https://web.archive.org/web/20080430085418/http://freakonomics.blogs.nytimes.com/2008/01/24/wall-street-journal-paywall-sturdier-than-suspected/ |archivedate=2008-04-30 |quote=''...the Wall Street Journal will not, as has been widely speculated, tear down its paywall entirely...''}}

When using the quote parameter, choose the most succinct and relevant material possible that preserves the context of the reference. Storing the entire text of the source is not appropriate under [[Wikipedia:Non-free content|fair use policies]], so choose only the most important portions of the text that most support the assertions in the Wikipedia article.

A quote also helps searching for other on-line versions of the source in the event that the original is discontinued.

Where applicable, [[Wikipedia:Public domain|public domain]] materials can be copied to [[Wikisource]].

==Repairing a dead link==
{{shortcut|WP:DEADLINK}}
{{Redirect|WP:DEADLINK|the related guideline|WP:DEADREF}}

There are several ways to try to repair a dead link, detailed below:

===Searching===
If the dead link includes enough information (article title, names, etc.) it is often possible to use it to find the Web page at a different location, either on the same site or elsewhere.

;Search the site
Often web pages have simply moved, either in connection with a migration to a new server, or through general site maintenance. A site index or site-specific search feature is a useful place to locate the moved page. If these tools are not available, many Internet search engines allow a search on a specified site.

;Search the Internet
A search engine query using the title of the page, possibly with a search restriction to the same site, might find the page. Using the examples from above, a web search (such as [[Google]], [[Yahoo]], etc.) might look like one of these:
:<code><nowiki>site:freakonomics.blogs.nytimes.com/ "Wall Street Journal Paywall Sturdier Than Suspected"</nowiki></code>
:<code><nowiki>site:nytimes.com/ "Wall Street Journal Paywall Sturdier Than Suspected"</nowiki></code>
:<code><nowiki>"Wall Street Journal Paywall Sturdier Than Suspected"</nowiki></code>
Also, a search for some components of the dead link with punctuation removed is often fruitful; e.g. a search through Google for
:<code>[https://google.com/search?q=groups.csail.mit.edu+sFFT+paper+pdf groups.csail.mit.edu sFFT paper pdf]</code>
leads to a page enabling {{diff|page=Algorithmic_efficiency|diff=prev|oldid=516174744|label=this fix}}.  A search for an unusual or unique-looking substring of the URL, such as just the filename at the end, is often fruitful.

===Internet archives===
Check for archived versions of the page in the [[Web archiving|archiving services]]. If you find an archived version of the dead link, double-check to make sure that citation still supports the article text. It is also a good idea to consult the access date of the citation (if it was specified, or a history search for when it was added) to see how contemporaneous this archived version is to the link when it was cited.

The following archiving services are considered to be reliable:
*[[Wayback Machine]] at https://archive.org/web/
*[[WebCite]] at http://www.webcitation.org/query
*[[UK Government Web Archive]] at http://webarchive.nationalarchives.gov.uk/

The [http://www.webarchive.org.uk/mementos/search Mementos interface] allows you to search multiple [[Web archiving|archiving services]] for archived versions of some pages with a single request using the [[Memento Project|Memento]] protocol. Unfortunately, the Mementos webpage interface removes any [[Query string|parameters]] which are included with the URL. If the URL contains a "?" it is unlikely to work properly when entered manually without changes. When entering the URL into the Mementos interface manually, the most common [[Percent-encoding|change needed]] is to change "?" to "%3F". While making only this change will not be sufficient in all cases, it will work most of the time. The bookmarklet in the table below will properly [[Percent-encoding|encode]] URLs such that searches will work. Mementos looks like it is, or at least will be, very convenient. However, if archives are not found at Mementos, it should not be the only site checked. Mementos can sometimes return no results when archives exist at sites which it normally includes.  An example of this is trying to find archives of [[Battle of the Atlantic]].  {{As of|April 2014}}, Archive.org reports it has 63 or 64 archives ([https://web.archive.org/web/*/https://en.wikipedia.org/wiki/Battle_of_the_Atlantic https], [https://web.archive.org/web/query?type=urlquery&url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FBattle_of_the_Atlantic&Submit=BROWSE+HISTORY http]). Mementos reports ''0 archives'' ([http://www.webarchive.org.uk/mementos/search/https://en.wikipedia.org/wiki/Battle_of_the_Atlantic https], [http://www.webarchive.org.uk/mementos/search/http://en.wikipedia.org/wiki/Battle_of_the_Atlantic http]). Mementos usually finds archives at Archive.org, but sometimes Mementos does not even when archives exist.  If you try Mementos first, don't assume that there really are no archives if Mementos reports that there are none.

There are [[List of Web archiving initiatives|many Internet archive projects]] in existence.

When multiple archive dates are available, try to use the one that is most likely to be the contents of the page seen by the editor who entered the reference on the {{para|accessdate}}. If that parameter is not specified, a [http://wikipedia.ramselehof.de/wikiblame.php?lang=en search of the article's revision history] can be performed to determine when the link was added to the article.

View the archive to verify that it contains valid page information.  Sometimes archives are actually archives of the fact that the link is dead, or that the archiving failed.  If this is the case, try using an archive from a different date.  Usually dates closer to the time the link was placed in the Wikipedia page, or earlier, are more likely to show valid information.  Different archiving sites should also be tried.

If an archived version of a page is found  for which the dead link supplied little information, the additional information may be enough, with a little extra work, to find a live copy. For example, the archived version of a dead bare link may provide title and author, allowing a live version to be found. An [https://en.wikipedia.org/w/index.php?title=Japonaiserie_%28Van_Gogh%29&type=revision&diff=759338148&oldid=759320052 actual example]: the dead link <nowiki>[http://www.vangoghmuseum.nl/vgm/index.jsp?page=2122&lang=en Van Gogh Museum, Amsterdam] leads to http://wayback.archive.org/web/20140323172316/http://www.vangoghmuseum.nl/vgm/index.jsp?page=2122&lang=en, which gives the title ''The Courtesan (after Eisen), 1887''; a search on the www.vangoghmuseum.nl site finds a live link.</nowiki>

For most citation templates, archives are entered using the required {{para|archiveurl}}, {{para|archivedate}} and optional {{para|deadurl}} parameters (archive-url, archive-date, and dead-url are synonyms). The primary link is automatically switched to the archive unless {{para|deadurl|no}}; the {{para|deadurl}} parameter can simply be omitted. To pre-emptively supply an archived version of a URL that may later go dead, {{para|deadurl|yes}} (or y, or true) will change the display order, with the title retaining the original link and the archive linked at the end.  When the original URL has been usurped for the purposes of spam, advertising, or is otherwise unsuitable, setting {{para|deadurl|unfit}} or {{para|deadurl|usurped}} suppresses display of the original URL (but |url= is still required).

<div style="overflow:auto;">
{| class="wikitable" style="margin: 1em auto 1em auto;"
|+ Bookmarklets to check common archive sites for archives of the current page<span style="font-weight:normal;"><br/>(all open in a new tab or window)</span>
|-
! Archive site !! Bookmarklet
|-
| Archive.org || <source lang="javascript">javascript:void(window.open('https://web.archive.org/web/*/'+location.href))</source>
|-
| UKGWA || <source lang="javascript">javascript:void(window.open('http://webarchive.nationalarchives.gov.uk/*/'+location.href))</source>
|-
| WebCite || <source lang="javascript">javascript:void(window.open('http://www.webcitation.org/query.php?url='+location.href))</source>
|-
| Wikiwix || <source lang="javascript">javascript:void(window.open('http://archive.wikiwix.com/cache/?url='+location.href))</source>
|-
| Mementos interface || <source lang="javascript">javascript:void(window.open('http://www.webarchive.org.uk/mementos/search/'+encodeURIComponent(location.href)+'?referrer='+encodeURIComponent(document.referrer)))</source>
|}</div>

The following archiving services were not permitted on the English Wikipedia:
* archive.is See [[WP:Archive.is RFC]] and [[WP:Archive.is RFC 3]] for more information.
It was removed from the blacklist in [[WP:Archive.is RFC 4]].

==Mitigating a dead link==
{{shortcut|WP:MDLI}}
At times, all attempts to repair the link will be unsuccessful. In that event, consider finding an alternate source so that the loss of the original does not harm the verifiability of the article. Alternate sources about broad topics  are usually easily located. A simple search engine query might locate an appropriate alternative, but be extremely careful to avoid citing [[Wikipedia:Mirrors and forks|mirrors and forks of Wikipedia]] itself, which would violate [[Wikipedia:Verifiability]].

Sometimes, finding an appropriate source is not possible, or would require more extensive research techniques, such as a visit to a library or the use of a subscription-based database. If that is the case, consider consulting with Wikipedia editors at [[Wikipedia:WikiProject Resource Exchange]], the [[Wikipedia:Village pump]], or [[Wikipedia:Help desk]]. Also, consider contacting experts or other interested editors at a relevant [[Wikipedia:WikiProject Council/Directory|WikiProject]].

==Keeping dead links==
{{shortcut|WP:KDL}}
A dead, unarchived source URL may still be useful. Such a link indicates that information was (probably) verifiable in the past, and the link might provide another user with greater resources or expertise with enough information to find the reference. It could also return from the dead. With a dead link, it is possible to determine if it has been cited elsewhere, or to contact the person originally responsible for the source. For example, one could contact the Yale Computer Science department if http://www.cs.yale.edu/~EliYale/Defense-in-Depth-PhD-thesis.pdf{{dead link}} were dead. Place {{tl|dead link}} after the dead URL and just before the <code></ref></code> tag if applicable, leaving the original link intact.
Placing {{tl|dead link}} auto-categorizes the article into [[:Category:Articles with dead external links|Articles with dead external links]] project category, and into specific monthly date range category based on {{para|date}} parameter.  '''Do not delete''' a URL just because it has been tagged with {{tl|dead link}} for a long time.

==Automated tools==
There have been at least 6 bots over the years that proactively and automatically archive external URLs.  As of April 2017 two bots are operational. The primary bot is [[User:InternetArchiveBot|InternetArchiveBot]] (operator [[user:Cyberpower678|Cyberpower678]]) which can also be run on individual pages by anyone: click on a page "History" tab and find the link for "Fix dead links". The other bot is [[WP:WAYBACKMEDIC|WaybackMedic]] (operator [[User:Green Cardamom|Green Cardamom]]) which primarily checks for link rot among the archive links themselves, plus fixes various other problems related to archives. 

[http://wummel.github.io/linkchecker/ LinkChecker] is an open-source tool that can scan for broken links on any website, including Wikipedia.

==Link rot on non-Wikimedia sites==
{{shortcut|WP:EXTERNALROT}}
Non-Wikimedia sites are also susceptible to link rot. Following a [[Wikipedia:Moving a page|page move]] or [[Wikipedia:Deletion process|page deletion]], links to Wikipedia pages from other websites may break. In most page moves, a [[Wikipedia:Redirect|redirect]] will remain at the old page—this won't cause a problem. But if a page is completely deleted or [[Wikipedia:Moving a page#Usurping a page title|usurped]] (i.e. replaced with other content) then link rot will have been caused on any external websites that link to it.

Replacement of page content with a [[Wikipedia:Disambiguation|disambiguation page]] may still cause link rot, but is less harmful because a disambiguation page is essentially a type of [[Wikipedia:Soft redirect|soft redirect]] that will lead the reader to the required content. If a page is usurped with content for another subject that shares its name, a [[Wikipedia:Hatnote|hatnote]] may be placed at the top that directs readers to the original content on its new page—this again is a type of soft redirect, but less obvious. In these cases, readers arriving from an external rotten link should be able to find what they're looking for, but the situation is best avoided as they would have to get there via an additional page, potentially giving a poor impression of both Wikipedia and the linking website.

Because the Wikipedia software does not store [[HTTP referer|<code>Referer</code> information]], it will be impossible to tell how many external web pages will be affected by a move or deletion, but the risk of link rot will probably be greatest on older and higher profile pages. In truth, there is not a lot that can be done; maintenance of non-Wikimedia websites is not within the scope of being a Wikimedian, nor in most cases within our capability (although if they ''can'' be fixed, it would be helpful to do so). However, it may be good practice to think about the potential impact on other sites when deleting or moving Wikipedia pages, especially if no redirect or hatnote will remain. If a move or deletion is expected to cause significant damage, then this might be a factor to consider in [[WP:RM]], [[WP:AFD]] and [[WP:RFD]] discussions, although other factors may carry more weight.

==See also==
*[[List of HTTP status codes]].
*[[Wikipedia:Citing sources#Preventing and repairing dead links]].
*[[Wikipedia:External links#Longevity of links]]—prescribes removal of dead URLs from the "External links" section.
*[[Wikipedia:Offline sources]]—essay.
*[[Wikipedia:Using the Wayback Machine]]—how-to guide.
*[[Wikipedia:Using WebCite]]—how-to guide.
*[[Wikipedia:Citing_sources/Further_considerations#Pre-emptive_archiving]]—brief guide on how to use various archiving services.
*[[Wikipedia:WikiProject External links|WikiProject External links]]—dedicated to cleaning up overly long lists of external links and having articles conform to Wikipedia's external links guidelines.
*[[:Category:Articles with bare URLs for citations]]—the backlog of articles containing bare URLs at risk of link rot, sub-categorised by month.
*[[:Category:Articles with dead external links]]—the backlog of articles containing dead links, sub-categorised by month.
*[[Special:LinkSearch]]—to find all the pages that contain a particular URL.

===Bots===
*[[WP:STiki/Dead_links]]—Page reporting NEWLY added dead links, a component of the [[WP:STiki|STiki project]].
*<!--[[User:RileyBot]]—approved to update, migrate and replace broken links with new functioning links.  Submit requests for link updates at [[User:RileyBot/Requests]].-->
*[[User:MerlLinkBot]]—purpose is to change links in articles which are outdated and can be successfully replaced by a new one. Submit requests for link updates to the bot's [[User talk:MerlLinkBot|talk page]].
<!--*[[User:DeadLinkBOT]]—(inactive since 2009) purpose is to update dead links caused by link rot. Submit any updatable links found (old + new locations) to the bot's [[User talk:DeadLinkBOT|talk page]]. After human verification, the bot automatically updates affected articles.
*[[User:WebCiteBOT]]—(inactive since 2009) purpose is to combat link rot by automatically [[WebCite|WebCiting]] newly added URLs.-->
*[[User:Legobot]]—can mass tag links with {{tlx|dead link}}. Requests can be made at [[User talk:Legoktm]].
<!--*[[User:Dispenser/Reflinks]]-can automatically or semi-automatically add information to references using data present in the web page.-->
*[[User:InternetArchiveBot]]—automatically fixes dead links whenever possible, and tags them when it isn't.

=={{Anchor|Tools}} External links==
<!-- This Anchor tag serves to provide a permanent target for incoming section links. Please do not move it out of the section heading, even though it disrupts edit summary generation (you can manually fix the edit summary before you save your changes). Please do not modify it, even if you modify the section title. It is always best to anchor an old section header that has been changed so that links to it won't be broken. See [[Template:Anchor]] for details. (This text: [[Template:Anchor comment]]) -->
*[[mw:Manual:Pywikipediabot/weblinkchecker.py|weblinkchecker.py]]—script from the [[mw:Manual:Pywikipediabot/Basic_use|Python Wikipedia Bot]] collection which finds broken external links.
<!--*[[tools:~dispenser/view/Checklinks|tools:Checklinks]]—an external link checker [[tools:~dispenser/view/Main_Page|tool]] for Wikimedia Foundation projects, which lists dead links and allows recovery using archiving services.-->
*[http://undeadlinks.org/ UndeadLinks.org]—allows you to search for a broken link's new address.
*[https://addons.mozilla.org/en-US/firefox/addon/resurrect-pages/ Resurrect Pages]—add-on for Firefox, provides links to seven cache/archive websites upon coming across a dead link.
*[https://addons.mozilla.org/en-us/firefox/addon/404-error/ 404-Error?]—add-on for Firefox, automatically brings you to the archive.org version upon coming across a dead link.
*[http://antelle.net/safari/ PageHistory]—addon for Safari.
*[https://addons.opera.com/en/extensions/details/webcache/?display=en Webcache]—add-on for Opera.
*[https://chrome.google.com/webstore/detail/web-cache/coblegoildgpecccijneplifmeghcgip Web Cache]—add-on for Chrome.
*[https://www.archive.org Internet Archive]<ref name="ia_form" group="note" />
*[https://twitter.com/BrokenWikiLinks BrokenWikiLinks Twitter bot]—tweets about pages with broken links

==Notes==
{{Reflist|group="note"}}

{{Wikipedia essays|building}}

[[Category:Wikipedia maintenance|{{PAGENAME}}]]
[[Category:Wikipedia essays on building the encyclopedia]]
All content in the above text box is licensed under the Creative Commons Attribution-ShareAlike license Version 4 and was originally sourced from https://kn.wikipedia.org/w/index.php?oldid=760623.
This site is not affiliated with or endorsed in any way by the Wikimedia Foundation or any of its affiliates. In fact, we fucking despise them.