Broken Links Part 2: Tips and Techniques For Repairing Your Website

Repair broken links

In Broken Links Part 1: Checking Tools for Your Website, we tool a look at a few tools to help you identify links that no longer work on your website. In Part 2, we will now go over some tips and techniques to help you repair these website links.

So, you are having trouble figuring out how to fix some broken links on your website. Can't remember what the broken link used to point to? Not sure why the broken link you fixed keeps showing up in your broken link report? You've come to the right place.

The following information assume you are using Xenu Link Sleuth to scan your website but most of the information should be applicable regardless of the tool you use.

Always Empty Your Website and Web Browser Cache

If you are using a web cache of any kind, be sure to clear it before you check your website for broken links. This may even involve restarting your web server software. I've seen more than one website try to troubleshoot a broken link issue only to discover that they had resolved the problem an hour earlier but the web server was still serving the old version of the page.

Similarly, while troubleshooting a broken link issue, if the problem just doesn't seem to go away, clear your web browsers cache, close the web browser and start it up again.

The Internet Archives

Google Cache and Internet Archives: Wayback Machine - These two tools can be extremely valuable to help you see the original page pointed to by the broken link so that you can figure out where it was moved to. Xenu conviniently allows you to right-click on a URL (not the HTML report) to get instant access to these archives.

Broken Links Appearing to Come From Other Sites

At times Xenu will report broken links on other sites. This happens when the link on your site leads to two or more redirections on the target site followed by a broken link (ex: doesn't exist). When this happens, Xenu reports the broken as coming from on that site. In this case, only your knowledge of your own website will help you locate the broken link on your website in this case. You may be able to find some help by identifying the content of the original destination page using Google Cache or the Wayback Internet Archives.

Links That are Not Visible

In rare cases, a link may not be visible. This is most common when a link has not been completely removed from a page. This can happen if you delete all the text that formed a link but left the link there. To find these hidden links, right click on a blank area of the webpage with the mouse and select View Source from the context menu. Depending on the web browser that you are using, the source of the page may open in a text editor/viewer or as a tab in your browser. You will then be able to search for the link that has causes the error to occur. You can then go back to the source code in the CMS and fix/remove the link.

Links That Are Not Created Correctly

Occasionally you may find a broken link that looks something like this:

This happens when you forget to add the http:// to the front of the link. The http://yourdomain.com porttion of a link is only optional the link is pointing to the same domain as the page is on. If it's on a different domain, you MUST add the http:// in front of it. For files on an FTP site, use ftp:// instead. For email, it's mailto: (no forward slashes!).

Another common situation results in links that look like:

http://yourdomain.com/regions/ontario/images/map.jpg

If you know that map.jpg is actually in http://yourdomain.com/images/map.jpg, chances are pretty good that your link to the image was specified as images/map.jpg instead of /images/map.jpg. The first forward slash tells the web browser to look for the images folder in the root of your website instead of in the current folder of the web page. The only time you might want to not use the first forward slash is when the images folder is relative to the location of a style (.css) stylesheet file. Just remember, links are always relative to the file in which they appear when you don't include the domain or at least an initial forward slash.

Broken Websites

Changing or removing the link on your website isn't always the solution. First, wait a few minutes, maybe even a day before assuming that a website is gone forever. The website may just be temporarily down for a few minutes or hours depending on the severity of the problem.

If it doesn't come back, contact the owner of the website to let them know that their website isn't working. Not only will they appreciate it, they will likely fix it or let you know a) where it was moved to or b) that it won't be back. You can often find their contact information using Google, Google Cache or the Wayback machine. A quick telephone call just might be all it takes.

Do you know how often your website goes down for short periods of time?

Understanding the Xenu Link Sleuth Error Messages

The standard error reports contain one or more blocks of information. A block lists the link which creates the error, followed by the page(s) that contain the bad link. There may be more than one page with the bad link; if so, the pages are listed together in the block. e.g.

http://www.chamberofcommerce.com/

error code: 12007 (no such host), linked from page(s):

  • http://yourdomain.com/regions/ontario/niagara.aspx
  • http://m.yourdomain.com/regions/ontario/niagara.aspx

In the above example, two pages contained a link to http://www.chamberofcommerce.com/. The two pages are:

  • http://yourdomain.com/regions/ontario/niagara.aspx
  • http://m.yourdomain.com/regions/ontario/niagara.aspx

You will have to go to the page with the bad link, e.g. http://yourdomain.com/regions/ontario/niagara.aspx, and look for the bad link. Hover your mouse over links on the page, and observe the actual address in your web browser’s status bar until you locate it. If you don't see the status bar, you'll need to turn it on. If you don't find the link, you will need to search for it in the source code.

Common Error Codes that Accompany Bad Links

87 (parameter incorrect)

This usually implies an incorrect URL type, specified by the start of the address. e.g. http:/// instead of http://.

300 (ambiguous)

The address is for a page or resource that cannot be uniquely defined. This implies really that you have the correct root, but the complete path is not correct.

301 (object permanently moved)

The site owner probably rebuilt part or all of the website and the item is no longer at this address. It may exist elsewhere on the site; remove everything after the root, and you will probably see the site’s home page.

302 (object temporarily moved)

See 301 redirections. Troubleshooting is done in the same way.

400 (no object data)

This link no longer works. It may exist elsewhere on the site; remove everything after the root, and you will probably see the site’s home page.

401 (auth required)

This link requires a login to access. When running the link checker, the request to see this page or asset was likely cancelled if you were prompted. If you don't know the user name and password, then others won’t either, and this item should not normally appear on a publicly-available website.

403 (forbidden request)

Click on the bad link. If it works, ignore the error message. It means merely that the link can be accessed in normal browsing, but cannot be accessed by automated program like link checkers.

Or, the site may truly be restricted to people coming from a specific place as might be the case with a corporate intranet accessible only to branches around the world. You may need to try this link from inside the business if this is the case.

Occasionally, a website will designate it’s ‘page not found’ page to be forbidden, in which case this error is misleading. A error 404 error would have been more appropriate.

Sometimes a broken website will return a forbidden response to a programming error in the website, in which case this error is again misleading. A error 500 error would have been more appropriate.

404 (not found)

This link no longer works. It may exist elsewhere on the site; remove everything after the root, and you will probably see the site’s home page.

500 (server error)

Most likely the site being accessed is having troubles - it is overloaded or down. But it may also be a badly-formatted URI on that website. Try later to see if the problem is resolved.

502 (error response received from gateway)

Somewhere along the route between your computer and the website, there was an error in the information being transferred. It is usually the result of a server not being properly configured. The error is not necessarily at the website’s server, but more likely at a gateway server along the way. With luck, the error will “go away.” If it persists, see the webmaster.

12002 (timeout)

There was no response from the site within a reasonable timeframe (typically one minute). The site may be overloaded, or experiencing problems. Try later to see if the problem is resolved.

12007 (no such host)

The site does not exist. You can confirm that by trying the root address. E.g. in the example above the website no longer exists. In rare cases, this may be rectified in a day or two, especially if a site’s registration has lapsed and needs to be renewed.

12017 (cancelled / timeout)

Most likely the site being accessed is having troubles - it is overloaded or down. Try later to see if the problem is resolved.

12030 (connection aborted)

Most likely the site being accessed is having troubles - it is overloaded or down. Try later to see if the problem is resolved.

Share on...
Follow Michael Milette:

Moodle LMS Consultant

Michael Milette enjoys sharing information and uses his skills as an LMS developer, leader and business coach to deliver sustainable solutions and keep people moving forward in their business life.

4 Responses

  1. Franchesca

    It’s difficult to find knowledgeable people on this subject, but you seem like you know what you’re talking
    about! Thanks

    • Michael Milette

      Hi Lucas,
      Thanks for taking the time to ask. That’s actually a really good question.
      This usually means that Xenu is unable to reach your website for whatever reason such as having entered http://www.example.com instead of https://example.com – notice the difference in http vs https and the difference in the domain name. The first test you should do is to try to access your website using Internet Explorer from the same computer you are trying to run Xenu. If it doesn’t work, make sure the DNS is correctly configured to resolve the domain name. If you don’t have Internet Explorer installed, that could be the problem.
      While I can’t say for sure in your case, this is very often the result in your website trying to limit the number of concurrent connections in order to avoid denial of service attacks. In XENU, try reducing the number of parallel threads (under More Options) to 4 or less. In fact, start with just one and see how it goes. If it works reliably, increase by 1 until it stops working and then reduces it back to the point where it worked.
      If your website or your network is slow, you can play with the timeouts a litte. See http://home.snafu.de/tilman/xenulink.html#timeout for details.
      Another reason might be that your corporate network is using a proxy server and does not allow direct connections out to the Internet. If that is the case, you will need to configure this in Windows’ Internet Options.
      A firewall can also be causing the problem. If your computer, organization or your website has a firewall or proxy server in front of it which places a limit on the number of connections you can make to the outside or to the web server (like CloudFlare), that could be the cause. Some local firewall/internet protection software can also have this effect (see http://home.snafu.de/tilman/xenulink.html) It may also be trying to control the type of web client that can access the site. This is not a good practice as there are people out there using web clients that your server may not aware of.
      If your site requires users to login in order to access the site, that could be the problem too as Xenu doesn’t login as a user. For a potential solution, see http://home.snafu.de/tilman/xenulink.html#form
      By the way, if your site uses JavaScript links instead of normal HTML links, Xenu won’t be able to follow them. In fact, your site will likely have accessibility issues that you might want to investigate. This would not usually result in timeouts though. Xenu would simply not see the links.
      Hope something in all of this helps.
      Best regards,
      Michael Milette

Add a comment:

Your email address will not be published. Required fields are marked *