From http to https in an SEO friendly way

Sven March 6, 2015
60
shares

https Everywhere

Google recently declared that the web must become a safer place.’Secure by default’ is the motto of the campaign to have all communication with websites go through HTTPS. In a 45 minutes long but instructive video Google argues why this is important and explains step by step how a website can be converted from HTTP to HTTPS.

To help you scanning quicker through the video, we will give you a full run down; summing up all arguments, questions and practical guidelines from the video in three chapters:

  1. About HTTPS
  2. How do I convert my website from HTTP to HTTPS?
  3. How to make the site HTTPS friendly: for the user as well as the search engine
  4. Finally

1. About HTTPS

Why is HTTPS important?

SEO and HTTPS Everywhere: Because security is a top priority for Google, the use of HTTPS is encouraged and used as a ranking signal (At this time a less important signal that influences less than 1% of all search queries).

The necessity of HTTPS secured communication for financial transactions is evident. But why is it also important for every other instance of web-based communication? Google argues that apparently innocent meta-data can tell a lot about somebody. Added up, they form a picture that’s highly compromising for the privacy of the internet user. HTTPS shields both the user and the website from active and passive attacks.

https3

What security does HTTPS ensure?

As a web user you may ask yourself three things:

  • Are the people I’m talking to really who they say they are?
  • Has anyone tampered with the data?
  • Can anyone see my conversation?

HTTPS means safe web communication through:

  • Authentication
    Active attackers can’t pose as the people you want to talk too.
  • Data integrity
    Active attackers can’t tamper with the data.
  • Encryption
    Active and passive attackers can’t listen in on your conversation.

2. How do I convert my website from HTTP to HTTPS?

The prerequisite for a HTTPS website is a TLS (Transport Layer Security) configuration. Many website owners are wary of implementing TLS. Google is aware of the insecurities and fears that come with altering websites in such a profound way, and gives thorough advice and tips on how to make websites HTTPS friendly without a hiccup. Here we’ll delve into the questions and answers one at a time.

HTTPS and TLS:

HTTPS (Hypertext Transfer Protocol Secure) is a combination of 2 protocols. Namely, the use of HTTP over SSL or TLS. This makes an insecure HTTP communication safe because it is being carried out over an encrypted layer. TLS is an improved version of SSL.

FAQ

Implementing and speeding up TLS

  1. Isn’t a TLS certificate expensive?
  2. Doesn’t it slow down my server and website?
  3. What are the best practices for the configuration?

Making the website HTTPS friendly

  1. How do I migrate existing content?
  2. How do I make it search engine friendly?
  3. How to avoid common mistakes?

Implementing TLS

https6

Google provides the following checklist, and advises not to skip a single step.

Sys-admin checklist (see 9:40 in video)

  1. Get a 2048-bit certificate
  2. Configure TLS on your web-server(s)
  3. The TLS configuration on your server(s)
  4. Limit the influence of TLS on the performance
  5. Server hardware and special settings
  6. Use SPDY & HTTP/2

1. Get a 2048-bit TLS certificate

It is often thought that a certificate is expensive, but depending on the website this varies from a free certificate to a couple dozens of euros for the most common use-cases.

Depending on use the following possibilities exist:

  • Free certificate for non-commercial use from StartSSL
  • Free certificate for open-source projects from GlobalSign
  • Commercial multi-domain certificate starting at 30 €
  • Wildcard certificate starting at 100 €

If you are still using a 1024-bit certificate you have to change this to a 2048-bit certificate because the 1024-bit certificate is no longer powerful enough for the current demands.

2. Configure TLS on the web server(s)

There are plenty of different sources on the internet that (often partially) explain how to configure TLS on your web-server. To prevent mistakes, Google advises not to piece together the configuration from different sources, but to use one coherent source. Namely, Mozilla Server Side TLS. Here the configurations and best practices are described clearly and fully for the most popular web-servers such as Apache, Nginx and others.

[for apache: mod_ssl must be installed. mod_ssl supports SSL and TLS through the OpenSSL library. TLS1.1 and TLS1.2 won’t be available until OpenSSL 1.0.1. Important info: OpenSSL 1.0.1g must be used, previous versions suffer the infamous HeartBleed bug]

3. Check the TLS configuration on your server(s)

When the configuration is carried out and TLS has been activated, you can have the implementation checked by an external tool. Qualis SSL Labs analyses your site and provides instant feedback. This helpful tool is valuable to use every time you have modified the TLS configuration. The Qualis test provides a score and practical instructions for improvements.

4. Limit the influence of TLS on performance

Setting up and using a TLS tunnel consists of 2 steps that influence CPU usage:

  1. Asymmetric cryptography:
    1. Verifying the certificate
    2. Exchanging of the key for symmetric cryptography
  2. Symmetric cryptography:
    1. Exchanging of the application data (actual data)

Only the asymmetric cryptography uses a lot of CPU. This CPU usage can be decreased by:

  • Activating keep alive: The same connection is used for different requests. Consequently the TLS tunnel doesn’t have to be set up each time, and the asymmetric connection cryptography only has to be carried out once for the different connections.
  • Using session resumption: When the visitor has previously set up a session with the server, the TLS parameters can be re-used for setting up the new session.

⇒ Dedicated hardware for handshakes? No longer necessary. Even sites with high traffic like Twitter, Google and Facebook no longer use special hardware, opting instead for an ordinary software implementation of TLS.

5. Server hardware and special settings

Google states that many administrators still worry about performance and -partially influenced by trade literature and hardware salesmen- think specific hardware is needed to efficiently use HTTPS.

Modern CPU are designed in such a way that they can swiftly handle these calculations. Big players like Twitter, Facebook and Google these days no longer use special hardware for HTTPS.

6. Use SPDY & HTTP/2

SPDY is a protocol to speed up the web. It is developed to solve performance issues in HTTP/1. SPDY is already supported by the most commonly used browsers (Chrome, Opera, FireFox and Internet Explorer) and web-servers. Technically you need TLS to activate SPDY, but once you have set up TLS it is simply a matter of activation on the web-server.

Here Google has released some numbers on pageload time using SPDY, compared to a HTTPS connection without SPDY. Because less resources are used on the web-server, performance is gained for both the visitor and the web-server.

Google News Google Sites Google Drive Google Maps
Median -43% -27% -23% -24%
5th percentile(fast connections) -32% -30% -15% -20%
95th percentile(slow connections) -44% -33% -36% -28%

Setting up TLS with SPDY will, according to Google, even decrease resources.

At IsTLSfastyet.com you’ll find a number of links to sites with information on configuring servers.

Summary

  • Certificates are less expensive than generally assumed
  • Great online tools exist to verify your configuration
  • By making clever use of TLS connections CPU usage can be decreased
  • Modern CPU’s are equipped to handle cryptographic modifications
  • When TLS has been set up, SPDY is easily activated, gaining performance as a result

3. How to make the website HTTPS friendly: for the user as well as the search engine

https4

The correct way to migrate and avoid mistakes.

Websites that run completely through HTTPS are evaluated by Google exactly like other websites. They are indexed in the same manner and the ranking for keywords is no different from sites that use HTTP. (Note: on this page Google states there’s a slight advantage). But to be sure your site will be fully indexed as a safe site, all indexation signals must be consistent.

For this Google provides the following checklist.

Webmaster checklist (see 20:50 in the video)

  1. Configure HTTPS on your webserver
  2. All parts of a webpage must be requested through HTTPS
  3. Direct internal links to HTTPS pages, redirect external links
  4. Verify robot.txt, rel=canonical and other signals
  5. Check Webmaster Tools report

1. Configuration of HTTPS on the server

Aside from the configuration described above, other things may go wrong:

  • The server hostname does not equal the name on the certificate
  • The certificate does not have a full certificate chain
  • The certificate has expired

The most prevalent of these mistakes are also shown in Google Webmaster Tools (GWT) and the Qualis SL Labs tool will also find these problems.

2. All parts of a page must be requested through HTTPS

Avoid the use of HTTP and HTTPS protocol on the same site. Some browsers will not load the parts requested through HTTPS. If they are loaded anyway, HTTP will be used and the site will not be 100% secured. The combination of HTTP and HTTPS is one of the signals that may hinder the site from being optimally indexed by Google.

A combination of HTTP and HTTPS mostly occurs because hardcoded URL’s are used in the webpages. This can be avoided by using protocol relative URL’s. When a part is requested through a protocol relative URL, it will be requested with the same protocol as the webpage itself.

How do you write a protocol relative URL? It’s simple: remove http: from the URL. The page will then be requested with the same protocol as the page where the URL is.

<br />
&lt;p&gt;So an image with this source:<br />
&lt;img src='http://www.bar.com/link-to-image' alt=''&gt;<br />
determines an absolute protocol<br />
is better to use this relative protocol<br />
&lt;img src='//www.bar.com/link-to-image' alt=''&gt;<br />

3. Direct internal links to HTTPS pages, redirect external links

All internal links must direct to the HTTPS page, that’s obvious. External requests (backlinks) for HTTP pages must be redirected to the HTTPS page through a Permanent redirect response (HTTP 301).

This is important, because remaining backlinks that direct to the HTTP page will present visitors coming in through these links with a HTTPS version of the page.

At all times: use only 1 protocol to ensure no 2 versions of the page exist. Or, as explained in the video: no redirects is an anti-pattern!

On the HTTPS page, place a canonical to itself (self referencing canonical), this emphasizes the fact that the site uses only HTTPS URL’s.

<br />
&lt;!-- on the page https://examplesite.com/page-a --&gt;<br />
&lt;head&gt;<br />
...<br />
&lt;link rel='canonical' href='http://examplesite.com/page-a<br />
...<br />
&lt;/head&gt;<br />
&lt;!-- this is better: --&gt;<br />
&lt;head&gt;<br />
&lt;link rel='canonical' href='https://examplesite.com/page-a<br />
&lt;/head&gt;<br />

Avoid a chain of redirects by having the first redirect lead immediately to the correct URL. This prevents latency that especially mobile users will suffer from.

Wrong:

request http://examplesite.com/page-a
directs to http://www.examplesite.com/page-a
directs to https://www.examplesite.com/page-a

Correct:

request http://examplesite.com/page-a
directs to https://www.examplesite.com/pagina-a

Latency can also be avoided by using HSTS. HSTS is a security policy that commands the browser that communication may only occur through HTTPS. This policy can be imposed by a HTP-header Strict-Transport-Security. You indicate in the policy how long the rule is valid and if it applies also to sub-domains.

The HSTS header prevents redirects from HTTP to HTTPS. When the user requests a page through HTTP, the browser alters this request and instead requests the page through HTTPS. This saves a round-trip to the server and also makes surfing the site more secure: it decreases the likelihood of downgrade attacks. In the future, the use of HSTS may be a signal for the search engine that the site truly focuses on safe traffic.

4. Verify robot.txt, rel=canonical and other signals

robots.txt

The previous part handled modifications that were also noticeable for the visitor. This part is about modifications that are not perceivable by the user, but serve purely the bots (ie. Googlebot).

Avoid blocking access to HTTP or HTTPS pages by setting rules in the robots.txt file.

If HTTPS pages cannot be crawled, they cannot be indexed.

If crawling URLs with HTTP is blocked, the bot will not be able to discover links from HTTP to HTTPS. You can easily verify if HTTP or HTTPS prohibits crawling with the GWT fetch tool.

noindex

Also verify there are no noindex on the HTTPS pages, because then the pages will also not be indexed by the search engine. The noindex can be places on the page itself or as a HTTP header. If they exist they must be removed. Use GWT to verify if these noindex are used

<br />
&lt;!-- remove from html ! --&gt;<br />
&lt;meta name='robots' content='noindex'&gt;<br />
&lt;meta name='googlebot' content='noindex'&gt;</p>
<p>&lt;!-- remove from HTTP header--&gt;<br />
X-Robots-Tag: noindex<br />

canonicals

If applied inconsistently (some redirects to HTTP page, some to HTTPS page) this may have a negative influence as a ranking signal. If applied correctly, the influence can be positive. Use an URL as canonical with protocol to emphasize the use of HTTPS.

5. Check Google Webmaster Tools report

Google Webmaster Tools contains dozens or reports that help you discover any mistakes that may exist. Use this checklist when converting your site HTTP to HTTPS:

  1. Verify all variants of your site
  2. Verify index status
  3. Verify crawl errors
1. Verify all variants of your site

Depending on the use of HTTP, HTTPS, domain.com and www.domain.com Google views ever URL as a different site. So:

  • http://examplesite.com
  • http://www.examplesite.com
  • https://examplesite.com
  • https://www.examplesite.com

are 4 different sites to Google. When you use a separate domain for your mobile site, this is also an extra site to Google. (ie. https://m.examplesite.com)

Normally all requests (through the different URLs : http, https, with or without www) land on the https://www.examplesite.com, but it is definitely recommended to verify this is implemented correctly in Google Webmaster Tools. Considering the different versions of the URL’s are different sites to Google, these are also different profiles in GWT.

2. Verify index status: (see 35:30 in video)

De indexed HTTP pages must dwindle to 0 while the indexed HTTPS pages must rise. If this doesn’t happen there is still a flaw in the configuration. (ie. a robots.txt string that commands the bot not to crawl the HTTPS version).

3. Verify crawl errors

Google Webmaster tools indicates mistakes that are unnoticed when crawling. Both mistakes that are side-wide and mistakes that exist on specific pages. It is also indicated if the mistake occurs when visiting the page with a mobile or desktop browser. Thoroughly check these reports – especially after the transition to HTTPS – to ensure it hasn’t caused any problems.

Finally

https2

Thus far the story of Google. It is obvious that Google highly values safety on the internet and it seems likely that HTTPS security will at some point influence the ranking of websites.

For a flawless implementation of HTTPS, Google and other sources offer a number of helpful tools, mentioned above.

With this article we wanted to inform you as thoroughly as possible about ‘Secure by default’, about the procedures, the consequences and the best practices. We can imagine that this information doesn’t just answer questions, but also raises them. We’ll be happy to answer those questions, you have just to comment or contact us.