Search Engines and Privacy

Have you ever wondered how search engines actually make their money from advertising?

Have you ever had privacy concerns over the search terms you use?

Have you ever been freaked out by how well targeted modern advertising is?

If you can say yes to any of these – then read on! Otherwise, still read on for an eye-opener.

Search engines only exist because there is a financial model behind them which is there, naturally, to generate profits. So, how do search engines make their profit? Search engines primarily make their money from advertising (as their search and associated services are free to end-users) and they achieve this by three primary techniques:-

  1. Provide space for adverts on the site and rent them out.
  2. Extend scope of adverts through syndication schemes, embedded content and ‘like’ buttons.
  3. Sell your data (search terms and the IP addresses that they come from along with browser unique IDs held in cookies etc) to 3rd parties.

Through the many different tracking technologies available- it is easy to identify and build a profile of an Internet user. This data is collected in the form of search engine logs (on their servers) which can then be analysed either in real-time or at a later date, This statistical analysis provides deep insight into what other products and services might be of interest to you in order to elicit targeted marketing, however there can be a far more sinister use for this data too.

This data can be used to profile a person in order to find out things such as:-

  • Your name and any aliases (such as ‘internet’ names and previous names)
  • Birthday
  • Location
  • Address
  • Telephone Numbers
  • Email Addresses
  • Your interests
  • Tastes in music, clothes, and also food and drink
  • Your faith and beliefs
  • Your friends and associations
  • Your spending habits
  • Where you like to go
  • Times when you are not at home
  • Your car make, model, and registration
  • Where you work and what you do for a living
  • Your thoughts and feelings
  • Pictures of you in places and with people

All of these are used for profiling you in order to place you within a demographic classification which can then be identified and targeted for a number of uses including advertising.

How are these stats collected? The user normally wilfully gives them without a second thought. Sources such as your favourite search engine, along with Facebook, Twitter, Flickr, and other social media, which is then tied together into continuous sessions using tracking cookies.

Many social media tools like facial recognition on Facebook enable people to be accurately associated with others, another social engineering danger, which usually starts with “do you know ‘so-and-so’?”

Many web pages register your presence with syndication partners merely by viewing the page, for example, Google and Facebook get informed of your visit every time you visit a page with some of their ‘like’ buttons (but not all), so in many cases – even if you don’t click it, you may get tracked. Same for YouTube videos on sites other than YouTube itself – YouTube (and thus Google) will know you’ve visited even if you don’t watch any movies because it will have linked back to YouTube in order to provide the ‘player’ for that shared content. In another example, Amazon’s affiliate advertising syndication could be used to track users across pages which show Amazon affiliate adverts.

An interesting concept on social tracking is that if you use your friend’s wireless with your own equipment then there will be a chance to trace your tracking session transferring to a source of other known tracking sessions, thus it is able to also track your physical movements and correlate you to others through simultaneously sharing the same IP. Your iPhone or Android will go with you everywhere, and wherever there is wireless configured (with the correct password), it will use it and tell on you. Being fair on the matter, the telephone companies can do this far easier through 3G phone networks but being in the same proximity does not always positively prove a relationship – unlike sharing an IP.

It is also possible for ISPs to proxy your web traffic for the purpose of caching content in order to deliver a faster network – this can also be used as a source of data.

Many people simply ‘like’ products, creating for themselves an association with which allows others to gauge your persona because when you ‘like’ a product on Facebook, it tells all of your friends (or at least ‘friends’ on Facebook). This could then be used in social engineering attacks against you. Google track you simply for viewing a page with a Google+ button on it.

Google is so elusive – you need their “opt-out” plugin to avoid them – which as discussed before comes with an auto-update program which still ‘phones-home’.

So, by using social, open-source, and purchased data from many different sources such as Facebook, Twitter, MySpace, Google, Bing! Etc it is easily possible to build an accurate profile of a person, their relationships, and their surfing habits which will reveal insight into that person in order to target them for one purpose or another. This indeed is what the advertisers are after hence this is the reason why this data has value.

While this data has value, this data is also private data about individuals, and it is about your data too, which while a few traces will reveal little about you, long term traces can reveal lots more than you realise. How often do you clear your cookie cache? And do you have ‘Do Not Trace’ set on your web browser?

A recent row has broken out between online advertisers and Microsoft, who have taken the bold step to enable ‘Do Not Trace’ on their latest web browsers, which if you’re not aware is the default action of actively blocking tracking cookies. The advertisers are up in arms – and this tells you a lot about the value of the data.

To protect yourself from this form of personal data leakage, you should choose your web browser and search engines wisely. I am currently using SRW Iron for a web browser because it has all the power and prowess of Google Chrome with all of the Google-phone-home stuff taken out and a few safety features made default, and I use duckduckgo.com for a search engine because it supports encrypted connections through https, it does not record your IP in its logs, and they discard your results after 2 days. Duckduckgo.com also has a search portal within Tor and are strong advocates of internet privacy.

See duckduckgo’s privacy statement here:-

https://www.duckduckgo.com/privacy.html

Another search engine worth considering is ixquick, who’s privacy policy can be seen here:-

https://ixquick.com/eng/privacy-policy.html

For comparison, here’s Google’s privacy statement:-

https://www.google.co.uk/intl/en/policies/privacy/

Wow!, need I say more?

It is worth noting that all searches done using standard http can be recorded ‘on-the-wire’ by anyone who is monitoring the traffic. This includes all searches, returned content, and modifications to those searches, often, character-by-character where auto-fill offers search suggestions. For this reason, it is always worth using a search engine which can support https as this will stop a degree (but not all) snooping on the wire.

Another point worth noting is that you often lose protection the second you leave the https encrypted search engine page because you then give away the site which the search engine led you too. In the cases of many search engines, they too track this information, adding to their knowledge base of not only what you searched, but which links you clicked on. Once you click your intended site, you leave the protection of the encrypted search thus revealing your next site, so therefore using an encrypted search does not protect you beyond the initial search you undertake.

So, now knowing the extend of browser tracking, I encourage you to consider this when surfing the Internet, and take measures to protect your privacy.

Who do you trust? Why Certificate Authorities are a CARTEL!

Well, this might come as a surprise to many – but the current (and any future group) of certificate authorities (CAs) who control and distribute root SSL certificates are complicit, along with browser vendors, in creating a cartel (similar to OPEC for oil) in security on the basis of a false assurance.

The way CA’s currently run is that a select group (of which you have to go great lengths to become a member) issue their root certificates in conjunction with browser vendors, this in turn means that only certificates issued by this select few can be used without the end-user receiving some kind of nag-screen about the certificate being “untrusted”. This is the key issue – a successful certificate, no matter who signs it – works without any notification.

This means that no matter how secure your solution is – unless you buy a certificate from a “trusted” CA then your users will be led to believe that the site may not be secure. This provides the end-user with false information on which a false sense of security is given.

So…..who decides whether a device or CA is trustworthy? you? me? or your software vendor? and who do you trust to make that decision on your behalf.

Now, as has been proven in recent years – CAs probably don’t have as much security as they would like to think and it has been possible to purchase “fake” certificates for well known sites and domains. Compromised CAs like DigiNotar, Comodo, and the Flame virus’s use of a fake Microsoft certificate, This in turn allows malicious hackers to fake popular sites in order to steal information including passwords for further use. If that hacker has the power to subvert traffic (maybe due to a trojan or compromised DNS server or maybe he/she works for an isp, government agency, company network engineer, etc…), then it is possible for them to present a fake site with a valid certificate – and the end user would never know!

So because it is possible for an end-user to be diverted to a fake site using a fake certificate without the browser warning the end-user – then what trust value does a certificate from a root CA actually have? A friend noted that a serious amount of identity is required to purchase a certificate these days and that is true for most CAs but it only takes one weak system or compromised CA to enable the successful signing in the name of any hostname on the Internet. Also, because a valid certificate just says “yes” this means that so long as it validates then nobody looks at it so who cares if the certificate is registered by someone else? the user wouldn’t know and wouldn’t care.

These extended validation certificates are a bit of a con too – they only make the browser bar go green! it still suffers the same weaknesses as the other CA products for all the same reasons, and I reckon most users wouldn’t notice until it was too late.

It is for this reason that no certificate from any CA is inherently trustworthy just because it is signed by a “trusted” CA, and any assurance of assumed trust could be false and misleading.

So…what do we do with this information? and how do we do away with untrusted root CAs?

The answer is simple – and it is this – delete all of your root certificates for CAs that you don’t trust.

If you don’t use any root certificates then the first time that you connect to an unknown site or service – you have to make a choice of whether to trust the certificate which has been granted to it. This can be tedious at first as many sites use multiple hosts but is the most secure method. Often, for lack of any other knowledge you frequently have to just “assume” that the first certificate is genuine in order to access the service. For this reason it is worth checking the certificate before accepting it as an exception.

This is a personal choice and must be based upon your knowledge and trust of the organisation you are connecting to. Many organisations now have their own certificate services – which can provide a root certificate which applies to all services from a given organisation (but be aware that the organisation’s CA may be more insecure than a public one).

Once the choice has been made to trust the certificate – you can then note it as an ‘exception’, that is, you know what it is because you’ve seen it before, and for lack of root CA signing – am still willing to trust it. From here on in – you have the level of trust you require.

If the certificate is to change then the user will be prompted with a different certificate – where the user has to then make the decision again – do I trust this – or has someone compromised the service I have used previously.

This is the only way of assuring trust – and it does not require a root CA to do so – it requires a user to understand what is meant by trust and for that user to act in accordance with their own assessment.

Ok – so the enterprise people say “we can’t have nag screens for all our users!” – the answer again is simple – sign all of your own services using your own CA chain and then issue your root CA certificate to all of your clients so that they have a trust relationship with your certificates – simple!

Some might say that this last statement counters the whole of my argument, i disagree – the issue with root CAs is not that they exist but the fact that their assurance is no more concrete than yours.

These CAs sell trust to anyone who is willing to pay for a certificate – so who do you trust? and is being willing to pay for a certificate enough of a basis to elicit trust?