Search Engines and Privacy

Have you ever wondered how search engines actually make their money from advertising?

Have you ever had privacy concerns over the search terms you use?

Have you ever been freaked out by how well targeted modern advertising is?

If you can say yes to any of these – then read on! Otherwise, still read on for an eye-opener.

Search engines only exist because there is a financial model behind them which is there, naturally, to generate profits. So, how do search engines make their profit? Search engines primarily make their money from advertising (as their search and associated services are free to end-users) and they achieve this by three primary techniques:-

  1. Provide space for adverts on the site and rent them out.
  2. Extend scope of adverts through syndication schemes, embedded content and ‘like’ buttons.
  3. Sell your data (search terms and the IP addresses that they come from along with browser unique IDs held in cookies etc) to 3rd parties.

Through the many different tracking technologies available- it is easy to identify and build a profile of an Internet user. This data is collected in the form of search engine logs (on their servers) which can then be analysed either in real-time or at a later date, This statistical analysis provides deep insight into what other products and services might be of interest to you in order to elicit targeted marketing, however there can be a far more sinister use for this data too.

This data can be used to profile a person in order to find out things such as:-

  • Your name and any aliases (such as ‘internet’ names and previous names)
  • Birthday
  • Location
  • Address
  • Telephone Numbers
  • Email Addresses
  • Your interests
  • Tastes in music, clothes, and also food and drink
  • Your faith and beliefs
  • Your friends and associations
  • Your spending habits
  • Where you like to go
  • Times when you are not at home
  • Your car make, model, and registration
  • Where you work and what you do for a living
  • Your thoughts and feelings
  • Pictures of you in places and with people

All of these are used for profiling you in order to place you within a demographic classification which can then be identified and targeted for a number of uses including advertising.

How are these stats collected? The user normally wilfully gives them without a second thought. Sources such as your favourite search engine, along with Facebook, Twitter, Flickr, and other social media, which is then tied together into continuous sessions using tracking cookies.

Many social media tools like facial recognition on Facebook enable people to be accurately associated with others, another social engineering danger, which usually starts with “do you know ‘so-and-so’?”

Many web pages register your presence with syndication partners merely by viewing the page, for example, Google and Facebook get informed of your visit every time you visit a page with some of their ‘like’ buttons (but not all), so in many cases – even if you don’t click it, you may get tracked. Same for YouTube videos on sites other than YouTube itself – YouTube (and thus Google) will know you’ve visited even if you don’t watch any movies because it will have linked back to YouTube in order to provide the ‘player’ for that shared content. In another example, Amazon’s affiliate advertising syndication could be used to track users across pages which show Amazon affiliate adverts.

An interesting concept on social tracking is that if you use your friend’s wireless with your own equipment then there will be a chance to trace your tracking session transferring to a source of other known tracking sessions, thus it is able to also track your physical movements and correlate you to others through simultaneously sharing the same IP. Your iPhone or Android will go with you everywhere, and wherever there is wireless configured (with the correct password), it will use it and tell on you. Being fair on the matter, the telephone companies can do this far easier through 3G phone networks but being in the same proximity does not always positively prove a relationship – unlike sharing an IP.

It is also possible for ISPs to proxy your web traffic for the purpose of caching content in order to deliver a faster network – this can also be used as a source of data.

Many people simply ‘like’ products, creating for themselves an association with which allows others to gauge your persona because when you ‘like’ a product on Facebook, it tells all of your friends (or at least ‘friends’ on Facebook). This could then be used in social engineering attacks against you. Google track you simply for viewing a page with a Google+ button on it.

Google is so elusive – you need their “opt-out” plugin to avoid them – which as discussed before comes with an auto-update program which still ‘phones-home’.

So, by using social, open-source, and purchased data from many different sources such as Facebook, Twitter, MySpace, Google, Bing! Etc it is easily possible to build an accurate profile of a person, their relationships, and their surfing habits which will reveal insight into that person in order to target them for one purpose or another. This indeed is what the advertisers are after hence this is the reason why this data has value.

While this data has value, this data is also private data about individuals, and it is about your data too, which while a few traces will reveal little about you, long term traces can reveal lots more than you realise. How often do you clear your cookie cache? And do you have ‘Do Not Trace’ set on your web browser?

A recent row has broken out between online advertisers and Microsoft, who have taken the bold step to enable ‘Do Not Trace’ on their latest web browsers, which if you’re not aware is the default action of actively blocking tracking cookies. The advertisers are up in arms – and this tells you a lot about the value of the data.

To protect yourself from this form of personal data leakage, you should choose your web browser and search engines wisely. I am currently using SRW Iron for a web browser because it has all the power and prowess of Google Chrome with all of the Google-phone-home stuff taken out and a few safety features made default, and I use duckduckgo.com for a search engine because it supports encrypted connections through https, it does not record your IP in its logs, and they discard your results after 2 days. Duckduckgo.com also has a search portal within Tor and are strong advocates of internet privacy.

See duckduckgo’s privacy statement here:-

https://www.duckduckgo.com/privacy.html

Another search engine worth considering is ixquick, who’s privacy policy can be seen here:-

https://ixquick.com/eng/privacy-policy.html

For comparison, here’s Google’s privacy statement:-

https://www.google.co.uk/intl/en/policies/privacy/

Wow!, need I say more?

It is worth noting that all searches done using standard http can be recorded ‘on-the-wire’ by anyone who is monitoring the traffic. This includes all searches, returned content, and modifications to those searches, often, character-by-character where auto-fill offers search suggestions. For this reason, it is always worth using a search engine which can support https as this will stop a degree (but not all) snooping on the wire.

Another point worth noting is that you often lose protection the second you leave the https encrypted search engine page because you then give away the site which the search engine led you too. In the cases of many search engines, they too track this information, adding to their knowledge base of not only what you searched, but which links you clicked on. Once you click your intended site, you leave the protection of the encrypted search thus revealing your next site, so therefore using an encrypted search does not protect you beyond the initial search you undertake.

So, now knowing the extend of browser tracking, I encourage you to consider this when surfing the Internet, and take measures to protect your privacy.

Product Endorsement on Facebook

It always makes me wonder why people click “like” for commercial products and allow themselves to be used as free advertising!

Millions every day give away habits such as gambling and playing online games through their “likes”.

Many also attach themselves to high value products in the vain hope of improving their own self image.

These commercial endorsements serve only to benefit the vendors but have the unexpected consequence of providing very accurate demographics on yourself.

Post Codes

What does a post code say about you? well, it can affect your insurance premiums, and may categorise you demographically. Again, unless it is a service which sends you important mail or one which has a legal obligation or one which could require formal identification, a bogus address and post code will suffice. Most postal districts have their first post code as being <prefix>1 1AB and so on, so it is easy to make up a location central to a given postal district like n1 1ab is probably or at least was at one time the sorting office northampton and b1 1ab the sorting office for birmingham for example, so it is logical to assume a post code of n1 1ac may exist. take a look on google maps to find a postcode of your choice. This evasion is best used for privacy when making cash payments at shops which ask your postcode on the way out as you pay and those which ask you to complete some kind of survey.

Date Of Birth

Never publish your date of birth online, this allows you to be located by using tools such as the electoral role when combined with knowledge of your locality which is often given away by geo-location tags on social media such as facebook. Unless you think you would require the use of a birth certificate or other formal identity document to gain or re-gain access to accounts or services you only have to prove you are so-old so make a different birthday, maybe 1/1/1978? something easy to remember.

Don’t forget – a surname, date of birth and an electoral role can be a direct match!