BIND DNS query log shipping into a MySQL database

BIND DNS query log shipping into a MySQL database

Yay!, I’ve been wanting to do this for a while! Here it goes:-

Documented herein is a method for shipping BIND DNS query logs into a MySQL database and then reporting upon them!

Note: SSH keys are used for all password-less log-ons to avoid prompt issues

BIND logging configuration

BIND named.conf query logging directive should be set to simple logging:-

logging{

  # Your other log directives here

  channel query_log {
    file "/var/log/query.log";
    severity info;
    print-time yes;
    print-severity yes;
    print-category yes;
  };

  category queries {
    query_log;
  };
};

The reason why a simple log is needed is because the built-in BIND log rotation only allows rotation granularity of 1 day if based on time, hence an external log rotation method is required for granularity of under 24 hours.

BIND query log rotation

My external BIND log rotation script is scheduled from within cron and it looks like this:-

#!/bin/bash
QLOG=/var/named/chroot/var/log/query.log
LOCK_FILE=/var/run/${0##*/}.lock

if [ -e $LOCK_FILE ]; then
  OLD_PID=`cat $LOCK_FILE`
  if [ ` ps -p $OLD_PID > /dev/null 2>&1 ` ]; then
    exit 0
  fi
fi
echo $$ > $LOCK_FILE

cat $QLOG > $QLOG.`date '+%Y%m%d%H%M%S'`
if [ $? -eq 0 ]; then
  > $QLOG
fi
service named reload

rm -f $LOCK_FILE

Place this in the crontab, working at between one and six hours, ensure it is not run on the hour or at the same time as other instances of this job on associated servers

make sure /var/named/chroot/var/log/old exists for file rotation, used in the data pump script later on.

From here, I create a MySQL table, called dnslogs with the following structure:-

create table dnslog (
  q_server   VARCHAR(255),
  q_date     VARCHAR(11),
  q_time     VARCHAR(8),
  q_client   VARCHAR(15),
  q_view     VARCHAR(64),
  q_text     VARCHAR(255),
  q_class    VARCHAR(8),
  q_type     VARCHAR(8),
  q_modifier VARCHAR(8)
);

You can either define a database user with a password and configure it such in the scripts, or you can configure a database user which can only connect and insert into the dnslogs table.

Then I use the following shell script to pump the rotated log data into the MySQL database:-

#!/bin/bash
PATH=/path/to/specific/mysql/bin:$PATH export PATH
DB_NAME=your_db
DB_USER=db_user
DB_PASS=i_know_it_is_a_bad_idea_storing_the_pass_here
DB_SOCK=/var/lib/mysql/mysql.sock
SSH_USER=someone
LOG_DIR=/var/named/chroot/var/log
LOG_REGEX=query.log.\*
NAME_SERVERS="your name server list here"

LOCK_FILE=/var/run/${0##*/}.lock

if [ -e $LOCK_FILE ]; then
  OLD_PID=`cat $LOCK_FILE`
  if [ ` ps -p $OLD_PID > /dev/null 2>&1 ` ]; then
    exit 0
  fi
fi
echo $$ > $LOCK_FILE
for host in $NAME_SERVERS; do
  REMOTE_LOGS="`ssh -l $SSH_USER $host find $LOG_DIR -maxdepth 1 -name $LOG_REGEX | sort -n`"
  test -n "$REMOTE_LOGS" && for f in $REMOTE_LOGS ; do
    ssh -C -l $SSH_USER $host "cat $f" | \
      sed 's/\./ /; s/#[0-9]*://; s/: / /g; s/\///g; s/'\''//g;' | \
        awk -v h=$host '{ printf("insert into '$DEST_TABLE' values ( 
'\''%s'\'', 
STR_TO_DATE('\''%s %s.%06s'\'','\''%s'\''), 
'\''%s'\'', 
'\''%s'\'', 
'\''%s'\'', 
'\''%s'\'', 
'\''%s'\'', 
'\''%s'\''
);\n",
h, 
$1, 
$2, 
$3 * 1000, 
"%d-%b-%Y %H:%i:%S.%f", 
$7, 
$9, 
$11, 
$12, 
$13, 
$14
); }' | mysql -A -S $DB_SOCK -u $DB_USER --password=$DB_PASS $DB_NAME 2> $ERROR_LOG
    RETVAL=$?
    if [ $RETVAL -ne 0 ]; then
      echo "Import of $f returned non-zero return code $RETVAL"
      test -s $ERROR_LOG && cat $ERROR_LOG
      continue
    fi
    ssh -l $SSH_USER $host mv $f ${f%/*}/old/
  done
done
rm -f $LOCK_FILE $ERROR_LOG

Put this script into a file and schedule from within crontab, running some time after the rotate job suffice to allow it to complete, but before the next rotate job.

Note that the last operation of the script is to move the processed log file into $LOG_DIR/old/.

This will take each file in /var/named/chroot/var/log/query.\* and ship it into the dnslogs table as frequently as is defined in the crontab.

From here, it is possible to report from the db with a simple query method such as:-

#!/bin/bash
PATH=/path/to/specific/mysql/bin:$PATH export PATH
DB_NAME=your_db
DB_USER=db_user
DB_PASS=i_know_it_is_a_bad_idea_storing_the_pass_here
DB_SOCK=/var/lib/mysql/mysql.sock
SSH_USER=someone
SQL_REGEX='%your-search-term-here%'

LOCK_FILE=/var/run/${0##*/}.lock

if [ -e $LOCK_FILE ]; then
  OLD_PID=`cat $LOCK_FILE`
  if [ ` ps -p $OLD_PID > /dev/null 2>&1 ` ]; then
    exit 0
  fi
fi
echo $$ > $LOCK_FILE

echo "select * from dnslogs where q_text like '$SQL_REGEX';" | \
  mysql -A -S $DB_SOCK -u $DB_USER --password=$DB_PASS $DB_NAME

rm -f $LOCK_FILE

And there it is! SQL reporting from DNS query logs! You can turn this into whatever report you like.

From there, you may wish to script solutions to partition the database and age the data.

Database partitioning should be done upon the q_timestamp value, dividing the table into periods which align with the expectation of the depth for which reporting is expected. On a minimal basis, I would recommend keeping at least 4 days of data in partitions of between 24 hours and 1 hour, depending upon the reporting expectations. If reports are upon the previous day’s data only, then 1 partition per day will do, while reports which are only interested in the past hour or so will benefit from having partitions of an hour. in MySQL, sub-partitions are not worthwhile because they give you nothing more than partitions but adds a layer of complexity on what is otherwise a linear data set.
Once partitioning is established, it should be possible to fulfill reports by querying only the relevant partitions to cover the time span of interest.
Partitioning also has another benefit, which is data aging. Instead of deleting old records, it is possible to drop entire partitions which cover select periods of time without having to create a huge temporary table to hold the difference as would be required by a delete operation. This becomes an extremely useful feature if you have a disk with a table size which is greater than the amount of free space available.

Script updates for add and drop partition to follow….

Advertisements

Search Engines and Privacy

Have you ever wondered how search engines actually make their money from advertising?

Have you ever had privacy concerns over the search terms you use?

Have you ever been freaked out by how well targeted modern advertising is?

If you can say yes to any of these – then read on! Otherwise, still read on for an eye-opener.

Search engines only exist because there is a financial model behind them which is there, naturally, to generate profits. So, how do search engines make their profit? Search engines primarily make their money from advertising (as their search and associated services are free to end-users) and they achieve this by three primary techniques:-

  1. Provide space for adverts on the site and rent them out.
  2. Extend scope of adverts through syndication schemes, embedded content and ‘like’ buttons.
  3. Sell your data (search terms and the IP addresses that they come from along with browser unique IDs held in cookies etc) to 3rd parties.

Through the many different tracking technologies available- it is easy to identify and build a profile of an Internet user. This data is collected in the form of search engine logs (on their servers) which can then be analysed either in real-time or at a later date, This statistical analysis provides deep insight into what other products and services might be of interest to you in order to elicit targeted marketing, however there can be a far more sinister use for this data too.

This data can be used to profile a person in order to find out things such as:-

  • Your name and any aliases (such as ‘internet’ names and previous names)
  • Birthday
  • Location
  • Address
  • Telephone Numbers
  • Email Addresses
  • Your interests
  • Tastes in music, clothes, and also food and drink
  • Your faith and beliefs
  • Your friends and associations
  • Your spending habits
  • Where you like to go
  • Times when you are not at home
  • Your car make, model, and registration
  • Where you work and what you do for a living
  • Your thoughts and feelings
  • Pictures of you in places and with people

All of these are used for profiling you in order to place you within a demographic classification which can then be identified and targeted for a number of uses including advertising.

How are these stats collected? The user normally wilfully gives them without a second thought. Sources such as your favourite search engine, along with Facebook, Twitter, Flickr, and other social media, which is then tied together into continuous sessions using tracking cookies.

Many social media tools like facial recognition on Facebook enable people to be accurately associated with others, another social engineering danger, which usually starts with “do you know ‘so-and-so’?”

Many web pages register your presence with syndication partners merely by viewing the page, for example, Google and Facebook get informed of your visit every time you visit a page with some of their ‘like’ buttons (but not all), so in many cases – even if you don’t click it, you may get tracked. Same for YouTube videos on sites other than YouTube itself – YouTube (and thus Google) will know you’ve visited even if you don’t watch any movies because it will have linked back to YouTube in order to provide the ‘player’ for that shared content. In another example, Amazon’s affiliate advertising syndication could be used to track users across pages which show Amazon affiliate adverts.

An interesting concept on social tracking is that if you use your friend’s wireless with your own equipment then there will be a chance to trace your tracking session transferring to a source of other known tracking sessions, thus it is able to also track your physical movements and correlate you to others through simultaneously sharing the same IP. Your iPhone or Android will go with you everywhere, and wherever there is wireless configured (with the correct password), it will use it and tell on you. Being fair on the matter, the telephone companies can do this far easier through 3G phone networks but being in the same proximity does not always positively prove a relationship – unlike sharing an IP.

It is also possible for ISPs to proxy your web traffic for the purpose of caching content in order to deliver a faster network – this can also be used as a source of data.

Many people simply ‘like’ products, creating for themselves an association with which allows others to gauge your persona because when you ‘like’ a product on Facebook, it tells all of your friends (or at least ‘friends’ on Facebook). This could then be used in social engineering attacks against you. Google track you simply for viewing a page with a Google+ button on it.

Google is so elusive – you need their “opt-out” plugin to avoid them – which as discussed before comes with an auto-update program which still ‘phones-home’.

So, by using social, open-source, and purchased data from many different sources such as Facebook, Twitter, MySpace, Google, Bing! Etc it is easily possible to build an accurate profile of a person, their relationships, and their surfing habits which will reveal insight into that person in order to target them for one purpose or another. This indeed is what the advertisers are after hence this is the reason why this data has value.

While this data has value, this data is also private data about individuals, and it is about your data too, which while a few traces will reveal little about you, long term traces can reveal lots more than you realise. How often do you clear your cookie cache? And do you have ‘Do Not Trace’ set on your web browser?

A recent row has broken out between online advertisers and Microsoft, who have taken the bold step to enable ‘Do Not Trace’ on their latest web browsers, which if you’re not aware is the default action of actively blocking tracking cookies. The advertisers are up in arms – and this tells you a lot about the value of the data.

To protect yourself from this form of personal data leakage, you should choose your web browser and search engines wisely. I am currently using SRW Iron for a web browser because it has all the power and prowess of Google Chrome with all of the Google-phone-home stuff taken out and a few safety features made default, and I use duckduckgo.com for a search engine because it supports encrypted connections through https, it does not record your IP in its logs, and they discard your results after 2 days. Duckduckgo.com also has a search portal within Tor and are strong advocates of internet privacy.

See duckduckgo’s privacy statement here:-

https://www.duckduckgo.com/privacy.html

Another search engine worth considering is ixquick, who’s privacy policy can be seen here:-

https://ixquick.com/eng/privacy-policy.html

For comparison, here’s Google’s privacy statement:-

https://www.google.co.uk/intl/en/policies/privacy/

Wow!, need I say more?

It is worth noting that all searches done using standard http can be recorded ‘on-the-wire’ by anyone who is monitoring the traffic. This includes all searches, returned content, and modifications to those searches, often, character-by-character where auto-fill offers search suggestions. For this reason, it is always worth using a search engine which can support https as this will stop a degree (but not all) snooping on the wire.

Another point worth noting is that you often lose protection the second you leave the https encrypted search engine page because you then give away the site which the search engine led you too. In the cases of many search engines, they too track this information, adding to their knowledge base of not only what you searched, but which links you clicked on. Once you click your intended site, you leave the protection of the encrypted search thus revealing your next site, so therefore using an encrypted search does not protect you beyond the initial search you undertake.

So, now knowing the extend of browser tracking, I encourage you to consider this when surfing the Internet, and take measures to protect your privacy.

Cached Passwords

Cached passwords are the holy grail of hackers and a principal target for information thieves, for this reason you should never save your password on your workstation and never use the same password across systems.

When prompted to save a persistent authentication token such password, you should always say no or never. The only place for a ‘stored’ password is an encrypted and password protected database (for which it should not have the password written down).

SRW Iron

Who does your browser consult when deciding which sites to go to? well, many browsers come with ‘smart’ technologies which ‘phone-home’ to various vendors and organisations of sorts, providing a potential data leakage of web request urls through services such as ‘safe-search’ and browser search plugins.

SRW Iron is Google Chrome with all the google-phone-home stuff taken out, and while yes, I know you can tweak Chrome to be as quiet if you want – Iron comes secured out of the box and has all functionality where data privacy is a concern is turned off permanently. Get it at:-

http://www.srware.net/en/software_srware_iron_download.php