Welcome to rickit’s documentation!

dat is the Domain Analysis Toolkit

See the README for a general introduction.

Modules

The functionality is organised in the following modules.

Analysis

Analysis and computations on domain names.

This module provides functions that can be applied to a domain name. Similarly to richkit.lookup, and in contrast to richkit.retrieve, this is done without disclosing the domain name to third parties and breaching confidentiality.

Note

For this entire module, we adopt the notion of effective Top-Level Domains (eTLD), effective Second-Level Domain (e2LD), etc. “Effective” refers to the practice where the public sufffic is considered the effective TLD, and counted as one label. The list of public suffixes, maintained by Mozilla, is used as the definitive truth on what public suffixes exists.

richkit.analyse.depth(domain)

Returns the effective depth of the domain,

The depth is the number of labels in the domain.

Example:google.co.uk is “effectively a 2LD. google is one label. The public suffix of co.uk is considered one label effectively. With effectively two labels, the effective depth is two.
Parameters:domain – Domain (string)
richkit.analyse.entropy(s)

Returns the entropy of characters in s.

Parameters:s – Domain (string)
richkit.analyse.language(domain)

Returns the best gues for the language of the domain.

Parameters:domain – Domain (string)
richkit.analyse.length(domain)

Returns the sum of count of characters for all labels.

Parameters:domain – Domain (string)
richkit.analyse.n_grams_alexa(domain)

Returns similarity to distribution of N-grams in Alexa Top 1M.

Parameters:domain – Domain (string)
richkit.analyse.n_grams_dict(domain)

Returns similarity to distribution of N-grams in English dictionary

Parameters:domain – Domain (string)
richkit.analyse.n_label(domain, n)

Returns the Effective N’th-level label.

Parameters:
  • domain – Domain (string)
  • n – N’th-Level (int)
richkit.analyse.nld(domain, n)

Returns the Effective N’th-Level Domain (eNLD).

Parameters:
  • domain – Domain (string)
  • n – N’th-Level (int)

Usage:

from richkit.analyse import nld

## returns second level domain … print(nld(“www.google.com”, 2))

## returns top level domain print(nld(“www.google.com”,1))

richkit.analyse.number_consonants(s)

Returns the number consonants to all characters in s.

Parameters:s – Domain (string)
richkit.analyse.number_numerics(s)

Returns the number numeric characters to all characters in s.

Parameters:s – Domain (string)
richkit.analyse.number_specials(s)

Returns the number special characters to all characters in s. The default special character list is “~`!@#$%^&*()_={}[]:>;’,</?*-+”.

Parameters:s – Domain (string)
richkit.analyse.number_vowels(s)

Returns the number vowels to all characters in s.

Parameters:s – Domain (string)
richkit.analyse.number_words(s)

Returns the number of English word found in s.

Parameters:s – Domain (string)
richkit.analyse.ratio_consonants(s)

Returns the ratio consonants to all characters in s.

Parameters:s – Domain (string)
richkit.analyse.ratio_numerics(s)

Returns the ratio numeric characters to all characters in s.

Parameters:s – Domain (string)
richkit.analyse.ratio_specials(s)

Returns the ratio special characters to all characters in s. The default special character list is “~`!@#$%^&*()_={}[]:>;’,</?*-+”

Parameters:s – Domain (string)
richkit.analyse.ratio_vowels(s)

Returns the ratio vowels to all characters in s.

Parameters:s – Domain (string)
richkit.analyse.sl_label(domain)

Returns the Effective 2-level label.

Parameters:domain – Domain (string)
richkit.analyse.sld(domain)

Returns the Effective Second-Level Domain (2LD) (aka Apex Domain).

The 2LD, aka the Apex Domain, is extracted from the domain, using the list of public suffixes maintained by Mozilla

Parameters:domain – Domain (string)
richkit.analyse.tld(domain)

Returns the Effective Top-Level Domain (eTLD) (aka Public Suffix).

The eTLD is extracted from the domain,

Parameters:domain – Domain (string)

Lookup

Retrieve

Retrieval of data on domain names.

This module provides the ability to retrieve data on domain names of any sort. It comes without the “confidentiality contract” of richkit.lookup.

richkit.retrieve.dns_a(domain)

Return the A Records of a given domain :param domain: domain (string) :return: IP Addresses (list)

richkit.retrieve.dns_ptr(ip_address)

Return the PTR record of a given IP address :param ip_address: IP Address (string) :return: domains (list)

richkit.retrieve.symantec_category(domain)

Returns the category from Symantec’s BlueCoat service. :param domain: :return:

richkit.retrieve.dns.get_a_record(domain)

Return the A record list of a given domain :param domain: domain (string) :return: IP Addresses (list)

richkit.retrieve.dns.get_ptr_record(ip_address)

Return the PTR record of a given IP Address :param ip_address: IP Address (string) :return: domains list

Symantec Web Service

This is generated to get categories of given urls, normally it fetches category from symantec web service then saves it to local file which is called categorized_urls under richkit/retrieve/data/

How to use:

>>> # Import necesseary functions and make a call as demonstrated given below
>>> from richkit.retrieve.symantec import fetch_from_internet
>>> from richkit.retrieve.symantec import LocalCategoryDB
>>>
>>> urls = ["www.aau.dk","www.github.com","www.google.com"]
>>>
>>> local_db = LocalCategoryDB()
>>> for url in urls:
...     url_category=local_db.get_category(url)
...     if url_category=='':
...         url_category=fetch_from_internet(url)
...     print(url_category)
Education
Technology/Internet
Search Engines/Portals

Indices and tables