Annual sales estimation of a darknet market

May 25, 2017 R finance darknet

The darknet is home to a multitude of Amazon-like marketplaces where one can purchase all manner of items, licit and illicit. One of the most famous of these sites is the Silk Road, a black market that sold illegal drugs until it was shut down by the United States FBI in October of 2013. One of the first of its kind, the Silk Road was structured as a Tor hidden service and allowed anyone with a specialized web browser to purchase illegal drugs on the internet. One study estimated that the site’s annual revenue, based on data collected in 2011 and 2012, was $14.4 Million U.S. Dollars. The “Sealed Complaint 13 MAG 2328: United States of America v. Ross William Ulbricht” suggests that annual revenues were even higher - approximately $80 Million.

Since the Silk Road’s demise, imitators have proliferated. Are they as successful as the original? What is the annual revenue of similar sites in 2017? These questions are hard to answer as sales data is not easy to obtain. However, one of these black markets, which wil be referred to as DNM, publishes the quantity sold for each of its listings. Scraping this data from the site will create a data set that can be used to estimate annual revenue, as well as identify the items that generate the majority those sales.

Data collection and processing

All active, non-auction, product listings from DNM were scraped in chunks between the dates of 4/17/17 and 5/13/17. The scraping was done with a Google Chrome extension developed by Martins Balodis. It can be found here. All of the resuting csv files are available in the github repository for this project. The JSON sitemap used for scraping is available there as well (DNM_sitemap.json).

Once collected, the csv files can be processed into a single, raw data set.

Load required libraries.

library(tidyverse)
library(stringr)
library(lubridate)
library(ggthemes)
library(scales)
library(knitr)

Read scraped data from csv files and combine all records into a single table.

#This function extracts the scrape date from the end of the filename and adds the value to a new column.
read_func <- function(x){
    read_csv(x) %>% mutate(scrape_date = ymd(str_extract(x, '[0-9]{8}')))
}

#Download data files from https://github.com/davidldenton/dnm/tree/master/data into local directory
#Replace '~/[email protected]/code/R/projects/DNM/data' with your local directory
csv_list <- list.files('~/[email protected]/code/R/projects/DNM/data', full.names = TRUE, pattern = '\\.csv$')

raw_dat <- map_df(csv_list, read_func)

Most of the variables of interest (e.g. price, quantity sold, days for sale) are contained in text fields in the raw data. Regular expressions are used to extract these values from the strings. In most cases, the variables of interest appear only once. Price, however, is found in two separate fields - in the listing on the overview page, and on the detail page for the specific product. Occasionally, the two price values do not match. In these cases, the price on the detail page is used.

The following code extracts the relevant data from the raw text, creates new sales variables based on quantity sold and time for sale, determines the most accurate price in cases where a discrepancy exists, and subsets the table to include only the variables of analytical interest. A few of the new fields, those with a ‘max’ prefix, are explained in greater detail in the ‘Analysis’ section.

price_func <- function(x,y){
    detail_price <- as.numeric(gsub(',', '', str_extract(x, '[0-9]+.*$')))
    listing_price <- as.numeric(gsub('[,|USD|\\s]', '', str_extract(y, 'USD\\s[[:graph:]]+\\s')))
    listing_price <- ifelse(is.na(listing_price), 0, listing_price)
    price <- ifelse(detail_price == 0 | is.na(detail_price), listing_price, detail_price)
}

dat <- raw_dat %>%
    as_tibble(select(category_id, subcategory_id, subcategory2_id, listing_link-href, detail_name, 
                     detail_qty_sold, listing_price, detail_price, listing_vendor, scrape_date)) %>%
    mutate(qty_sold = str_extract(detail_qty_sold, '[0-9]{1,5} sold since.*20[0-9]{2}'),
           quantity = as.numeric(str_extract(qty_sold, '^[0-9]{1,5}')),
           date_str = str_extract(qty_sold, '[A-Z].*20[0-9]{2}'),
           listing_date = mdy(date_str),
           price = price_func(detail_price, listing_price),
           days_for_sale = as.numeric(difftime(scrape_date, listing_date, units = c("days"))+1),
           max_quantity = replace(quantity, quantity == 0, 1),
           sales_in_period = price*quantity,
           max_sales_in_period = price*max_quantity,
           avg_daily_sales = sales_in_period/days_for_sale,
           max_avg_daily_sales = max_sales_in_period/days_for_sale) %>%
    select(product_category = category_id, product_subcategory = subcategory_id, 
           product_name = subcategory2_id, product_listing = detail_name, listing_date,
           scrape_date, quantity, max_quantity, price, days_for_sale, sales_in_period, 
           max_sales_in_period, avg_daily_sales, max_avg_daily_sales)

For reference, the category hierarchy is as follows: product_category > product_subcategory > product_name

Analysis

A total of 35485 listings were scraped. These are all active, non-auction, product listings. The site lists a much higher number, typically between 200K and 300K, but most of these are inactive. There are some listings that were missed by the scraper. The listings on DNM were scraped by each individual product name within each product subcategory. The site displays only the first 50 pages of listings for each of these products. In the few cases where all product listings cannot fit on 50 pages, some items were missed. This occured mostly within the ‘Drugs & Chemicals’ category. ‘Psychedelics - LSD’ and ‘Ecstasy - MDMA’, for example, exceed the 50 page limit. Fortunately, this seeems to have had little impact on the data. The listings are ordered by popularity, so the products that appear on the first 50 pages are the most actively sold. Using extremely precise search criteria to find some of the orphan product listings reveals mostly garbage listings composed of unreadable content, or products that have been listed for years without a sale.

To estimate annual sales, three variables are required: price (P), quantity sold (QTY), and the number of days the listing has been active (DAYS).

Avg_daily_sales = (P * QTY)/DAYS

This value, multiplied by 365, yields an estimate of total annual sales for a given listing.

While most of the product listings on DNM are listed for repeat sales, some are unique items that disappear once a sale is made. These listings would display a quantity sold of 0 until a sale is made and then the listing would be rendered inactive. To accomodate for this possibility, two annual sales estimates are made. The first uses the actual quantity sold and the second, using the ‘max_’ variables created during data processing, presumes a quanity sold of 1 for every listing that currently displays 0.

sum_dat <- dat %>%
    summarise(total_avg_daily_sales = sum(avg_daily_sales),
              total_annual_sales = total_avg_daily_sales*365,
              max_total_avg_daily_sales = sum(max_avg_daily_sales),
              max_total_annual_sales = max_total_avg_daily_sales*365)
Table 1: Summary sales statistics
total_avg_daily_sales total_annual_sales max_total_avg_daily_sales max_total_annual_sales
$982,447 $358,593,136 $1,152,337 $420,603,084

Actual annual revenues are likely close to the average of the two estimates above. As such, DNM will generate approximately $390 million in 2017. The average commision percentage for the darknet markets listed at deepdotweb is 3.6%. Assuming this figure is accurate, the operators of DNM stand to earn approximately $14 million. These sales figures are significantly higher than the most generous estimates of The Silk Road’s revenues. Given this dramatic increase and the overall proliferation of darknet markets, it seems clear that online black markets have experienced tremendous growth in the years since the Silk Road was taken down by the FBI.

Appendix: Charts

A few simple charts can help identify which products are most commonly purchased at DNM. The following bar chart illustrates how much each product category contributes to the site’s overall annual revenues.

The Drugs & Chemicals category clearly generates the vast majority of the sales at DNM. Overall, 80% of annual revenue comes from Drugs & Chemicals. Breaking this category down into its constituent subcategories indicates which types of drugs are most often purchased at DNM.

Despite what appears to be a few miscategorized products in the CVV & Cards subcategory, the majority of sales in the Drugs & Chemicals category come from Psychedelics, Cannabis & Hashish, and Stimulants. Together, these subcategories constitute 69.3% of revenues in the Drugs & Chemicals product category.

Lastly, the top 10 products by revenue, regardless of the product category, are as follows.

By itself, the site’s top product, LSD, generates 22% of all revenues.