In 2017, Division on Addictions (DOA) and bwin Interactive Entertainment collected data on gambling activity by users who registered for the service between February 1st and 28th of 2005. This data_mart summarizes the activities of these users from February 1st until Septermber 30th 2005. In order to examine solely the gambling patterns of the users when they use their own money, we removed all of the events where the users were playing with promotional money. Below are a couple summary statistics of the forementioned users and an explanation of the logic that we used to summarize the gambling behavior of each user.

User Demographics

Gender and Age

One statistic that we found interesting is that females make up less than one-tenth of the new gambler demographic between February 1st and September 30th 2013. This reflects the trend of males being more interesting in playing high stakes games like poker.

Another interesting statistic is the fact that about half of the users were in their twenties. This is logical given people in their twenties generally do not have a lot of responsibilities like taking care of children and paying a mortgage. It could also reflect the fact that are still looking for that thrill in life, which can be gained in the short term through adrenaline raising activities, such as gambling.

Country

Another interesting statistic that came out of our data was that Germany is by far the best represented country amoung out of all countries. Given that the companies doing the research were in Austria, the data might be skewed to this country due to the proximity of the research location.

Acquisition Metrics

Here are the top acquisition platforms, filtered by those that brought more than 100 users to gambling. Clearly, BETANDWIN.DE is the most successful referer bringing in far more users at over 22 thousand for the period of February 1st until February 28th, 2005. The fact that the highest performing acquisition website has a ‘.de’ domain suggesting that the website is German. Thus, Germany is by far the nationality that is the best represented, it is logical that the best performing acquisition website is German.

Platform Users_Acquired
BETANDWIN.DE 22044
BETANDWIN.COM 14502
BETEUROPE.COM 2360
BETOTO.COM 2144
BETANDWIN POKER 483
BETANDWIN CASINO 456
PLAYIT.COM 327

Type of Gambling Activity

Users having signed on between February 1st and 28th seemed to be most interested by sports gambling, as they make up most of the activity measured.

Gambling Product Number of Users
Sports book fixed odds users 40572
Sports book live action users 25202
Poker users 2353
Casino BossMedia users 1862
Supertoto users 833
Games VS users 2530
Games bwin users 1827
Casino Chartwell users 4867

Maximum and minimum returns per gambling product

As we will explain below, in order to aggregate user activity, we calculated the maximum profit and the maximum loss that each user earned between February 1st and September 30th 2005. Here, we calculated what the average of these macimum profit and loss variables were for all of the users in order to see how well each product is doing. While the users make the highest average profit at Casino BossMedia, they also make their highest average loss there. On the other hand, Supertoto users never really make a proft. What would be interesting would be to see if the means of the profit and loss variables are statistically different in order to answer which product earns the highest margin.

Note: Because poker data was collected separatly from the other products, we created different variables for it. The variable most similar to profit and loss were the maximum poker chip buy and sell amounts, as they also represent money flow to and from the company. Thus, poker statistics are summarized in their own table.

Gambling Product Average Profit Average Loss
Sports book fixed odds users 84.7237830350981 -63.7932823252489
Sports book live action users 34.1906268986588 -49.9792293786207
Casino BossMedia users 91.6610670784103 -148.060899785177
Supertoto users -0.274633853541417 -1.30714789915966
Games VS users 10.6535182608696 -61.4971819367589
Games bwin users 10.4244630541872 -28.9911491516147
Casino Chartwell users 59.0799342510787 -128.492677213889
Poker Chip Action Average Maximum Amount Amoung Users
Maximum Poker Chip Sell Amount 121.121201219159
Maximum Poker Chip Buy Amount 157.277439402374

Metrics Calculated

Before even begining our analysis, we first wanted to remove all of the activity recorded where the user played with promotional money as that may not truly describe a users gambling habits if they were using their own money. To do this, we filtered out all play activities recorded before the first pay date (i.e. the date the users deposited money to play the games.) We used the following code to do this:

raw_user_daily_agg <- merge(x = raw_user_daily_agg, y = raw_demo[,c("UserID", "FirstPay")], by = "UserID", all.x = TRUE)

raw_user_daily_agg["Promotional"] <- raw_user_daily_agg$FirstPay > raw_user_daily_agg$Date

raw_user_daily_agg <- filter(raw_user_daily_agg, Promotional == "FALSE")

Then, in order to analyze user behaivior we created various metrics. For all of the products, except for poker, we calculated the user loyalty, consumption, addiction, and engagement. We defined loyalty as the length of the relationship in days (LOR_days_active), consumption as the total spent (stakes_total), addiction as the betting frequency (bet_freq) and the engagement as the number of days active (Nbr_days_active). Note, engagement differs from loyalty in that loyalty is the number of days between the first play and the last play while engagement is the total number of days that the user actually played.

We also calculated summary statistics, such as min, max and total for the number of bets, the amount of winnings and the amount of stakes. Finally, we created two variables for the largest amount lost in one day as well as the largest profit in one day.

To create these metrics, we used the following code:

raw_user_daily_agg %>%
    group_by(UserID) %>%
    filter(ProductDescription == x) %>%
    summarize(
      first_active_date = min(Date),
      last_active_date = max(Date),
      LOR_days_active = difftime(max(Date), min(Date), units = "days"),
      Days_of_Inactivity = difftime(as.Date('2005-09-30', "%Y-%m-%d"), max(Date), units = "days"),
      Nbr_days_active = n_distinct(Date),
      bets_total = sum(Bets),
      max_bet = max(Bets),
      min_bet = min(Bets),
      bet_freq = ifelse(LOR_days_active == 0, 0, bets_total/as.double(LOR_days_active, units='days')),
      win_total = sum(Winnings),
      max_win = max(Winnings),
      min_win = min(Winnings),
      stakes_total = sum(Stakes),
      max_stake = max(Stakes),
      min_stake = min(Stakes),
      max_loss = min(Profit_Loss),
      max_profit = max(Profit_Loss)
    )

Because the information on the poker product came from a different database, the summary statistics calculated were a bit different but essentially following the same logic. For poker, we split the data into buying poker chips and selling poker chips. Again, we calculated the user loyalty (LOR_days_active), consumption (total_amount), addiction (avg_amount_per_day), and engagement (Nbr_days_active).

raw_poker_chip %>%
  group_by(UserID) %>%
  filter(TransType == "buy") %>%
  summarize(
    first_active_date = min(TransDate),
    last_active_date = max(TransDate),
    LOR_days_active = difftime(max(TransDate), min(TransDate), units = "days"),
    Days_of_Inactivity = difftime(as.Date('2005-09-30', "%Y-%m-%d"), max(TransDate), units = "days"),
    Nbr_days_active = n_distinct(TransDate),
    total_amount = sum(TransAmount),
    max_amount = max(TransAmount),
    min_amount = min(TransAmount),
    avg_amount = mean(TransAmount),
    avg_amount_per_day = poker_buy_total/as.double(Nbr_days_active_buy_poker, units='days'))

Once we created all of the variables mentioned above for all of the products, we merged them with the relevant demographics variables–i.e. Country, Language, Registration Date, Application/Acquisition, Gender–from the demographics database. We also added the age variable from analytics database as we did not have access to the users’ birthday.

To conclude, the data mart created contains all of the demographics and metrics described above for each of the users in the databases.