← Mobile Data Services

User Activity SQL Queries

Note Some of these queries might not run perfectly with the postgres indexes we have setup already. So please don’t spam these queries, they’ve been written in the midst of fighting fires and are to be used for doing the same in the future.

How many users ran during some period of time?

Query

select count(distinct user_id)
from aggregation_status
where request_type in (0, 1, 5, 7, 8)
and (run_time > timestamp '2015-05-04 11:35:00.000-05'
     and run_time < timestamp '2015-05-04 11:37:00.000-05')
;

Example Output

  count
-------
    77
(1 row)

How many users ’timed out'?

  • ’timed out’ means that the overall aggregation took at least some fixed interval of time. Typically we’ve set this to 5 minutes, and when that time elapses and we’re still waiting for something from a service we just end the aggregation and reset the user’s status back so they can try again. Timeouts can happen for any number of reasons (repeated slow queries, services dying and not shutting down cleanly, really slow banking websites, etc.)

Query

-- You can also find out how long they took with something like
--select distinct O.user_id, A.institution_id, A.run_time - O.date

-- Otherwise, here's the count
select count(distinct O.user_id)
from online_aggregations as O
inner join aggregation_status as A
on o.user_id = a.user_id
where (O.date > (now() - interval '3 hours')) and
      (A.run_time > (now() - interval '3 hours')) and
      (A.run_time - O.date > interval '5 minutes') and
      (A.institution_id <> '5e4dcedb-567f-417b-b428-2231d4713b63') -- ignore artemis
;

Example Output

  count
-------
   789
(1 row)

Net online aggregations in the past week, broken down by hour.

It’s been helpful to know when our peaks are for throughput (aggregation wise), and this provides a pretty good count of the aggregation counts flowing through.

Query

-- net online aggs count in the past 7 days, broken down by day
select count(*), date_trunc('day', date), lpad('', (count(*)/1000)::int, '#') as histogram
from online_aggregations
where date > now() - interval '7d'
group by date_trunc('day', date)
order by date_trunc('day', date) desc
;

-- stats on ^^ limited to 7AM to 7PM
with stats as (
  select count(*) as online_aggs, date_trunc('hour', date), lpad('', (count(*)/10)::int, '#') as histogram
  from online_aggregations
  where date > now() - interval '7d'
  and (extract(hour from date) > 7 and extract(hour from date) < 19)
  group by date_trunc('hour', date)
  order by date_trunc('hour', date) desc
)
select max(online_aggs), min(online_aggs), avg(online_aggs), stddev(online_aggs)
from stats
;

Example Output

What are some of the most frequent users, and how much are they aggregating?

We’ve sometimes had users who try and aggregate a bunch against our systems. This can cause problems if they have a lot of accounts (we’ve seen users with 40+ accounts aggregate 10+ times a day).

Query

-- most frequent users, avg / stddev of user syncs per day, for the last week
select distinct oa.user_id, u.primary_institution_username, i.name, count(oa.user_id), date_trunc('day', date), lpad('', (count(*)/5)::int, '#') as histogram
from online_aggregations as oa
inner join users as u
on u.user_id = oa.user_id
inner join institution as i
on u.primary_institution_id = i.institution_id
where date > now() - interval '7d'
group by date_trunc('day', date), oa.user_id, u.primary_institution_username, i.name
order by count(oa.user_id) desc
limit 25
;

-- math on ^^
with stats as (
  select distinct oa.user_id as uid, u.primary_institution_username, i.name, count(oa.user_id) as aggs, date_trunc('day', date), lpad('', (count(*)/5)::int, '#') as histogram
  from online_aggregations as oa
  inner join users as u
  on u.user_id = oa.user_id
  inner join institution as i
  on u.primary_institution_id = i.institution_id
  where date > now() - interval '7d'
  group by date_trunc('day', date), oa.user_id, u.primary_institution_username, i.name
  order by count(oa.user_id) desc
),
under_two as (
  select count(*) as users_under_two, stats.uid
  from stats
  group by stats.uid
  having count(aggs) <= 2
)
select max(aggs), avg(aggs), stddev(aggs), count(users_under_two) as users_under_two, count(distinct stats.uid) as net_users_with_sync
from stats
left outer join under_two
on stats.uid = under_two.uid
;

Find users with the most accounts running within a period of time.

This is another query to go along with looking for users that have a lot of accounts but this time it’s limited to an interval of time, not just the last week.

Query

with top_logins_with_accounts as
(select distinct count(account_id), user_id
 from accounts
 where deleted_at is null
 group by user_id
 order by count(account_id) desc
 limit 200)
select distinct top.user_id, count
from top_logins_with_accounts as top
inner join login_accounts as la
on la.user_id = top.user_id
where (la.last_status_change > timestamp '2014-11-14 7:00:00.000-05' and
       la.last_status_change < timestamp '2014-11-14 8:45:00.000-05')
and la.active = true
order by count desc
;