
In the first two parts of this series, we looked at modelling out pageview events to include accurate time spent and scroll depths. Now we’ll roll up sessions.
A session or visit to a web site is when a user comes to your site, does some stuff, then left. The key feature of a session is that the user came from somewhere, did some stuff, then left. The “some stuff” is a bunch of pageviews and events. The “somewhere” here is a traffic source, for example if they came from a Google search.
Create a lookup table on domain_sessionid
This model pulls out the various parameters from the first referrer, uses the Campaign attribution enrichment fields and also has a monster CASE statement classifying traffic sources on a number of dimensions but mostly using first_refr_urlhost
.
The giant CASE statement defining channel is probably the piece you want to most dig into and play with for your purposes. You’ll notice that I classify a first_refr_urlhost of NULL
as Unknown in contrast to Director Typed/Bookmarked. This is a very conscious decision as Direct and Typed/Bookmarked give the invalid impression that there is some certainty about the source when in fact we just don’t know where the session came from.
DROP TABLE IF EXISTS lookup_sessions;
CREATE TABLE lookup_sessions
DISTKEY(1)
SORTKEY(1)
AS (
WITH sessions AS
(SELECT DISTINCT
collector_tstamp,
domain_sessionid,
refr_urlhost,
page_referrer,
mkt_campaign,
refr_urlscheme,
refr_urlhost,
refr_urlport,
refr_urlpath,
refr_urlquery,
refr_urlfragment,
refr_medium,
refr_source,
refr_term,
mkt_content,
mkt_medium,
mkt_source,
page_urlpath,
FIRST_VALUE(refr_urlhost)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_refr_urlhost,
FIRST_VALUE(page_referrer)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_page_referrer,
FIRST_VALUE(refr_urlscheme)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_refr_urlscheme,
FIRST_VALUE(refr_urlport)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_refr_urlport,
FIRST_VALUE(refr_urlpath)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_refr_urlpath,
FIRST_VALUE(refr_urlquery)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_refr_urlquery,
FIRST_VALUE(refr_urlfragment)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_refr_urlfragment,
FIRST_VALUE(refr_medium)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_refr_medium,
FIRST_VALUE(refr_source)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_refr_source,
FIRST_VALUE(refr_term)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_refr_term,
FIRST_VALUE(mkt_campaign)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_mkt_campaign,
FIRST_VALUE(mkt_content)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_mkt_content,
FIRST_VALUE(mkt_medium)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_mkt_medium,
FIRST_VALUE(mkt_source)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_mkt_source,
FIRST_VALUE(page_urlpath)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS first_page_urlpath,
LAST_VALUE(page_urlpath)
OVER (
PARTITION BY domain_sessionid
ORDER BY domain_sessionid, collector_tstamp
rows between unbounded preceding and unbounded following ) AS last_page_urlpath
FROM atomic.events
WHERE app_id IN ('poplindata.com')
ORDER BY
1 DESC)
SELECT DISTINCT
domain_sessionid,
first_refr_urlhost,
first_page_referrer,
first_refr_urlscheme,
first_refr_urlport,
first_refr_urlpath,
first_refr_urlquery,
first_refr_urlfragment,
first_refr_medium,
first_refr_source,
first_refr_term,
first_mkt_campaign,
first_mkt_content,
first_mkt_medium,
first_mkt_source,
first_page_urlpath,
last_page_urlpath,
CASE
WHEN first_refr_urlhost LIKE '%poplindata.com'
THEN 'internal'
WHEN first_refr_urlhost = 'localhost'
THEN 'localhost'
WHEN first_refr_urlhost = '127.0.0.1'
THEN 'localhost'
-- Google
WHEN first_mkt_source = 'adwords'
THEN 'Adwords'
WHEN first_refr_urlhost LIKE 'www.google.%'
THEN 'Google'
WHEN first_refr_urlhost = 'com.google.android.googlequicksearchbox'
THEN 'Google'
WHEN first_refr_urlhost = 'encrypted.google.com'
THEN 'Google'
WHEN first_refr_urlhost LIKE '%doubleclick.net'
THEN 'Google Doubleclick'
WHEN first_refr_urlhost = 'plus.google.com'
THEN 'Google Plus'
WHEN first_refr_urlhost = 'plus.url.google.com'
THEN 'Google Plus'
WHEN first_refr_urlhost = 'www.googleadservices.com'
THEN 'Google Ad Services'
WHEN first_refr_urlhost = 'cse.google.com'
THEN 'Google Custom Search'
-- Email platforms
WHEN first_refr_urlhost = 'outlook.live.com'
THEN 'Email'
WHEN first_refr_urlhost = 'mailchi.mp'
THEN 'Email'
WHEN first_refr_urlhost = 'mail.google.com'
THEN 'Email'
WHEN first_refr_urlhost = 'com.google.android.gm'
THEN 'Email'
WHEN first_refr_urlhost LIKE '%ecosia.org'
THEN 'Ecosia Search'
WHEN first_refr_urlhost LIKE '%search.yahoo.com'
THEN 'Yahoo Search'
WHEN first_refr_urlhost = 't.co'
THEN 'Twitter'
WHEN first_refr_urlhost LIKE '%bing.com'
THEN 'Bing'
WHEN first_refr_urlhost = 'discourse.snowplowanalytics.com'
THEN 'Discourse'
WHEN first_refr_urlhost LIKE '%snowplowanalytics.com'
THEN 'Snowplow'
WHEN first_refr_urlhost = 'yandex.ru'
THEN 'Yandex'
WHEN first_refr_urlhost = 'www.linkedin.com'
THEN 'LinkedIn'
WHEN first_refr_urlhost = 'com.linkedin.android'
THEN 'LinkedIn'
WHEN first_refr_urlhost = 'lnkd.in'
THEN 'LinkedIn'
WHEN first_refr_urlhost = 'chrome.google.com'
THEN 'Chrome'
WHEN first_refr_urlhost = 'duckduckgo.com'
THEN 'DuckDuckGo'
WHEN first_refr_urlhost LIKE '%measurecamp.org'
THEN 'MeasureCamp'
WHEN first_refr_urlhost LIKE '%meetup.com'
THEN 'Meetup'
WHEN first_refr_urlhost LIKE '%wawsydney.com'
THEN 'WAW Sydney'
WHEN first_page_referrer LIKE '%meetup.com/Web-Analytics-Wednesday-Sydney%'
THEN 'WAW Sydney'
WHEN first_refr_urlhost LIKE '%wawmelbourne.com'
THEN 'WAW Melbourne'
WHEN first_page_referrer LIKE '%meetup.com/Web-Analytics-Wednesdays-Melbourne%'
THEN 'WAW Melbourne'
WHEN first_refr_urlhost LIKE '%meetup.com'
THEN 'Meetup'
WHEN first_refr_urlhost = 'com.Slack'
THEN 'Slack'
WHEN first_refr_urlhost LIKE '%reddit.com'
THEN 'Reddit'
WHEN first_refr_urlhost LIKE '%facebook.com'
THEN 'Facebook'
WHEN first_refr_urlhost LIKE '%baidu.com'
THEN 'Baidu'
WHEN first_refr_urlhost LIKE '%github.com'
THEN 'Github'
WHEN first_refr_urlhost LIKE '%getpocket.com'
THEN 'Pocket'
WHEN first_refr_urlhost LIKE '%evernote.com'
THEN 'Evernote'
WHEN first_refr_urlhost IS NULL
THEN 'Unknown'
ELSE 'other'
END AS channel
FROM sessions
ORDER BY channel
)
Reporting
To build the report shown above, we’re just looking at the channel categorizations and the associated sessions. Your app_id
would be different.
-- Landing pages
SELECT se.channel, COUNT(DISTINCT ev.domain_sessionid) AS sessions
FROM atomic.events ev, lookup_sessions se
WHERE CONVERT_TIMEZONE('Australia/Sydney', ev.derived_tstamp) > CONVERT_TIMEZONE('Australia/Sydney', SYSDATE) - INTERVAL '7 days'
AND EVENT = 'page_view'
AND ev.domain_sessionid= se.domain_sessionid
AND ev.app_id = 'poplindata.com'
GROUP BY 1
ORDER BY 2 DESC
A more advanced report breaks out traffic sources by landing pages, showing source then the first page in the session. Your app_id
would be different.
SELECT se.channel, se.first_page_urlpath, COUNT(DISTINCT ev.domain_sessionid) AS sessions, COUNT(*) AS pv
FROM atomic.events ev, lookup_sessions se
WHERE CONVERT_TIMEZONE('Australia/Sydney', ev.derived_tstamp) > CONVERT_TIMEZONE('Australia/Sydney', SYSDATE) - INTERVAL '7 days'
AND EVENT = 'page_view'
AND ev.domain_sessionid= se.domain_sessionid
AND ev.app_id = 'poplindata.com'
GROUP BY 1, 2
ORDER BY 3 DESC
Next steps
In the next installation we’ll look at user data. The most interesting component of that model is the ability to identify users retrospectively and across devices.