Snowplow already provides a few tools for identifying events from non-human devices (“crawlers”, “bots”, etc.) however most of these function based on metadata collected with events, like the User Agent string and IP address (e.g., the IAB Bots and Spiders enrichment) which is also used to flag bot traffic in Google Analytics.
As User Agents are progressively frozen to include less descriptive information and bots become more sophisticated, these methods are becoming ineffective over time.
Breaking existing CAPTCHAs is now a well known and inexpensive problem with more exotic methods like generating convincing mouse trajectories to fool newer CAPTCHAs starting to enter the wild.
Unlike more traditional CAPTCHAs which may require the user to interactively identify text, traffic lights, boats or decode The Da Vinci Code, the passive capture uses background signals from the users’ browser including typing speed and mouse movements to determine just how human a user is. As reCAPTCHA has a wide deployment (several million sites) Google uses these signals combined with machine learning to return a score from 0.0 to 1.0 that can now be attached to your Snowplow events.
From a technical perspective reCAPTCHA v3 is requested in the background and requests a short-lived (2 minute) token which is then attached as an entity to a Snowplow event. This can be attached to all events – or only a subset of events if required.
As with many methods this detection is probabilistic so there is always the possibility of false positives and false negatives so we recommend considering combining this technique with additional detection methods such as intrusion detection systems and UI honeypots.