Understanding web behaviors with data science

 

Over the past year, I’ve worked with my data science team to develop logistic regression models to improve the usability of our product and unearth interesting behavioral insights about web traffic for our clients.

In a typical web session, the pages visitors navigate are designed to achieve a certain end goal. Our work was focused on making the design of these pages easier. The process of website experience design is not an exact science. There is a lot of debate on how to structure a page for the best outcome. For example, does a single page checkout flow with all the steps on one page (cart overview, shipping details, payment details, confirmation notice) perform the best? Or is it better to break these down into multiple pages for the user to click through? These are the types questions our product answers, though the process was excruciatingly manual until we used data science to draw out logic from the chaos!

We successfully used clustering analysis to group site visitor behaviors and predict session outcomes. Our model calculates the similarity of behavior patterns and then predicts an outcome.

Screen Shot 2020-02-20 at 1.46.10 PM.png

One of the most compelling findings during this project was the model’s ability to predict the conversion rate with a limited number of initial behaviors.

Picture1.png

Model 1 applies logistic regression to the first 10 events of a web session and predicts whether a session would convert down the line. For context, the session data we used to train and test the model were on average 1,000 events long, meaning the model accurately predicts outcomes with only 1% of user behavior data! Subsequently we increased the number of events we exposed to the model and saw our accuracy increase, though not by very much (see below).

 
Use events Distinct Events Accuracy AUC
First 5 134 99.697% 0.99863
First 10 189 99.776% 0.99943
First 20 255 99.957% 0.99997

In conclusion, we learned that we could effectively predict whether a session would convert based on a small number of initial behaviors. This allowed our clients not only to make improvements to their pages, but customize these pages in real time to improve potential outcomes. Furthermore, we learned that increasing the number of events used in the model did not lead to notable increases in accuracy. This insight saved us a huge amount of cost by avoiding the need to spin up new machines to analyze extra events.