[ad_1]

## What is A/B Testing

A/B testing, also known as split testing, refers to a randomized experimentation process wherein two or more versions of a variable (web page, advertisement, etc.) are shown to different segments of users or customers at the same time to determine which version leaves the maximum impact and drives business metrics.

This article covers the three types of A/B Testing.

- Classical A/B Testing
- Sequential A/B Testing
- Machine Learning for A/B Testing

## Overview and Objective

SmartAd is a mobile-first advertiser agency. It designs intuitive touch-enabled advertising. It provides brands with an automated advertising experience via machine learning and creative excellence. Their company is based on the principle of voluntary participation which is proven to increase brand engagement and memorability 10 x more than static alternatives.

For a client, SmartAd is running an online advertisement in an effort to raise brand awareness. By charging the client depending on user engagements with the ad it created and delivered via various channels, SmartAd earns money. SmartAd offers an additional service that measures the growth in brand recognition as a result of the ads it displays to online users in an effort to boost its market competitiveness. The major goal of this research is to determine whether the company’s campaigns significantly increased brand recognition.

In order to assess the influence of the creative, or advertisement they design, on various upper funnel KPIs, such as memorability and brand sentiment, SmartAd offers an additional service called Brand Impact Optimiser (BIO), a brief questionnaire supplied with every campaign.

The BIO data for this project is a “Yes” and “No” response of online users to the following question

** Q: Do you know the brand Lux? **O Yes. O No.

## About the data

The data has the following columns

**auction_id:**the unique id of the online user who has been presented the BIO. In standard terminologies, this is called an impression id. The user may see the BIO questionnaire but choose not to respond. In that case, both the**yes**and**no**columns are zero.**experiment**: which group the user belongs to — control or exposed.**date**: the date in YYYY-MM-DD format**hour**: the hour of the day in HH format.**device_make**: the name of the type of device the user has e.g. Samsung**platform_os:**the id of the OS the user has.**browser**: the name of the browser the user uses to see the BIO questionnaire.**yes**: 1 if the user chooses the “Yes” radio button for the BIO questionnaire.**no**: 1 if the user chooses the “No” radio button for the BIO questionnaire.

If a recent advertising campaign significantly increased brand recognition, a solid hypothesis testing algorithm is needed to determine whether this happened. Using a survey questionnaire, we acquired information from people who viewed both fake and real adverts. Let’s apply regression methods to forecast how well the creative advertisement will raise brand recognition for our goal variable.

## Methodologies/Tools

**MLFlow**: is an open-source platform for **managing the end-to-end machine learning lifecycle**. It has the following primary components: Tracking: Allows you to track experiments to record and compare parameters and results.

- It is easy to set up a model tracking mechanism in MLflow.
- It offers very intuitive APIs for serving.
- It provides data collection, data preparation, model training, and taking the model to production.

**Data Version Control(DVC)**: Open-source Version Control System for Machine Learning Projects. It is an open-source Version Control System for data science and machine learning projects.

- Along with data versioning, DVC also allows model and pipeline tracking.
- With DVC, you don’t need to rebuild previous models or data modeling techniques to achieve the same past state of results.
- Along with data versioning, DVC also allows model and pipeline tracking.

**Continuous Machine Learning(CML):** is CI/CD for Machine Learning Projects · GitFlow for data science · Auto reports for ML experiments.

## Classical A/B Testing

In our classical A/B Testing we have performed the following steps

- Define the baseline conversion rate and minimum detectable effect (MDE).
- Calculate the sample size using statistical power and significance with the addition of the metrics in the above step.
- Drive traffic to your variations until you reach the target sample for each variation
- Finally, evaluate the results of your A/B test.

If the difference in performance between variations reached MDE or exceeded it, the hypothesis of your experiment is proven right, otherwise, it’s necessary to start the test from scratch.

**Results Obtained from Classical A/B Testing**

We have found out that our sample size should be 11661 to achieve 80% power. Statistical Power is the probability that the test correctly rejects the null hypothesis.

We have got a higher significance level than the p-value also the significant power is too low which shows there is a possibility of having a Type II error(fail to reject the null hypothesis when it is false).

**Limitations and Challenges of Classical A/B Testing**

- Can take lots of time and resources to collect the desired sample size.
- Assumes an unchanging worldview and doesn’t take into account changes in trends and consumer behavior and the impact of seasonal events, for example. In reality, the winner may change over time depending on influencing factors such as the ones mentioned above.
- Can not handle multiple variable complex systems.

**Sequential A/B Testing**

Sequential A/B testing allows experimenters to analyze data while the test is running in order to determine if an early decision can be made. sequential sampling works in a very non-traditional way; instead of a fixed sample size, you choose one item (or a few) at a time and then test your hypothesis. We will use the Sequential probability ratio testing (SPRT) algorithm, which is based on the likelihood ratio statistic, for our dataset.

General steps of conditional SPRT

- Calculate critical upper and lower decision boundaries
- Perform cumulative sum of the observation
- Calculate test statistics(likelihood ratio) for each of the observations
- Calculate upper and lower limits for the exposed group
- Apply to stop.

**Results Obtained from Sequential A/B Testing**

We used the conditional SPRT algorithm to calculate the critical upper and lower limits and performed a cumulative sum of the observation and plotted the below figure and the statistical result as shown below. Which clearly shows that more samples are needed.

**Limitations and Challenges of Sequential A/B Testing**

Even if Sequential A/B Testing is incredibly useful at clustering audiences into different segments (clusters)- segmenting audience data into similar groups from a range of dimensions and can be utilized to perform focused A/B testing on more granular audience groupings, sometimes it overcomplicates the A/B Testing for simple dataset (system)

**Significance Testing Using Machine Learning(ML-based A/B Testing)**

It’s not just a matter of calculating the difference between the exposed and controlled variances to determine which one has been chosen when using machine learning for A/B testing; instead, it’s about figuring out which parameter (variable) in the data has the highest significance value for predicting the outcome.

In order to conduct this experiment, we assembled 3machine learning models of our choosing and attempted to determine each model’s accuracy score and correlation matrix.

**Five-fold cross-validation**

The ﬁve-fold cross-validation (CV) is a process when all data is randomly split into k folds, in our casek=5, and then the model is trained on the k−1 folds, while one fold is left to test a model.

This procedure is repeated k times. However,in this work, all data ﬁrst is split into training and testingdatasets, and a training dataset is used for cross-validation.

**Logistics Regression**

Logistic Regression is a “Supervised machine learning” algorithm that can be used to model the probability of a certain class or event. It is used when the data is linearly separable and the outcome is binary or dichotomous in nature.

**Without applying 5 folds**

`{'accuracy': 0.58, 'precision': 0.51, 'recall': 0.36, 'entropy': 0.68, 'true_pos': 160, 'true_neg': 57, 'false_pos': 54, 'false_neg': 102}`

Feature Importance

Confusion Matrix

**Applying 5 folds**

`{'accuracy': 0.5553319919517102, 'validation': 0.5542168674698795}`

Accuracy for the 5 folds

**2. Random Forest Classifier**

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

**Without applying 5 folds**

`{'accuracy': 0.55, 'precision': 0.47, 'recall': 0.52, 'entropy': 0.96, 'true_pos': 122, 'true_neg': 82, 'false_pos': 92, 'false_neg': 77}`

Feature Importance

Confusion Matrix

**Applying 5 folds**

`{'accuracy': 0.8571428571428571, 'validation': 0.4979919678714859}`

Accuracy for the 5 folds

**3. XGBoost**

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable.

**Without applying 5 folds**

`{'accuracy': 0.54, 'precision': 0.46, 'recall': 0.52, 'entropy': 0.86, 'true_pos': 119, 'true_neg': 82, 'false_pos': 95, 'false_neg': 77}`

Feature Importance

Confusion Matrix

**Applying 5 folds**

Accuracy for the 5 folds

`{'accuracy': 0.8249496981891348, 'validation': 0.5060240963855421}`

**Conclusion**

We faced an issue in raising a difinitive conclusion because more data was needed to train the models, even though applying five-folds can be performed here but still the sample size is small.

Not having enough sample is the reason that accuracy level of logistic regression and xgboost model arealmost identical.

We can see the confusion matrix is not the perfect one and it is because we have small sample size.

**References**

- https://www.researchgate.net/publication/340524896_Prediction_of_geometry_deviations_in_additive_manufactured_parts_comparison_of_linear_regression_with_machine_learning_algorithms#pf7
- https://xgboost.readthedocs.io/en/stable/
- https://www.databricks.com/session_eu20/data-versioning-and-reproducible-ml-with-dvc-and-mlflow
- https://www.kaggle.com/code/joparga3/2-tuning-parameters-for-logistic-regression/notebook
- https://scikit-learn.org/stable/auto_examples/linear_model/plot_logistic_l1_l2_sparsity.html
- https://www.freshrelevance.com/blog/limitations-a-b-testing
- https://www.knime.com/blog/regularization-for-logistic-regression-l1-l2-gauss-or-laplace
- https://splitmetrics.com/blog/sequential-a-b-testing/
- https://hbr.org/2017/06/a-refresher-on-ab-testing
- https://github.com/iterative/cml#install-cml-as-a-package

[ad_2]

Source link