Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Artificial Intelligence

Multi-Armed Bandits Applied to Order Allocation among Execution Algorithms | by Lars ter Braak | Mar, 2023

admin by admin
March 5, 2023
in Artificial Intelligence


Finding the right balance between exploitation and exploration

Confused robot observing three one-armed slot machines in Picasso style. Source: DALL-E 2.

Making decisions under uncertainty is a common challenge faced by professionals in various fields, including data science and asset management. Asset managers face this problem when selecting among multiple execution algorithms to carry out their trades. The allocation of orders among algorithms resembles the multi-armed bandit problem that gamblers face when deciding which slot machines to play, as they must determine the number of times to play each machine, the order in which to play them, and whether to continue with the current machine or switch to another. In this article, we describe how an asset manager can best distribute orders among available algorithms based on realized execution cost.

Dummy example

For each order, we take an action a to allocate to one of K algorithms

Eq. 1: Set of possible actions to allocate an order to one of K algorithms.

The value of action a is the expected execution cost for the algorithm

Eq. 2: (Unobserved) expected execution cost for action a, i.e. choosing a certain algorithm.

Suppose that K = 3 and the expected execution cost for the algorithms are

Eq. 3: (Unobserved) expected execution cost for three algorithms.

If you would know the action values a priori, it would be trivial to solve the problem. You would always select the algorithm with the lowest expected execution cost. Suppose now that we start allocating orders among the three algorithms as shown in Figure 1.

Figure 1: Example of order allocation among three algorithms and associated execution cost. Source: Author.

We still do not know the action values with certainty, but we do have estimates after some time t:

Eq. 4: (Observed) expected execution cost for action a conditional on the information up until time t.

We can for instance construct the empirical distribution of the execution cost¹ for each algorithm, as shown in Figure 2.

Figure 2: Empirical distribution of execution cost per algorithm after some time t. Source: Author.

Allocating all orders to the algorithm with the lowest expected execution cost may appear to be the best approach. However, doing so would prevent us from gathering information on the performance of the other algorithms. This illustrates the classical multi-armed bandit dilemma:

  • Exploit the information that has already been learned
  • Explore to learn which actions give the best outcomes

The objective is to minimize the average execution cost after allocating N orders:

Eq. 5: Objective function for order allocation problem.

Solving the problem using policies

To solve the problem, we need an action selection policy that tells us how to allocate each order based on current information S. We can define a policy as a map from S to a:

Eq. 6: Definition of an action selection policy.

We discuss the most well known policies² for the multi-armed bandit problem, which can be classified in the following categories:

  • Semi-uniform strategies: Greedy & ε-greedy
  • Probability matching strategies: Upper-Confidence-Bound & Thompson sampling

Greedy

The greedy approach allocates all orders to the action with the lowest estimated value. This policy always exploits current knowledge to maximize immediate reward:

Eq. 7: Action selection policy for greedy approach.

ϵ-Greedy

The ε-greedy approach behaves greedily most of the time but with probability ε selects randomly among the suboptimal actions:

Eq. 8: Action selection policy for ϵ-greedy approach.

An advantage of this policy is that it converges to the optimal action in the limit.

Upper-Confidence-Bound

The Upper-Confidence-Bound (UCB) approach selects the action with the lowest action value minus a term that is inversely proportional to the number of times the trading algorithm is used, i.e. Nt(a). The approach thus selects among the non-greedy actions according to their potential for actually being optimal and the associated uncertainties in those estimates:

Eq. 9: Action selection policy for Upper-Confidence-Bound (UCB) approach.

Thompson Sampling

The Thompson Sampling approach, as proposed by Thompson (1933), assumes a known initial distribution over the action values and updates the distribution after each order allocation³. The approach selects actions according to their posterior probability of being the best action:

Eq. 10: Action selection policy for Thompson sampling approach.

Evaluating policies

In practice, policies are commonly evaluated on regret which is the deviation from the optimal solution:

Eq. 11: Definition of regret as a function of a sequence of actions.

where μ* is the minimal execution cost mean:

Eq. 12: Expected execution cost for choosing the optimal action.

Actions are a direct consequence of the policy, and we can therefore also define regret as a function of the chosen policy:

Eq. 13: Definition of regret as a function of an action selection policy π.

In Figure 3, we simulate the regret for the aforementioned policies in the dummy example. We observe that the Upper-Confidence-Bound approach and Thompson sampling approach perform best.

Figure 3: Simulated regret for different action selection policies for dummy order allocation problem. Source: Author.

Allocating orders? Embrace uncertainty!

The dummy example simulation results strongly indicate that relying solely on a greedy approach may not yield optimal outcomes. It is, therefore, crucial to incorporate and measure the uncertainty in the execution cost estimates when developing an order allocation strategy.

Footnotes

¹ To ensure comparability of the empirical distribution of the execution cost, we need to either allocate similar orders or use order-agnostic cost metrics for evaluation.

² In situation where an algorithm’s execution cost are dependent on the order characteristics, contextual bandits are a more suitable option. To learn more about this approach, we recommend Chapter 2.9 in Barto & Sutton (2018) for an introduction.

³ We strongly suggest Russo et al. (2018) as an outstanding resource to learn about Thompson sampling.

Additional resources

The following tutorials / lectures were personally very helpful for my understanding of multi-armed bandit problems.

Industry

  1. Research Scientist Robert Schapire @ Microsoft
  2. Research Scientist Hado van Hasselt @ Deepmind

Academia

  1. Assistant Professor Christina Lee Yu @ Cornell
  2. Assistant Professor Emma Brunskill @ MIT

References

[1] Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.

[2] Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. (2018). A tutorial on Thompson sampling. Foundations and Trends® in Machine Learning, 11(1), 1–96.

[3] Thompson, W. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika. 25(3/4): 285–294.

[4] Thompson, W. R. 1935. On the theory of apportionment. American Journal of Mathematics. 57(2): 450–456.

[5] Eckles, D. and M. Kaptein. 2014. Thompson sampling with the online bootstrap. arXiv preprint arXiv:1410.4009.



Source link

Previous Post

Did You Know When to use Mean and when to use Median? | by Ashish Patel | Mar, 2023

Next Post

Introducing Microsoft Dynamics 365 Copilot, the world’s first copilot in both CRM and ERP, that brings next-generation AI to every line of business

Next Post

Introducing Microsoft Dynamics 365 Copilot, the world’s first copilot in both CRM and ERP, that brings next-generation AI to every line of business

CVAT: Computer Vision Annotation Tool - 2023 Guide

What is Google AI Bard?

Related Post

Artificial Intelligence

10 Most Common Yet Confusing Machine Learning Model Names | by Angela Shi | Mar, 2023

by admin
March 26, 2023
Machine Learning

How Machine Learning Will Shape The Future of the Hiring Industry | by unnanu | Mar, 2023

by admin
March 26, 2023
Machine Learning

The Pros & Cons of Accounts Payable Outsourcing

by admin
March 26, 2023
Artificial Intelligence

Best practices for viewing and querying Amazon SageMaker service quota usage

by admin
March 26, 2023
Edge AI

March 2023 Edge AI and Vision Innovation Forum Presentation Videos

by admin
March 26, 2023
Artificial Intelligence

Hierarchical text-conditional image generation with CLIP latents

by admin
March 26, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.