You can test the owl method by simulating data. We create a reward function so that we know what the optimal treatment is for every patient. You can then train an OWL classifier on your data and see how well it fits the optimal classifier.
for example:
I am all U([-1,1]) distribution. I randomly and uniformly gave patients one of three treatments {1,2,3}.
The response function is sampled from the N(μ, 1) distribution. Where μ = (X₁ + X2)*I(T=1) + (X₁ —
# This code block creates the data for the simulation
import numpy as npn_train = 500 # I purposely chose a small training set to simulate a medical trial
n_col = 50 # This is the number of features
n_test = 1000
X_train = np.random.uniform(low = -1, high = 1, size = (n_train, n_col))
T = np.random.randint(3, size = n_train) # Treatments given at random uniformly
R_mean = (X_train[:,0]+X_train[:,1])*(T==0) + (X_train[:,0]-X_train[:,1])*(T==1) + (X_train[:,1]-X_train[:,0])*(T==2)
R = np.random.normal(loc = R_mean, scale = .1) # The stanadard deviation can be tweaked
X_test = np.random.uniform(low = -1 , high = 1, size = (n_test, n_col))
# The optimal classifier can be deduced from the design of R
optimal_classifier = (1-(X_test[:,0] >0)*(X_test[:,1]>0))*((X_test[:,0] > X_test[:,1]) + 2*(X_test[:,1] > X_test[:,0]))
It is not difficult to see that if both X₁ and X2 are positive, the optimal treatment approach is to give treatment 1. X2 if both are negative
Or you can show it as an image. Below are the various ranges of optimal processing shown for the range X₁, X2.
We sampled 500 data points using the 50 features and reward function described above. I fit an OWL classifier using a Gaussian (‘rbf’) kernel and obtained the following classification, which I visualized against the values of X₁, X2.
# Code for the plot
import seaborn as snskernel = 'rbf'
gamma = 1/X_train.shape[1]
# gamma is a hyperparameter that has to be found by cross validation but this is a good place to start
D = owl_classifier(X_train, T, R, kernel, gamma)
prediction = D.predict(X_test)
sns.scatterplot(x = X_test[:,0], y = X_test[:,1], c = prediction )
In case you missed what happened here, the data consisted of 2 features that influenced the response and 48 noise features. The model was able to learn the effects of two important features without modeling this relationship in any way!
This is just one simple example, and to make it easier to understand and visualize, I made the reward function depend on X₁ and X2. However, you are free to use other examples and try different classifiers.