Achievable label calculation: standard scatter factor
Defining the shatter coefficient verbally seems simple at first glance.
Given a hypothesis class hour, that NExpressed as ᵗʰ scatter coefficient Sₙ(hour), indicates Maximum number of labels a classifier can achieve hour to the sample N Feature vector.
But what is “?labeling“? And what makes it “achievable“? Answering these questions will help us lay the foundation for pursuing a more formal definition.
In the context of binary classification, labeling One of the feature vector samples is simply one way in which a value from the set { -1, 1 } can be assigned to that vector. As a very simple example, consider two one-dimensional feature vectors (i.e. points on a vertical line). X₁ = 1 and X2 = 2.
A possible labeling is a combination of classification values that can be assigned to individual feature vectors independently of each other. Each labeling can be expressed as a vector. Here the first and second coordinates represent the values assigned to: X₁ and X2, respectively. So the set of possible labelings is { (-1, -1), (-1, 1), (1, -1), (1, 1) }. A sample of size 2 produces 2² = 4 possible labelings. We’ll soon see how this generalizes to samples of arbitrary size.
we say this Labeling is achievable As a hypothetical class hour If a classifier exists hour ∈ hour That labeling can happen. Continuing with our simple example, assume we are restricted to classifiers of the form: X ≥ KK ∈ ℝ, i.e. a one-dimensional threshold such that all items to the right of the threshold are classified as positive. Labeling (1, -1) is not achievable for this hypothesis class. Xgreater than 2 X₁ means any threshold for classification. X₁ You must actively do the same. X2. Therefore, the achievable labeling set is { (-1, -1), (-1, 1), (1, 1) }.
Once we understand the basic terminology, we can begin to develop notations to formally express the elements of the linguistic definition we started with.
We stick to representing the labeling as vectors, as we did in our simple example, with each coordinate representing a classification value assigned to that feature vector. There are 2ⁿ Possible total labeling: For each feature vector there are two possible choices and the labeling can be thought of as a set of N Each such choice is made independently of the rest. For hypothetical classes hour All possible labeling of samples can be achieved 𝒞ₙ, in other words, achievable Labeling of 𝒞ₙ Same as 2ⁿ, we say so H is broken 𝒞ₙ.
Finally, we use the above notation to converge on the following more stringent definition: Sₙ(hour):
Following our explanation of the shattering, Sₙ(hour) same as 2ⁿ This means that a sample size exists. N it’s broken hour.
Hypothetical class expressivity estimation: regular VC dimension
Vapnik–Chervonenkis (VC) dimensionality is a way to measure the expressiveness of a hypothesized class. This builds on the shattering concept we just defined and plays an important role in determining which classes of hypotheses are amenable to PAC learning and which are not.
Let’s start by intuitively defining the standard VC dimensions.
Given a hypothesis class hourThe corresponding VC dimension is VCdim(hour) is defined as the largest natural number. N If a sample size exists N that broken by hour.
use Sₙ(hour) allows us to express this much more clearly and concisely.
Vdim(hour) = maximum{ N ∈ ℕ : Sₙ(hour) = 2ⁿ }
However, this definition is imprecise. Note the set of numbers with a shatter coefficient of 2.ⁿ It could be infinite. (So VCdim(hour) = .) Then the set has no well-defined maximum. We solve this problem by taking the highest value instead:
Vdim(hour) = eat dinner N ∈ ℕ: Sₙ(hour) = 2ⁿ }
This strict and concise definition is the one we will use going forward.
Adding Affinity to the Mix: Strategic Coefficient of Destruction
It is quite simple to generalize the standard concepts we have just looked at to work in strategic settings. Redefine the shatter factor as: Data Point Highest Response What we defined in the previous article is actually all we need to do.
Given a hypothesis class hour, Basic setting RAnd the cost function Seed, that much NShatter coefficient of ᵗʰ Sүүᴄ⟨hour, R, Seed⟩, denoted as σₙ(H, R, C), indicates The maximum number of labels a classifier can achieve is: hour on set N Potentially manipulated feature vectors, i.e. N Data Point Best Response.
Again, here’s how we defined the data point optimal response:
To make this more formal, we can adapt the notation used in the discussion of normal scatter coefficients.
The biggest difference is that each X Your sample should have those items. R. Other than that, in the standard case, placing the data point best response with x works smoothly.
As a quick sanity check, let’s consider what happens if: R = { 0 }. Realized compensation period 𝕀(hour(ji) = 1) ⋅ R It will be 0 at all data points. Utility maximization therefore becomes synonymous with cost minimization.. The best way to minimize the cost incurred by data points is simple. No manipulation of feature vectors.
d(X, R; hour) In the end, it’s always just X, It places us firmly within the realm of formal classification. σ is as follows:ₙ(hour{ 0 }, Seed) = Sₙ(hour) every H,c. This is consistent with our observation that the fair preference class is expressed as follows: R = { 0 } is equivalent to standard binary classification.
Expressiveness by Preference: Strategic VC Dimension (SVC)
defined Nᵗʰ strategic shatter factor, We can simply replace Sₙ(hour) From the formal definition of the VC dimension for σ,ₙ(hour, R, Seed).
SVC(H, R, C) = eat dinner N ∈ ℕ: pₙ(H, R, C) = 2ⁿ }
Based on the example considered above, SVC (hour{ 0 }, Seed) = VCdim(hour) anything hour, Seed. of course, SVC is VCdim as its strategic shatter coefficient is the same as the standard. Both are elegant generalizations of non-strategic concepts.
From SVC to strategic PAC learnability: The basics of strategic learning
We can now use SVC to describe the fundamental theorem of strategic learning. This links the complexity of the strategic classification problem to the possibility of (agnostic) PAC learning.
Strategic Classification Instance S Enjoyhour, R, Seed⟩SVC(H, R, C) is finite. that much sample complexity For learning strategic agnostic PAC, middle(d, E) ≤ Cε ⁻² ⋅ (SVC(H, R, C) + log(1/d))with Seed Because it is a constant.
I won’t go into too much detail about how this can be proven. Suffice it to say that it boils down to a clever reduction to a (well-documented) fundamental theorem. statistics Learning is essentially a non-strategic version of organizing. If you’re mathematically inclined and interested in the basics of proofs, you can find it in Appendix B of the paper.
This theorem essentially completes the generalization of classical PAC learning to strategic classification settings. This shows that the way we define SVC doesn’t really make sense in our heads. It actually acts as a generalization of the most important VCdim. Armed with the lemma, we are well-equipped to analyze strategic classification problems just as well as traditional binary classification problems. In my opinion, having the ability to determine whether a strategic problem is theoretically learnable is truly amazing!