Every data scientist should have SVM in their toolbox. Learn how to master this versatile model with a hands-on introduction.
SVM is a powerful and versatile algorithm, which, at its core, can delineate optimal hyperplanes in a high-dimensional space, effectively segregating the different classes of a dataset. But it doesn’t stop here! Its effectiveness is not limited to classification tasks: SVM is well-suited even for regression and outlier detection tasks.
One feature makes the SVM approach particularly effective. Instead of processing the entire dataset, as KNN does, SVM strategically focuses only on the subset of data points located near the decision boundaries. These points are called support vectors, and the mathematics behind this unique idea will be simply explained in the upcoming sections.
By doing so, Support Vector Machine is computationally conservative and ideal for tasks involving medium or even medium-large datasets.
At its core, SVM classification resembles the elegant simplicity of Linear Algebra. Imagine a dataset in two-dimensional space, with two distinct classes to be separated. Linear SVM tries to separate the two classes with the best possible straight line.
What does it mean “best” in this context? SVM searches for the optimal separation line: a line that not only separates the classes, but does it with the maximum possible distance from the closest training instances of each classes. That distance is called margin. The data points that lay on the margin edge are…