This post builds on top of this article from Towards Data Science: https://towardsdatascience.com/explaining-feature-importance-by-example-of-a-random-forest-d9166011959e. Here I talk about the column drop method, which evaluates the performance of the classifier with one feature removed. For some reason, the article above computes the benchmark score with all features on the same training dataset, and it does not use cross-validation to compute the margin of error.
I have adapted the code from the article to compute all the scores with cross-validation and then draw a boxplot with errors:
In the end you should see a graph similar to the following: