5 useful Python packages from Kaggle’s kernels you didn’t know existed (Part 2)

By Piotr Gabrys

Kaggle’s kernels are great — they offer computing power for everybody to be able to take part in Data Science competitions. Do you use them? If yes, are you sure that you employ their whole potential?

In this Part 2 article, I will show you 5 interesting but not very known Python packages that are available with Kaggle’s kernels (as for Sep 11th, 2018).

You may also be interested in my first article about my Python packages available here.

Python module (C extension and plain python) implementing Aho-Corasick algorithm by Wojciech Muła.

Category: Natural Language Processing

Why bother? Fast multi-pattern string search.

Cools stuff example: I’ve searched all the occurrences of US cities names in scikit-learn 20newsgroups articles:

The Aho-Corasick algorithm performed the searched over 500x faster than the naive implementation.

NOTE: The Automation trie has to be computed. For a long list, it takes some time. The good news is that it can be pickled and saved for future utilization.

repo: https://github.com/WojciechMula/pyahocorasick

docs: https://pyahocorasick.readthedocs.io/en/latest/

A collection of fancy functional tools focused on practicality by Sour.

Category: Standard Library Enhancements

Why bother? Big collection of useful functions which save time and add clarity to your code.

Cools stuff example: Dictionaries merging in one line of code:

And pairs generator:

There are plenty of functions like these above. I strongly recommend you checking the docs.

repo: https://github.com/Suor/funcy

docs: https://funcy.readthedocs.io/en/stable/

A variety of matrix completion and imputation algorithms implemented in Python by iskandr.

Category: Data Preparation

Why bother? Easy access to 8 imputation algorithms.

Cools stuff example: Imputation with MICE (Multiple Imputation by Chained Equations):

repo: https://github.com/iskandr/fancyimpute

Missing data visualization module for Python by ResidentMario.

Category: Data Exploration

Why bother? Fast missing values visualization.

Cools stuff example: Quick data completeness overview:

Everything should be made as simple as possible, but not simpler. Albert Einstein

repo: https://github.com/ResidentMario/missingno

Have you already used the packages? Are you planning to? What packages should be covered in Part 3? Let me know in the comments or just clap if you liked the article!


Page 2

Kaggle’s kernels are great — they offer computing power for everybody to be able to take part in Data Science competitions. Do you use them? If yes, are you sure that you employ their whole potential?

In this article, I will show you 5 interesting but not very known Python packages that are available with Kaggle’s kernels (as for Aug 21st, 2018). Let’s start the show!

Port of Google’s language-detection library to Python by Mimino666

Category: Natural Language Processing

Why bother? Simple and fast classification of text’s language.

Cool stuff example: Language detection with probability:

repo: https://github.com/Mimino666/langdetect

Bottleneck is a collection of fast NumPy array functions written in C by kwgoodman.

Category: Data Manipulation

Why bother? You can simply use Bottleneck functions instead of NumPy to obtain faster computation.

Cool stuff example: I cannot find anything cooler than the author did. You just

and get all the speed comparisons (above is just partial output).

repo: https://github.com/kwgoodman/bottleneck

docs: https://kwgoodman.github.io/bottleneck-doc/

ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions by TeamHG-Memex.

Category: Visualization

Why bother? The package helps you to visualize the way your model works. It may help you debug your models more efficiently.

Cool stuff example: Visualization of a programming language classifier inference on one code sample.

repo: https://github.com/TeamHG-Memex/eli5

docs: https://eli5.readthedocs.io/en/latest/

This is the end of the list. Do you know any other packages that are useful but barely anybody knows them? Let me know in the comment section!

EDIT: There is Part 2 of the article here.