Although is has many forms, text wrangling is basically the pre-processing work that’s done to prepare raw text data ready for training. Simply put, it’s the process of cleaning your data to make it readable by your program, and then formatting it as such.
Many of you may be wrangling text without knowing it yourself. In this tutorial, I will teach you how to clean up your text in Python. I will show you to perform the most common forms of text wrangling: sentence splitting, tokenization, stemming, lemmatization, and stop word removal.
Continue reading “Introduction to Text Wrangling Techniques for Natural Language Processing”