Master Natural Language Processing with Python. Learn text preprocessing, sentiment analysis, NER, and work with modern transformer models.
Natural Language Processing (NLP) is a branch of AI that helps computers understand, interpret, and manipulate human language. It combines computational linguistics with machine learning and deep learning.
# Common NLP Tasks:
- Text Classification (Sentiment Analysis)
- Named Entity Recognition (NER)
- Machine Translation
- Question Answering
- Text Summarization
- Chatbots and Conversational AIKey applications of NLP
Install essential NLP libraries including NLTK, spaCy, and transformers for working with text data.
# Install NLP libraries
pip install nltk spacy transformers
# Install spaCy language model
python -m spacy download en_core_web_sm
# Install additional tools
pip install textblob wordcloudInstall essential NLP libraries
import nltk
import spacy
from transformers import pipeline
# Download NLTK data
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
# Load spaCy model
nlp = spacy.load('en_core_web_sm')Import and initialize NLP libraries
Text preprocessing is crucial for NLP tasks. It involves cleaning and transforming raw text into a format suitable for analysis.
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
text = "Natural Language Processing is amazing! It's transforming AI."
# Tokenization
tokens = word_tokenize(text.lower())
print(f"Tokens: {tokens}")
# Remove stopwords
stop_words = set(stopwords.words('english'))
filtered_tokens = [w for w in tokens if w.isalnum() and w not in stop_words]
print(f"Filtered: {filtered_tokens}")
# Lemmatization
lemmatizer = WordNetLemmatizer()
lemmatized = [lemmatizer.lemmatize(w) for w in filtered_tokens]
print(f"Lemmatized: {lemmatized}")Complete text preprocessing pipeline
Sentiment analysis determines the emotional tone of text. It's widely used for analyzing customer reviews, social media, and feedback.
from textblob import TextBlob
# Analyze sentiment
text = "I love this product! It's absolutely fantastic."
blob = TextBlob(text)
# Get polarity (-1 to 1) and subjectivity (0 to 1)
sentiment = blob.sentiment
print(f"Polarity: {sentiment.polarity}")
print(f"Subjectivity: {sentiment.subjectivity}")
if sentiment.polarity > 0:
print("Positive sentiment")
elif sentiment.polarity < 0:
print("Negative sentiment")
else:
print("Neutral sentiment")Simple sentiment analysis with TextBlob
from transformers import pipeline
# Use pre-trained transformer model
sentiment_analyzer = pipeline('sentiment-analysis')
texts = [
"This movie was absolutely wonderful!",
"I hated the service at this restaurant.",
"The product is okay, nothing special."
]
results = sentiment_analyzer(texts)
for text, result in zip(texts, results):
print(f"Text: {text}")
print(f"Sentiment: {result['label']}, Score: {result['score']:.4f}\n")Advanced sentiment analysis with transformers
NER identifies and classifies named entities (people, organizations, locations, etc.) in text.
import spacy
# Load spaCy model
nlp = spacy.load('en_core_web_sm')
text = """
Apple Inc. was founded by Steve Jobs in Cupertino, California.
The company released the iPhone in 2007.
"""
# Process text
doc = nlp(text)
# Extract named entities
print("Named Entities:")
for ent in doc.ents:
print(f"{ent.text:20} {ent.label_:15} {spacy.explain(ent.label_)}")Extract named entities using spaCy
Text classification assigns predefined categories to text documents. Common applications include spam detection and topic categorization.
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# Sample data
texts = [
"Free money now! Click here!",
"Meeting scheduled for tomorrow",
"Win a free iPhone today!",
"Project deadline is next week",
"Congratulations! You won the lottery!"
]
labels = ['spam', 'ham', 'spam', 'ham', 'spam']
# Vectorize text
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(texts)
# Train classifier
clf = MultinomialNB()
clf.fit(X, labels)
# Predict
new_text = ["Important meeting reminder"]
X_new = vectorizer.transform(new_text)
prediction = clf.predict(X_new)
print(f"Prediction: {prediction[0]}")Text classification with TF-IDF and Naive Bayes
Word embeddings represent words as dense vectors that capture semantic meaning. Similar words have similar vector representations.
import spacy
# Load model with word vectors
nlp = spacy.load('en_core_web_md')
# Get word vectors
word1 = nlp("king")
word2 = nlp("queen")
word3 = nlp("car")
# Calculate similarity
similarity = word1.similarity(word2)
print(f"Similarity (king, queen): {similarity:.4f}")
similarity = word1.similarity(word3)
print(f"Similarity (king, car): {similarity:.4f}")Working with word embeddings in spaCy
Modern transformers like GPT can generate human-like text. The Hugging Face library makes it easy to use pre-trained models.
from transformers import pipeline
# Create text generation pipeline
generator = pipeline('text-generation', model='gpt2')
# Generate text
prompt = "Artificial intelligence is"
result = generator(
prompt,
max_length=50,
num_return_sequences=1,
temperature=0.7
)
print(result[0]['generated_text'])Generate text using GPT-2
Question answering systems can extract answers from context passages. This is useful for building chatbots and search systems.
from transformers import pipeline
# Create QA pipeline
qa_pipeline = pipeline('question-answering')
context = """
Python is a high-level programming language. It was created by
Guido van Rossum and first released in 1991. Python is known for
its simple syntax and readability.
"""
question = "Who created Python?"
result = qa_pipeline(question=question, context=context)
print(f"Question: {question}")
print(f"Answer: {result['answer']}")
print(f"Confidence: {result['score']:.4f}")Question answering with transformers
nltk - Text processing toolkitspacy - Industrial NLPtransformers - Pre-trained modelstextblob - Simple text processing