Module 10: Deep Learning with Keras, Grounded in scikit-learn

Listen to this article

Deep learning often feels like a separate world, but the basic machine learning discipline remains the same: split data, preprocess inputs, train on known answers, evaluate honestly, and compare against baselines. Keras extends that workflow into deeper and more flexible neural networks.

Deep learning should not feel like abandoning the core machine learning workflow. It still needs the same habits. We still define a target. We still separate training and test data. We still compare to a baseline. We still evaluate with metrics that match the decision. The tools change, but the discipline stays.

The companion script is: ML-Blog/module_10_keras_bridge.py at main · aduwillie/ML-Blog

It creates a small synthetic product-review dataset. It first trains a scikit-learn text classifier baseline. Then, if TensorFlow/Keras is installed, it trains a compact Keras neural network on the same task.

Standalone orientation

You can read this article without reading the neural-network modules first. The minimum idea is that text must be converted into numbers before a model can learn from it. The article compares two ways to do that: a scikit-learn TF-IDF baseline and a Keras model that learns token embeddings.

If you are reading the whole series, this module bridges from scikit-learn’s consistent estimator workflow into Keras. If you are reading it alone, focus on continuity rather than novelty: X is still input data, y is still the label, training still adjusts weights, validation still protects against overfitting, and metrics still decide whether the model is useful.

How to read the examples: text `X`, label `y`, vectorizers, and layers

In this module, X is not a numeric dataframe. It is an array of product-review strings. Each item is a sentence-like review such as “great battery life and easy setup” or “poor battery life and confusing setup.”

y is the sentiment label for each review. A value of 1 means positive. A value of 0 means negative. This is a classification task because the model predicts a category.

The scikit-learn baseline uses two components:

Component	Role
`TfidfVectorizer`	Converts raw text in `X` into numeric features based on word and phrase importance.
`LogisticRegression`	Learns a classification boundary from those text features to `y`.

The Keras model uses a different set of components:

Component	Role
`TextVectorization`	Learns a vocabulary and converts strings to integer token sequences.
`Embedding`	Converts token IDs into dense learned vectors.
`GlobalAveragePooling1D`	Summarizes token embeddings into one review-level representation.
`Dense`	Learns nonlinear combinations of the representation.
`Dropout`	Randomly drops some activations during training to reduce overfitting.
final `Dense(1, sigmoid)`	Outputs a probability for the positive class.

The same X_train, X_test, y_train, and y_test structure remains. The difference is the representation. scikit-learn uses TF-IDF features. Keras learns token embeddings. Both are trying to map review text to sentiment labels.

Why Keras?

scikit-learn’s MLP models are useful for learning neural-network concepts, especially on tabular data. Keras is designed for deeper and more flexible neural-network architectures. It is commonly used for text, images, audio, sequence modeling, and custom deep-learning workflows.

Keras code often looks different from scikit-learn code, but the conceptual pieces are familiar:

			
features -> model -> prediction
loss -> optimization -> learned weights
validation -> generalization estimate
metrics -> decision evidence

The biggest difference is that Keras gives more control over layers, activations, embeddings, optimizers, callbacks, and training loops.

Text classification as a bridge example

Imagine an online shop wants to classify product reviews as positive or negative. The raw input is text:

			
"The battery lasted all weekend and the setup was easy."
"The zipper broke after two uses."

Text cannot go directly into most models. We need to convert words into numbers. In scikit-learn, a classic baseline uses TF-IDF:

			
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
baseline = Pipeline(steps=[
    ("tfidf", TfidfVectorizer(ngram_range=(1, 2), min_df=2)),
    ("classifier", LogisticRegression(max_iter=1000)),
])

		

This baseline is important. Before training a neural network, ask whether a simpler model already performs well. Many text classification problems have strong linear baselines.

A Keras model sees tokenized sequences

Keras commonly represents text as integer token sequences. A text vectorization layer learns a vocabulary and maps strings to fixed-length sequences:

			
from tensorflow import keras
from tensorflow.keras import layers
vectorizer = layers.TextVectorization(
    max_tokens=5000,
    output_sequence_length=80,
)

		

After vectorization, an embedding layer maps token IDs to dense vectors:

layers.Embedding(input_dim=5000, output_dim=32)

The model can then pool or process those embeddings and output a prediction:

			
model = keras.Sequential([
    vectorizer,
    layers.Embedding(input_dim=5000, output_dim=32),
    layers.GlobalAveragePooling1D(),
    layers.Dense(16, activation="relu"),
    layers.Dense(1, activation="sigmoid"),
])

		

The final sigmoid output estimates the probability of the positive class.

Training still needs validation

Keras training uses compile and fit:

			
model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=["accuracy"],
)
model.fit(
    X_train,
    y_train,
    validation_split=0.2,
    epochs=10,
    batch_size=32,
)

		

The loss guides optimization. The validation split gives feedback on generalization during training. For serious work, callbacks such as early stopping are common:

			
keras.callbacks.EarlyStopping(
    monitor="val_loss",
    patience=2,
    restore_best_weights=True,
)

		

This connects directly to Module 7. Deep learning is still optimization under uncertainty. Learning rate, regularization, architecture, batch size, epochs, and early stopping all matter.

RNNs and sequence models

Some text models process sequences in order. Recurrent neural networks, including LSTMs and GRUs, were designed for sequential data. They can model order-sensitive patterns such as negation:

"not good" is different from "good"

Modern deep-learning text systems often use transformers, but RNNs remain useful for learning sequence-modeling ideas. The important concept is that text has structure over time or position, and sequence models are designed to use that structure.

For this module’s sample, we use a compact embedding-plus-pooling model to keep dependencies and runtime reasonable. The conceptual bridge to RNNs is clear: once text is tokenized and embedded, different neural architectures can process the sequence.

What to notice when running the sample

The scikit-learn baseline comes first on purpose. TF-IDF plus logistic regression is a strong text classification baseline because it turns word and phrase patterns into sparse numeric features and learns a linear boundary. If that baseline works well, a neural network must justify its additional complexity.

The Keras model uses a different representation. TextVectorization learns a vocabulary from X_train. Embedding learns dense vectors for tokens. Pooling turns the sequence of token vectors into one review representation. Dense layers then learn the classification function. This is the deep-learning version of the same old workflow: represent inputs, learn from labels, evaluate on held-out data.

If TensorFlow is not installed, the script skips the Keras section and still runs the baseline. That design keeps the sample useful in lightweight environments while showing exactly what dependency is needed for the deep-learning portion.

Common deep-learning traps

The first trap is skipping the baseline. Deep learning is powerful, but a simple model may be cheaper, faster, easier to debug, and good enough. The second trap is adapting text preprocessing on all data before splitting. The vocabulary should be learned from training data, not from the full dataset.

The third trap is focusing only on accuracy. Text models can learn shortcuts from repeated phrases, product names, author patterns, or collection artifacts. A model that performs well on a synthetic or narrow dataset may fail when reviews become more varied. Evaluation should include realistic examples, error inspection, and monitoring after deployment.

The module in one journey

Keras expands the neural-network toolbox, but it does not replace the machine learning workflow. Start with a baseline. Split data carefully. Vectorize text consistently. Train with validation. Use callbacks to control overfitting. Evaluate on held-out data. Choose metrics that match the decision.

Run the sample:

python module_10_keras_bridge.py

If TensorFlow is not installed, the script will still run the scikit-learn baseline and print an explicit message explaining how to enable the Keras portion.

Module 10: Deep Learning with Keras, Grounded in scikit-learn

Standalone orientation

How to read the examples: text X, label y, vectorizers, and layers

Why Keras?

Text classification as a bridge example

A Keras model sees tokenized sequences

Training still needs validation

RNNs and sequence models

What to notice when running the sample

Common deep-learning traps

The module in one journey

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from aduwillie.com

How to read the examples: text `X`, label `y`, vectorizers, and layers