Module 06: Neural Networks with scikit-learn

Listen to this article

Linear models are powerful because they are simple, stable, and interpretable. But some patterns do not want to be straight lines.

Imagine Riverbend Roasters begins selling seasonal drinks. Demand no longer changes smoothly with temperature. Cold brew rises with heat, but only after a certain point. Pumpkin drinks spike in autumn. Rain hurts walk-in traffic on weekdays but sometimes helps weekend cafe seating. The relationship between features and outcomes bends, interacts, and changes direction.

This is where neural networks enter the story. A neural network is a model that learns layers of transformations. Instead of asking one linear equation to solve the whole problem, a neural network stacks many simple computations so the model can represent more flexible patterns.

The companion script is: ML-Blog/module_06_neural_networks.py at main · aduwillie/ML-Blog

It creates a synthetic customer-response dataset and trains scikit-learn multilayer perceptrons for classification and regression.

Standalone orientation

You can read this neural-network article without first reading the linear-model modules. The minimum idea is that a model receives input features X and learns to predict an answer y. Neural networks are one flexible kind of model for learning that relationship.

If you are reading the whole series, this module extends the same scikit-learn workflow into layered models. If you are reading it alone, focus on three ideas: hidden layers learn intermediate representations, activation functions let the model learn nonlinear patterns, and scaling helps the training process behave reliably.

How to read the examples: `X`, `y`, layers, and MLP components

This module has two examples. In the classification example, X contains two numeric signal columns created by make_moons. The target y contains two class labels. The data is intentionally curved so a simple straight-line model struggles, while a neural network can learn a nonlinear boundary.

In the regression example, X contains several generated numeric features. y is a numeric target with both linear and nonlinear structure. The MLP regressor tries to learn a flexible function from those features to that number.

The important components are:

Component	Meaning
`StandardScaler`	Scales features so neural-network optimization behaves more reliably.
`MLPClassifier`	A multilayer perceptron for class labels.
`MLPRegressor`	A multilayer perceptron for numeric targets.
`hidden_layer_sizes`	The number of hidden layers and units in each layer.
`activation`	The nonlinear function that lets the network learn curved relationships.
`alpha`	L2 regularization strength, used to discourage overly large weights.
`max_iter`	The maximum number of training iterations.

When the script calls fit(X_train, y_train), the network adjusts its internal weights. Those weights are the learned parameters. The hidden layers are not columns from the original dataframe; they are learned intermediate representations created inside the model.

From one line to many learned transformations

A linear model combines features directly:

prediction = weighted sum of features

A neural network inserts hidden layers between the input and the output:

features -> hidden layer -> hidden layer -> prediction

Each hidden layer creates intermediate representations. These representations are not hand-designed columns like day_of_week or rain_inches. They are learned transformations. One hidden unit might respond to “warm weekend with high event activity.” Another might respond to “high service score and low delay.” We usually do not name these internal signals, but they help the network model nonlinear relationships.

In scikit-learn, neural networks are available through MLPClassifier and MLPRegressor. MLP means multilayer perceptron.

			
from sklearn.neural_network import MLPClassifier
model = MLPClassifier(
    hidden_layer_sizes=(32, 16),
    activation="relu",
    max_iter=500,
    random_state=42,
)

		

The tuple (32, 16) means the network has two hidden layers: the first with 32 units and the second with 16 units.

Activation functions: adding bends to the story

If a neural network only stacked linear transformations, the entire stack would collapse into another linear transformation. Activation functions prevent that. They introduce nonlinearity.

The most common activation in modern beginner-friendly examples is ReLU:

ReLU(x) = max(0, x)

ReLU keeps positive values and turns negative values into zero. That simple bend allows networks to approximate much richer functions than a single line or plane.

In scikit-learn:

MLPClassifier(activation="relu")

Activation functions are part of the model’s expressive power. They help neural networks represent thresholds, interactions, and curved relationships.

Scaling is not optional

Neural networks are sensitive to feature scale. If one feature ranges from 0 to 10000 and another ranges from 1 to 5, training can become slow or unstable. Standardization is usually required.

That is why the companion script wraps the MLP inside a pipeline:

			
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
model = Pipeline(steps=[
    ("scaler", StandardScaler()),
    ("mlp", MLPClassifier(hidden_layer_sizes=(32, 16), max_iter=500)),
])

		

This reinforces an important machine learning principle: the model is not just the final estimator. It is the full sequence of transformations and estimation.

Classification and regression with MLPs

For classification, the neural network outputs class probabilities or class labels. For regression, it outputs numeric values.

			
from sklearn.neural_network import MLPRegressor
regressor = MLPRegressor(
    hidden_layer_sizes=(64, 32),
    activation="relu",
    max_iter=700,
    random_state=42,
)

		

The same concept appears in both cases. The network learns internal representations that help map inputs to the target.

The beginner lesson is that neural networks are flexible function approximators. The expert lesson is that flexibility increases responsibility. Neural networks can overfit, require careful preprocessing, depend on hyperparameters, and may be harder to interpret than simpler models.

When should you reach for a neural network?

For tabular data, start with strong baselines: linear models, regularized models, tree-based methods when available, and simple classifiers. A neural network is useful when relationships are nonlinear, interactions are complex, and you have enough data to support the flexibility.

Neural networks are especially important for images, audio, natural language, and representation learning. scikit-learn’s MLPs are not the full deep-learning ecosystem, but they are excellent for learning the concepts without leaving the scikit-learn workflow.

What to notice when running the sample

The classification demo uses a curved two-dimensional dataset. This makes the purpose of hidden layers visible. A linear model would try to separate the classes with a straight boundary. The MLP can bend the boundary because hidden layers and activation functions create nonlinear transformations.

The regression demo adds nonlinear structure to generated numeric data. The MLP regressor is asked to learn a target that is not purely linear. This does not mean neural networks always beat simpler models; it means they have the capacity to learn patterns that simple linear forms cannot represent directly.

Notice the repeated use of StandardScaler. Neural networks are trained through iterative optimization, and optimization behaves better when feature scales are comparable. If you remove scaling, the model may train more slowly, converge poorly, or produce unstable results.

Common neural-network traps

The first trap is assuming that a neural network is automatically better because it is more flexible. Flexibility can help when there is enough signal and data, but it can also memorize noise. The second trap is ignoring convergence warnings. A model that has not converged may need more iterations, better scaling, different learning settings, or a simpler architecture.

The third trap is skipping baselines. Before using an MLP on tabular data, compare it to regularized linear models and other simpler approaches. A simpler model that performs similarly is often easier to explain, tune, and maintain.

The module in one journey

Module 6 extends the idea of a model from one direct mapping to a layered mapping. Hidden layers learn intermediate representations. Activation functions introduce nonlinear behavior. Scaling supports stable training. MLP classifiers and regressors bring neural-network thinking into the same fit/predict pattern we have used all along.

Run the sample:

python module_06_neural_networks.py

Module 06: Neural Networks with scikit-learn

Standalone orientation

How to read the examples: X, y, layers, and MLP components

From one line to many learned transformations

Activation functions: adding bends to the story

Scaling is not optional

Classification and regression with MLPs

When should you reach for a neural network?

What to notice when running the sample

Common neural-network traps

The module in one journey

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from aduwillie.com

How to read the examples: `X`, `y`, layers, and MLP components