About This Project

This web application predicts molecular toxicity for the Tox21 benchmark using trained graph neural networks. It combines a Flask inference backend, RDKit chemistry tooling, PyTorch Geometric models, and a modern interactive interface for single-molecule and batch analysis.

Technology Stack

Python
Flask
PyTorch Geometric
PyTorch
RDKit
3Dmol.js
HTML5
CSS3 + Bootstrap
NumPy + scikit-learn
Pandas

Website Capabilities

The website supports practical end-to-end toxicity analysis workflows:

1. Single SMILES inference with 2D molecule rendering and interactive 3D structure.

2. Batch CSV inference for multiple compounds (requires SMILES column).

3. Model selection between GINE, GCN, and GATv2, including all-model comparison mode.

4. Endpoint-level toxicity output for all 12 Tox21 assays with probability scores and labels.

5. CSV export of latest prediction results from backend.

Dataset And Split Strategy

Training notebook summary (start-to-end):

1. Dataset: MoleculeNet Tox21 (~7,831 molecules, 12 toxicity tasks).

2. Split: Scaffold-aware 80/10/10 split using Morgan fingerprint key (radius=2, 2048 bits).

3. Features: Node features normalized globally; edge features normalized when available.

4. Stability controls: feature clamping to [-10, 10], NaN/inf replacement, gradient clipping.

Tox21 12 tasks Scaffold split Train/Val/Test 80/10/10

Where This Project Can Be Used

  • Early-stage toxicity screening in drug discovery pipelines before expensive wet-lab assays.
  • Academic and industry research for structure-toxicity relationship exploration on new compounds.
  • Educational demonstrations for graph neural networks in cheminformatics and computational toxicology.
  • Pre-filtering of chemical libraries to prioritize safer candidates for synthesis and testing.
  • Model comparison studies (GINE vs GCN vs GATv2) under scaffold-aware generalization settings.

System Overview

1. Frontend collects SMILES / CSV and model choice.

2. Flask backend converts molecules to graph data.

3. Loaded checkpoints generate 12-endpoint toxicity probabilities.

4. Results are rendered with 2D/3D molecular visualization and downloadable CSV.

Model Architectures And Training

Model Graph Layers Pooling Head Dropout Optimizer / LR Schedulers
GINE 4x GINEConv + BatchNorm Mean + Sum pool -> MLP 0.25 Adam, 5e-4 CosineWarmRestarts + ReduceLROnPlateau
GCN 4x GCNConv + BatchNorm Mean + Sum pool -> MLP 0.25 Adam, 5e-4 CosineWarmRestarts + ReduceLROnPlateau
GATv2 4x GATv2Conv (4 heads) + BatchNorm Mean + Sum pool -> MLP 0.30 Adam, 3e-4 CosineWarmRestarts + ReduceLROnPlateau

Reported Notebook Results

From the attached training notebook summary:

Experiment Macro ROC-AUC Status
GCN (scaffold split) 0.8300 Best in notebook run
GINE (scaffold split) 0.8076 Strong baseline
GATv2 (scaffold split) 0.7982 Competitive
Random split reference 0.7928 Notebook comparison value

Production Inference Stack

Backend endpoint behavior in this website:

1. Loads pretrained checkpoints for GINE, GCN, and GATv2 at startup.

2. Rebuilds graph features from SMILES with RDKit + PyG Data objects.

3. Applies training-aligned normalization and sigmoid-based endpoint probabilities.

4. Generates 2D image and 3D SDF for interactive molecular visualization.

5. Stores latest prediction rows for backend CSV export.

Team Members

Team members who contributed to this project.

Saptarshi Ghosh
Saptarshi Ghosh
Team Lead & Research Coordinator
Lead project direction, coordinated research milestones, and aligned model experimentation with final deployment goals.
Sumit Chaira
Sumit Chaira
UI/Deployment & Visualization Developer
Built and refined the web interface, deployment flow, and 2D/3D molecular visualization experience for end users.
Mangaldip Dhua
Mangaldip Dhua
Data Engineer & Preprocessing Specialist
Handled Tox21 data preparation, scaffold split strategy, normalization steps, and preprocessing robustness checks.
Uday Shankar Dey
Uday Shankar Dey
GNN Model Developer
Developed and trained GINE, GCN, and GATv2 model architectures for multi-task toxicity prediction.
Arnab Subhra Ghosh
Arnab Subhra Ghosh
Model Evaluation & Optimization Engineer
Evaluated task-wise and macro ROC-AUC metrics, compared split/model performance, and tuned optimization settings.