About This Project

This web application predicts molecular toxicity for the Tox21 benchmark using trained graph neural networks. It combines a Flask inference backend, RDKit chemistry tooling, PyTorch Geometric models, and a modern interactive interface for single-molecule and batch analysis.

Technology Stack

Python

Flask

PyTorch Geometric

PyTorch

RDKit

3Dmol.js

HTML5

CSS3 + Bootstrap

NumPy + scikit-learn

Pandas

Website Capabilities

The website supports practical end-to-end toxicity analysis workflows:

1. Single SMILES inference with 2D molecule rendering and interactive 3D structure.

2. Batch CSV inference for multiple compounds (requires SMILES column).

3. Model selection between GINE, GCN, and GATv2, including all-model comparison mode.

4. Endpoint-level toxicity output for all 12 Tox21 assays with probability scores and labels.

5. CSV export of latest prediction results from backend.

Dataset And Split Strategy

Training notebook summary (start-to-end):

1. Dataset: MoleculeNet Tox21 (~7,831 molecules, 12 toxicity tasks).

2. Split: Scaffold-aware 80/10/10 split using Morgan fingerprint key (radius=2, 2048 bits).

3. Features: Node features normalized globally; edge features normalized when available.

4. Stability controls: feature clamping to [-10, 10], NaN/inf replacement, gradient clipping.

Tox21 12 tasks Scaffold split Train/Val/Test 80/10/10

Where This Project Can Be Used

Early-stage toxicity screening in drug discovery pipelines before expensive wet-lab assays.
Academic and industry research for structure-toxicity relationship exploration on new compounds.
Educational demonstrations for graph neural networks in cheminformatics and computational toxicology.
Pre-filtering of chemical libraries to prioritize safer candidates for synthesis and testing.
Model comparison studies (GINE vs GCN vs GATv2) under scaffold-aware generalization settings.

System Overview

1. Frontend collects SMILES / CSV and model choice.

2. Flask backend converts molecules to graph data.

3. Loaded checkpoints generate 12-endpoint toxicity probabilities.

4. Results are rendered with 2D/3D molecular visualization and downloadable CSV.

Model Architectures And Training

Model	Graph Layers	Pooling Head	Dropout	Optimizer / LR	Schedulers
GINE	4x GINEConv + BatchNorm	Mean + Sum pool -> MLP	0.25	Adam, 5e-4	CosineWarmRestarts + ReduceLROnPlateau
GCN	4x GCNConv + BatchNorm	Mean + Sum pool -> MLP	0.25	Adam, 5e-4	CosineWarmRestarts + ReduceLROnPlateau
GATv2	4x GATv2Conv (4 heads) + BatchNorm	Mean + Sum pool -> MLP	0.30	Adam, 3e-4	CosineWarmRestarts + ReduceLROnPlateau

Reported Notebook Results

From the attached training notebook summary:

Experiment	Macro ROC-AUC	Status
GCN (scaffold split)	0.8300	Best in notebook run
GINE (scaffold split)	0.8076	Strong baseline
GATv2 (scaffold split)	0.7982	Competitive
Random split reference	0.7928	Notebook comparison value

Production Inference Stack

Backend endpoint behavior in this website:

1. Loads pretrained checkpoints for GINE, GCN, and GATv2 at startup.

2. Rebuilds graph features from SMILES with RDKit + PyG Data objects.

3. Applies training-aligned normalization and sigmoid-based endpoint probabilities.

4. Generates 2D image and 3D SDF for interactive molecular visualization.

5. Stores latest prediction rows for backend CSV export.

Team Members

Team members who contributed to this project.

Saptarshi Ghosh

Team Lead & Research Coordinator

Lead project direction, coordinated research milestones, and aligned model experimentation with final deployment goals.

Sumit Chaira

UI/Deployment & Visualization Developer

Built and refined the web interface, deployment flow, and 2D/3D molecular visualization experience for end users.

Mangaldip Dhua

Data Engineer & Preprocessing Specialist

Handled Tox21 data preparation, scaffold split strategy, normalization steps, and preprocessing robustness checks.

Uday Shankar Dey

GNN Model Developer

Developed and trained GINE, GCN, and GATv2 model architectures for multi-task toxicity prediction.

Arnab Subhra Ghosh

Model Evaluation & Optimization Engineer

Evaluated task-wise and macro ROC-AUC metrics, compared split/model performance, and tuned optimization settings.