PDC Spring 2026 — Semester Project

Parallel Detection
of Malicious Activity

MPI-powered log analysis system detecting Backdoors, DoS & Reconnaissance attacks across 2.5M+ network records using distributed parallel computing.

257,673
Records Analyzed
32,669
Attacks Detected
6
MPI Processes
OK
Checksum Verified
Live Session active
Fetching sessions...
Memory
CPU Load
Disk Usage
Question 1

Parallel Malicious Activity Detection

Q1 DONE Complete
Detect Backdoor, DoS & Reconnaissance
Scan the UNSW-NB15 training set (82,332 records) for Backdoor, DoS, and Reconnaissance attack patterns. Rank 0 reads the full dataset and distributes chunks to each MPI process via point-to-point send/receive. Each process scans its chunk independently, then MPI_Reduce aggregates the global totals to rank 0.
MPI_Bcast MPI_Send MPI_Recv MPI_Reduce MPI_Barrier
→ Assigned to Lubna
/// Results — training-set.csv
82,332 records
583
Backdoor
4,089
DoS
3,496
Reconnaissance

8,168 total malicious records — 9.9% of training set.

Rank 1 shows 0 detections — rows 20k–41k are entirely Normal/Generic in the dataset's natural ordering. Global totals verified against Python.

Question 2

Parallel Statistical Analysis

Q2 DONE Complete
Distributed Attack Detection & IP Cross-Checking
Each of the 4 MPI processes reads its own UNSW-NB15 CSV file and tracks suspicious IPs locally. MPI_Scatter distributes work counts, MPI_Reduce aggregates global statistics, MPI_Allreduce shares attack flags, MPI_Gatherv collects per-process IP lists, and MPI_Bcast broadcasts the final deduplicated suspicious-IP list to all processes.
MPI_Scatter MPI_Reduce MPI_Allreduce MPI_Gather MPI_Gatherv MPI_Bcast
→ Assigned to Insharah
/// Run Results (np=4, UNSW-NB15)
ATTACK DETECTED
33
Unique Suspicious IPs
116,922
Failed Logins
46,153
Port Scans
84,910
Connections
→ Validation: PASSED — all 4 processes handled distinct log segments
→ Verdict: Potential DDoS / Port-Scanning Attack Detected
→ Top flagged: 59.166.0.x subnet (~11,700 failed logins each), 175.45.176.x (high connections)
Question 3

Serial vs. Parallel.
Where does MPI actually help?

Q3 DONE Complete
Performance Analysis & Benchmarking
Benchmark serial vs. parallel execution on the full UNSW-NB15 combined dataset (257,673 records). Measure speedup, efficiency, and communication overhead as MPI process count scales from 1 to 6. Use MPI_Wtime at each phase to identify bottlenecks — specifically whether MPI_Scatter or computation dominates total time.
MPI_Scatter MPI_Gather MPI_Reduce MPI_Allreduce MPI_Bcast MPI_Wtime
→ Assigned to Haseeb
/// Key Finding

Communication-bound workload. MPI_Scatter distributes ~125MB (257K lines × 512 bytes) to each process, dominating total parallel time.

Computation scales linearly. Local compute drops from 50ms (np=1) to 8.9ms (np=6) — near-perfect computational parallelism.

Amdahl's Law in action. Serial bottleneck (file I/O + scatter) limits theoretical max speedup regardless of process count. Optimization path: MPI-IO parallel file reads.

• Backdoor
Critical
583
Unauthorized access attempts detected
• Denial of Service
High
4,089
DoS attack patterns identified
• Reconnaissance
Medium
3,496
Scanning & probing activities
/// Benchmark Results — Speedup vs Processes
82,332 records
Processes (np) T_serial (s) T_parallel (s) Speedup Efficiency Comm Overhead Compute Time
10.0169690.0444020.38x38.2%63.8%0.016030s
20.0333130.0685030.49x24.3%127.0%0.015799s
30.0204830.0628370.33x10.9%142.7%0.015415s
40.0332300.0446740.74x18.6%181.4%0.013329s
50.0368100.0442930.83x16.6%224.4%0.005903s
60.0345330.0446300.77x12.9%257.8%0.002977s
Execution Time Comparison
np=1
0.044s
np=2
0.069s
np=3
0.063s
np=4
0.045s
np=5
0.044s
np=6
0.045s
Serial baseline
Parallel execution
MPI Communication Breakdown (np=4)
Scatter
65.6ms
Reduce
4.0ms
Allreduce
11.3ms
Gather
 
Bcast
 
Compute
13.3ms
Communication
Computation
17%
Efficiency
np=5
0.83x
Best Speedup
np=5
224%
Comm
Overhead
Key Findings

Communication-bound workload: MPI_Scatter dominates (~70% of parallel time) because 82K lines × 512 bytes = ~40MB must be distributed to each process.

Computation scales well: Local compute time drops linearly from 16ms (np=1) → 3.0ms (np=6), showing near-perfect computational parallelism.

Amdahl's Law in action: The serial fraction (file I/O + scatter) limits theoretical max speedup regardless of process count.

Optimization path: Using MPI file I/O (MPI_File_read) instead of rank-0-reads-then-scatters would significantly reduce overhead.

The Dataset

UNSW-NB15

257,673
Training Records
82,332
Testing Records
49
Features per Record
10
Attack Categories
Attack Categories — Training Set
Normal 37,000
Generic 18,871
Exploits 11,132
Fuzzers 6,062
DoS 4,089
Reconnaissance 3,496
Analysis 677
Backdoor 583
Shellcode 378
Worms 44
Dataset Files
Attack Distribution
Combined Set
• Normal93,000
• Generic58,871
• Exploits44,525
• Fuzzers24,246
• DoS16,353
• Reconnaissance13,987
• Analysis2,677
• Backdoor2,329
• Shellcode1,511
• Worms174
Dataset Files
FileRowsSize
UNSW-NB15_1.csv700K162MB
UNSW-NB15_2.csv700K158MB
UNSW-NB15_3.csv700K148MB
UNSW-NB15_4.csv440K94MB
Training set82K15MB
Testing set175K31MB
VM Specs
IP      139.84.171.89
CPU     6 vCPU
RAM     16 GB
Disk    300 GB SSD
OS      Ubuntu 22.04
Region  Delhi
MPI     OpenMPI 4.1.2
The Team

Built by three.

Lubna
Lubna
Q1 — Parallel Detection
Parallel malicious activity detection using MPI_Send/Recv distribution and MPI_Reduce aggregation on 82,332 records.
DONE Complete
Insharah
Insharah
Q2 — Statistical Analysis
Parallel min/max/avg computation of network features using MPI_Scatter and MPI_Allreduce across all processes.
DONE Complete
Haseeb
Haseeb
Q3 — Performance Analysis
Serial vs. parallel benchmarking, speedup metrics, communication overhead analysis, dashboard and VM infrastructure.
DONE Complete
Live Terminal

Run it yourself.

haseeb@pdc-project:~
Live Execution
pdc@project $ echo "Ready. Click a command above to execute." Ready. Click a command above to execute. Commands execute on the live VM (139.84.171.89) Results appear here in real-time.