Skip to main content

Table 1 Adjusted Rand index for different clusterings, varying the number of input attributes considered (best-performing clusterings italicized)

From: Optimized combined-clustering methods for finding replicated criminal websites

Scam websites

Dynamic cut height

Optimized cut height

 

Test

Train

Test

Train

Fake escrow services

Sentences

0.107

0.289

0.982

0.924

DOM tags

0.678

0.648

0.979

0.919

File names

0.094

0.235

0.972

0.869

Images

0.068

0.206

0.325

0.314

S and D

0.942

0.584

0.982

0.925

S and F

0.120

0.245

0.980

0.895

S and I

0.072

0.257

0.962

0.564

D and F

0.558

0.561

0.979

0.892

D and I

0.652

0.614

0.599

0.385

F and I

0.100

0.224

0.518

0.510

S and D and F

0.913

0.561

0.980

0.895

S and D and I

0.883

0.536

0.971

0.673

S and F and I

0.100

0.214

0.975

0.892

D and F and I

0.642

0.536

0.831

0.772

S and D and F and I

0.941

0.536

0.971

0.683

High-yield investment programs

Sentences

0.713

0.650

0.738

0.867

DOM tags

0.381

0.399

0.512

0.580

File names

0.261

0.299

0.254

0.337

Images

0.289

0.354

0.434

0.471

S and D

0.393

0.369

0.600

0.671

S and F

0.291

0.310

0.266

0.344

S and I

0.290

0.362

0.437

0.471

D and F

0.309

0.358

0.314

0.326

D and I

0.302

0.340

0.456

0.510

F and I

0.296

0.289

0.397

0.336

S and D and F

0.333

0.362

0.319

0.326

S and D and I

0.319

0.350

0.459

0.510

S and F and I

0.303

0.289

0.398

0.336

D and F and I

0.320

0.337

0.404

0.405

S and D and F and I

0.320

0.337

0.404

0.405