Skip to main content

Table 1 PDF structural path consolidation rules

From: Hidost: a static machine-learning-based detector of malicious files

 

Search regular expression

Substitute regular expression

1.

/Resources/(ExtGState|ColorSpace|Pattern|Shading|XObject|Font|Properties|Para)/[̂/]+

/Resources/ 1/Name

2.

̂Pages/(Kids/|Parent/)*(Kids$|Kids/|Parent/|Parent$)

Pages/

3.

/(Kids/|Parent/)*(Kids$|Kids/|Parent/|Parent$)

/

4.

(Prev/|Next/|First/|Last/)+

<empty string>

5.

̂Names/(Dests|AP|JavaScript|Pages|Templates|IDS|URLS|EmbeddedFiles|AlternatePresentations|Renditions)/(Kids/|Parent/)*Names

Names/ 1/Names

6.

̂StructTreeRoot/IDTree/(Kids/)*Names

StructTreeRoot/IDTree/Names

7.

̂(StructTreeRoot/ParentTree|PageLabels)/(Kids/|Parent/)+(Nums|Limits)

1/ 3

8.

̂StructTreeRoot/ParentTree/Nums/(K/|P/)+

StructTreeRoot/ParentTree/Nums/

9.

̂(StructTreeRoot|Outlines/SE)/(RoleMap|ClassMap)/[̂/]+

1/ 2/Name

10.

̂(StructTreeRoot|Outlines/SE)/(K/|P/)*

1/

11.

̂(Extensions|Dests)/[̂/]+

1/Name

12.

Font/([̂/]+)/CharProcs/[̂/]+

Font/ 1/CharProcs/Name

13.

̂(AcroForm/(Fields/|C0/)?DR/)(ExtGState|ColorSpace|Pattern|Shading|XObject|Font|Properties)/[̂/]+

1 3/Name

14.

/AP/(D|N)/[̂/]+

/AP/ 1/Name

15.

Threads/F/(V/|N/)*

Threads/F

16.

̂(StructTreeRoot|Outlines/SE)/Info/[̂/]+

1/Info/Name

17.

ColorSpace/([̂/]+)/Colorants/[̂/]+

ColorSpace/ 1/Colorants/Name

18.

ColorSpace/Colorants/[̂/]+

ColorSpace/Colorants/Name

19.

Collection/Schema/[̂/]+

Collection/Schema/Name