Skip to content

EmirXK/TR-PDF-PageClassifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

TR-PDF-PageClassifier

A Deep Learning Model Capable of Identifying Low-Value Pages on Professional Theses written in the Turkish Language in PDF Format

Low-Value pages are defined as follows: Table of Contents, Table of Figures, Tables of Tables, References, and Appendices.

The model detects these pages and outputs a list that contains the page number of every page it has labeled as a Low-Value page.

Features

  • Can read from CSV, PDF, and URL's
  • High Accuracy Values (~98%)
  • Fast Algorithm
  • Usable with any type of text in the Turkish Language.

Usage

Run Last_version.ipynb

Acknowledgements

Authors

License

MIT

About

A Deep Learning Model Capable of Identifying Low-Value Pages on Professional Theses written in the Turkish Language in PDF Format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors