An n-gram Based Approach to the Classification of Web Pages by Genre
From Anita Borg Institute Wiki
Presenter: Jane E. Mason (Dalhousie University)
The extraordinary growth in both the size and popularity of the World Wide Web has created a growing interest not only in identifying Web page genres, but also in using these genres to classify Web pages. This thesis hypothesizes that an n-gram representation of a Web page can be used to automatically classify that Web page by genre and a new model for the classification of Web pages is presented.
Valerie Fenwick wrote the blog for this session.