An n-gram Based Approach to the Classification of Web Pages by Genre

From Anita Borg Institute Wiki

Jump to: navigation, search

Presenter: Jane E. Mason (Dalhousie University)

The extraordinary growth in both the size and popularity of the World Wide Web has created a growing interest not only in identifying Web page genres, but also in using these genres to classify Web pages. This thesis hypothesizes that an n-gram representation of a Web page can be used to automatically classify that Web page by genre and a new model for the classification of Web pages is presented.

Valerie Fenwick wrote the blog for this session.

Personal tools