Aug
31

Html Agility Pack

Posted In: HTML, LINQ, XML by DotNut

Html Agility Pack (HAP) is an open-source, agile, HTML parser that builds a read/write DOM and supports plain XPATH or XSLT.  This .NET code library enables you to parse “out of the web” HTML files.  The parser is very tolerant with “real world” malformed HTML.  The object model is very similar to the model found in System.Xml, but for HTML documents (or streams).

Html Agility Pack now supports Linq-to-Objects (via a LINQ-to-Xml interface).  Sample applications include:

  • Page fixing or generation.  You can fix a page the way you want, modify the DOM, add nodes, copy nodes, etc.
  • Web scanners.  You can easily get to img/src or a/hrefs with XPATH queries.
  • Web scrapers.  You can easily scrape any existing web page into an RSS feed for example, with just an XSLT file serving as the binding.

Html Agility Pack on CodePlex

Leave Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.

Original content Copyright © 2008-2010 Tiwebb Ltd. All rights reserved.
Wordpress Theme designed by DT Website Templates