Web Content
The Windows Runtime makes it easy to download and process web content. To access web pages, you will use the HttpClient. The class is similar to the WebClient class that Silverlight developers may be familiar with. This class is used to send and receive basic requests over the HTTP protocol. It can be used to send any type of standard HTTP request including GET, PUT, POST, and DELETE. The client returns an instance of HttpResponseMessage with the status code and headers of the response. The Content property contains the actual contents of the web page that was retrieved if the operation was successful.
The BlogDataSource class contains a helper method that provides an instance of HttpClient. The method sets a buffer size to allow for large pages to be loaded and provides a user agent for the request to use. User agents are most often used to identify the browser making the web request. In the case of programmatic access, you can pass an agent that provides information about the application and expected compatibility. Passing an agent that is compatible with mobile devices may result in the web server returning a page that is optimized for mobile browsing.
The Windows Runtime makes it easy to fetch a page asynchronously and process the results. The following two lines of code fetch the client and retrieve the page:
var client = GetClient(); var page = await client.GetStringAsync(item.PageUri);
Images are not always embedded within the RSS feed, so the code retrieves the target page for the entry and then parses it for images. This is done using regular expressions. The syntax for a regular expression provides a concise way to match patterns in strings of text. This makes it ideal for parsing tokens like HTML tags out of the source document.
The first expression parses all image tags from the source for the web page:
public const string IMAGE_TAG = @"<(img)\b[^>]*>"; private static readonly Regex Tags = new Regex(IMAGE_TAG, RegexOptions.IgnoreCase | RegexOptions.Multiline); var matches = Tags.Matches(content);
Each tag is then parsed to pull the location of the image from the src attribute. This is used to construct an instance of an Uri that is added to the ImageUriList property of the blog post. This property is implemented as an ObservableCollection to provide notification when new images are added. A random image is displayed for each post. The image is hosted on the Internet, but Windows 8 will use a cached copy of the image when the user is offline if it has been downloaded previously.