Vampire Bots: Quick Overview
As mentioned earlier, a vampire bot is a program that "drains" content from other web sites. Of course, we're speaking figuratively when we say drains, as no actual information is lost from the source site; rather, the vampire bot stores the source information on the destination computer. The program typically takes as input the address of a web page where the bot should start draining content and a folder where the bot should store the content.
Suppose you know of a web page with public domain images of planets; for example, at http://www.professorf.com/planets.html (see Figure 1).
Figure 1 The sample web page that our vampire bot will drain.
Now suppose that you want to store those images onto a folder on your laptop named d:\botpics\ and that you have a vampire bot named vbot. Before executing vbot, the botpics folder is empty (see Figure 2).
To run the simple vampire bot, you open a DOS window, execute vbot, and enter the web page address and folder location (see Figure 3).
Figure 2 Folder before executing the vampire bot.
Figure 3 Executing the vampire bot.
After the vampire bot finishes executing, it fills the folder with the images found on the web page, as shown in Figure 4.
Figure 4 Folder after executing the vampire bot.
Cutting right to the chase, Listing 1 shows the code for the simple vampire bot.
Listing 1Vampire Bot Archetype
using System; using System.Net; using System.IO; using System.Collections; class vampirebot { string base_url, folder; vampirebot(string url, string dir) { int slash_loc; slash_loc = url.LastIndexOf("/"); base_url = url.Substring(0, slash_loc+1); folder = dir; } public string URLtoRawHTML(string URL) { WebRequest req; WebResponse res; Stream str; string RawHTML; int ch; req = WebRequest.Create(URL); res = req.GetResponse(); str = res.GetResponseStream(); RawHTML = ""; while ((ch=str.ReadByte())!=-1) RawHTML=RawHTML+Convert.ToChar(ch); str.Close(); res.Close(); return RawHTML; } public ArrayList RawHTMLtoImageList(string raw_html) { string patt, spat, epat; int ploc, sloc, eloc; string file; ArrayList list; patt=".gif"; spat="\"" ; epat="\"" ; list = new ArrayList(); ploc=raw_html.IndexOf (patt, 0); while (ploc>=0) { sloc=raw_html.LastIndexOf(spat, ploc)+1; eloc=raw_html.IndexOf (epat, sloc)-1; file=raw_html.Substring (sloc, eloc-sloc+1); ploc=raw_html.IndexOf (patt, eloc); list.Add(file); } return list; } public void ImageListtoFiles(ArrayList file_list) { int i; string filename; FileStream fs; WebRequest req; WebResponse res; Stream str; int ch; for (i=0; i < file_list.Count; i++) { filename=Convert.ToString(file_list[i]); filename=filename.Replace("/", "_"); filename= folder+"/"+filename; fs=new FileStream(filename, FileMode.Create); req = WebRequest.Create(base_url+file_list[i]); res = req.GetResponse(); str = res.GetResponseStream(); while ((ch=str.ReadByte())!=-1) fs.WriteByte(Convert.ToByte(ch)); str.Close(); res.Close(); fs.Close(); } } public static void Main() { string url, dir; vampirebot vbot; string rawHTML; ArrayList alist; Console.Write("Enter starting URL: "); url=Console.ReadLine(); Console.Write("Destination folder? "); dir=Console.ReadLine(); vbot = new vampirebot(url,dir); rawHTML = vbot.URLtoRawHTML(url); alist = vbot.RawHTMLtoImageList(rawHTML); vbot.ImageListtoFiles(alist); } }
Remember that this code is just an archetype; you have to modify it for your specific task, which requires an understanding of how the code works. The next section examines the design of a vampire bot from a code-improvisation perspective. You'll find that the code contains many useful motifsfor string manipulations, file handling, and web streaming, to name a fewthat you can use in a wide variety of other programs.