URLs
URL matching is a complicated task—or rather, it can be complicated depending on how flexible the matching needs to be. At a minimum, URL matching should match the protocol (probably http and https), a hostname, an optional port, and a path.
http://www.forta.com/blog https://http://www.forta.com:80/blog/index.cfm http://www.forta.com http://ben:password@http://www.forta.com/ http://localhost/index.php?ab=1&c=2 http://localhost:8500/
https?://[-\ w.]+(:\ d+)?(/([\ w/_.]*)?)?
http://www.forta.com/blog https://http://www.forta.com:80/blog/index.cfm http://www.forta.com http://ben:password@http://www.forta.com/ http://localhost/index.php?ab=1&c=2 http://localhost:8500/
https?:// matches http:// or https:// (the ? makes the s optional). [-\ w.]+ matches the hostname. (:\ d+)? matches an optional port (as seen in the second and sixth lines in the example). (/([\ w/_.]*)?)? matches the path, the outer subexpression matches / if one exists, and the inner subexpression matches the path itself. As you can see, this pattern cannot handle query strings, and it misreads embedded username:password pairs. However, for most URLs it will work adequately (matching hostnames, ports, and paths).