Posted on October 8, 2009, 5:28 pm, by Michael Zhang, under
Uncategorized.
Suppose you’re automatically parsing a webpage, and you come across the following kind of thing:
blah blah
some starting text
some useful content
some ending text
blah blah
We want to parse out the useful content from among the non-useful stuff, and we know there’s some starting text and some ending text that wraps the useful content.
A better example:
I like chicken
<div [...]
Posted on August 5, 2009, 3:11 pm, by Michael Zhang, under
Uncategorized.
I was parsing an RSS file in Ruby on Rails today, and found that the RSS::Parser.parse seems to throw away the actual content of the RSS items, leaving only the description.
require 'rss'
require 'open-uri'
source = "http://www.website.com/rss.xml"
content = ""
open(source) do |s| content = s.read end
rss = RSS::Parser.parse(content, true)
rss.items.each do |item|
puts item.description
end
This didn’t do me [...]
Posted on July 1, 2009, 12:25 pm, by Michael Zhang, under
Uncategorized.
Here’s a little code snippet that allows you to grab the Title tag if you have a URL in php:
$url = "http://www.folksonomy.org";
$page = file($url);
$page = implode("",$file);
if(preg_match("/<title>(.+)<\/title>/i",$page,$t))
print "$url has the title: $t";
else
print "No title was found";