Thursday, February 12, 2009

Yahoo BOSS exposes structured data in RDF

This could be a big step toward the "web of data" vision of the Semantic Web.

Yahoo announced (Accessing Structured Data using BOSS that their BOSS (Build your Own Search System) will now support structured data, including RDF.
"Yahoo! Search BOSS provides access to structured data acquired through SearchMonkey. Currently, we are only exposing data that has been semantically marked up and subsequently acquired by the Yahoo! Web Crawler. In the near future, we will also expose structured data shared with us in SearchMonkey data feeds. In both cases, we will respect site owner requests to opt-out of structured data sharing through BOSS."
Yahoo\'s BOSS to support RDF data
Here's how it works:
  • Sites use microformats or RDF (encoded using RDFa or eRDF) to add structured data to their pages
  • Yahoo's web crawler encounters embedded markup and indexes the structured data along with the unstructured text
  • A BOSS developer specifies "view=searchmonkey_rdf" or "view=searchmonkey_feed" in API requests
  • BOSS's response returns the structured data via either XML or JSON
Yahoo's SearchMonkey only acquires structured data using certain microformats or RDF vocabularies. The microformats supported are hAtom, hCalendar, hCard, hReview, XFN, Geo, rel-tag and adr. RDF vocabularies handled include Dublin Core, FOAF, SIOC, and "other supported vocabularies". See the appendix on vocabularies in Yahoo's SearchMonkey Guide for a full list and more information.

A post on the Yahoo search blog talks about this and other changes to the BOSS service and includes a nice example of the use of structured data encoded using microformats from President Obama’s LinkedIn page.

microformatted data on President Obama\'s linked in page

No comments: