Monday, November 15, 2010

Reading Notes- Week 11 (reposted)

Reading Notes- Week 11- Im reposting the notes so my blog is somewhat in order.

1) David Hawking , Web Search Engines: Part 1 and Part 2 IEEE Computer, June 2006.

I found this article on search engines informative. I hadn’t really thought about the vast amount of space a search engine uses to be efficient. I also found it interesting that there were many different aspects of the search engine in order to make it work. For example, a politeness delay is used to prevent a crawling server from having too many requests at a time. When the article discussed duplicates, it mentioned that “sophisticated” methods were needed in some case. I wonder if these methods are not employed or just have yet to be available because of the many duplicates that can be found in a typical search. The second part of the article was more confusing to me. I didn’t completely understand how the search engine knows which documents to skip, and how it numbers different documents.

2) Shreeves, S. L., Habing, T. O., Hagedorn, K., & Young, J. A. (2005). Current developments and future trends for the OAI protocol for metadata harvesting. Library Trends, 53(4), 576-589.

This article discussed the Open Archives Initiative. This initiative works towards creating metadata standards to be used universally. The creators of the initiative had hoped that people would use the standards as well as implement others along with them. Open Language Archives Community is one such community that has extended the standards they use beyond OAI. The article discusses current developments, issues, and future developments for the OAI community. The issue of metadata formats made a lot of sense, as more formats means that there is no standard.

3) MICHAEL K. BERGMAN,  “The Deep Web: Surfacing Hidden Value”

The deep web consists of all the web pages that can not be accessed by “traditional” search engines. I was surprised by the statistic: “Eighty-five percent of Web users use search engines to find needed information.” What do the other 15 percent use to find information? I would have thought that everyone used search engines. I was also surprised that many deep websites are visited more often then some surface websites. I would think that the larger amount of traffic to the site would make it a surface site.

5 comments:

  1. I liked the OAI article because we've been discussing on this and other classes the problems with creating a universal metadata standard. It's interesting reading about the progress that is being made in this area, as well as noting the services that can be built around common metadata harvesting, like search and retrieval.

    ReplyDelete
  2. It seems like that idea of metadata standards keeps coming up in many of our classes. There are so many groups trying to solve this problem, does that make the problem even greater? I kind of think that it does...

    ReplyDelete
  3. You brought up an interesting point about deep websites being visited more often than some surface websites. I, too, would think such high traffic would cause these sites to make their way up the relevance chain and be at the top of the search engine's results!

    ReplyDelete
  4. The standards is an issue long overdue and I wonder how much resistance is leftover from the early days of secrets and propitiatory guardianship?

    ReplyDelete
  5. Brittany,
    I also wonder what the other 15% of people do to find information? On a side note, the article also mentioned Northern Light as a search engine. And I remember using this from a long time ago. But, I don't remember having heard anything about it recently. According to Wikipedia, it was a search engine from 1996-2002, but then has become involved more with a software base now. Its so strange how things just come and go so quickly.

    ReplyDelete