December 2005


This article on Google’s newsletter for librarians gives a well explanation on how google collects and ranks articles. You may think that you know PageRank well. But I promise you will find something new in this article. The best part I like is this exercise, for more details you should refer to the article:

Now tape those documents on a wall, step back a few feet, and squint your eyes. If you didn’t know what the rest of a page said, and could only judge by the colored words, which document do you think would be most relevant? Is there anything that would make a document look more relevant to you?

This shows you how important a good title is. Take a look at your page source. What’s in your title and h1 element. Are they relevent to your article?

Most websites put a static site name in the title of every page. Just think about how many times, you should modify the description area when you tag a page to online bookmarks. This shortage in page design not only annoys taggers, but also shows less importance to Google.

So I have something to do with my site now. Hehe!

Now you can search in Google Scholar and get direct access to some online resources in National Library of China. Here is how to do:

  1. Go to Google Scholar, select search preference
  2. In the section for add library, you search for China
  3. And there is only one result, which is National Library of China.

You are done. I did some try, but cannot trigger result from NL of China. Maybe the resources they provide is not within my field.

I have enabled FeedFlare in my blog feeds. FeedFlare is a service provided by Feedburner to enable more user action to the feeds, like email this, tag this, technorati links, etc. If you are reading this in your aggregator, you have already seen them in the bottom of each post.

There is an apology to you. In enabling this service, all the posts are marked unread. This is a one time pain for you.

See press release and Feedburner blog for more detail about this service.

This morning, I knew Yahoo acquired Del.icio.us. Since I am both MyWeb and Del.icio.us user, I am looking forward to how Del.icio.us will integrate with MyWeb.

Yahoo search blog said:

Finally, don’t be surprised if you see My Web and del.icio.us borrow a few ideas from each other in the future.

Joshua answered in the mail list:

[They will be] Separate. Obviously there will be some cross-pollination.

So let’s wait and see. Currently 50% of my links go into del.icio.us, and 50% go into MyWeb. Why I use two services in the same time? Let’s have a comparison:

  • Del.icio.us has good social feature. Links are easy to share. With the latest integration with firefox, it’s user friendly. And lots of early adopters make an exerllent user background. So what locks me in is not the system, but the users.
  • MyWeb has good search feature. With Yahoo’s resources, its service is more reliable. And it saves copy of the web page. (IMO, del.icio.us can never get such kind of resources if it keeps independent.) But it’s not very easy to share links with non-Yahoo users. There’s no way to use it as a public link blog.

So with these pros and cons of the two services, I have to split my links into two places. Here is how I use the two services: Dynamic and index pages go into del.icio.us. Static articles, especially expiring news stories go into MyWeb.

How much value does people powered search provide?

I would like to take personalization as an example. Findory and TailRank are two of my favorite personalized search engine. As to the big difference between them, TailRank is people powered, while Findory is not. TailRank goes a clever way to take both number of sharing and inbound links into account to determine how important a story is.

Currently, I cannot tell which service is better for me. I just use them both. In my experience, people powered does make sense, but not very big sense. We still have to wait at least several months to be benefit from this acquisition.

And don’t forget there are other players in the playground. As this Red Hering article pointed out:

Many so-called “web 2.0” companies, including Wink and Jookster, are also exploring improving search by compiling users’ recommendations. Blog search engine Technorati makes heavy use of tags to help connect users to what they’re looking for. Digg uses user input to point to interesting technology news and web sites in real time.

Update: Greg Yardley pointed out what actually Yahoo bought:

It’s the del.icio.us community that can’t be duplicated. Yahoo didn’t buy del.icio.us’ technology; it bought our bookmarks and tags - and for quite a price. Assume 300,000 people make use of del.icio.us in a meaningful, regular way (a wild-ass guess, but for the sake of argument) and assume the purchase price is around $30 million. That means my personal bookmarks just got sold for a hundred bucks. Which is considerably more [should be less] than I thought they were worth.

Thanks to Arnet for mentioning this on del.icio.us list.

Update: David Beisel talked about monetization of bookmark services. He said, “With the Yahoo Delicious acquisition, there is one less acquirer out there available to shop for a competitive service. This situation leaves the remainder of the field ‘validated’ by the acquisition, but challenged to truly build a business model around their offering.” see The Search for Delicious Bookmarking Revenue

I was playing Yahoo Answers this morning. Yahoo Answers is a product to utilize social community to solve questions that cannot be answered easily by search engine. This is another machine Vs. human comparison.

For someone that still doesn’t get what Yahoo answers is doing, here is a good question for you: What is so special about yahoo answers? And the answer is good, so that I will quote it here:

It’s just like a message board but instead of meaningless rambling it is all questions… And quite possibly the largest forum in the world eventually.

What I like Yahoo Answers right now is two things:

  • Just like any other meme engine, YA use big font size to show important headlines. Although, what means important here is not so sure.
  • Even you don’t ask or answer any questions, you can learn a lot by just browsing, and get points by rating. So it will be a good way to kill time, if you are bored.

So where are you going today?

This Monday, I listened to Biing-Hwang (Fred) Juang’s speech, who is from Georgia Institute of Technology. You can find the news on our school’s website. The title of the speech is Prospects of Speech & Multimedia Research For The New Digital Era. Here is some notes of his talk.

Speech Research

Fred said there is a paradigm shift in speech synthesis. Storage prise becomes cheaper today, and this makes speech synthesis no use, because we can just store all the words that we need in advance.

Then he gave a demo on text-to-speech. It was a email reading program. He first gave a traditional version, which sounds quite machine-ish. Then a much better version similar to real human. The challenge is to let computer understand the document.

Went on, Fred brought forward the quesion: Speech recognition is a language problem or just a math or signal processing problem?

The math model has reached its limitations. In comparison of human Vs. machine, human often does better job. He mentioned two approach: 1) Biological approach; 2) more interaction. Interaction means to give more steps for machine to tweak the result.

In the future of speech research, Fred said speech technology is of human, it should beyond simple conversion between sound and text.

Media Research

Fred first introduced some new telecom paradigm: Ubiquitous computing; converged digital services; right content, right person, right context.

Then he quickly reviewed some of the research project in GIT.

  • Spatialization & the perceptual dimension. He gave a demo of Multi-channel system, transmiting voices of a male and a female at the same time.
  • Content processing. They are using region of interest to solve low resolution on mobile devices.
  • Embedded and Layered coding. They are using multi-stream coding to deal with high bit error.

In summary, these are the things in the center of future telecom research.

Next Page »