As anyone who used our search functionality on the old site in the past year knows, one of the major problems Technorati has been battling is how to provide good, useful search results without them being poisoned by spam, scrapers, free article reposters, and the like. We obviously failed at cleaning the data after it got into our search set, as we had numerous spam blogs with authorities in the hundreds.
Our solution, as briefly described in our Indexing FAQ and Blog Claiming FAQ, is to work from a known clean data set. It probably won't ever be perfectly clean, but try some popular keyword searches on the new site and you'll see the tremendous improvement. Sites that we have added to the clean index are being crawled, having detailed authority calculated, display recent posts (one at the moment) on our site, and contribute to other sites authority. We are continuously adding more and more sites to this index, and are working on ways to do so faster, but as you can imagine, the volume of sites to qualify is enormous.
Despite some of the speculation & hyperbole you may have seen, the only judgment call that can be inferred here is that, if a site is in the index (has higher authority and displays recent posts), then Technorati is pretty certain that it is a true blog (or news site -- we have those too). If you find any sites getting that treatment that do not belong, please feel free to let us know in the Site Details page comments on Technorati.com, or over at GetSatisfaction/Technorati. But as far as reading our minds goes, that's it. If a site is NOT in our clean set, it doesn't mean anything other than that, today, that site is not in our index. It does not mean we have any opinion about the site and is not an assessment on the site's quality, worth, or usefulness. It's a safe bet that we just haven't had a chance to look at it yet.
As I said, we are continuously adding new sites, and are investigating ways to speed up & optimize that process. We hope to have a way for Technorati community members to nominate or sponsor additional sites as well, but that will be further down the road.