Saturday, May 30, 2009

Comments By Google

A couple of posts back I wrote about some techniques one can use to fight comment spam on their web site. Just recently, I ran into another technique, one that is all the rage in so many other business aspects: outsource it.

It turns out that Google has an interesting little service called Friend Connect. The features include comments, ratings, authentication and moderation tools to name a few. I haven't set it up yet nor read all there is to read, but I'd be willing to bet they also throw some of their great spam filtering technology built for GMail at the comments as well. An API exists so that developers can make use of the information tracked by Google to provide a more interactive experience. Think privileges determined by karma and such. And with a reasonably trustworthy third party writing a large chunk of your code, developers can focus on all the aspects of the site that deliver content instead.

The biggest downside to this approach, and it can be a biggie in some cases, is that all that social data is going to be housed in a remote database. That's probably OK for a great number of sites out there, but for some application, that may just be asking too much.

Another downside is that the look and feel is going to be limited unless you code all of your own controls against their API. That probably still does save some time. More importantly it means that you don't need to worry about the more demanding aspects of authentication. But once again, it means that outsourcing may not yield all the time saving benefits so many people think it will.

Tuesday, May 26, 2009

Middle Mouse Button Broken

I had a frustrating problem with my mouse tonight. Basically, the middle mouse button stopped working on my Razer Lachesis. I tried searching for a resolution and all I found were suggestions on how to clean the mouse. I'm too lazy for that.

I found that a simple way to test the buttons is to remap the button(s) in question to a keystroke; mouse button 3 to the number 3 in this case. I opened up notepad, clicked, and lo and behold, a 3 every time I click. What gives?

I started to switch the mouse button function back to 'windows button 3' and... what gives again? It's gone. As I switched through the other mouse profiles though I noticed that button 3 was almost always set to 'universal scroll'. "Let's try that," I thought to myself. I applied the new setting. Don't forget to apply the new setting! And look, the mouse is as good as new.

As near as I can figure it, I actually set the mouse button to 'windows button 3' at some point in some previous Windows version or application. The interesting thing is that the profiles are stored in the mouse, not the driver software. That's my guess at least as I removed the driver software and restarted the computer at one point and my middle mouse button was still pumping out 3's. Combine that with the fact that the signal is evidently a bit different between Windows 7 and whatever program I set it on, and a little bit of profile changing butter fingers on my part and voila - apparently broken button.

So, it turns out that my only real complaint about my nice, year old Lachesis is that it is too customizable for my clumsy self. Aside from that, this is one heck of a mouse.

Figured I'd share my experience in case some one else runs into this problem.

MOSS Search Access Denied

So I have been running into an issue on Office Share Point where the search service ends with an access denied error when it runs.

"Access is denied. Check that the Default Content Access Account has access to this content, or add a crawl rule to crawl this content."

The fix for this in our situation turned out to come from Microsoft KB 896861.  A lot of solutions focus on making sure account permissions are set up correctly.  However, this lesser known issue is caused by a security feature in IIS that prevents reflection attacks.  The feature gets in the way of the shared services provider when it tries to crawl the site in a single server environment.  The recommended solution is to map a specific hostname to the loopback address.  Check out the KB for the details.

I originally found this article on SharePoint Blogs. Hopefully, promoting this solution will help someone else out there.

Tuesday, May 19, 2009

Techniques to Fight Comment Spam

The following post is a list of techniques that I have run across that attempt to deal with the problem of comment spam on sites.  This is a followup to my last post titled, "Preventing Comment Spam."

Requiring authentication is generally seen as a fairly effective approach to preventing comment spam.  However, the disadvantages are frequently enough to dissuade implementation on many sites.  One problem is that authentication is a feature requiring resources and expertise that not all development shops have enough of.  Another issue is that authentication results in a barrier to leaving comments on a site that casual visitors will probably not bother overcoming.  In addition, the challenge is not that great for technically adept spammers if the payoff is access to a large user base.

Building upon authentication is the idea of karma.  Forcing users to build karma based on quality of participation before they can take certain actions is usually a hurdle that is too high for most spammers to deal with.  Unfortunately, depending on your user base, it can be an equally high hurdle to legitimate participation.

Moderation is another technique that comes up often.  It is generally regarded as the only foolproof approach.  Simply put, every post made to the site is screened by a human being.  The downside of course is that if your site is heavily trafficked by spammers, weeding quickly becomes a task that takes up all of your time.

Filtering is another approach that can best be described as automated moderation.  As an example, I found ReverseDOS.  This is an easy to setup ASP.NET  HttpModule that reads all of the content of a request and determines whether or not the request is a spam attempt based on rules that you define.  The rules can include checking all or only a portion of the request against a set of regular expressions and can be turned on or off for each directory within a site.

Another suggestion along these lines was to create a central repository for tracking spam.  Sites could query the repository which would try to determine if the submitted content was spam based on past submissions, user feedback and a bit of good natured artificial intelligence.  Regardless of the technique, the idea of filtering is to cut down the number of spam comments to an amount manageable by other means.

Reverse Turing tests like CAPTCHA can sometimes be used to increase the difficulty of posting spam.  The problem is that the effectiveness of the most common implementation, retyping words presented as an image, wanes as image recognition tools get better and better.  The images must get more warped in order to prevent automated scanning, but that makes it more difficult for legitimate users as well.  E.g., Google's captcha for new emails is so difficult to read at times, that I only get one out of four correct.

Throttling can be used in order to prevent any user from posting too many times.  Limits can be set on the number of items that can be created over a span of time or making sure that no user posts multiple comments back to back in a single thread.  The challenge here lies in identifying users.  If no authentication is used, relying on IP address is inconsistent at best and runs the risk of blocking legitimate users.

In the end, no single approach is probably good enough to stop spam. The pet project I am currently working on has been built with a mix of most of the techniques above.  I combined a bunch of existing frameworks with a little bit of custom code so it wasn't too much work.  At times I worry that I may have spent too much time on this aspect of the site.  Then again, the whole site was started as a learning endeavor.  If nothing else, I gained some knowledge and will have the tools in place to respond quickly if spammers begin to target the site.

Preventing Comment Spam

I have taken up the coding challenge of dealing with comment spam.  As with most topics that I write about, I am by no means an expert.  But I have done a lot of reading recently, and here are some of my observations.

There is no majority consensus on the single best approach to prevent spam filtering.  Everyone agrees something must be done, but few people agree on which one method is the most effective.

The tactic that most people do agree on is that multiple techniques are necessary in order to achieve the desired levels of spam reduction, ease of maintenance and usability for visitors.  The business of spam is based on the idea that by getting a lot of content in front of a lot of users it is likely that enough people will respond to make a profit.  Countering spam is a process of making it difficult enough for spammers to post to your site that their time is better spent elsewhere.  The challenge lies in creating a system that is easy enough for your users to participate in that is at the same time complex or smart enough to discourage spammers.

The combination of most effective tools to employ varies depending on the site being targeted.  Your breadth of content and comment topics, user quantity and quality, and a host of other variables will determine the different tools that will achieve the best results fighting spam.  The larger and wider ranging each of those dimensions is, the smarter your techniques will need to become.  At some point, the easiest to implement approach may be to screen submissions by hand.

There seems to be a general feeling that if enough sites take steps to reduce spam, the web can be made a better place for everyone.  Spam will probably never go away.  If it does, it is likely that the infrastructure of the web was changed for the worse for everyone in some way.  But the idea is to make it difficult enough that spammers would make more money performing constructive services instead of annoying ones.

For a bit more on techniques used to prevent comment spam, check out this followup post.

Friday, May 15, 2009

Example Code, Patterns and OOD

So, I've titled this post a couple of times now and each time I reverse the order of the concepts. I apologize if it turns out to be backwards in the final draft and it throws the more inflexible of you for a loop.  Moving on...

I've noticed a few things about developers that I've worked with during my career. When creating solutions to problems there are three common plans of attack they follow: they grab a piece of someone else's code from somewhere and shoe horn it in, they find a design pattern that more or less works for the problem at hand, or they think about the problem from an OOD perspective and plan out the code to come.

Example code sometimes gets a job done.  But it was originally written for someone else's job.  If this is always the course of action taken, there is a greater likelihood that the code will not mesh with the architecture or surrounding code that is already in place.

Design patterns are better.  They force you to think about the problem abstractly.  Then you can write code around them that both solves the problem and fits the architecture of your application.  However, at the core, they are still based on a design that is meant to be implemented in a particular way.  Sure there are enough design patterns out there to satisfy any need, but do you really want to memorize them all?

Knowing your object oriented design concepts is definitely the way to go; encapsulation, inheritance and polymorphism.  There is no design pattern I have seen that can not be boiled down to a combination of different amounts of these concepts.  If you know how to use each of them, there are no problems you can't solve and your code will fit into any architecture and follow whatever conventions you need it to.

Admittedly, sometimes you just need to know how to add a particular CSS class to the fifth paragraph tag on a web page using JavaScript.  For that, a code example will definitely get you going in the right direction fastest.  And design patterns are excellent for instruction and communication.  How better to learn when to use the different design principles than by example.  And naming a properly chosen design pattern can save a lot of time when conveying the solution of a complex problem.  

But I feel that by considering a problem from an object oriented perspective first you will end up with the best solution most often.  Sometimes the best solution will turn out to be based on one of the other two tactics.  But this way, you know you arrived at the right one.

Fake Communities

I don't like fake community sites.  Let me qualify that a bit.  I've been finding a bunch of sites recently that create their content by screen scraping other community sites that I legitimately belong to.  I find them slimy for a couple of different reasons.  Yes, 'slimy' is the technical term.  Honest.  

First, they are trying to pass off someone else's work as their own.  They didn't go to the trouble of promoting themselves.  The site probably doesn't have any fresh ideas; definitely not in content and if they copied the content, they probably copied the features as well.  Why would I want to go to the site at all?

Second, they pollute search results.  When I go looking for information, I want the definitive source, not a copy, nothing inaccurate.  There are tons of sites like this that aren't community based.  I don't like those either.  But there's another reason that makes these community sites worse.

When I first started finding these sites, I happened upon them because I was trying to keep track of information about myself and my company.  I was trying to pay attention to what, if anything, the public might be saying about us so that we could respond and be good members of the community.  Anyhow, I found a site that had some information about me on it.  It was mostly outdated and some was wildly inaccurate.

I thought to myself, "I should probably fix that so there won't be any misunderstandings."  And then I realized that I'd been suckered.  Well almost suckered as I didn't actually take any action, but the point is...

There is subtle tactic these sites use to make people join.  Once people see their information there, there is a strong sense of personal identity that urges them to take charge of that data to make sure that they will not be misrepresented.  Maybe I'm just paranoid.  But, I can't believe that they accidentally got the wrong information when it's all publicly available from LinkedIn.

I felt that if I logged into that site, it validated all the questionable tactics that they used to bring me there.  The regurgitating of information from other sites.  Preying on people's sense of identity to create an account and fix the content. And the site is apparently trying to enter into competition with the sites that they steal the information from in the first place.  If I created an account, I felt I would be just another number that they could hold up to investors to 'prove' how much traffic their site was getting.

Maybe I've been reading too much Seth Godin and his honesty and up front marketing tactics are rubbing off on me.  But it doesn't change the fact that these fake community sites are more or less stealing other organizations' work and data and holding it up as their own in order to try to trick the public into using their sites.  And that just feels slimy to me.

Sunday, May 10, 2009

Another ELMAH Convert

I just tried out ELMAH and I am yet another convert.

This project is getting some attention all of a sudden and it deserves it.  The project is a great library for an easy to use .NET logging utility.  I spent the last two or three hours incorporating it into a pet project of mine and it is exactly what I was looking for.  I was able to add database logging to my web site and send emails to GMail.  From there I retrieve then into a FogBugz account, but that's another story.

The basic ELMAH set up is simple.  That article may look long, but that's all there is to it.  For the most part.  There are a few other coding gems out there that I also took advantage of.

This is a wiki page I came across about how to secure ELMAH for remote use.

Here is a great article on how to make ELMAH play nicely with SMTP servers that require SSL.  Namely smtp.google.com, but quite possibly Yahoo and others as well.  [Edit: Turns out this was a known issue and there is a fix in the current trunk of the project. Here's some info.]

And last but not least, this wonderful web page explains how to create your own ASP.NET MVC error handler attribute that makes ELMAH appear as if the two frameworks were designed for each other.

There's plenty more to learn about ELMAH such as signalling and the great features of the elmah.axd report tool, but the above resources will get you going pretty darn quick.

Great job on the project Atif.