I found these blocked sites by starting with a combination of URL lists and ad hoc spidering, and running as many sites as possible through the Saudi filters to catch the ones that were blocked. Some of the sites were blocked for reasons that were easy to guess -- for example, http://www.bighornbasinsfw.org/, the home page of the Big Horn Basin, Wyoming chapter of Sportsmen for Fish & Wildlife, was almost certainly blocked because of the slang term "nsfw" in their URL. http://www.AgainstPornography.org and http://www.SearchingForMySpermDonorFather.org were presumably blocked because of the presence of the words "porn" and "sperm".
On the other hand, there appears to be no rational reason why the Filipino American Women's Network, the Tuscon Jazz Institute, or the Sacramento Police Activities League would have been blocked by Smartfilter, even by accident. A partial list of the blocked sites that I found is in the blog post I wrote for Citizen Lab, an Internet censorship research center at the University of Toronto.
Articles about sites that are erroneously blocked by Internet censorship software, have a storied history. The first widely read piece was the article "Keys to the Kingdom" written by Brock Meeks and Declan McCullagh in 1996, calling out Cyber Patrol for blocking EnviroLink.org and the University of Newcastle Computer Science Department, and CYBERsitter for blocking the National Organization for Women. I made a minor name for myself and the Peacefire.org site in the late 1990's by writing more pages about sites blocked by other products, including some (like X-Stop and SurfWatch) which no longer exist, and others that are still around, including Smartfilter. I was also one of six people comprising the Censorware Project, a loosely organized group of volunteers that published a few more reports.
By the early 2000's, however, it became clear that anyone whose mind was likely to be changed by information about what kinds of sites were blocked by blocking software, would have changed their mind already (or would, if they came across the research that had already been done up to that point). So the further reports on Internet blocking software errors, by me and other people, slowed to a trickle. I wrote a report in January 2002 on the latest list of sites blocked by Cyber Patrol, a product that most people today have forgotten. In 2006 I worked with the ACLU of Washington to publish a report on sites erroneously blocked by FortiGuard, a program used on computers in some libraries in central Washington, as part of the ACLU's suit to challenge the constitutionality of the program's use on public library terminals. (The Washington State Supreme Court rejected the lawsuit on the grounds that, regardless of what sites were blocked on the computers, it didn't matter because an adult library patron could request for the filter to be turned off.) In 2007 I wrote an article for Slashdot titled "From Bess to Worse" listing some sites that were blocked by an Internet filtering program called Bess (which was later bought out by Smartfilter and discontinued).
Most people's awareness of this debate, if they had heard about it at all, was limited to the perception that "breast cancer sites" and sites about "chicken breast recipes" were sometimes filtered by Internet blocking programs. Or they heard that "Beaver College" actually had to change its name to avoid being censored by web filters. As I tried to explain in a FAQ (written, according to the Wayback Machine, in 1999, but which still broadly holds true today), these examples are true, but they miss the point. These examples make it sound as if blocking software companies are doing the best job they can under the circumstances, and that the errors are unavoidable due to limitations on machine intelligence. In reality, any software algorithm that blocks the American Board of Vocational Experts, the Hopewell United Methodist Church, and the Patriot Guard Riders of Mississippi, as "pornography" (as Smartfilter currently does), is probably not the best algorithm the company could have come up with -- but there's no incentive for them to try harder, because few people will ever look that deep.
And yet, people continue to remember the "breast cancer site" examples. This sounds to me like an example of the narrative fallacy -- people remember that breast cancer sites were blocked, because there's a tidy explanation. There is no tidy explanation for most other examples of blocked sites, so the meme never spreads very far. Conveniently for the blocking companies, the blocked-site errors which make the company look most sloppy (the Kennels at Simpson Creek Farms, the St. Francis Institute of Milwaukee, etc.) are precisely the ones that, due to the narrative fallacy, most people won't remember or hear about.
One company, CYBERsitter, did manage to make a few blocking decisions in the 1990s that were egregious enough that their antics did make the news, and did finally raise some people's awareness that the controversy over private Internet filtering extended beyond "breast cancer sites". After TIME Magazine's website published an article (no longer online) that criticized CYBERsitter's blocking policies, CYBERsitter responded by blocking TIME Magazine's pathfinder.com domain. A few months earlier, CYBERsitter had blacklisted the monthly e-Zine "The Ethical Spectacle, after the Spectacle's founder, Jonathan Wallace, published an article criticizing CYBERsitter for blocking my own Peacefire.org website. And Peacefire.org had been blocked, in turn, because of a page I wrote (now very much out of date) listing some of the sites that CYBERsitter blocked, including the International Gay and Lesbian Human Rights Commission and Mother Jones. (Nowadays, of course, nobody would be surprised that filtering companies block Peacefire.org, since the site publishes ample instructions on how to get around Internet blockers. But at the time, the site's first and only article was the list of sites blocked by CYBERsitter, which is why CYBERsitter received so much criticism for blocking the domain in retaliation.) CYBERsitter also threatened to have Meeks and McCullagh criminally prosecuted for writing "Keys to the Kingdom" and threatened to sue me over the page that I had made.)
The moral, it seems, is that if you want an example of a censored web site to stick in people's minds, it either has to be a forgivable error, or an insane vindictive dick move -- because in either of those cases, people will understand why it happened. The vast swaths of censored websites on the spectrum in between, the ones for which there is no rational explanation for the blocking, go ignored.
These days, though, American and Canadian "censorware" makers have also come under fire for selling censoring software to foreign governments which use them for country-wide censorship. Most of the criticism focuses, naturally, not on the kinds of sites that are accidentally blocked by the blocking software, but on the immorality of these companies enabling statewide foreign censorship in the first place. Netsweeper, Blue Coat, and McAfee have all made the claim that "Once we sell their product to them, we have no control over what they do with it" -- which, as I wrote previously in Slashdot, is nonsense, because for the product to be effective, it has to rely on updates to the blocked-site list, which are provided at regular intervals by the manufacturer. Cut off the updates, and the product will not work, at least not as well.
So the fact that McAfee has classified the Boy Scout Troop 87 of North Andover, the Pan-Iranist Party of Iran, and Reptile Conservation International as "Pornography" is (rightly) overshadowed by the fact that McAfee is selling to government censors in Saudi Arabia and the UAE in the first place. However, as long as the filters are installed, these blocked sites are at least part of the problem for users in those countries, just as much as they are for students or cubicle workers in the U.S. whose network administrators happen to use Smartfilter. And, of course, I sampled only a miniscule fraction of the Web to find these examples of blocked sites, so the true number of stupid blocks affecting Saudi and UAE users is likely to be much larger. For each individual example, you might reasonably ask, "Is it really a big deal if Saudis are blocked from accessing Boy Scout Troop 87 of North Andover?" But it adds up.