Google Goof – Perils of Case-Sensitivity in Censorship…

Edit/Synopsis: A bug exists in the code that Google uses to enforce censorship of its recently announced Chinese service; the trivial bug – that search-strings with Capitalised Words generate uncensored output – could provide a (presumably temporary) mechanism for Chinese nationals to bypass their government censorship.

Simon Phipps just messaged me: (window grabs added infix for illustration)

Try this:
http://images.google.cn/images?q=Tiananmen
versus:
http://images.google.cn/images?q=tiananmen

So this is what censorship looks like in China:

And this is what the rest of the world sees:

…and so it looks like Google’s [2006-era] pro-China censorship keyword list is in all-lowercase, and the text matching is case-sensitive.

Oopsie. Enjoy the liberation whilst you can, citizens!

Comments

16 responses to “Google Goof – Perils of Case-Sensitivity in Censorship…”

  1. alecm
    re: Google Goof

    I regret that I am completely incapable of dealing with the various Chinese character sets, and so not able to check the possibility of native-language censorship bypass methods…

  2. alecm
    re: Google Goof – Perils of Case-Sensitivity in Censorship…

    Isn’t it tragic how the number of hits drops from 20,200 down to 414 between the two searches…

  3. Simon Phipps
    re: Google Goof – Perils of Case-Sensitivity in Censorship…

    One comment on Dave Farber’s IP list[1] (where I was tipped off to this problem by a comment someone made) is even more enlightening.

    Susan Jacobson said: “Certainly the Chinese images are misleading because they leave out June 4, 1989. But the Western images are misleading because they foreground June 4, 1989. Tiananmen Square has a long history in China. It was a place for protest and public gathering long before June 4, 1989, and yet we don’t see these images at the top of the Western search. Tiananmen Square is also an icon of patriotism, where families visit and have their photos taken in front of Chairman Mao, where many national holidays are celebrated. Where are these images in the Western search?”

    S.

    [1] http http://www.interesting-people.org/archives/interesting-people/

  4. alecm
    re: Google Goof – Perils of Case-Sensitivity in Censorship…

    It goes without saying that http://www.google.com provides the same image search results irrespective of keyword case; but regards Susan’s comment above somehow given the nature of the search engine and its self-reinforcing reflection of by what cosseted foreigners are interested, that we see nothing but tanks in the “typical” search is unsurprising.

    If you add an extra word and search for “Tiananmen Tourist” – the selection changes completely. Similarly “Tiananmen Celebration”.

  5. Clive
    Google “Goof”?

    If I worked at Google, whether programmer or manager, I would do everything I possibly could to maximise the number of “mistakes” that appeared in this exceptional censorship mechanism, the implementation of which is obviously very difficult, so anathematic it is to the main code base.

    And obviously correcting any faults that were discovered would have to be a low priority compared with issues that affected a greater proportion of Google’s users.

    (-8<

  6. alecm
    re: Google ‘Goof’

    I can see that point of view, though I suspect from a corporate viewpoint it would not be tolerated…

  7. bartb
    Result Of A Capitalised Word Google Search In China

    Well, it makes sense… the first shows the “Result Of A Capitalist Word Google Search In China”, the second does not.

    Bart

  8. alecm
    re: Result Of A Capitalised Word Google Search In China

    That… a really really bad pun.

  9. bartb
    re: Result Of A Capitalised Word Google Search In China

    Thanks, I’ll be here all week. Try the veal, and remember to tip your waitress.

  10. flipa
    re: Google Goof – Perils of Case-Sensitivity in Censorship…

    They seem to have fixed it now. Got 414 results with both searches.

  11. alecm
    re: Google Goof – Perils of Case-Sensitivity in Censorship…

    Gotta pay them their dues, they keep atop this sort of thing. That’s a short window from exploit to fix. Now, we just have to find the next weakness in the censoring filter. 😎

    “…interprets censorship as damage, and routes around it…” – how true that is.

  12. alecm
    It’s 24 hours later…

    And I for one am still getting the tanks, and I don’t think I’m hitting a cache?

  13. Rich Sharples
    re: Google Goof – Perils of Case-Sensitivity in Censorship…

    *Some* of this is this merely a ‘feature’ of Google – ie. that it ranks your own domain higher. I noticed this feature years ago. For example try searching for “football” on google.com – it comes up with something called NFL as #1 which only appears at the bottom when you search on google.co.uk (and if you only search for the .co.uk domain NFL is a long way down the list).

    I’m not saying that there isn’t cencorship happening – I’m merely saying that there is an implicit censorship caused by what people are interested in – and that differs by country.

    – Rich

  14. dave
    re: It’s 24 hours later…

    hmmm….

    http images.google.cn/images?q=tbn:lZQdztFM5Ny-yM:jml.prof.free.fr/doc/TiananMen-photo.jpg

    suggests all isn’t quite well…

  15. Ben
    re: Google Goof – Perils of Case-Sensitivity in Censorship…

    Also, simple misspellings get past the google filters: e.g., tienanmen

  16. 2010 UPDATE: links fixed and layout updated to put both pictures up-front.

Leave a Reply

Your email address will not be published. Required fields are marked *