No browser supporting socks5 authentication?

If you're trying to use a socks server with Internet Explorer , Firefox, Opera or Safari everything will work just fine, except for authentication.

From my point of view this is a big problem. Who in the world would leave such a proxy server unprotected? Yeah of course you can always limit access to a proxy server based on ip address, but in some cases ( see NAT ) this is just not going to work.

Internet explorer supports only the socks4 protocol which doesn't even support full password authentication ( only username and it defaults to the current logged in username ) .

Firefox supports socks5 but no authentication mechanism so supporting socks5 is pretty much useless. I think I saw some ticket in bugzilla about this but no one managed to commit a fix yet.

Opera doesn't even support socks protocol but I thought I should mention all major browsers 🙂

Safari supports SOCKS5 and even allows you to set a username and password to access the SOCKS server but it does not use them.

I tried Konqueror, but I was unable to specify the Socks server, I guess this is because it was not compiled with a socks library.  Has anyone had any success with Konqueror and Socks ?

Solaris in qemu

For some time I wanted to try Solaris mainly because all the buzz around it, going open source then with the buzz around ZFS, but I never got the chance to do it until today when I had to make a perl script work on solaris 9 ( this is a outdated but that's what the client had installed ).

The perl script was using the Net-SSH-Perl module and the module had some requirements that had to be installed.

The problem is that the system where this script had to run had no C compiler so I had to install solaris9 in qemu, compile the requirements and send them all in one package.

Geting Solaris 9

So I went to solaris 9 download page and downloaded the DVD version ( split over 5 zip files suffixed with a-e ) , decompressed each file, then cat them all in one big file respecting the alphabetical order of the names and I got my big .iso file.

Qemu Scripts

When I use qemu ( and I use it a lot ) I have these 3-4 scripts that I use everywhere:

1 ) the start script : start.sh . I use this script to basically boot the guest system

  1.  

I run this like "./start.sh" or "./start.sh d" if I want to boot from cd instead of the disk. DU_sol9.iso contains some drivers i will discuss later in this post. macaddr is optional but I set it so that my dhcp server knows what ip to provide for this system.

2) restart_dhcp.sh : this script restarts my dhcp server so that the server will start answering requests coming in from the tap devices created by qemu. this way the guest systems can get their ips from my dhcp server.

  1.  

Sleeps a bit before restarting the dhcp server because qemu does not create the interface right at the start and we have to make sure the interface ( tap11 ) exists before we restart the dhcp server.

3) qemu-ifup.sh and if I want two interfaces in the guest system then I will have qemu-if2up.sh. This script just sets the ips on the host side of the tunnel between the host and the guest system

  1.  

I can't remember exactly why I wanted to use ip tools, I think ifconfig $1 172.16.10.2 would work just as well.

Installing Solaris 9

Installing solaris went pretty smooth in qemu. I just modified the start.sh script to use the solaris9.iso for -cdrom and ran "./start.sh d" and a few F2 keys hit and a reboot later the system was up and running. Everything seemed to be fine but I had no network access to the guest os and the CDE ( Common Desktop environment ) locks from time to time .

I am unable to determine the problem with CDE but so I just killed the X server and then I ran /usr/openwin/bin/xdm which gave me an X session in a minimal window manager ( Twm ) but that was enough for me to install Sun Studio and get a C compiler.

There was no network access because solaris 9 does not have the drivers required for the network cards that qemu emulates ( NE2000 rtl 8139 and a few others ). I found this page providing a community network drivers for those card. the page also contains instructions about how to add the drivers and at the official sun documentation site you can find out more about configuring network interfaces in solaris

When trying to install Sun Studio, qemu just crashed at about %30 with "qemu: fatal: Trying to execute code outside RAM or ROM". Luckily the C compiler was in the first %30 of studio's files and I managed to uses to for compiling the requirements ( modules ) for my perl script, but when I was finished, the client thought he should mention that his server was SPARC not x86 ( what I tried ) 🙂 . Why didn't I think of that, I guess most servers out there running solars are sparc....

Next I'm going to install the SPARC version of solaris 9 in qemu and I'll let you know how it goes in another post...

Repair a MySQL table

I'm running mysql 5.0.23 on a FreeBSD server. I have several databases there and a few phpbb forums.

I noticed the tables used for searching the forums ( phpbb_search_wordlist and phpbb_search_wordmatch ) crash quite a lot lately from various reasons but mainly because of hardware problems ( like lack of power 🙂 ). Nothing unusual here so far. When I notice this I go into mysql and do a repair like :

  1.  

But this time I got this answer: " Table is already up to date". So MySQL client tells me the table is fine but in phpbb when I try to search something I get this error message: "SQL Error : 145 Table './simscripts_phpbb/phpbb_search_wordlist' is marked as crashed and should be repaired". I try to read the table from the mysql client and I get a similar message.

Going through the MySQL documentation I find some extra parameters i can pass to repair table. And use_frm seems to be the one that fixes the problem. This parameter should recreate the indexes by looking at the .frm file ( the structure definition of the table )

  1.  

did the job and search in phpbb is back online.

Unfortunately it seem that the table was so badly damaged that no rows could be recovered so the repair did was to recreate the database structure. Good thing I had a backup!

OR maybe it was just because my table was created on an older mysql version and as the documentation says:

Caution

Do not use USE_FRM if your table was created by a different version of the MySQL server than the one you are currently running. Doing so risks the loss of all rows in the table.

I don't know if this is a bug only in the version I run, but I think MySQL should really look at indexes and if they need to be recreated it should just do it automatically or atleast tell you the table is not ok instead of lying like that.

Update:

If you have shell access to your server with root or mysql user permissions you can go in the mysql data directory ( usually /var/lib/mysql or /var/db/mysql on freebsd ) go into your broken database directory and use myisamchk to repair the table without the risk of losing all the rows in it :

  1.  

How to write about Linux for Digg?

I can't say I really know the answer to this question as none of my articles reached the front page, and I don't think they will ever be there mainly because digg audience doesn't care much about the type of content I write, but check out this site www.venturecake.com.

The site has only 11 articles and 6 of them reached the front page on digg.com. Venturecake.com is a blog about technology, mainly open source, Linux, Unix, Apple, and some others. The last post ( Who copied who? ) was published yesterday and it got over 600 diggs in one day.

The posts that made it to digg's front page are about common buzz words like Apple, Web 2.0, ( Web 2.0 is built on Open Source ), Open Source ( yes this is still a buzz word ), Ubuntu and Virtualization ( 15 minutes to using your existing Windows install & apps in Ubuntu , 10 minutes to run every Windows app on your Ubuntu desktop ) but also some unique tips like
10 Linux shell tricks you don’t already know. Really, we swear.

Speedlinking Unix 20-07-2007

It seems Solaris features (mostly ZFS ) make a lot of buzz these days. I wonder when ( if ever ) will Linux include ZFS in the kernel. I know there is a ZFS implementaion for Linux in FUSE, but that's never going to be used in production unless it's ported in the kernel.

I think the developers should leave the licensing mambo jumbo aside and really look at what ZFS has to offer and decide upon that if it's worth implementing . Isn't there anyone in the Linux world that wants ZFS ?

More optimization for comment relish plugin

In my previous post about optimizing the comment relish plugin I managed to lower the load generated by the comment relish plugin on the database server by adding an index on a column in the cr_emailed table and by optimizing a query ( basically removing a "useless?" join ). This improved the load time a lot but some users still reported slow load times on blogs with a lot of comments.

Looking more over the source code I realized that the function that tried to find new commentators was executed on each page. This function was executing a mysql query that joined two tables wp_comments and wp_cr_emailed. I think you can imagine the result of this on blogs with a large number of comments.

The solution was to execute this function only when a new comment was posted. And this comes with two benefits:

  1. because this is called only when a new comment is posted ( or approved ) the rest of the blog will remain as fast as before the plugin was installed.
  2. because of the way we call the function ( as an action/hook associated with the code that processes the comments ) we have more information about the comment so we don't have to do the query where we join two tables. We still do one query to get the whole comment data and one to see if this is a new commentator ( not emailed yet ) but these use indexes and are really fast.

Get the new optimized plugin here and let me know how it works for you.

Update:

the plugin link above contains a plugin that tries to send email even when the blog receives a pingback or trackback as reported by Rhys. I have uploaded another version that corrects this problem here: [download id="9" format="1"]

extracting fields in shell

A lot of shell scripts require processing some kind of data structured in fields or columns separated by special characters ( space, coma, semi colon, etc... )

This is a short tutorial that shows you how you can extract the fields in a stream of data. There are several ways of doing this and each has it's advantages of disadvantages.

Here is what I use:

  1. Using cut

    The 'cut' program will allow you to extract the fields separated by one character. you can specify which field to extract, and what is the field separator.
    Example: echo "a:b:c" | cut -f2 -d':' will output b
    The cut program has the advantage that it is simple to use, almost ( all ) Unix flavors have it included in the base distribution and is relatively lightweight ( ~33Kb with no library dependency other then libc on my gentoo Linux )
    The problem with cut is that the field separator can only be a single character.

  2. Using awk

    awk is a pattern scanning and processing language somehow similar perl. Actually it is believed that perl was inspired by languages like awk, perl, C, and some others. Awk is a lot more flexible then cur and can do a lot more. You can actually specify a regular expression for the field separator.
    Here is an example for extracting the fields separated by one or more spaces:
    echo "a b c"|awk '{print $2}' - this will print the second field. As you can see I have not specified any separator because awk uses <space> as the default separator. <space> means any number of spaces here.
    You can specify a different field separator by using the -F parameter.

  3. Using a shell function

    this may be the simplest and fastest solution but will only work if the field separator is composed of spaces or tabs only. As you may know the parameters are passed to a shell function separated by spaces. so you can just make a function that has the sole purpose of returning the field ( parameter ) you want.
    If I want to get the third field from a line I would do a function like this

    1.  

    getfield a b ccc ddd would display 'ccc' . This is more useful in a script where you need to get a field value from a variable containing some text but not so mush with whole files.

Do you know any other/better method ? Feel free to share them in the comments

Comment relish optimization

Comment Relish is a wordpress plugin that sends an email the first time someone comments on your blog. The plugin allows you to define the message in wp-admin and you can embed tags like author's name, email, website and others in the message.

This can be used to send a welcome message to the first time commentators and maybe invite them to subscribe to your rss feed.

The Problem

John Chow, tried to use it on his blog but the plugin and his huge number of comments ( 43000 ? ) put the site down for about 2 hours. He said the plugin misbehaved and he would not recommend it if you have a lot of comments.

I did a little simulation on a wordpres blog with about 25000 comments and I can see how the plugin misbehaves :). It took more then tho minutes to select the email addresses that should be emailed before I decided to stop it, because that's just unacceptable, but I think it would have taken a lot more.

I managed to optimize it easily by just modifying a field in the table used by this plugin. If you don't care how I did it and just want the optimized plugin skip to the end of the post. If not then read on...

How it works

The plugin uses a table ( cr_emailed ) to remember which addresses received an email, so that it does not send a message more then once. Basically the plugin just inserts a new record in this table every time someone new comments on the blog. It tries to see if someone is new by doing a LEFT JOIN select on two tables: comments and cr_emailed, and then filters the results that have cr_emailed.email = NULL. The problem with this is that the email column has no index so if you have 43000 comments the query will have a huge result set and mysql will have to process each result to find out if cr_emailed.email is null ( the email has not received a message yet ) .

The solution

This was really simple, just convert the email field to a varchar ( needed in order to be able to set a fixed length key on this field ) and then add an index on it.

After doing this the query that took more then two minutes, now takes a little less then 1 second.

Some one on John Chow's blog suggested to add an index to wp_comments.comment_author_email. I tried that but I noticed no improvements, and using explain on the sql query shows the index on comment_author_email is not used, so that's useless.

Looking more over the code I noticed another weird thing. The query executed to find the email addresses looks like this:

  1.  

I wonder why the author used the posts table in this query? The results from the posts table are not used anywhere in the code. I really couldn't see a good reason for keeping the posts table in this query so I removed the posts table from this query, making it even faster.

Get the source code

Here is a diff file between my version and the original version and here is a my new version . If you have this plugin already installed you have to uninstall it and remove the wp_cr_emailed table and then copy the new version and activate it in wp-admin.

I'm using the new version on this blog and is working but I have few comments here. I have only tested the modifications on this blog and on a fictional blog with 25000 automatically generated comments so the usual disclaimer applies... you know all that "WARNING NO WARRANTIES" stuff...

If you have a blog with many comments and you want to try this, let me know how it works in a real environment.

Google set to kill link ads

Google has a way of reporting paid links now. They say buying links is an attempt to game their pagerank algorithm and they want you to report sites that sell or buy paid links.

They agree links are a good way of advertising and are not against it, but they want those that display text link to put the rel="nofollow" attribute in the links. Using the "nofollow" attribute will means that GoogleBot will not follow the link, thus will not use it when computing the page rank for the destination url.

I think the only reason you would want text links on a site is because of that, to get a higher page rank and relevance, so by requiring webmasters to use nofollow, they are just killing text link advertising networks like Text Link Ads that work especially because they sell link ads that are followed and transfer page rank.

Google says this violates their guidelines. How can you violate a guideline, you can violate a rule, but if it's just a guideline that means you shouldn't be penalized for not following it.
And there are other problems with this policy. Links are supposed to mean that the owner of the site thinks that some other site is relevant, and that is why he links to it. Paid or not it can be relevant. Page rank is about relevancy, right ?
If I want my site in google search ads, I pay google for it, does that mean my ads are not relevant ? Google says it shows contextual ads because they are relevant to the content the user is seeing. It seems to me, it's relevant only if you pay google for the ad.

And here's another problem: How can google tell if the person that reports such violation does not lie? If I want to get my competitor out of google index or set him on a lower page rank I could just report him for buying text links. A lot of web sites have text links pointing to them, paid or not. It's hard to tell. Some disclose them, other's don't. This may influence the ones that do disclose them, not to disclose the links anymore. Why add to the risk of being reported?

css class names in IE 6

I just realized today that IE 6 class names must not start with something else other then [a-zA-Z]. So don't you dare name your classes  like _class1 cause it will not work in IE6.

It works well in Firefox 2.0, IE7, opera and safari and that will just make it harder when you'll try to discover the problem.

I know this seems like a lame mistake and something any designer should know, but well I'm not really a designer and some things you never know unless they hit you.