Hashing of files

Something that every so often arises on the (now absent) forum is why DC++ re-hashes some files. People with network drives are among the majority of these users.

There are two reasons why DC++ would (re-)hash a file.
(1) The path to the file has changed. (The file name is included here.)
(2) The file content has changed.

People don’t realize why (1) is important. They think that DC++ could just look at the file name and see “that it’s the same file”. However, this would obviously not work well if you have multiple files named the same (“example.png”) sprinkled through your share.

(2) is obvious if you’re indeed changing the content of a file intentionally. However, there’s some software that “automatically” do this for you. You might experience this the most with MP3 files and documents. Certain media players like to change the ID3 tag of MP3s, and various document editors like to set their own foot print on the files.

People with network shares may see these things regularly. Detaching and re-attaching the network drive may cause them to update the files’ timestamp and may cause DC++ to re-hash the files [2]. Sometimes, the path may also change, causing (1) to happen.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Protocol chat

A key element in DC is the ability to chat. Basic chat is very easily implemented in a client.

As is noted by the previous ADC and NMDC run downs, commands are sent differently, which is why we need to know about each.

Let me create two categories; NMDC chat and ADC chat.

Both categories have chat in the main window. Further, NMDC has “private” messaging, which is essentially that chat is sent to a particular user. ADC has also “private” messaging. However, while the chat functionality in ADC allows a single user to receive the message, the private message is intended for a ‘group’ of users. This group can be anything, really. It could be only those who us DC++, or those who natively support user commands. What is interesting is that sending a message in ADC to a user is the same as sending to a group, just a replacement of some info, but the length will be the same.

Let us continue, and start with a basic main chat message.

Main chat is rather simple in NMDC, and it require no real difficult parsing it. In this example, my nick is going to be “ullner” and the message being “hello everybody”. This is how that will look;
<ullner> hello everybody|
Which is pretty straight forward. The nick in brackets, followed by the message and an ending pipe. This amounts to 25 characters or bytes.

Moving on to main chat in ADC… The difficulty to parse is raised, although not very much. I’m going to use the same message. However, due to ADC’s user -> SID mapping, I won’t be using a nick, but a SID. The SID is “MD3Z”. This is how that will look;
BMSG MD3Z hello\severybody\n
Which is also pretty straightforward, although not as much as the NMDC example. The \s replaces the space and the last \n is one character (ending the command). This amounts to 27 characters. Not that much difference, really. At least not in our example.

Note that changing nick won’t change the SID. So, NMDC will consume slightly less bandwidth if the nick is 7 or less characters, 8 characters and we’re dead on and above 8 mean that ADC will scale a better.

Let us continue to “private” messaging.

I’m going to now use the same message, and with the other user being “arne”. arne’s SID is in this case “6DKN”.

Starting with NMDC; $To: arne From: ullner $<ullner> hello everybody|
So, essentially it is the same as for a normal main chat message except the beginning. This amounts up to 49 characters or bytes.

In ADC, we will have the following; DMSG MD3Z 6DKN hello\severybody PMMD3Z\n
In this, we changed the initial letter from a B to a D (works with E, too), added arne’s SID before the message and our own SID after (in the PM parameter). This amounts up to 39 characters or bytes.

If the two users’ nicks were only one character long, we’d end up with 36 characters. (Though, most users don’t have that, naturally.)

In conclusion; main chat messages are somewhat better in NMDC, but up to a point where ADC will be better and; ADC will in almost every case out-weigh NMDC in private messaging.

Identifying ADC

Something very important in ADC is the different client identification schemes. I already noted something about them all, but I thought I’d dedicate an entire post for them.

Session ID
The session ID (SID) is the unique ID that is used per hub. When a client connect to a hub, the hub will assign a particular SID for that user. The SID is calculated by taking 20 arbitrary random bits and then encoding it with Base32 (to a form a 4 byte string). There is only one reserved value that the hub must not assign a user; It is “AAAA”. As the hub isn’t considered a “client”, it does not have a SID. However, to simplify client implementation, the client can (artificially) assign the hub the SID AAAA, since the client know no one else can have that. (Elise and DC++ does this, at least.) During one session, that is, when a user is logged in, the SID for a user mustn’t change. The user must log out and log in to get re-assigned a SID. The SID is assigned before any real information from the client has been sent. This in turn mean that the hub doesn’t care about what kind of information the client send. If the client’s nick change (can happen) during the session, the client won’t get a new SID. Note that the SID is *per hub*. This mean that a user with SID “6DKN” on HubA isn’t necessarily the same user as “6DKN” on HubB (it is possible, however). This ID is what is to be used when sending commands.

Private ID
This ID is the unique ID that is used to verify your CID (see below). The PID must not be given to another client. Doing so will allow others to claim they’re you. If you’re an operator in a hub, I’m sure you don’t want others to know how to potentially get in and gain that operator-status. Of course, there’s always the possibility of rogue hub operators, but I guess you’ll have to trust them in the end. According to the ADC draft, PIDs should be “generated by hashing the MAC address of the generating client followed by the current time using the Tiger hash algorithm.” Personally, I had a couple of issues with that when I wrote the PID generation for Elise. (1) The MAC address isn’t always that simple to get. If you can’t get to it, just use an arbitrary string that you know will be (at least) semi-unique. (2) “The current time” phrasing is a little fuzzy, since we have no idea of what format that “current time” should be. Seconds since 2000? Time of day? Doesn’t really matter here, either. Make sure you are using something that is (at least) semi-unique. In the waiting for a potential re-phrasing, use strings that make the final hash probably to be unique in the end. Elise do, at least. Note that it is the Tiger hash algorithm. Not Tiger Tree. (I have no idea if it actually really matter here since the info isn’t very large, but worth noting anyway…) The final PID should be 192 bits and encoded with Base32 (to form a 39 byte string). You can actually change your PID to something you’d like in the Experts Only page… Note that this means that the CID will also change.

Client ID
While having the most ambiguous ID-name, it is also the most important. The CID is the ID that will (should) uniquely identify you across entire DC. The CID is constructed by taking the unencoded hash, hash it again, and then apply the encoding with Base32 (also 39 bytes). This ID is something you will identify yourself for your friends, for the appropriate hub status (registering per CID is a lot better than nick since people can change nick all the time) and as other people’s source.

Changing CID and PID is potentially possible during a session, though there’s a rather large chance that the hub will kick you. (ADCH++ will.)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

ADC: The run down

I previously wrote an extensive post about NMDC. Since we’re moving away from NMDC to ADC, I guess a post about ADC is in order. If I don’t explicitly say that something is different from NMDC, assume that the ADC way is the same as the NMDC way.

While a final version (that is, “1.0”) doesn’t exist, the current draft is mostly what will be in the finalized version. The first draft of ADC saw day light on the 3rd of December 2003. ADC was spawned from ideas from another replacement draft; DCTNG. The draft mentions ADC to stand for “Advanced DC”, though it isn’t official. (I always thought of ADC as a recursive acronym; ADC Direct Connect, but maybe that’s just me.)

As a network, ADC work the same as NMDC does; with hubs and clients, where the hub is a central part. Everything is routed through the hub, except the actual file transfers. However, a client could claim (to the hub) that it wants to download from another client, the hub allows it, and instead of trying to get a file, the client will start sending other messages (such as chat). Truely private chat.

Contrary to NMDC, the one that does the connecting, speaks first. That is, eg when connecting to a hub, the hub will wait, after establishing a connection socket, for the client to say “hello. I want to come in”.

In ADC, there are two key characters. The first is a space, used as a delimiter inside commands and a “newline” character to denote the end of a command. There’s no starting character. Whatever that come after the newline character is considered to be a totally new command.

Commands are constructed in multiple ways. In all of these ways, an initial four characters (well, five with a space) are required. These character say (1) how the message should be routed or used (“type”) and (2) what the message is about (“action”). When a client recieve a command, it shouldn’t actually even look at the type to determine what it should do. As I said, there’s multiple ways to create commands, but you’ll need some more info on ADC.

In ADC, when the client connects to a hub, the hub will assign the client a unique ID for that particular hub. This unique ID is very important since the client will need it to interact with the hub. (This is called a ‘SID’.)

Also, beyond a unique ID per hub, ADC require that all users in DC have a unique ID for the entire network. That is, I should be able to say “hey, that user is the same user as that one”. This unique ID is broadcasted to everyone in the hubs (well, doesn’t have to, but most likely will in most hubs) a user frequent. (This is the ‘CID’, which you can visibly see in DC++…) Further, so users aren’t allowed to spoof someone else’s CID, they need to provide another special unique ID (‘PID’) to hubs. The hub will then verify that there’s a match, and let the client to continue. You can spot a security issue here; users need to trust hubs, that they don’t give out the PID to others.

Let us continue. Each action, have a set of parameters that are allowed and/or have to be used. These parameters can either be mandatory or voluntary. If the parameter is voluntary, it is required that it is preceded by an two-character identifier. If the paramter is mandatory, there shouldn’t be an identifer.

Moving on… There are three types of commands. Since the initial bit is always mandatory, I’ll leave it out from these examples. (1) Only the parameters of the action are present. (2) The SID for whom it is from, followed by the parameters. (3) The SID for whom it is from, followed by a SID for who it for (“send this to person x only”), followed by the parameters.

In ADC, all commands are uppercase characters and case-sensitive. Voluntary parameters have no particular order; one can send them however they want.

Something else that is interesting in ADC, which NMDC doesn’t do, is that if a parameter need to have a space in it (like a description for a user), the space is replaced by “\s”. “\n” to display a real new line and “\\” to display the character \.

One of the most interesting aspects of ADC for developers is the ability to create extensions, without trouble. If a client or hub doesn’t understand something, it just ignores it (well, there’s always a possibility of kicking/disconnecting).

Let us get away from this somewhat boring info… In NMDC is a hub assumed to be running at port 411 and file transfers on 412. However, this assumption is not allowed in ADC; Addresses must be explicit in the usage of port.

Contrary to NMDC, chat is rather easy in ADC. In ADC, all chat is assumed to be in UTF-8, meaning that everyone should be able to see everything. Also, there’s no such thing as the “highest number wins”, in transfers.

As NMDC has a protocol specifier (dchub://), ADC has one, too. It is “adc://”. In the future, you may see “adcs://” do denote that the hub is using TLS.

Rather obvious but… ADC natively require the usage of TTH…

I’ve spoken much about ADC, so make sure you read all of the other posts on it as well.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Detecting your client

A client detection mod (CDM) is an client that is run by an operator in a hub. The CDM will gather information about users and try to enforce rules set by the operator. The CDM have various ways of gathering information and using it, some obvious and some not so obvious ones. We have all seen them; an ‘operator’ is mass-kicking users in a hub because of cheating, slot ratio or some other stuff. CDMs are sometimes (jokingly) referred to as ‘spreading cancer’ because of their nature; they use purely logic and assumptions. For a CDM, you are either good or bad. No grey area. And of course, innocent users will always come in the middle…

CDMs can be set to use a ‘white list’ or a ‘black list’. Clients on the white list is the only clients that are allowed in the hub, with no exceptions. If the CDM discover a client not being part of the white list family, it will be kicked (or banned). Clients on the black list are the only clients that are restricted from the hub. This means that if the CDM discover your client, and it’s not on the black list, it will be allowed in. From a security point of view, the white list is better. However, from a network point of view, the black list is better since it will allow new clients so they have a possibility to grow.

There are various things a CDM check to conclude the client’s status;

  • Commands
  • Share
  • Tag

The first is essentially that the CDM will monitor traffic from your client and if the traffic is, or not, in the list of (un)approved clients, the CDM will act on it. Eg, you can use your fresh copy of DC++ to detect other DC++ clients; connect to them, and their icon should become blue. This is because DC++ has a set of specific commands it sends, thus increasing the possibility for someone to know which client you’re using.

The second, share, can be divided into a few sub-categories.

  • Number of broadcast bytes
  • File list
  • Normal files

The number of broadcast bytes is a classic. Essentially, the one thing checked is the amount of bytes your client claim you share. If the value is too common, or the entire number share some common denominator, the CDM will know about it. Most CDMs will e.g. kick if they see someone broadcasting “444444444” bytes with the message “Too many similar numbers” or something like that. This is only the first frontier, and will most likely flush out the most common and crappy cheaters. (Of course, some normal users may be kicked, though it’s probably rather rare.)

Going on to file lists, they are the second frontier and most often the last stop for CDMs regarding share. What the CDM does is that is downloads your file list, (1) looks at the amount of broadcast bytes and compares with the file list’s shared byte. If they differ too much, you’ll (most likely) be kicked. (2) The CDM will also go through the share, and look at file names and hashes. If one of the files is the same file as a known fake or illegal (as in not allowed in that particular hub) file, you’ll (most likely) be kicked. (3) Also, besides checking for file name and hash, most hubs enforce a “maximum file size” rule, and the CDM will look for that, too.

The last part is verifying normal files, which to my knowledge, very few CDMs actually do. This means that the CDM will download the file list, and then attempt to download a random file. If the CDM can download the file without trouble, no action is taken. However, if there’s an constant error, like TTH inconsistency (wrong leaves) or ‘no slots available’ etc, the CDM will conclude that the user is faking somehow. This is non-trivial for the CDM because; it requires more logic on behalf of the CDM to download a ‘random’ file and then delete it when the download is complete. To successfully pass such a CDM of that skill, the client need to successfully create a correct leaf-database for each of the shared files, which is non-trivial.

The third part a CDM will look at is the tag. This usually contain (1) client and version, (2) slots, and (3) amount of hubs. Most CDMs use a white-list and the CDM will look at (1) as a means of seeing if that’s an allowed client and version. Sometimes, users are kicked by CDMs because they use a brand new version of the client (has happened to me several times). The CDM will also look at (2) as a means of figuring out how many slots are acceptable in the hub. The CDM may also run a search, and check the ‘search window’ and see how many slots appear there. (The CDM can search, see that there’s plenty of slots available, and try and download a file, but being unable to because of a ‘no slots available’. The CDM can then conclude that the client in question has locked its slots.) And lastly, (3), is used to enforce a “maximum hubs” rule. This rule concern most often the amount of ‘normal’ hubs you’re in, and not where you’re registered and/or operator. And of course slot ratio is enforced; the amount of slots you have to have open per amount of hubs you’re in.

NMDC: The run down

I’ve been advocating ADC because of NMDC’s shortcomings. However, I haven’t gone through how NMDC is built up. I thought I’d do that today.

Neo-Modus Direct Connect (NMDC) was created in November 1999 by Jonathan Hess, while still being in high school (am). Hess never released information about the protocol, or the source code for his client or hub, which forced others to reverse-engineer each of those things, thus an official specification has never arisen.

NMDC is a text-based protocol, meaning that you don’t need commands to go through an algorithm for you to understand the command. This means also that commands have certain delimiters which indicate when a command end, a new start etc. $ (dollar sign) is used to denote that there’s a new command. | (pipe) is used to denote the end of a command. As you might imagine, if you intend for either character to be included in the actual message, you will encounter trouble since they’re… well, delimiters. ADC does this nicely by using escapes if you intend to use specific characters. The workaround for DC++ was to replace those characters with their HTML equivalent. Also, ‘ ‘ (space) is a delimiter for parsing a command. If you intent to include a space in eg a search, the space is converted to a ? (question mark).

While NMDC has no actual specification, it has been agreed upon (by developers) that commands are to be sent case-sensitive. That is, Hello and hellO for the command name isn’t the same. Though there’s no official naming convention, most commands follow CamelCase. Also, there’s no minimum or maximum limit for the length of a command. As I said, there’s no official spec, so if certain hubs / clients object what you’re sending, they aren’t wrong or right. (And you aren’t either.)

Commands are constructed in two parts; It’s name and the actual information. The name is always first and prepended with a $ (of course, since they’re used to denote a new command).

No command in NMDC is extensible without changing the code for the recipient, which is annoying to say the least. This is also why $Supports and the hi-jacking of the description exist.

In NMDC, there are two major parties that interact. Hubs and clients. Each client connect to a hub. The client then proceed, through the hub, to talk and interact with the other clients connected to that same hub. All communication, except strict client to client transfers, are viewable for the hub administrator(s). There’s no native support for security, although extensions (have a look at Valknut) have been made to include eg SSL, though no wide-spread usage exist. Communication in a hub is fully broadcast by and routed through the hub, making the hub a central point in communication between clients; Take down the hub and the clients can’t do anything. (Except download from each other if there’s already a session started.) Clients are identified by their nick (yes, not good).

Each hub is totally separated from another hub; a user in HubA cannot be identified accross the entire network of hubs. Nick is dubious since anyone can change a nick. This means that each can enforce its own rules, and they usually do.

The original NMDC hub didn’t have a configurable port to choose, instead a default existed; port 411. This port is also today considered as the default port. If the port is already in use, the hub would try the next port, 412, and so on. The original NMDC client also didn’t have a configurable port to choose for file transfers; default was port 412. If that one is in use, port 413 is used, and so on.

In the NMDC protocol, it is always the “other party” that speaks first. That is, if you’re a client and connect to a hub, you connect to the socket and the hub says “oh, you want to come in? give me info”. (This is different in ADC where the connecting party speak first.) Connecting to a hub and connecting to another user (to download), there are “mandatory” commands; This mean that unless you provide some of these commands, the hub/client will ignore/kick/disconnect you.

In the early days, when third party clients started arising (such as DC++), Hess changed some of the hub/client response to include a specific lock/key variant to keep out other clients. (Other than his own, that is.) This was in the end also broken, and we still have it today.

File transfers in NMDC are somewhat weird. When e.g. you say you want to download, you say this to the other party. Unfortunately, there can occur a race condition where the other party also want to download from you. In the event of this, the clients picks a random number and tell each other which is the highest. The one with the highest is allowed to download. In DC++, this randomness should of course be somewhat fair. Though, there’s been modifications of DC++ where there were very little “randomness”.

There are three types of users in a hub; a normal user, a registered user and an operator. An operator is usually someone in power, in one way or another. Though, most people might have you believe a person with a key is exclusively an operator; this is wrong. A person with a key is just a registered user who acquired a key for some reason. The registered users are just that; their nickname are registered in the hub and they must supply a password to get in. Normal users are those who didn’t need to input a password and aren’t an operator (they have no power essentially).

NMDC allow for active and passive users. There’s no proxy implemented anywhere in the protocol (or any known hub or client). This mean that a passive user cannot connect to another passive user. So, if you’re passive, you can only connect to those who are active, whereas if you’re active, you can connect to everyone.

There’s essentially three important things in NMDC that users use; chat, transfers and searching. If you didn’t know it before, the most “difficult” thing in NMDC is the actual chat. I’ll get to that in a bit… The underlaying protocol for transfers is TCP; you may have noticed that you need to configure a TCP port to enable transfers. The underlaying protocol for search is UDP; you may have noticed that you need to configure an UDP port to enable searching. A crash course in TCP and UDP; TCP is used for transfers because it will error-check packets (data) when transferring. That is, you truely want everything and don’t want to miss anything. UDP on the other side is used for search because there’s no error-checking for packets. This mean that you’ll get the packets (those you do recieve) faster, and it doesn’t really matter if one or two gets lost. In layman’s terms; think about if you were to “transfer” ten eggs between you and a friend, standing 10 m apart. If you use TCP, you’d take it very carefully and plan the project. It will go ‘slow’, but you’d get what you intended to send. If you use UDP, you’d throw the eggs and don’t care if all ten egg made there safely, as long as “a few of them did”.

So. Chat. Horrible. Horrible, I say. When entering a hub, you and the other clients use a particular encoding for your chat. This encoding is (usually) what your computer is set to. Meaning, if you have (only) installed a Swedish Windows XP, you won’t be able to see e.g. Chinese characters if someone else in the hub has a Chinese Windows XP. This all mean that when you send text, your client think “oh, I hope other people can read this…” Usually, clients, if they don’t know what the character is, will replace the character with an underscore or a question mark or something completely nonsensical. Encodings are basically a way for clients to display text. If you and your friends clients don’t use the same encoding, they won’t be able to display ‘proper’ texts. (Think of this as if you went to a country where you didn’t speak the language. Other people wouldn’t be able to understand you, and you wouldn’t understand them. But if you both speak the same language, say English, you all will understand each other – this is what ADC does.)

Hubs in NMDC can use an identifier; dchub://. This is basically what “http://&#8221; is. So, hub addresses can be written in the form of dchub://example.com:411 where 411 is the port. Though, like I said, 411 is already default so you don’t need to specify it if the hub is on that port.

Wow. This became a rather long post. I had intended it to make it into the Wikipedia page, but as soon as I had started, I wasn’t too sure about what to keep and what to discard. Hopefully someone reading this will get enough inspiration.

Web service necromancy

This is a follow-up to my earlier post. I had to discontinue the dcpp.net domain on Feb 21st, due to the aforementioned DDoS. The hosting was provided by a dear friend, and I retained all the files and database backups. We maintain some web presence, as I’ve reinstated http://dcplusplus.souceforge.net as the project homepage, and have restored the static content on it (and even added some). I’ve also moved the blog to the hosting service offered by WordPress. The dcpp.net domain isn’t gone, but it may be a while before the forums, bug/feature tracker, and wiki make a return. Until then, spread the word about our new location.

Share and Enjoy!

GargoyleMT
Man of many hats

Resurrecting DSC

We’ve all seen the “<operator> operator kicked user because: reason” in NMDC hubs. One can of course also trigger it without being operator but that isn’t what today’s post is about.

The above text is purely that; Text. It isn’t a command in NMDC to tell the hub to get rid of someone. Instead, there is $Kick. DC++ just send $Kick user|and the kick message above… As the wiki page says, the command is sent to the hub and the hub in turn removes the user.

All this is rather basic and easy to understand. Don’t like a user in a protocol? Invent a command to get rid of them.

As things progressed over the years, ADC came and the command DSC was invented. DSC does the same thing as $Kick, besides giving some more info in one command, and that is notifying the hub that it should remove a user. And oh, DSC is also used to redirect people.

Meanwhile, an extension to ADC, User command was thought out. The extension is basically what we see today; The hub send some hub-specific commands that (basically) end up as a right-click command. (You of course have to have “Accept custom user commands from hub” enabled.)

At this point, you may wonder what DSC and user commands have to do with each other, and begin to remiss that 0.698 feature a change; that DSC has been removed.

Yes, the changelog for DC++ is not lying. DSC has been removed, though the ADC draft hasn’t been updated (yet). ‘Why?’ you might wonder. Well, it has to do with an assumption (though it will in most cases be true, it won’t always). In essence, ADC hubs and clients are assumed to support user commands, where the actual “kick/redirect feature” will exist.

This has several befits; Hubs are in control over who have the kick/redirect feature (built in otherwise in DC++ if you’re an operator [NMDC at least]), hubs are also in control over where the user can see the kick/redirect feature, clients doesn’t need to specifically add the kick/redirect feature (less code, less bugs, less clutter) and more importantly; Client and hub developers need to pay less attention to writing a basic hub. The lack of DSC ‘significantly’ decreases code (mostly for hub developers, but still).

However, this of course have several downsides; Hub developers have to make an agreement on how the kick/redirect text is supposed to look like (otherwise, users would probably be confused if they see 15 different ways of kicking in 15 hubs) and it increases the level of difficulty if one want a basic hub with also kicking/redirecting capabilities (of course, one can just do the “+kick” approach). (Don’t be afraid, though, since we can always resurrect DSC.)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Debug NMDC!

In a previous post, I talked about how you enable ADC debug messages (and allows them to basically flood you in a busy hub). Since there aren’t many ADC hubs around and you might have found a problem in NMDC hubs, that option won’t do you any good. Unfortunately, there is no “easy” option to do in NMDC hubs, like the ADC option.

However, there is something you can do. First, compile DC++ in Debug mode. Then open up NmdcHub.cpp and locate the function “NmdcHub::onLine(const string& aLine)”. In it, place the line “dcdebug(“Debug line: “, aLine.c_str());” in the beginning. This way, you will display every line that is sent to you, in raw NMDC. And of course when you then run DC++ in debug mode, look in Visual Studio for the output.

The beauty of this is that you aren’t confined in NmdcHub.cpp or AdcHub.cpp. You can use this anywhere you want in the DC++ source. (Look for the #define to see how it’s implemented.) Just replace “aLine” with the appropriate line you want to see debugged.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

The parts of a hub list

In DC today, a hub list is a vital part of the infrastructure. We all experienced it very well when Hublist.org (one of DC++’s default hub lists) went down.

A hub list consist of primarily two things. The actual hub list, a file, and a hub list bot (or crawler).

The former is what you insert in DC++, that DC++ download and display to you. The latter is something you as a user (of a client) never see. You might see the latter if you are running a hub.

The file is the actual hub list. It contain all the information you need; Hub addresses, hub names, requirements , rating, etc. The file can be distributed in two ways. (1) The file is sent clearly and DC++ (and other clients) start to parse it immediately. (2) The file is sent compressed. This technique is also used for the file lists. This means that the file being sent is smaller than the actual hub list. You have probably experienced it in other places, eg .zip or .rar. Having the hub list be compressed mean that whoever distributes the file and you will have to download/upload less information. If you see .bz2, the hub list is compressed. When DC++ see this, it decompresses the file and continues parsing the hub list, as in (1).

The actual hub list is of either DcLst style or XML. The former has been deemed deprecated by DC++ ( as of 0.696.). The file extensions for the two styles are .config respectively .xml. The difference between the two is very simple: You cannot change the formatting of a DcLst. It means that if someone want to add a field, eg rating, it is pointless since the clients can’t parse them. The XML allow hub list administrators to add and remove fields at their will, and client implementors can choose to display them or not. According to the DC++ wiki, the only available information in a DcLst is name, address, description and user count. If you look at a hub list supplied in DC++, you will probably notice more fields.

The second major thing a hub list need is a bot. This is essentially a normal client (like DC++) that is specialized in hub lists. The client, or bot, will connect to a hub, say “hello, I am a hub list bot. I just want to get some hub information. Bye.” The bot then proceed to process the information and (presumably) generate an appropriate hub list file. The file is then distributed by whatever means.

As a user of a hub list, you will most certainly never see this. However, most hub owners might. Most hub softwares out there today allow a hub owner to register with the hub list distributor and say “Hello. Can you please add my hub to your list”, and the crawling begin. (One might argue that registration is the third vital part of hub lists.)

Design a site like this with WordPress.com
Get started