The evolution of Direct Connect

Direct Connect is quite old. The community have lasted for little over six years, and I doubt DC will die in the next few years. During this time, we have seen various things pop up. Like hashing of files, segmented downloads and ADC. And that’s just to name a few things.

As DC will continue to age, I think we need to start to think about the next evolutionary step. We’ve gone past going from identifying files based on file name and size to using a hash based on file content. Then it was going from single source downloads to multiple sources. Then it was going from the NMDC protocol to the ADC protocol. And so on and so on. What we need now, is the new implementation or idea on how to improve Direct Connect. By saying this, I mean not that creating the next step is or need to be easy, but to force people to try to think about the future.

During the past years, I think there have been a growing dependency on Jacek (on the client and ADC side) and PPK, Yoshi and Nev on the hub side, to create the “next thing”. Other people have of course contributed to the development, but the people I mention are those who have the largest market share and thus the best ability to change things. I believe this should stop. We should stop depending on these people, and try to enforce standards in a different way; By forcing people (those above, as well) to use a particular feature or scheme simply because it’d be too difficult to resist.

Mind you, this post is only intended as a preface for a series of posts, so if you want to comment, comment in the post that is related.

Are there other things we can do to improve Direct Connect?

Ps. No, this isn’t an April fools joke.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

ADC as an open messaging protocol

Szabolcs Molnar’s recent post advocated hub-side command filtering, specifically on BMSG PMs. I believe this, and obvious generalisations, to be mistakes. They sap the openness of a protocol capable of routing arbitrary messages between users, subject to bandwidth limits on involved hubs and clients.

NMDC, as that post observes, implicitly provides such limitations. One cannot send a broadcast message but it will be interpreted as intended as a mainchat message for display by the receiving parties. Similarly, private messages under NMDC as widely interpreted exclusively contain user-visible messages, leaving those who attempted protocol innovation to either seek quirks of buggy parsing or overloading such messages as $SR to achieve their ends. These limitations don’t ultimately help those using a protocol, instead pushing it towards a choice of ugly kludges or stagnation.

ADC, among other goals, includes the means to obviate the need for those workarounds and instead to directly implement unanticipated protocol features. To shut this down, as the previous blog post suggests, would merely invite the same harmful cycles seen in NMDC. Instead, an ADC hub should function essentially to authenticate identity, ensuring registered users are who they claim and that messages sent between users contain a correct source CID.

That stated, the motivation behind desiring hub-side filtering of BMSG PMs is real, and a rejection of centralised limitations on them should include a response to that impetus. Rather than specifically targeting BMSG PMs, a both freer and more robust system allocates a certain portion of a hub’s bandwidth each user can consume and under conditions of stress prevents or prioritizes as low traffic beyond that allowed.

Clients, meanwhile, can simply ignore BMSG PMs if they so desire; someone in control of a hub who desires equivalent functionality can use DMSG PMs instead. This allows her to retain a more general bandwidth allocation regime whilst simultaneously allowing free use of the ADC protocol with the ability for individual clients to choose to ignore BMSG PMs. Such a system, of course, represents a compromise in itself (why should a hub have to lie about BMSGs being DMSGs just so those who control it can get their mass messages displayed?), but unlike the alternatives doesn’t collapse with smallest gaming.

Legitimate uses of hub-side filtering do exist, primarily where the those administering a hub have unique knowledge of a pattern of abuse undetectable via static structural analysis. For example, URL spam tends to be both more dynamic and harder to detect a priori than the BMSG PMs, and therefore more worthwhile of hub filtering. The general principle involved I’d identify is that when a DC client can do something autonomously with negligible loss of functionality over what an ADC hub could do, the hub should refrain from performing that functionality.

Summary : don’t stunt ADC by reducing it to NMDC’s capabilities when alternatives exist.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Thoughts about an ADC-hubsoftware

As you might already know, on clientside, ADC does work more or less. You can chat, you can download, you can search, you can browse other user’s file lists. Sure it has bugs, but it works.

But what about ADC-hubs? There are several, but none of them are mature yet, it lacks services which nmdc hubowners already got used to. And there are differences which the developers need to think about. Well, I collected some of them. Not to be wise or whatever, just because I think someone shall start it :)

Ok.. Well, let’s see:

  1. Hubsoftware must ensure that the CID matches the PID and not allow users entering the hub if they couldn’t provide a valid CID for their PID
  2. Hubsoftware must not store, broadcast or make available anyone’s PID to someone else including hubowners and scripts too. This would weaken the security of the system. People should not use or install hubsoftwares which does this to protect their operators and users.
  3. It’s a good option to register users using their CIDs, but the hub should note the users that their registration will lost if they modify or lost their PID/CID. Moreover, it’s a good idea to store the last nick for every registration to disallow other users to connect and talk in the name of someone else while that other user is offline. This protects the users’ reputation.
  4. Filtering commands is the hub’s job, not the client’s. So hubs must ensure that regular users are not allowed to send mass messages to other users for example by adding a PM flag to their BMSG.

Sure there is a lot more, but I think it’s enough for now. Feel free to comment.

Bug reporting

As we don’t have a Bugzilla at our disposal, the only means of reporting bugs in DC++ is by sending an e-mail or contacting someone at DCDev Public.

Well, now, you can comment here in the blog concerning bugs you encounter.

Continue reading here.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

UPnP file required for compiling

At one point in time, DC++ added support for UPnP, which enabled DC++ an automagic configuration for active mode.

For accomplish this, a specific file in the source code was required. This file is called ‘natupnp.h’. I’m sure people have noticed it; the compile information mention this file and if you’ve attempted to compile DC++ without the file, the compiler would complain.

(Visual Studio 2005 have this file built in, but if you used Visual Studio 2003, you need this file in your includes.)

The compile information note that you can get the file in three ways; (1) To download the .NET SDK, (2) get it through our Bugzilla install (which is down, yes) and (3) by contacting someone of us and then we’d give it to you.

Unfortunately, this presents us with a problem. We might actually be in a pickle if we provide it for you directly because we don’t have permission by Microsoft to distribute it.

So to save us all some time and trouble, download the SDK provided by Microsoft and you’re all set.

(I don’t know if this file, or another one, is required for UPnP compilation with other compilers, other than Microsoft’s.)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Look around for more space to escape

If people look at the previous post on escapes, they probably notice that we’re essentially “increasing” what is being sent in ADC to represent certain things. And of course, the NMDC advocates say that this is one of the reasons why ADC is worse and that NMDC is better.

Well, they’re correct in one part. We are increasing the bandwidth when we want to use certain characters. We will not get away from this. This is a fact, and I can’t dispute it.

I checked previous post for spaces. There were 329 of them. This mean that in NMDC, it would have been 329 characters to represent the spaces, and 658 characters to represent them in ADC. However, we know that while messages are a contributor to overall bandwidth, but there are far worse culprits.

On a different note, what would happen if we used a different character to represent the visual character of a space? Well, then we’d “get rid of” the escape sequence for a space. Eg, what if the client would replace, before sending the message, every normal space with a non-breaking space. The character would look like a normal space, and depending on how clients and hubs interpret characters, it wouldn’t be treated as a delimter.

I tried this with DC++ and ADCH++; It worked like a charm. Visually, the character appear as a normal space, and DC++ doesn’t attempt to encode it nor does ADCH++ choke. (The “annoying” thing is that if you enable AdcDebug, the output might throw you off.)

Also, there’s probably other characters that are visually a space, but are not actually one. If developers are afraid to use much bandwidth, you know what to do. (Yes, I can agree that this is an weird solution, but I believe it to be the only one. We should NOT introduce other things in ADC that would change command design.)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

The characters are escaping!

One of the things I’ve noticed in the main chat of the public DC development hub is that people seem to have a trouble understanding the escape sequences and command ending character in ADC.

Certain characters in ADC are used for command termination and command delimitation. This mean that these characters need to be escaped in commands where these characters might come up in the message one side is trying to relay. Normal messaging is the most obvious that is affected by these escapes.

The following characters are, in effect, those kind of characters;
A space (” “) – In ADC; “\s”
A new line – In ADC; “\n”
A backslash (“\”) – In ADC; “\\”

The character that has been chosen for escaping characters is the backslash character, as one might see above. Any other type of escape sequence is restricted.

The character for ending a command is ‘\n’. And the answer is ‘no’ to your first question; “But that’s the same as for a plain new line?!”.

The new line representation is not one character, but two. The backslash and a ‘n’. The termination character is the character we see as a new line; ‘\n’. This character can be written differently in different languages. The ADC draft say ‘0x0a’. (Try it in your own language. It’s the same as if you wrote ‘\n’.

When you’re writing your client, hub, bot or whatnot, your language probably have a restriction where you need to escape backslashes in your code. This mean that when coding, you’re supposed to look for the string “\\s” to search for the visual character of a space. And with “\\”, it will become “\\\\”. (As the first backslash esacpe the second, and the third escape the forth.)

So, if you’re ever going to learn anything from this post, learn this; “\n” is a sequence of two characters that represent a new line in a visual manner in ADC while ‘\n’ is one character that represent a new line in a command termination manner.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Command and bandwidth estimations in NMDC

As more and more hubs try to reach higher and higher user counts, it is required for hub developers to know how much bandwidth usage they need to be able to cope with, to create a good and substainable hub.

Daniel Muller (also known as Verliba), the author of Verlihub, postulated the following formula for estimating how much bandwidth the hub need to have, given the amount of users one would desire.
It is; (Number of users/20)^2 kbit/s. So, if your desire is to have 1000 users, you will need around 2.5 Mbit/s in upload bandwidth.

Unfortunately, this formula is only restricted to NMDC, as no one has, to my knowledge, profiled the performance of an ADC hub on a long term basis.

Another estimation that is very interesting are statistics that Fredrik Stenberg (you may know him as fusbar) compiled. It was an observation over the amount of times the different commands in the protocol were sent. The stats are based on a NMDC hub that had varying users, ranging from 800 to 2200, over a six day period.

Command Count
Main chat 34,870
$To (private chat) 13,668
$Search (active) 1,326,908
$Search (passive) 48,650
$SR (search result) 567,589
$MyINFO (information) 95,105
$NickList (information) 93,429
$ConnectToMe (download, active) 6,013,652
$RevConnectToMe (download, passive) 173,232

The only major difference one would see with this list and a ADC hub is that we wouldn’t have “$NickList”.

The conclusions that can be drawn from above table is that searching and downloading are the major players in hub and network performance.

(Sorry for the lack of alignment in the table.)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Listing nicks in ADC and NMDC

When a user login to a hub, the user is notified who’s in the hub, who’s registered and who’s operator.

ADC and NMDC do this differently, but both have the same end effect.

ADC’s way is the most clean and simple way. When logging in, one recieve everyone’s INF. One of the beautys with the INF is that it’s a simple addition of a parameter to display a certain user is an operator or a registered user. This is done with the parameters OP respectively RG. This mean that when a client log in, it doesn’t need to recieve redundant information, as these parameters are omitted when the user is neither an operator nor a registered user.

In NMDC, it’s a different matter. When a client enters the hub, it recieves the $MyINFO. In this, certain information is relayed about the user. One of the things that aren’t relayed, is the matter if the user is logged in to the hub and if the user is an operator or not. In addition to the $MyINFO of every user, another command is used to signal that who are online. It is done with $NickList. This command is very simple; it only lists the users who are “online”, by nickname. Also, the operators of the hub are also flagged (in addition to the $NickList command) in the command $OpList. This list is also composed of the nicknames.
(By the way, as I understand it, to recieve the users who are online, one must first also send $GetNickList.)

I haven’t been able to see why $GetNickList and $NickList is required. Does anyone know why? I mean, can’t the client just use the $MyINFOs it gets when connecting? (I can see why $OpList is required, as there’s no other way to say “these people are operators” in NMDC, as far as I can see.)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Search replying in NMDC and ADC

I previously wrote about some search statistics in NMDC and ADC. Today’s post is about the actual search replies. As before, I’ll start with active and then move on to passive searching.

Search replies where the person doing the search is in active mode.
NMDC: $SR mynick motd.txt1000 1/1TTH:MIC2BCFKXGQKPFN5WHEKGK7ANAAYUHURCWVRRRY (x.x.x.x:x)|
This rounds up to 80 characters. This may look like a lot, but don’t be frightened. The is a single character. The “x.x.x.x:x” is the hub address and port (where the port can be omitted if it’s 411). I used single digits in both to come up with 80. The nick of the searcher was in my example 1 character. If we’d been in a non-TTH world, the entire TTH bit would be changed to the name of the hub, but that’s unsual today.

ADC: DRES ZB4H P6L5 SI1000 SL1 FN/motd.txt TRMIC2BCFKXGQKPFN5WHEKGK7ANAAYUHURCWVRRRY TOsometoken\n
This is 84 characters with one character as token. Notice that the token is an exact copy of the token sent in the search. TTHs are required in ADC so one can’t avoid them. Of course, we have additional characters in the INF to denote the IP and port of the active user. But they’re only sent once.

In active searching, NMDC have a couple of factors to count in. In ADC, we only have one factor, which in itself probably is rather limited. In any case, it’s quite even between ADC and NMDC.

Now searching where the searcher is in passive mode…
NMDC: $SR mynick motd.txt1000 1/1TTH:MIC2BCFKXGQKPFN5WHEKGK7ANAAYUHURCWVRRRY (x.x.x.x:x)thenickofwhosentthesearch|
This rounds up to 82 characters. Same rules as above. The nick of who sent the search was in my example one character, too.

ADC: DRES ZB4H P6L5 SI1000 SL1 FN/motd.txt TRMIC2BCFKXGQKPFN5WHEKGK7ANAAYUHURCWVRRRY TOsometoken\n
Yes, this is the exact same string. One can send the search reply in other ways, too, but this is what a current DC++ version will send. (Or at least what I could gather from Wireshark.)

This show that there’s even more factors in passive NMDC searching. The token factor in ADC remain (obviously) the same.

In NMDC, the best case scenario is when users have very short nicks and the hub IP is “short” and the port is 411. In ADC, the best case scenario is when the token, the searcher used, is short.

In conslusion, NMDC and ADC are quite close when it comes to search replies. However, as ADC have fewer variables, it has best scaling possibility.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Design a site like this with WordPress.com
Get started