How to generate CID taken?

Or rather, how not to generate CID taken?

I guess most of you already know that on ADC the clients are identified by their CID. Which means that two clients with the same CID cannot enter to the hub at the same time.

And a lot of hubowners, website maintainers, etc provide clients with some basic settings for download to ease the start for the users. Which means that you start a clean new client, set up some common things you’d like to (like a default nick, coloring, timestamp, add some hubs as a favorite etc.), then rar/zip the whole thing and put it on your website.

What happens if you do that? Since the clients your users are going to download includes the PID setting too, all users will have the same CID, so they can’t enter to the same hub while the other users are in, they are going to get “CID taken”; and I’m not sure that an inexperienced user can figure out what should he change in the settings to get rid of that.

So I ask anyone who distributes clients with some settings preset in them, edit DCPlusPlus.xml before publishing and remove the <CID>…</CID> from the xml. Or rather, distribute the clients without the xml file.

A footnote: you may notice that the settings xml contains a CID tag instead of a PID tag. That’s historical and haven’t changed yet, but that’s the PID. So please don’t distribute it otherwise everyone’s CID’s going to be taken.

Comments and suggestions for ADC

If you have a comment or suggestion for ADC, please visit the ADC draft commenting page. As it mention, don’t forget to visit the reported information that’s going to change.

Of course, you can always visit the development hub, located at adc://dcdev.no-ip.org:16591, and flesh out your ideas.

TLS disabled, failed to generate certificate

“TLS disabled, failed to generate certificate:” followed by “Failed to load certificate file” is probably something you’ve seen. Most people, almost frantically, think that DC++ is somehow broken. Well, it’s not.

The short answer to all this is; Ignore the message. If you don’t know what it means, just ignore it.

The medium length answer to all this is; Ignore the message. The message concern TLS, which is encryption on DC. I’ve in the past written about it (though, I’m not sure how outdated that post is). The TLS will only work on ADC hubs, which there are few of. And you’re most likely not on one of them. (Unfortunately, I might add.) So, you aren’t likely to need this anyway. Just ignore it.

The long answer to all this is; Ignore the message, unless you know what you’re doing. The first message come when DC++ tries to see if you have any certificates file entered in the settings for DC++. If you don’t have entered anything in the “Security Certificates” window, in the box “Private key file” or the “Own certificate file” box. (Or if they aren’t in their default DC++ folder, which they’re likely not when you’ve just installed DC++.) The second message come when you haven’t entered anything in the “Own certificate box”, or when there occur an error when using what’s inputted. (There’s a couple of things that could go wrong – I’m not confident to talk about any of them.) (The different messages are there because they mean different things.)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Edit: I feel that I need to edit this post. The message have no effect on your ability to download or search. Consult the FAQ for such inqueries.

File information in searches

A frequently requested feature in DC++ is the ability to search for files, and also specify (and get back) certain information about the actual file. (Normally called “metadata”.)

This information is eg the ID3 tag in MP3 files or the author of a Microsoft Word document. That is, information about a file, that isn’t really the content of the file.

The request is quite fine and valid, but it’s going to be quite difficult to implement in a good way.

In NMDC, you can’t send “ohh, by the way, the ID3 tag is this and that” when you’re about to send a search reply. If you’d wanted to do that, you’d have to write a totally new command, since no other client would probably understand your normal search reply.

In ADC, you could add a new parameter to the search, but it’s going to be ugly. If you add a new parameter for ID3 tags, you’re opening yourself up to the query “how about authors in Microsoft Word documents, then?” So, you’ll have to add a parameter for that. And for that other file information. Etc. Etc. Basically, ending up in a huge sea of annoying parameters.

Yes, you could add a new command or parameter that will say “send me everything about the file”. That’s going to a huge overhead in traffic. The hub is going to get so annoyed by the vast amount of bandwidth required to broadcast, that they’re just simply going to kick the clients supporting this. (Yeah, yeah. That’s up to the hubs, sure. But let us face it. Most hubs think of user count; This is going to have a severe impact on that.)

Yes, it’s quite a nifty feature. Yes, I’d like for it to happen with DC(++), too. But likely, the implementation is going to be quite ugly.

(Note: I have no idea how other P2P-networks do it, but I’m sure it’s not so damaging there. Why? Because they’re most likely not centralized, as DC is.)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

CTM tokens in ADC (or why the NMDC protocol is terrible, part 2)

ADC’s CTM command contains a token parameter. This solves NMDC’s problem of certain client-client connections being only loosely connected with any hub connection, which in turn solves several secondary issues in NMDC, such as inhibited multiple share support; easy nick-faking in client-client connections; and causing a potentially incorrect nick to be returned by the client receiving the C-C connection if nicks alone don’t unambiguously return. I’ll describe these individually in future blog posts, but this one will just focus on the mechanics of the token and how it authenticates a C-C connection to the hub through which it originated.

The potentially easy case, and the one which in any event the NMDC protocol is competent to handle, even if DC++ isn’t necessarily: one can determine what hub a client-client connections spawned from if a client has received a $ConnectToMe/CTM and thus initiates the C-C connection. In such an event, the bookkeeping is entirely local and therefore not in principle dependent on which protocol one uses – if the client trusts that the hub isn’t spoofing the other user’s messages, then it can transfer that information directly, that local socket N was created in response to a CTM from hub H. Any further communication on socket N can therefore be tied unambiguously to hub H.

The remote end of socket N has a more difficult problem, since it must match just a nick – not a hub nor any other information about a user that NMDC reliably transmit – with a hub. This in even theoretically cannot be done without, at best, heuristics. DC++’s works tolerably, but it’s still guessing. A couple of issues exist – that nicks are public information and therefore trivially spoofable and guessable means that there’s no reliable guarantee the client on the other end is even on of the same hubs one’s own client is, undermining the exclusive nature of hubs; and commonly, that a given nick appears on multiple remote hubs. Whether those nicks represent the same user cannot be unambiguously determined [*], so one cannot just assume that any choice of user associated with that nick is as good as any other. Instead, the remove end, which sent the $ConnectToMe, must guess. This historically seems to have stemmed from the original Neo-Modus client not supporting multiple hubs and therefore the protocol behind it not needing at the time to distinguish these conditions, but it isn’t a reliable way to run a P2P network.

ADC’s token solves this problem, and thus the triggered secondary issues, with minimal computation and bandwidth overhead, by involving only the two clients in question and the hub in transferring thusly private and only per-C-C-connection valid information which all parties involved can use to authenticate the origin of the connection, both the hub and specific user.

In order to do so, though, this token should be generated in a manner such that a third party would find it difficult to forge – otherwise several of the previous flaws would reappear. A good random number generator, though ideally with at least 64 independent output bits and suited for cryptographic applications, should suffice.

If this precondition is established, then only the two clients between which the client-client connection will be created and the hub. This precludes, barring the a hub’s potential untrustworthiness [**], either a rogue user unaffiliated with any hubs spoofing a DC client into interacting further with it or a DC client’s being forced to guess which hub a user is from.

[*] Yes, there are multiple ways of attempting to narrow this down. They’re all ugly and unreliable.

[**] Such hubs create more significant problems. SSL certificates avoid them.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Continue on with your break

A really nifty feature in DC++ is the ability to change the priority of files in the download queue, with the keyboard. It’s quite simple, you just press ‘+’ or ‘-‘ on the numeric keyboard. When you select multiple files, all of them change priority.

At one point, however, selecting multiple files and changing the priority on them, didn’t really change the priority for some files. This appeared (at first) to happen randomly, but after a couple of minutes, the reason became more clear.

When you selected several files, that had different priorities, and pressed ‘+’ or ‘-‘ until one of the files reached “Highest” or “Paused”, the other files completely seized to update. You could try pressing 4, or 400, times more, but nothing would happen.

The buggy code was similar to what I’ve written below.
1. loop until the entire list of files have been looked at
2. if the selected file is in “highest” or “paused” mode, break to 4 and continue on
3. otherwise, change the priority and jump back to 1 and continue on
4. “do some other stuff that isn’t related to this…”

If you notice, the above “code” show exactly what the bug report said. The solution was quite simple, too; “change the 4 to a 1”.

Now you know what you’ve always (secretly) wanted to know about that priority bug.

(The reason “break” and “continue” is used here, is because the actual fix was to literally replace the code “break” with “continue”.)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

TTH isn’t required in automated searches

One of the major benefits in ADC is that each client must hash the user’s files. It’s been pretty well documented so I won’t go any further on that fact.

However, while ADC require TTHs for each and every file, it does not require the client to use it when doing automated searches. There’s nowhere in the protocol to restrict this, and there really shouldn’t be any.

There’s a very good reason to not do automated searches via TTH; the search will only result in exact matches. Yes, getting your exact file is the entire point. But there’s nothing that say that the entire file’s TTH need to be the same. If you read through a previous post about the Tiger algorithm, you will see that blocks have a hash, too.

Eg, if two files consist of three blocks each, and the first two blocks of each file are exactly the same, those two blocks will have the exact same (“part”) hash. As the third block differ, the root hash (the TTH you see) will differ. But the point here is that the client may queue the first file, and if there’s, eg, no slots with that user, the client can download the second file, upto where the block differ. So when you’re about to start to download the first (and desired file), you can simply resume 2/3s in.

With automated TTH searches, you cannot do this, since the two files (that have different names, but only slightly) have different TTHs. If you search after “sensible” information in the file name, you may get more results. With the results you get, you can download the trees for those files, and see that there’s some matches.

(Yes, this approach will use up considerably more bandwidth.)

(You can use this approach in a different way; say, only when the user select “match files’ hashes”.)

(Yes, you could do this approach in NMDC, as well.)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Don’t hijack me

Or, well, rather /me.

In most IRC-clients, you can use a command usually called /me. You use it when you want to say something in third person. Like “/me like apples”, and it will appear “* joe like apples” if your nick is “joe”. (Other variants exist, but this is probably the most common.)

As this is a chat feature, it also caught attention to DC clients and hubs and their developers.

In NMDC, you can’t say “this is in third person” and then “this isn’t in third person”, because everything is in the latter category. Unless you invent a completely new command to do this, of course. Which would be rather awful. However, some clients and hubs offer this functionality, in a different way.
Hubs can allow this by creating a new command. Like “+me like apples”, which the hub would then send out as “* joe like apples” (as the hub can do pretty much anything it want). This is rather simple, but the problem is that not all hubs allow this. (Either it’s because their software don’t let them, or because they dislike the feature, or just plain don’t know they can do it.)
The clients do it in a different way, which is really ugly. When you write “/me like apples” in your client, the client will send it out exactly like that. Like it’s a standard message. However, when the client see “/me ” beginning someone’s message, they display instead “* joe like apples”. Basically, they’re highjacking the /me; The original user may have intended that as part of their actual message, and not for the “/me”-functionality.

In ADC, you don’t need to hijack the message. Instead, you can simply add a parameter to the message that’s being sent out, and the other client need to show it “* joe like apples”. (You could even have different types of “/me”s in ADC, but I think they’re “restricted” at the moment.) (Yes, I’m aware the protocol state that it should be “*joe like apples”; Note the lack of a space… Whatever…) You can even do this in a private chat, which I doubt any NMDC hubs support (that support +me).

So please, if you’re writing an ADC client and you’ve been damaged enough by NMDC; don’t hijack me.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Hub Bandwidth management

Balancing hub bandwidth between users requires triage of messages such that no individual user can monopolize hub bandwidth. Repeatedly I have observed a temptation to implement such an idea by limiting different messages differently. Whilst such an idea fails somewhat for NMDC, ADC is an open-ended protocol which separates message types from message contents, leading to more significant design failure.

This post addresses hubs for which the limiting resource is upload bandwidth. Those hubs limited by CPU or RAM have separate issues and those limited by their own download bandwidth are effectively under DoS attack. Within this constraint, the cost to a hub of a user’s message is the uploaded data triggered for the hub by that message. Therefore, broadcast-type messages should generally dominate hub bandwidth; empirical data bears this out. To a first approximation, then, one can ignore non-broadcast messages, as well as INF, which must be specially handled by a hub regardless.

Both merely large numbers of users and a smaller required quota of actively hostile users can strain a hub’s upload bandwidth. Because the former is subsumed by the latter and a hub should be able to withstand the latter whilst retaining service, the rest of this post will focus on an attack model of actively malicious users only. If a hub can maintain usability for non-malicious users proportional to the the bandwidth available per user given that a certain portion of users do maintain that hostile stance, then the hub will also be able to handle merely large numbers of non-malicious users whilst rationing resources such that each user will have access to a fair amount of bandwidth.

One tactic for selecting messages to forward towards such an end depends on treating different broadcast messages differently. However, any scheme which does this ends up merely requiring an attacker to maximize his damage via usage of multiple messages, preferentially those messages relatively least accounted for. For example, if MSGs is preferred over RES over active SCH over passive SCH, an attacker must merely concentrate his attacks as much through MSG as other constraints allow, then via RES, then, finally through the SCH variants in order. The net result isn’t necessarily less hub bandwidth usage, just bandwidth usage with different content.

Some messages do occur in different temporal distributions and a competent hub bandwidth management system should be able to handle those. Such a case plausibly (I don’t have data on this) occurs with TTH searches versus filename searches, wherein the former might tend to be more uniformly distributed than the latter due to the formers’ occurring through auto-search. In such circumstances, a hub can instead calculate which messages to drop based on a historical moving average bandwidth over time measure.

Only when such a distribution fails to smooth out to less than a hub’s total available upload bandwidth must a hub pull back from merely delaying or queuing some messages, amortizing over an overall low average bandwidth, to outright dropping messages. Importantly, precisely these same considerations and arguments apply to any message, SCH or otherwise, due to the assumption of a hostile user seeking the most efficient exploit mechanism.

SCH might still appear special due to its often automatically triggering RES messages. Rather than specially count RESes, instead one may simply account for them to the user which actually sends them, rather attempting to do so via the user which sent the search to which they’ll often respond. Again, SCH and RES are less unique than they might appear: not only could another such pair of messages appear in a non-DC++ client, but RESes don’t actually have to come in response to any SCH, even given the search token in ADC. Not only cannot a hub keep track of all searches in progress, including some that clients might take a while to respond to and thus be in the somewhat distant past, unless it maintains a greater history than might be desirable, but even were it to attempt to do so, it might miss searches it’s not involved in forwarding from one user to another. In principle, it cannot reliably associate searches with search responses, and therefore should credit search responses to those users sending them. Otherwise, once more assuming a hostile adversary, users could just switch to spamming with RESes.

This system, which has been proposed to be at least three separate times by three separate hub developers, contains conceptual flaws that merely promote its being gamed. Certainly a hub developer or hub owner can respond in an arms-race fashion and adjust the relevant heuristics, but this is a suboptimal, unstable outcome.

Instead, a hub which merely accounts for how much bandwidth any given user’s message, regardless of content but dependent on type (broadcast or non-broadcast, as well, in ADC, as which features it specifies), will consume on broadcast and accounts to that user that amount of bandwidth. Each user then has a specified amount of bandwidth available to him, dependent on the number of users on the hub at that time. Whether or not a message is forwarded, queued, or blocked will then depend purely on non-gameable factors – if the dominant cost is upload bandwidth (see initial assumption), and the hub actually does decide whether to forward a message based on upload bandwidth, the heuristic matches the actual cost so cannot be gamed, regardless of hostility of users.

Therefore, instead of the flawed message-dependent bandwidth shaping, hubs should aim for a message-agnostic bandwidth management system. Note that this allows as well for unknown messages in ADC, for which my previous, linked blog post argues. The result is a more effective, more robust file-sharing system.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

SourgeForge updates SVN address

SourceForge seem to have updated the address for their SVN service; It was before “svn.sourceforge.net”. Now it is “projectname.svn.sourceforge.net”. So you need to update your bookmarks and the SVN program you use to grab the various open source projects for DC.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Design a site like this with WordPress.com
Get started