AdulauWikiDiary: Rating Protocol

Don't forget it's only a hyper draft open to be discussed

The protocol is working in two steps. An initial step to get an uid if the extension has none or if the user want to regenerate a new uid. When requesting the uid (confidentiality should be assured), the server is giving the uid and a common shared secret linked to the uid.

Closed question

Closed question

How do you deal with replay attack of rating

Following the HMAC approach used, we only face the problem of replaying the same rating for a website. The problem is normalized when the scoring is "lifted" per GUID. You can only replay existing rating.

How do you deal with people/bot generating a lot of uid's ?

There are two approaches: a direct one and an indirect one.

The direct approach will add an additional security layer for generating uid's, e.g. by requiring the user to enter a CAPTCHA while requesting a new uid, or by requiring the user to register.

The indirect approach will blacklist those IP addresses, which generate a lot of uid's over time and whose uid's are not used to submit ratings. The indirect approach (fix the symptom but not the problem) has several disadvantages though: we still waste uid's unnecessarily until the fake requester is blacklisted, and we end up in an arms race with the abusers.

A hybrid approach could be an automated challenge/response step (without a need for user intervention), which is quite heavy on the computational side so that it will hinder attacks of UID abusers.

How do you deal with large submissions of ratings from the same uid ?

First, our basic assumption is that human users will report web pages to the central service, and not automated applications. In particular, we don't want automated applications to file user reports.

Thus it's reasonable to assume that a human user cannot submit more than X ratings per Y time period (just consider the time to download a web page, view/identify it and time needed to click the button in order to rate it). We can use this as a starting point to identify rating bots* and to identify patterns for incorrect or irregular user behavior, e.g. identifying mis-clicks such as a user wating to rate a web page as A but clicking the rate-as-B button, and then immediately clicking the rate-as-A button to "correct" the previous error).

*Of course, intelligent rating bots could simply be patient and submit ratings slowly to the service. These subtle bots are a much bigger problem just like small "incorrections" are much harder to identify on Wikipedia than big vandalism acts.

How do you deal with multiple submissions of ratings from the same uid for the same url in a small amount of time ?

See question How do you deal with large submissions of ratings from the same uid? above. In this case, we will assume that a mis-click happened and therefore just count the last rating, dropping all previous ratings of the URL from the same UID in the set of recent rating submissions.

Example: We receive four rating submissions for the same URL x by UID y in one minute. We discard all but the last rating submission for URL x, and drop the previous three ones. All previous ratings already stored in the service database will not be affected (at least at this point).

ID request

Important use SSL/TLS for the ID request

POST or GET are valids.

./uid.pl?action=create

return in text/plain

uid=B0470602-A64B-11DA-8632-93EBF1C0E05A;
key=itMzPcvEJyLk5ZDfA3Ce2Tknsske6z4rsxy1axZmof0=

http://gutenberg.freearchive.org/safer-internet/pb/uid.pl and you can use action=create (GET or POST as you like) to get an uid and a key

add rating

Protocol v1.0

(represented as a GET but POST is used)

Version string: "1.0"

HMAC: the protocol for adding a rating uses an auth parameter with an HMAC value
no encryption of the submitted information (increased privacy); this will be done in the next iteration

./add-rating.pl?uid=B0470602-A64B-11DA-8632-93EBF1C0E05A&url=urlsafeBase64(url)&class=foobar&vote=p
  &auth=HMAC(uid+urlsafeBase64(url)+class+vote)&protocol=1.0&client=firefox-extension-1.0

(the client parameter is recommended but optional)

pre-Rating Storage

We use a basic SQL table to store the "verified" rating. The format of the table is the following (until now, not space efficient) :

CREATE TABLE rating (
    class         TEXT,
    vote          TEXT,
    url           TEXT,
    uid           TEXT,
    ip            TEXT,
    client        TEXT,
    protocol      INTEGER,
    referer       TEXT,
    datesubmit    TIMESTAMP,
    source        TEXT,
    state         INTEGER
);

The pre-Rating storage is using a temporary storage that will be used to feed the "final" storage. A process is querying entries from the temporary storage and change the proc field with a specific value. Another process is reading the table to find the proc field with a specific value and delete the records. Nothing more.

pre-Rating process

Checking the existence of all required parameters, in particular the uid (GUID)
Input validation for each values
Verification of supplied HMAC

→ if all the tests are successfull, we allow the data to be pre-stored.

Dev. and outside the scope of the Rating Protocol

URLS on gutenberg

http://gutenberg.freearchive.org/safer-internet/add-rating.pl (FALSE add-rating only used for debugging the plug-in)
http://gutenberg.freearchive.org/safer-internet/pb/uid.pl (REAL uid creation interface)
http://gutenberg.freearchive.org/safer-internet/pb/add-rating.pl (REAL add-rating interface)

ChangeLog

(12/03/2006-adulau) Update on false add-rating to display the MIME(safe) decoded too
(12/03/2006-adulau) Creation of the real add-rating.pl - input validation checking only

TODO

Build a common Library (before Phase I ?) of recurring subfunction used
General configuration file read by CGIs in order to centralize config - PII
Move DB interface of UID lookup to something cleaner (no direct access to the file) - PII

RatingProtocol

Contents