Rhaptos2.Repo - Documentation

Welcome to the docs for Rhaptos2.repo.

Some documentation is hand-written as below, and some is API further below

Testing

We have a WSGI app to be tested.

I am using 3 approaches, which more or less marry up.

  1. Doctests, and examples.
  2. nosetesting for webtest
  3. config passing for testing...
  • standalone - should run with just python/rhaptos packages
  • with network
  • with selenium

To test locally

  1. We do not have sqllite working [#]_ as a backend so we must have netowrked postgres working,

  2. apart from that run the tests as follows

  3. doctests: each file individually should support testmod doctests can run in their own suite - they should not require netowrk access. This is not always true.

  4. functional testing of wsgi app:

    nosetests –tc-file=../../testing.ini runtests.py

This will use webtest to send requests in-process to the app - no http calls are made. The app however does not know this and proceeds as if runing in a wsgi server.

  1. functional HTTP testing of wsgi app:

    nosetests --tc-file=../../pbrian.ini  --tc=HTTPPROXY:http://localhost:8000/ runtests.py
    python run.py --config=../../pbrian.in
  1. example.txt - a demo / example of how the various bits fir together. it is a doctest suite but needs integrated system

config passing

nose

Also

Use interlude During development, copy the below into a doctest:

>>> import interlude; interlude.interact(locals())

now run the doctest, and it drops you into a shell at that line Just use the shell to develop, check, test etc. Its using locals() in the doctest. Its really helpful.

biblio:

http://ivory.idyll.org/articles/nose-intro.html

footnotes

Config

After the great configuration debates I am reluctant to touch this too much but nosetests has rather forced our hands.

noestests is a useful piece of kit but has one major problem - its really awkward to pass configuration into to test. Doug Hellman has a generally accepted soklution - nose-testconfig.

THis effectively takes the config file given on command line and parses it into a python dict and presents it as a variable named “config”:

from testconfig import config
main_server = config['foo']['bar']

The current Configuration setup uses the convention of moving all keys under the [app] section into the top level of the dict and then everything else lives under “globals”. It returns a Mapping object that acts like a dict. Pretty quickly this is read off into the app.config object which also acts like a dict.

To satisfy this I muck around with nosetest ini file I find this unsatisfaotry but survieable for now - ultimately I think just producing a dict out of a ini file and being done with it will work. God knows how I drove to a different conclusion at any time.

Ultimately its pretty hacky so there may wel lbe a break somewhere down the line.

Future notes

nose-testconfig supports YAML and python files so is not particularly limiting, and we can change this again - I did not particularly want to as it seemed churlish rehashing old stuff but its become a RRPIA.

Session Management

Summary

Session management in the repo has been poor for a long time, and it has made testing of the various functionalities more awkward and requiring more brain use by developer and test suite than needed.

There is a branch (session-cookie-approach) that hopefully fixes this.

We shall provide sensible, secure session management (to decide if a browser has previously authenticated) and linked to that flow-control that makes decisions based on the session management such as presenting a login screen, a registration screen or whatever we need.

What is a session?

A session here is a defined period of time in which a CNX server will accept a random number string as a password-replacement/proxy for each HTTP request received.

Overview

The session shall be activated after a registered user logs in to the repo (via openid) and the session will be assigned a UUID, which shall be stored on the users browser as a cookie, and also cached in a postgres dbase where the users details will be the value to the UUID key.

Every time the user represents their cookie we shall look up the user details in the cache, take any appropriate flow-control action (session timed out ? relogin?) and then store the user details in the request for later use.

Issues such as deleting sessions, retrieving the correct dicts etc are also handled, with some known issues (see below).

Sessions

Please see sessioncache for details.

Authentication flow and sessions.

Sessions are just a convenient way of allowing a user to signon once only. The flow of authentication is a little more convulited

please see rhaptos2.repo.auth.handle_user_authentication()

Difference between session sign on and Single Sign On

The repo manages its own login (openid) and sessions (sessioncache) The user will sign in once and then be given a sessionid. We will lookup the user from the the id as long as the session is valid.

If another service (transformations) receives a request from the user how should we validate the user request - should the session cookie be tested against the repo session cache locally?

What if the repo calls the transformation serivce to act on behalf of the user - what should we send from repo to transformations - the sessionid? Another api token created alongside sseession? Where does the lookup occur.

THere are two main phases

  • validate already set-up sessions and proceed correctly
  • Create and destroy sessions, existing or none (ie login and out)

Known issues

  • I am not handling the situation of user signing in twice.
  • I am not handling registraftion (co-ordinate with michael)
  • I am not setting cookie expires ...
  • I am setting httponly
  • ‘SSL’ * 443

What is wrong with current setup?

  1. NO session cache, which was to be redis but never came in. We are storing the users OpenID identifier. This is a massive security hole.
  2. reliance on Flask security session implementation. THere are a number of reasons to be disatissfied with this, the first is the secret key is a single phrase, in config.
  3. No clear migration away from Flask.
  4. The awful temptation to put more and more stuff in session cookies for “ease” and “scalbility”.

Primarily I am frustrated in testing ACLs, and in creating /resources/ - whoch would be again reliant on a broken session implementation.

What about API Tokens?

Did we not discuss these at the Sprint? Yes. “Single Sign On” is better decribed as “Once Only Sign On, many systems” A session is a once-only sign on for a single (local) system. We shall need to have an alternative API token approach for other systems taht want to use the same sign on as authentication. Examples wanted.

Testing issues

  • Creation of a “fake-login-API”. During testing only (ie a flag set) we can visit a API page, and get a valid session cookie for one of a number of pre-defined users.

    This now exists as /autosession.

Documenting JSON flow of repo API

THe below is the output of restrest.py. It documents HTTP conversations as they occur through the python requestsmodule.

POST /module/

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 826
Content-Type: application/json; charset=utf-8

Body:

{
    "authors": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "body": "<h1>In CONGRESS, July 4, 1776.</h1>\n<p>The unanimous Declaration ...
    "copyrightHolders": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "id_": "cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
    "maintainers": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "title": "Introduction"
}

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 1131
{"body": "<h1>In CONGRESS, July 4, 1776.</h1>\n<p>The u...

PUT /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126

NB - we are using session 000 for the initial PUT

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 384
Content-Type: application/json; charset=utf-8

Body:

{
    "acl": [
        "cnxuser:75e06194-baee-4395-8e1a-566b656f6921"
    ],

        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        This is the useruri for a registered user, (ross)
        who we happen to know has a sessionID of 0001

    "authors": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "body": "<p> Shortened body in test_put_module",
    "copyrightHolders": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "id_": "cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
    "maintainers": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "title": "Introduction"
}

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 676
{"body": "<p> Shortened body in test_put_module", "id_"...

PUT /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126

Here we are using a different user, 001 (ross) that we added as a ACL in previous PUT.

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000001
Host: 127.0.0.1:8000
Content-Length: 382
Content-Type: application/json; charset=utf-8

Body:

{
    "acl": [
        "cnxuser:75e06194-baee-4395-8e1a-566b656f6921"
    ],
    "authors": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "body": "<p> OTHERUSERSESSIONID has set this",
    "copyrightHolders": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "id_": "cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
    "maintainers": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "title": "Introduction"
}

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 674
{"body": "<p> OTHERUSERSESSIONID has set this", "id_": ...

POST /folder/

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 398
Content-Type: application/json; charset=utf-8

Body:

{
    "body": [
        "cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
        "cnxmodule:350f7859-e6e7-11e1-928f-2c768ae4951b",
        "cnxmodule:4ba18842-1bf8-485b-a6c3-f6e15dd762f6",
        "cnxmodule:77a45e48-6e91-4814-9cca-0f28348a4aae",
        "cnxmodule:e0c3cfeb-f2f2-41a0-8c3b-665d79b09389",
        "cnxmodule:c0b149ec-8dd3-4978-9913-ac87c2770de8"
    ],
    "id_": "cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707",
    "title": "Declaration Folder"
}

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 445
{"body": [{"mediaType": "application/vnd.org.cnx.module...

PUT /folder/cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 247
Content-Type: application/json; charset=utf-8

Body:

{
    "acl": [
        "00000000-0000-0000-0000-000000000001"
    ],
    "body": [
        "cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
        "cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41127"
    ],
    "id_": "cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707",
    "title": "Declaration Folder"
}

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 481
{"body": [{"mediaType": "application/vnd.org.cnx.module...

GET /folder/cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 481
Access-Control-Allow-Origin: *
{"body": [{"mediaType": "application/vnd.org.cnx.module...

POST /collection/

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 956
Content-Type: application/json; charset=utf-8

Body:

{
    "authors": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "body": "<ul><li><a href=\"cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126\"...
    "copyrightHolders": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "id_": "cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7",
    "keywords": [
        "Life",
        "Liberty",
        "Happiness"
    ],
    "language": "en",
    "maintainers": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "subType": "Other Report",
    "subjects": [
        "Social Sciences"
    ],
    "summary": "No.",
    "title": "United States Declaration Of Independance"
}

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 1181
{"body": "<ul><li><a href=\"cnxmodule:d3911c28-2a9e-415...

PUT /collection/cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 683
Content-Type: application/json; charset=utf-8

Body:

{
    "acl": [
        "00000000-0000-0000-0000-000000000001"
    ],
    "authors": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "body": "<ul><li><a href=\"cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126\"...
    "copyrightHolders": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "id_": "cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7",
    "keywords": [
        "Life",
        "Liberty",
        "Happiness"
    ],
    "language": "en",
    "maintainers": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "subType": "Other Report",
    "subjects": [
        "Social Sciences"
    ],
    "summary": "No.",
    "title": "United States Declaration Of Independance"
}

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 895
{"body": "<ul><li><a href=\"cnxmodule:d3911c28-2a9e-415...

GET /collection/cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 895
{"body": "<ul><li><a href=\"cnxmodule:d3911c28-2a9e-415...

PUT /collection/cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000
Content-Length: 494
Content-Type: application/json; charset=utf-8

Body:

{
    "authors": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "body": [
        "cnxmodule:SHOULDNEVERHITDB0"
    ],
    "copyrightHolders": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "id_": "cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7",
    "keywords": [
        "Life",
        "Liberty",
        "Happiness"
    ],
    "language": "en",
    "maintainers": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "subType": "Other Report",
    "subjects": [
        "Social Sciences"
    ],
    "summary": "No.",
    "title": "United States Declaration Of Independance"
}

Response:

Content-Type: text/html
Content-Length: 227
null

PUT /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 368
Content-Type: application/json; charset=utf-8

Body:

{
    "acl": [
        "cnxuser:75e06194-baee-4395-8e1a-566b656f6921"
    ],
    "authors": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "body": "Declaration test text",
    "copyrightHolders": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "id_": "cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
    "maintainers": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "title": "Introduction"
}

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 660
{"body": "Declaration test text", "id_": "cnxmodule:d39...

PUT /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000
Content-Length: 359
Content-Type: application/json; charset=utf-8

Body:

{
    "acl": [
        "cnxuser:75e06194-baee-4395-8e1a-566b656f6921"
    ],
    "authors": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "body": "NEVER HIT DB",
    "copyrightHolders": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "id_": "cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
    "maintainers": [
        "cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
    ],
    "title": "Introduction"
}

Response:

Content-Type: text/html
Content-Length: 223
null

PUT /folder/cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000
Content-Length: 163
Content-Type: application/json; charset=utf-8

Body:

{
    "acl": [
        "00000000-0000-0000-0000-000000000001"
    ],
    "body": [
        "THIS IS TEST"
    ],
    "id_": "cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707",
    "title": "Declaration Folder"
}

Response:

Content-Type: text/html
Content-Length: 223
null

GET /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000001
Host: 127.0.0.1:8000

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 660
{"body": "Declaration test text", "id_": "cnxmodule:d39...

GET /folder/cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 481
Access-Control-Allow-Origin: *
{"body": [{"mediaType": "application/vnd.org.cnx.module...

GET /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000

Response:

Content-Type: text/html
Content-Length: 223
null

GET /workspace/

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 433
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
[{"mediaType": "application/vnd.org.cnx.module", "id": ...

DELETE /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000

Response:

Content-Type: text/html
Content-Length: 223
null

DELETE /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 57
cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126 is no mo...

DELETE /collection/cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000

Response:

Content-Type: text/html
Content-Length: 227
null

DELETE /collection/cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 61
cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7 is n...

DELETE /folder/cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000

Response:

Content-Type: text/html
Content-Length: 223
null

DELETE /folder/cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707

Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000

Response:

Content-Type: application/json; charset=utf-8
Content-Length: 57
cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707 is no mo...

Proposals

This is an experimental area of the docs - it contains written use cases, specs and just plain old discussions about the work this branch is trying to perform. It seems useful to have one location to put this stuff. This location may or may not be best.

Add Google Analytics support

The author(s) of modules and collections should be able to add their own tracking codes to a module or collection or both, and we should offer a facility to do this.

storyref:https://trello.com/card/repo-add-api-support-for-new-metadata-fields-4-pts/5181197901c3b1290b001951/86

Spec

We shall provide a single text field in both modules and collections named googleTrackingID and this will allow arbitrary tracking code to be installed.

The backend repo only needs to support accepting a new field from the json doc and handling it correctly. The ATC client will need to do more, see story https://trello.com/card/atc-add-missing-ui-fields-for-metadata-6-pts/5181197901c3b1290b001951/85

Security issue: It may be better to implement this as a google-only field, and capture only a string corresponding to the google tracking code (ie AM-1234ABB) and we fill in the script boiler plate around it. This will prevent arbitrary script being written into the modules. An skype discussion indicated we would sanitise all HTML inputs during the publication process.

Tests

  • Can we inject a arbitrary string?
  • Can we see same string returned with no HTML mangling?

Logging

Good idea. Move to getLogger(__name__)

hooks

I think we really can do with simple hooks process where we run different functions based on point in request process.

Logging

We want to uset the getLogger(_name__) form We want to create a context and put it in g / environ and log with that We want to clear up logging and redirect to syslog We want to direct syslog elsewhere...

API

These documents come directly from source code as API docs (ala epydoc), but are divided up and commented in reasonable order below.

API for Views and models

Summary

If we move to greenlets as well we need to test that ability. As such I have not introduced a pool for psyocpg2 work yet. (see sessioncache)

API

rhaptos2.repo.views

views.py - View code for the repository application.

Structure: We have three main view-areas.

  1. the models (Folder, Collection, Module)
  2. the helper views (workspace)
  3. binary uploads.
  4. openid and persona

Protocols

I try to stick to these

  1. Every action (GET POST PUT DELETE) must have a useruri passed in to authorise
  2. views recevie back either a model.<> object or a json-encodeable version of that

json-encoding

todo: convert to factory based app entirely todo: remove view / as thats now JS todo: remove apply_cors and apply internally. Or just use it? todo: remove crash and burn

rhaptos2.repo.views.accept_resource_upload()[source]

Handler for resource file uploads

rhaptos2.repo.views.apply_cors(resp_as_pytype)[source]

A callable function (not decorator) to take the output of a app_end and convert it to a Flask response with appropriate Json-ified wrappings.

rhaptos2.repo.views.auto_session()[source]

strictly for testing purposes I want to fake three sessions with known ids. Also generate a “real” session with a known user FIXME - there has to be a better way

rhaptos2.repo.views.bootstrap()[source]

At this point there is either a valid session (so redirect to atc) or there is a need to let the visitor choose either to get an anonymous session, or that they are registered, and they should choose to log in again.

There is a logic choice that might improve things - if they have previously visited us, redirect to /login.

rhaptos2.repo.views.content_router(uid)[source]

We now serve everything form api/content

uid = content/1234-1234-12334
^^^ uuid

router logic is subtly different

  1. if we are GET, DELETE, HEAD then no payload and an uid do not collect payload, do collect uid route
  2. POST payload no uid
  3. PUT payload and uid

(Ignore OPTIONS etc)

rhaptos2.repo.views.folder_router(folderuri)[source]
rhaptos2.repo.views.get_resource(hash)[source]

Respond with a the resource file.

rhaptos2.repo.views.index()[source]

Serves up the index.html file. This will be removed.

rhaptos2.repo.views.keywords()[source]

Returns a list of keywords for the authenticated user.

rhaptos2.repo.views.obtain_payload(werkzeug_request_obj)[source]
rhaptos2.repo.views.requestid()[source]

before_request is supplied with this to run before each __call_

rhaptos2.repo.views.simple_xss_validation(html_fragment)[source]
>>> simple_xss_validation("US-12345678-1")
True
>>> simple_xss_validation("<script>Evil</script>")
False

This is very quick and dirty, and we need some consideration over XSS escaping. FIXME

rhaptos2.repo.views.temp_session()[source]

When a user wants to edit anonymously, they need to hit this first. This is to avoid the logic problems in knowing if a user should be redirected if they have one but not two cookies etc.

Here we generate a temperoiary userid (that is not linked to cnx-user) then setup a session based on that userid. All work will be lost at end of session.

rhaptos2.repo.views.validate_googleTrackingID(payload)[source]

Given a (json) formatted payload, return whether the google tracking ID is valid

rhaptos2.repo.views.validate_mediaType(payload)[source]

Given a (json) formatted payload, find out if it is a module. collection, folder and return appropriate mediatype

possible enhancements include using a acceptHeader to determine mediatype returns mediatype - seems odd..

rhaptos2.repo.views.verify_schema(model_dict, mediatype)[source]

Given a json object, verify it matches the claimed mediaType schema

model_dict: dict of the model as out of json - MUST be pure mediaType, not SOFT form mediatype: WHat we think the dict confirms to

FixMe: we do not have versioning of schemas FixMe: we don’t have a jsonschema verifier...

rhaptos2.repo.views.versionGET()[source]
rhaptos2.repo.views.whoamiGET()[source]

returns Either 401 if OpenID not available or JSON document of form

{“openid_url”: “https://www.google.com/accounts/o8/id?id=AItOawlWRa8JTK7NyaAvAC4KrGaZik80gsKfe2U”, # noqa
“email”: “Not Implemented”, “name”: “Not Implemented”}
rhaptos2.repo.views.workspaceGET()[source]

Authentication API

Summary

We make use of session cookies, a local cache and some logic flow to ensure users can work easily with minimal interruptions

rhaptos2.repo.auth

How does Authentication, Authorisation and Sessions work?

We are operating a Single-Sign-On service, using Valruse to do the authentication.

Workflow

A user will hit the repo-home page, and :func:`handle_user_authentication`will be fired, and based on the cookies stored in the user browser we will know one of three things about the user.

  • Never seen before
  • Known user, no in session
  • Known user, in session
  • Edge cases

Usually they will choose to login and will be directed to the login page on cnx-user. Here, the cnx-user will authenticate them in some fashion (OpenID) and the redirect the user browser back to the repo, with a token.

The repo then looks up this token against the cnx-user service. And hey presto, if the token matches, the repo knows they can trust the browser. (assuming SSL all the way)

The redirect hits at the /valid endpoint in the repo, which will check the token against the cnx-user, and then

user_uuid_to_user_details
 given a authenticated user ID (OpenID), look up the user details
 on cnx-user
create_session()

At this point, we now have a user who logged in against cnx-user, has then proven to cnx-repo that they did log in, and now has a session cookie set in their browser that the repo can trust as password-replacement for a set period.

Temporary sessions

Temporary sessions, or anonymous editing is where a user does not login but uses the repo anonymously, perhaps creating test modules.

This is supported by hitting the endpoint /tempsession which will trigger the view temp_session. This in turn calls set_temp_session(). Here we create both a random sessionID (as per usual for sessions) and we also create a random user_id. This user_id is used exactly as if it were a real registered user, but it is never sent back to the cnx-user, instead it solely is used in ACLs on the unpub repo.

This way, at the end of a temp session, the user effectively loses all their edits. This may want to be avoided, and is possible but not yet implemented.

known issues
rhaptos2.repo.auth.create_session(userdata)[source]

A closure function that is stored and called at end of response, allowing us to set a cookie, with correct uuid, before response obj has been created (before request is processed !)

Param:userdata - a userdict format.
Returns:sessionid

cookie settings:

  • cnxsessionid - a fixed key string that is constant
  • expires - we want a cookie that will live even if user

shutsdown browser. However do not live forever ...? * httponly - prevent easy CSRF, however allow AJAX to request browser to send cookie.

rhaptos2.repo.auth.delete_session(sessionid)[source]

request browser temove cookie from client, remove from session-cache dbase.

rhaptos2.repo.auth.handle_user_authentication(flask_request)[source]

Correctly perform all authentication workflows

We have 16 options for different user states such as IsLoggedIn, NotRegistered. The states are listed below.

THis function is where eventually all 16 will be handled. For the moment only a limited number are.

Parameters:flask_request – request object of pococo flavour.
Returns:No return is good because it allows the onward rpocessing of requests.

Otherwise we return a login page.

This gets called on before_request (which is after processing of HTTP headers but before __call__ on wsgi.)

Note

All the functions in sessioncache, and auth, should be called from here (possibly in a chain) and raise errors or other signals to allow this function to take action, not to presume on some action (like a redirect) themselves. (todo-later: such late decisions are well suited for deferred callbacks)

Auth Reg InSession ProfileCookie Next Action / RoleType Handled Here
Y Y Y Y Go Y
Y Y Y N set_profile_cookie Y
Y Y N Y set_session Y
Y Y N N FirstTimeOK  
Y N Y Y ErrorA  
Y N Y N ErrorB  
Y N N Y ErrorC  
Y N N N NeedToRegister  
N N Y Y AnonymousGo  
N N Y N set_profile_cookie  
N N N Y LongTimeNoSee  
N N N N FreshMeat  
N Y Y Y Conflict with anonymous and reg?  
N Y Y N Err-SetProfile-AskForLogin  
N Y N Y NotArrivedYet  
N Y N N CouldBeAnyone  

All the final 4 are problematic because if the user has not authorised how do we know they are registered? Trust the profile cookie?

we examine the request, find session cookie, register any logged in user, or redirect to login pages

rhaptos2.repo.auth.login()[source]

Redirect to cnx-user login.

rhaptos2.repo.auth.logout()[source]

kill the session in cache, remove the cookie from client

rhaptos2.repo.auth.lookup_session(sessid)[source]

As this will be called on every request and is a network lookup we should storngly look into redis-style lcoal disk cacheing performance monitoring of request life cycle?

returns python dict of user_details format.
or None if no session ID in cache or Error if lookup failed for other reason.
rhaptos2.repo.auth.session_to_user(flask_request_cookiedict, flask_request_environ)[source]

Given a request environment and cookie, return the user data.

>>> cookies = {"cnxsessionid": "00000000-0000-0000-0000-000000000000",}
>>> env = {}
>>> userd = session_to_user(cookies, env)
>>> outenv["fullname"]
'pbrian'
Params flask_request_cookiedict:
 the cookiejar sent over as a dict(-like obj).
Params flask_request_environ:
 a dict like object representing WSGI environ
Returns:Err if lookup fails, userdict if not
rhaptos2.repo.auth.set_autosession()[source]

This is a convenience function for development It should fail in production

rhaptos2.repo.auth.set_temp_session()[source]

A temopriary session is not yet fully implemented A temporary session is to allow a unregistered and unauthorised user to vist the site, acquire a temporary userid and a normal session.

Then they will be able to work as normal, the workspace and acls set to the just invented temporary id.

However work saved will be irrecoverable after session expires...

NB - we have “made up” a user_id and uri. It is not registered in cnx-user. This may cause problems with distributed cacheing unless we share session-caches.

rhaptos2.repo.auth.store_userdata_in_request(user_details, sessionid)[source]

given a userdict, keep it in the request cycle for later reference. Best practise here will depend on web framework.

rhaptos2.repo.auth.user_uuid_to_user_details(ai)[source]

Given a user_id from cnx-user create a user_detail dict.

Parameters:ai – authenticated identifier. This used to be openID URL, now we directly get back the common user ID from the user serv.ce

user_details no longer holds any user meta data aparrt from the user UUID.

rhaptos2.repo.auth.user_uuid_to_valid_session(uid)[source]

Given a single UUID set up a session and return a user_details dict

Several different functions need this series of steps so it is encapsulated here.

rhaptos2.repo.auth.valid()[source]

cnx-user /valid view for capturing valid authentication requests.

rhaptos2.repo.auth.whoami()[source]

based on session cookie returns userd dict of user details, equivalent to mediatype from service / session

rhaptos2.repo.sessioncache

sessioncache is a standalone module providing the ability to control persistent-session client cookies and profile-cookies.

sessioncache.py is a “low-level” piece, and is expected to be used in conjunction with lower-level authentication systems such as OpenID and with “higher-level” authorisation systems such as the flow-control in auth.py

persistent-session
This is the period of time during which a web server will accept a id-number presented as part of an HTTP request as a replacement for an actual valid form of authentication. (we remember that someone authenticated a while ago, and assume no-one is able to impersonate them in the intervening time period)
persistent-session cookie
This is a cookie set on a client browser that stores a id number pertaining to a persistant-session. It will last beyond a browser shutdown, and is expected to be sent as a HTTP header as part of each request to the server.

Why? Because I was getting confused with lack of fine control over sessions and because the Flask implementation relied heavily on encryption which seems to be the wrong direction. So we needed a server-side session cookie impl. with fairly fine control.

I intend to replace the existing SqlAlchemy based services with pure psycopg2 implementations, but for now I will be content not adding another feature to SA

Session Cache

The session cache needs to be a fast, distributed lookup system for matching a random ID to a dict of user details.

We shall store the user details in the tabl;e session_cache

Discussion

Caches are hard. They need to be very very fast, and in this case distributable. Distributed caches are very very hard because we need to ensure they are synched.

I feel redis makes an excellent cache choice in many circumstances - it is blazingly fast for key-value lookups, it is simple, it is threadsafe (as in threads in the main app do not maintain any pooling or thread issues other than opening a socket or keeping it open) and it has decent synching options.

However the synching is serious concern, and as such using a centralised, fast, database will allow us to move to production with a secure solution, without the immediate reliance on cache-invalidation strategies.

Overview

We have one single table, session_cache. This stores a json string (as a string, not 9.3 JSON type) as value in a key value pair. The key is a UUID-formatted string, passed in from the application. It is expected we will never see a collission.

We have three commands:

With this we can test the whole lifecyle as below

Example Usage

We firstly pass in a badly formed id.:

>>> sid = "Dr. Evil"
>>> get_session(sid)
Traceback (most recent call last):
...

Rhaptos2Error: Incorrect UUID format for sessionid...

OK, now lets use a properly formatted (but unlikely) UUID

>>> sid = "00000000-0000-0000-0000-000000000001"
>>> set_session(sid, {"name":"Paul"})
True
>>> userd = get_session(sid)
>>> print userd[0]
00000000-0000-0000-0000-000000000001
>>> delete_session(userd[0])
To do
  • greenlets & conn pooling
  • wrap returned recordset in dict.
  • pg’s UUID type?
Standalone usage
minimalconfd = {"app": {'pghost':'127.0.0.1',
                        'pgusername':'repo',
                        'pgpassword':'CHANGEME',
                        'pgdbname':'dbtest'}
               }

import sessioncache
sessioncache.set_config(minimalconfd)
sessioncache.initdb()
sessioncache._fakesessionusers()
sessioncache.get_session("00000000-0000-0000-0000-000000000000")
{u'interests': None, u'user_id': u'cnxuser:75e06194-baee-4395-8e1a-566b656f6920', ...}
>>>
rhaptos2.repo.sessioncache.connection_refresh(conn)[source]

Connections should be pooled and returned here.

rhaptos2.repo.sessioncache.delete_session(sessionid)[source]

Remve from session_cache an existing but no longer wanted session(id) for whatever reason we want to end a session.

Parameters:sessionid – Sessionid from cookie

:returns nothing if success.

rhaptos2.repo.sessioncache.exec_stmt(insql, params)[source]

trivial ability to run a dm query outside SQLAlchemy.

Parameters:
  • insql – A correctly parameterised SQL stmt ready for psycopg driver.
  • params – iterable of parameters to be inserted into insql
Return a dbapi recordset:
 

(list of tuples)

rhaptos2.repo.sessioncache.get_session(sessionid)[source]
Given a sessionid, if it exists, and is “in date” then
return userdict (oppostie of set_session)

Otherwise return None (We do not error out on id not found)

NB this depends heavily on co-ordinating the incoming TZ of the DB and the python app server - I am soley runnig the check on the dbase, which avoids that but does make it less portable.

rhaptos2.repo.sessioncache.getconn()[source]

returns a connection object based on global confd.

This is, at the moment, not a pooled connection getter.

We do not want the ThreadedPool here, as it is designed for “real” threads, and listens to their states, which will be ‘awkward’ in moving to greenlets.

We want a pool that will relinquish control back using gevent calls

https://bitbucket.org/denis/gevent/src/5f6169fc65c9/examples/psycopg2_pool.py http://initd.org/psycopg/docs/pool.html

Return psycopg2 connection objpsycopg2 connection obj:
 conn obj
Return psycopg2.Error:
 or Err
rhaptos2.repo.sessioncache.initdb()[source]

A helper function for creating the session table

rhaptos2.repo.sessioncache.maintenance_batch()[source]

A holdng location for ways to clean up the session cache over time. These will need improvement and testing.

rhaptos2.repo.sessioncache.run_query(insql, params)[source]

trivial ability to run a query outside SQLAlchemy.

Parameters:
  • insql – A correctly parameterised SQL stmt ready for psycopg driver.
  • params – iterable of parameters to be inserted into insql
Return a dbapi recordset:
 

(list of tuples)

run_query(conn, “SELECT * FROM tbl where id = %s;”, (15,))

issues: lots.

  • No fetch_iterator.
  • connection per query(see above)
  • We should at least return a dict per row with fields as keys.
rhaptos2.repo.sessioncache.set_session(sessionid, userd)[source]

Given a sessionid (generated according to cnxsessionid spec elsewhere) and a userdict store in session cache with appropriate timeouts.

Parameters:
  • sessionid – a UUID, that is to be the new sessionid
  • userd – python dict of format cnx-user-dict.
Returns:

True on successful setting.

Can raise Rhaptos2Errors

TIMESTAMPS. We are comparing the time now, with the expirytime of the cookie in the database This reduces the portability.

This beats the previous solution of passing in python formatted UTC and then comparing on database.

FIXME: bring comaprison into python for portability across cache stores.

rhaptos2.repo.sessioncache.validate_uuid_format(uuidstr)[source]

Given a string, try to ensure it is of type UUID.

>>> validate_uuid_format("75e06194-baee-4395-8e1a-566b656f6920")
True
>>> validate_uuid_format("FooBar")
False

rhaptos2.repo.weblogging

author:paul@mikadosoftware.com <Paul Brian>

This is initially a simple URL to be listened to:

/logging

The logging endpoint will take either of the two forms of message below apply it either to the local syslog, which we expect will be configured to centralise over rsyslogd, or it will take a triple, of the below form and convert it into a stasd call to be stored on the graphite database.

logging

This endpoint will capture a JSON encoded POST sent to /logging and will process one of three message types.

Firstly, just a block of text expected to be a traceback or other log-ready message. We would assume the client would not insert user data. There is no expectation of capturing session details here. Why would we want the log to be SSL protected? Might be an idea?

{‘message-type’:’log’,

‘log-message’:’Traceback ...’,

‘metric-label’: null, ‘metric-value’: null, ‘metric-type’: null;, }

The common metric of simply adding one to a global counter is shown here. We are capturing the number of times anyone types in the word penguin.

{‘message-type’:’metric’,

‘log-message’:null,

‘metric-label’: ‘org.cnx.writes.penguin’, ‘metric-value’: null, ‘metric-type’: ‘incr’, }

Here, a third type of message. We can capture a metric that is a specific value, this would be useful in aggregate reporting. It might be amount of time to perform an action, here its wpm.:

{'message-type':'metric',
 'log-message':null,

 'metric-label': 'org.cnx.wordsperminute',
 'metric-value': 48,
 'metric-type': 'timing';,
 }

NB The above message is not yet supported.

Improvements

Run this as a WSGI middleware, so it is simple to import into the chain.

Security considerations

Fundamentally no different from any web service. I expect we shall need to use some form of long running token and keep the conversastions in SSL to prevent simplistic DDOS attacks.

Simple testing

>>> from weblogging import *
>>> confd = {'globals':
... {'syslogaddress':"/dev/log",
... 'statsd_host':'log.frozone.mikadosoftware.com',
... 'statsd_port':8125,
... }}
>>> testmsg = '''{"message-type":"log",
...            "log-message":"This is log msg",
...            "metric-label": null,
...            "metric-value": null,
...            "metric-type": null
...           }'''
>>> configure_weblogging(confd)
>>> logging_router(testmsg)

### FIXME - there is no really decent way to snaffle syslogs in a unit test...

rhaptos2.repo.weblogging.log_endpoint(payload)[source]

given a dict, log it to syslog

rhaptos2.repo.weblogging.logging_router(json_formatted_payload)[source]

pass in a json message, this will check it, then action the message.

We have several types of incoming message, corresponding to an atc log message, an atc metric message (ie graphite). We want to correctly handle each so this acts as a router/dispatcher

rhaptos2.repo.weblogging.metric_endpoint(payload)[source]

given a dict, fire off to statsd

rhaptos2.repo.weblogging.validate_msg_return_dict(json_formatted_payload)[source]
>>>
>>> payload_good = '''{"message-type":"log",
...            "log-message":"This is log msg",
...            "metric-label": null,
...            "metric-value": null,
...            "metric-type": null
...           }'''
>>> x = validate_msg_return_dict(payload_good)
>>> x
({u'metric-type': None, u'metric-value': None, u'metric-label': None, u'message-type': u'log', u'log-message': u'This is log msg'}, True)

Future Developments

  • registration on user service
  • API Tokens and user service
  • reliance by other services on user service logged in (single Sign on)

Common functionality

For various reasons the common functions like errors and config are held not in a seperate repo but here.

rhaptos2.repo.backend

rhaptos2.repo.configuration

Contains a common configuration parsing class and various utilities for dealing with configuration.

class rhaptos2.repo.configuration.Configuration(settings={}, **sections)[source]

A configuration settings object This primarily used to read configuration from file.

classmethod from_file(file, app_name='app')[source]

Initialize the class from an INI file. The app_name (defaults to DEFAULT_APP_NAME) is used to signify the main application section in the configuration INI. The application section is put into top-level mapping. All other sections are put in the mapping as a keyed section name and a sub-dictionary containing the sections key value pairs.

>>> ini = '''[app]
... appkey=appval
...
... [test]
... foo=1
...
... [test2]
... bar=1
... '''
>>> f = "/tmp/foo.ini"
>>> open(f, "w").write(initxt)
>>> C = Configuration.from_file(f)
>>> expected = {'test': {'foo': '1'},
...            'test2': {'bar': '1'},
...            "appkey":"appval"}
>>> assert C == expected
>>> assert C.test == {'foo': '1'}
>>> assert C.appkey == "appval"
>>> assert C.test["foo"] == '1'

rhaptos2.repo.log

run

How to run the repo, with different options.

rhaptos2.repo.run

Commandline utilities

Contains commandline utilities for initializing the database (initialize_database) and an application for use with PasteDeploy.

rhaptos2.repo.run.initialize_database(argv=None)[source]

Initialize the database tables.

rhaptos2.repo.run.paste_app_factory(global_config, **local_config)[source]

Makes a WSGI application (in Flask this is app.wsgi_app) and wraps it to serve the static web files.

Misc.

Here are misc notes that need to be better incorporated into the body of the docs.

Glossary

user_detail dict
{user_id
user_uri}
  1. Concerns over use of <li> in storing data.

    We are using textual representations of HTML5 to store a module. This means we store the HTML5 of a module as part of a document that represents that doc and its associated metadata.

    THis seems to work well.

    We are also storing a collection using HTML5 in the body of the documnet - that is the tree structure of a collection is represented in one documnet as a seires of <li> nodes.

    Using <li> as nodes is of minor consequence, but there is consequence for storing the whole tree in one document. Let us take for example a collection of three levels deep - lets choose the article on penguins in the Encycloipaedia Britiannica. THe collection looks like:

     Britannica
     |
      - P-O
      |
       - Penguin
    
    Now if Britannica is a collection (of all the volumes), and stores the whole
    tree within itself, and the P-O is another collection and stores the whole
    tree, we have two trees pointing to Penguion - and they need to be kept in
    synch.
    
    We basically cannot nest collections and store the whole tree within each