Rhaptos2.Repo - Documentation¶
Welcome to the docs for Rhaptos2.repo.
Some documentation is hand-written as below, and some is API further below
Testing¶
We have a WSGI app to be tested.
I am using 3 approaches, which more or less marry up.
- Doctests, and examples.
- nosetesting for webtest
- config passing for testing...
- standalone - should run with just python/rhaptos packages
- with network
- with selenium
To test locally¶
We do not have sqllite working [#]_ as a backend so we must have netowrked postgres working,
apart from that run the tests as follows
doctests: each file individually should support testmod doctests can run in their own suite - they should not require netowrk access. This is not always true.
functional testing of wsgi app:
nosetests –tc-file=../../testing.ini runtests.py
This will use webtest to send requests in-process to the app - no http calls are made. The app however does not know this and proceeds as if runing in a wsgi server.
functional HTTP testing of wsgi app:
nosetests --tc-file=../../pbrian.ini --tc=HTTPPROXY:http://localhost:8000/ runtests.py python run.py --config=../../pbrian.in
- example.txt - a demo / example of how the various bits fir together. it is a doctest suite but needs integrated system
config passing¶
nose
Also¶
Use interlude During development, copy the below into a doctest:
>>> import interlude; interlude.interact(locals())
now run the doctest, and it drops you into a shell at that line Just use the shell to develop, check, test etc. Its using locals() in the doctest. Its really helpful.
biblio: |
---|
footnotes¶
Config¶
After the great configuration debates I am reluctant to touch this too much but nosetests has rather forced our hands.
noestests is a useful piece of kit but has one major problem - its really awkward to pass configuration into to test. Doug Hellman has a generally accepted soklution - nose-testconfig.
THis effectively takes the config file given on command line and parses it into a python dict and presents it as a variable named “config”:
from testconfig import config
main_server = config['foo']['bar']
The current Configuration setup uses the convention of moving all keys under the [app] section into the top level of the dict and then everything else lives under “globals”. It returns a Mapping object that acts like a dict. Pretty quickly this is read off into the app.config object which also acts like a dict.
To satisfy this I muck around with nosetest ini file I find this unsatisfaotry but survieable for now - ultimately I think just producing a dict out of a ini file and being done with it will work. God knows how I drove to a different conclusion at any time.
Ultimately its pretty hacky so there may wel lbe a break somewhere down the line.
Future notes¶
nose-testconfig supports YAML and python files so is not particularly limiting, and we can change this again - I did not particularly want to as it seemed churlish rehashing old stuff but its become a RRPIA.
Session Management¶
Summary¶
Session management in the repo has been poor for a long time, and it has made testing of the various functionalities more awkward and requiring more brain use by developer and test suite than needed.
There is a branch (session-cookie-approach) that hopefully fixes this.
We shall provide sensible, secure session management (to decide if a browser has previously authenticated) and linked to that flow-control that makes decisions based on the session management such as presenting a login screen, a registration screen or whatever we need.
What is a session?¶
A session here is a defined period of time in which a CNX server will accept a random number string as a password-replacement/proxy for each HTTP request received.
Overview¶
The session shall be activated after a registered user logs in to the repo (via openid) and the session will be assigned a UUID, which shall be stored on the users browser as a cookie, and also cached in a postgres dbase where the users details will be the value to the UUID key.
Every time the user represents their cookie we shall look up the user details in the cache, take any appropriate flow-control action (session timed out ? relogin?) and then store the user details in the request for later use.
Issues such as deleting sessions, retrieving the correct dicts etc are also handled, with some known issues (see below).
Sessions¶
Please see sessioncache for details.
Authentication flow and sessions.¶
Sessions are just a convenient way of allowing a user to signon once only. The flow of authentication is a little more convulited
Difference between session sign on and Single Sign On¶
The repo manages its own login (openid) and sessions (sessioncache) The user will sign in once and then be given a sessionid. We will lookup the user from the the id as long as the session is valid.
If another service (transformations) receives a request from the user how should we validate the user request - should the session cookie be tested against the repo session cache locally?
What if the repo calls the transformation serivce to act on behalf of the user - what should we send from repo to transformations - the sessionid? Another api token created alongside sseession? Where does the lookup occur.
THere are two main phases
- validate already set-up sessions and proceed correctly
- Create and destroy sessions, existing or none (ie login and out)
Known issues¶
- I am not handling the situation of user signing in twice.
- I am not handling registraftion (co-ordinate with michael)
- I am not setting cookie expires ...
- I am setting httponly
- ‘SSL’ * 443
What is wrong with current setup?¶
- NO session cache, which was to be redis but never came in. We are storing the users OpenID identifier. This is a massive security hole.
- reliance on Flask security session implementation. THere are a number of reasons to be disatissfied with this, the first is the secret key is a single phrase, in config.
- No clear migration away from Flask.
- The awful temptation to put more and more stuff in session cookies for “ease” and “scalbility”.
Primarily I am frustrated in testing ACLs, and in creating /resources/ - whoch would be again reliant on a broken session implementation.
What about API Tokens?¶
Did we not discuss these at the Sprint? Yes. “Single Sign On” is better decribed as “Once Only Sign On, many systems” A session is a once-only sign on for a single (local) system. We shall need to have an alternative API token approach for other systems taht want to use the same sign on as authentication. Examples wanted.
Testing issues¶
Creation of a “fake-login-API”. During testing only (ie a flag set) we can visit a API page, and get a valid session cookie for one of a number of pre-defined users.
This now exists as /autosession.
Why do you not encrypt the session ID in the cookie?¶
Mostly because I know bupkiss about encryption. No really I can do AES with OpenSSH just fine, but did I do it right? Did I rotate my encryption keys with each user? Did I use cyclic or block level encryption? Which one is which again? Am I handing out an oracle? (The last one is yes)
Here is a simple argument - I contend that to correctly and securely encrypt anything sent client side, when any one client could be an attacker, one should have a salt/key unique to each user.
This simple and reasonable request destroys the main argument for sticking session details like isAdmin and UserName into a encrypted cookie - that it simplifies distributed architecture (I can let client connect to any web server, and I will still have the session state in the cookie, no need for a database lookup)
Well the minute we need to get a unique salt for a user, we are back to database lookups, and just as frequently as session lookups.
Anyway, enough round the houses, I don’t know enough about securing encrypted services with part of the service under complete control of the attacker, to be sure I have not screwed it up. So I wont do it till I do, and even then all we should store is the session ID.
A neat trick¶
Sometimes it is desireable to set a cookie in your browser - chrome enables us to do this as follows:
- navigate to the domain & path desired (i.e. “/” in most cases)
- enter javascript:document.cookie="name=value" in the address bar & return
- you should then revisit the domain, and hey presto you have a cookie
Thanks to http://blog.nategood.com/quickly-add-and-edit-cookies-in-chrome
Documenting JSON flow of repo API¶
THe below is the output of restrest.py. It documents HTTP conversations as they occur through the python requestsmodule.
POST /module/¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 826
Content-Type: application/json; charset=utf-8
Body:
{
"authors": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"body": "<h1>In CONGRESS, July 4, 1776.</h1>\n<p>The unanimous Declaration ...
"copyrightHolders": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"id_": "cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
"maintainers": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"title": "Introduction"
}
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 1131
{"body": "<h1>In CONGRESS, July 4, 1776.</h1>\n<p>The u...
PUT /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126¶
NB - we are using session 000 for the initial PUT
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 384
Content-Type: application/json; charset=utf-8
Body:
{
"acl": [
"cnxuser:75e06194-baee-4395-8e1a-566b656f6921"
],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is the useruri for a registered user, (ross)
who we happen to know has a sessionID of 0001
"authors": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"body": "<p> Shortened body in test_put_module",
"copyrightHolders": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"id_": "cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
"maintainers": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"title": "Introduction"
}
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 676
{"body": "<p> Shortened body in test_put_module", "id_"...
PUT /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126¶
Here we are using a different user, 001 (ross) that we added as a ACL in previous PUT.
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000001
Host: 127.0.0.1:8000
Content-Length: 382
Content-Type: application/json; charset=utf-8
Body:
{
"acl": [
"cnxuser:75e06194-baee-4395-8e1a-566b656f6921"
],
"authors": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"body": "<p> OTHERUSERSESSIONID has set this",
"copyrightHolders": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"id_": "cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
"maintainers": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"title": "Introduction"
}
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 674
{"body": "<p> OTHERUSERSESSIONID has set this", "id_": ...
POST /folder/¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 398
Content-Type: application/json; charset=utf-8
Body:
{
"body": [
"cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
"cnxmodule:350f7859-e6e7-11e1-928f-2c768ae4951b",
"cnxmodule:4ba18842-1bf8-485b-a6c3-f6e15dd762f6",
"cnxmodule:77a45e48-6e91-4814-9cca-0f28348a4aae",
"cnxmodule:e0c3cfeb-f2f2-41a0-8c3b-665d79b09389",
"cnxmodule:c0b149ec-8dd3-4978-9913-ac87c2770de8"
],
"id_": "cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707",
"title": "Declaration Folder"
}
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 445
{"body": [{"mediaType": "application/vnd.org.cnx.module...
PUT /folder/cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 247
Content-Type: application/json; charset=utf-8
Body:
{
"acl": [
"00000000-0000-0000-0000-000000000001"
],
"body": [
"cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
"cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41127"
],
"id_": "cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707",
"title": "Declaration Folder"
}
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 481
{"body": [{"mediaType": "application/vnd.org.cnx.module...
GET /folder/cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 481
Access-Control-Allow-Origin: *
{"body": [{"mediaType": "application/vnd.org.cnx.module...
POST /collection/¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 956
Content-Type: application/json; charset=utf-8
Body:
{
"authors": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"body": "<ul><li><a href=\"cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126\"...
"copyrightHolders": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"id_": "cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7",
"keywords": [
"Life",
"Liberty",
"Happiness"
],
"language": "en",
"maintainers": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"subType": "Other Report",
"subjects": [
"Social Sciences"
],
"summary": "No.",
"title": "United States Declaration Of Independance"
}
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 1181
{"body": "<ul><li><a href=\"cnxmodule:d3911c28-2a9e-415...
PUT /collection/cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 683
Content-Type: application/json; charset=utf-8
Body:
{
"acl": [
"00000000-0000-0000-0000-000000000001"
],
"authors": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"body": "<ul><li><a href=\"cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126\"...
"copyrightHolders": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"id_": "cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7",
"keywords": [
"Life",
"Liberty",
"Happiness"
],
"language": "en",
"maintainers": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"subType": "Other Report",
"subjects": [
"Social Sciences"
],
"summary": "No.",
"title": "United States Declaration Of Independance"
}
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 895
{"body": "<ul><li><a href=\"cnxmodule:d3911c28-2a9e-415...
GET /collection/cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 895
{"body": "<ul><li><a href=\"cnxmodule:d3911c28-2a9e-415...
PUT /collection/cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000
Content-Length: 494
Content-Type: application/json; charset=utf-8
Body:
{
"authors": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"body": [
"cnxmodule:SHOULDNEVERHITDB0"
],
"copyrightHolders": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"id_": "cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7",
"keywords": [
"Life",
"Liberty",
"Happiness"
],
"language": "en",
"maintainers": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"subType": "Other Report",
"subjects": [
"Social Sciences"
],
"summary": "No.",
"title": "United States Declaration Of Independance"
}
Response:
Content-Type: text/html
Content-Length: 227
null
PUT /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Content-Length: 368
Content-Type: application/json; charset=utf-8
Body:
{
"acl": [
"cnxuser:75e06194-baee-4395-8e1a-566b656f6921"
],
"authors": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"body": "Declaration test text",
"copyrightHolders": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"id_": "cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
"maintainers": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"title": "Introduction"
}
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 660
{"body": "Declaration test text", "id_": "cnxmodule:d39...
PUT /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000
Content-Length: 359
Content-Type: application/json; charset=utf-8
Body:
{
"acl": [
"cnxuser:75e06194-baee-4395-8e1a-566b656f6921"
],
"authors": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"body": "NEVER HIT DB",
"copyrightHolders": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"id_": "cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126",
"maintainers": [
"cnxuser:f9647df6-cc6e-4885-9b53-254aa55a3383"
],
"title": "Introduction"
}
Response:
Content-Type: text/html
Content-Length: 223
null
PUT /folder/cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000
Content-Length: 163
Content-Type: application/json; charset=utf-8
Body:
{
"acl": [
"00000000-0000-0000-0000-000000000001"
],
"body": [
"THIS IS TEST"
],
"id_": "cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707",
"title": "Declaration Folder"
}
Response:
Content-Type: text/html
Content-Length: 223
null
GET /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000001
Host: 127.0.0.1:8000
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 660
{"body": "Declaration test text", "id_": "cnxmodule:d39...
GET /folder/cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 481
Access-Control-Allow-Origin: *
{"body": [{"mediaType": "application/vnd.org.cnx.module...
GET /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000
Response:
Content-Type: text/html
Content-Length: 223
null
GET /workspace/¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 433
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
[{"mediaType": "application/vnd.org.cnx.module", "id": ...
DELETE /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000
Response:
Content-Type: text/html
Content-Length: 223
null
DELETE /module/cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 57
cnxmodule:d3911c28-2a9e-4153-9546-f71d83e41126 is no mo...
DELETE /collection/cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000
Response:
Content-Type: text/html
Content-Length: 227
null
DELETE /collection/cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 61
cnxcollection:be7790d1-9ee4-4b25-be84-30b7208f5db7 is n...
DELETE /folder/cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000002
Host: 127.0.0.1:8000
Response:
Content-Type: text/html
Content-Length: 223
null
DELETE /folder/cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707¶
Cookie: cnxsessionid=00000000-0000-0000-0000-000000000000
Host: 127.0.0.1:8000
Response:
Content-Type: application/json; charset=utf-8
Content-Length: 57
cnxfolder:c192bcaf-669a-44c5-b799-96ae00ef4707 is no mo...
Simple spec links¶
This is a brief discussion and a linkage to more details of the API spec for CNX rewrite.
URIs¶
I have made a mistake here and wish to correct it soon. A URI is a URN, so I used the urh: format to define a URI. It also tried to avoid any unusual encoding around slashes and CGI escaping. This is confusing and simply not simple.
current string that represents a pure single identifer for a resource:
cnxuser:75e06194-baee-4395-8e1a-566b656f6920
THe new better format:
/user/75e06194-baee-4395-8e1a-566b656f6920
or possibly
/cnxuser/75e06194-baee-4395-8e1a-566b656f6920
If we are versioning the textual changes of say a module:
/module/75e06194-baee-4395-8e1a-566b656f6920@aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d
(sha1hash)
User¶
The current dict-format for a user
{u'affiliationinstitution': None,
u'affiliationinstitution_url': None,
u'biography': None,
u'email': None,
u'firstname': None,
u'fullname': u'Paul Brian',
u'homepage': None,
u'identifiers': [{u'identifierstring': u'https://paulbrian.myopenid.com',
u'identifiertype': u'openid',
u'user_id': u'cnxuser:75e06194-baee-4395-8e1a-566b656f6920'},
{u'identifierstring': u'https://paulbrian.myopenid.com/',
u'identifiertype': u'openid',
u'user_id': u'cnxuser:75e06194-baee-4395-8e1a-566b656f6920'}],
u'imageurl': None,
u'interests': None,
u'lastname': None,
u'location': None,
u'middlename': None,
u'otherlangs': None,
u'preferredlang': None,
u'recommendations': None,
u'suffix': None,
u'title': None,
u'user_id': u'cnxuser:75e06194-baee-4395-8e1a-566b656f6920',
u'version': None}
Resources¶
Module¶
Collection¶
Folder¶
Proposals¶
This is an experimental area of the docs - it contains written use cases, specs and just plain old discussions about the work this branch is trying to perform. It seems useful to have one location to put this stuff. This location may or may not be best.
Add Google Analytics support¶
The author(s) of modules and collections should be able to add their own tracking codes to a module or collection or both, and we should offer a facility to do this.
storyref: | https://trello.com/card/repo-add-api-support-for-new-metadata-fields-4-pts/5181197901c3b1290b001951/86 |
---|
Spec¶
We shall provide a single text field in both modules and collections named googleTrackingID and this will allow arbitrary tracking code to be installed.
The backend repo only needs to support accepting a new field from the json doc and handling it correctly. The ATC client will need to do more, see story https://trello.com/card/atc-add-missing-ui-fields-for-metadata-6-pts/5181197901c3b1290b001951/85
Security issue: It may be better to implement this as a google-only field, and capture only a string corresponding to the google tracking code (ie AM-1234ABB) and we fill in the script boiler plate around it. This will prevent arbitrary script being written into the modules. An skype discussion indicated we would sanitise all HTML inputs during the publication process.
Tests¶
- Can we inject a arbitrary string?
- Can we see same string returned with no HTML mangling?
Logging¶
Good idea. Move to getLogger(__name__)
hooks¶
I think we really can do with simple hooks process where we run different functions based on point in request process.
Logging¶
We want to uset the getLogger(_name__) form We want to create a context and put it in g / environ and log with that We want to clear up logging and redirect to syslog We want to direct syslog elsewhere...
API¶
These documents come directly from source code as API docs (ala epydoc), but are divided up and commented in reasonable order below.
API for Views and models¶
Summary¶
If we move to greenlets as well we need to test that ability. As such I have not introduced a pool for psyocpg2 work yet. (see sessioncache)
API¶
rhaptos2.repo.views¶
views.py - View code for the repository application.
Structure: We have three main view-areas.
- the models (Folder, Collection, Module)
- the helper views (workspace)
- binary uploads.
- openid and persona
Protocols¶
I try to stick to these
- Every action (GET POST PUT DELETE) must have a useruri passed in to authorise
- views recevie back either a model.<> object or a json-encodeable version of that
json-encoding¶
todo: convert to factory based app entirely todo: remove view / as thats now JS todo: remove apply_cors and apply internally. Or just use it? todo: remove crash and burn
- rhaptos2.repo.views.apply_cors(resp_as_pytype)[source]¶
A callable function (not decorator) to take the output of a app_end and convert it to a Flask response with appropriate Json-ified wrappings.
- rhaptos2.repo.views.auto_session()[source]¶
strictly for testing purposes I want to fake three sessions with known ids. Also generate a “real” session with a known user FIXME - there has to be a better way
- rhaptos2.repo.views.bootstrap()[source]¶
At this point there is either a valid session (so redirect to atc) or there is a need to let the visitor choose either to get an anonymous session, or that they are registered, and they should choose to log in again.
There is a logic choice that might improve things - if they have previously visited us, redirect to /login.
- rhaptos2.repo.views.content_router(uid)[source]¶
We now serve everything form api/content
- uid = content/1234-1234-12334
- ^^^ uuid
router logic is subtly different
- if we are GET, DELETE, HEAD then no payload and an uid do not collect payload, do collect uid route
- POST payload no uid
- PUT payload and uid
(Ignore OPTIONS etc)
- rhaptos2.repo.views.requestid()[source]¶
before_request is supplied with this to run before each __call_
- rhaptos2.repo.views.simple_xss_validation(html_fragment)[source]¶
>>> simple_xss_validation("US-12345678-1") True >>> simple_xss_validation("<script>Evil</script>") False
This is very quick and dirty, and we need some consideration over XSS escaping. FIXME
- rhaptos2.repo.views.temp_session()[source]¶
When a user wants to edit anonymously, they need to hit this first. This is to avoid the logic problems in knowing if a user should be redirected if they have one but not two cookies etc.
Here we generate a temperoiary userid (that is not linked to cnx-user) then setup a session based on that userid. All work will be lost at end of session.
- rhaptos2.repo.views.validate_googleTrackingID(payload)[source]¶
Given a (json) formatted payload, return whether the google tracking ID is valid
- rhaptos2.repo.views.validate_mediaType(payload)[source]¶
Given a (json) formatted payload, find out if it is a module. collection, folder and return appropriate mediatype
possible enhancements include using a acceptHeader to determine mediatype returns mediatype - seems odd..
- rhaptos2.repo.views.verify_schema(model_dict, mediatype)[source]¶
Given a json object, verify it matches the claimed mediaType schema
model_dict: dict of the model as out of json - MUST be pure mediaType, not SOFT form mediatype: WHat we think the dict confirms to
FixMe: we do not have versioning of schemas FixMe: we don’t have a jsonschema verifier...
- rhaptos2.repo.views.whoamiGET()[source]¶
returns Either 401 if OpenID not available or JSON document of form
- {“openid_url”: “https://www.google.com/accounts/o8/id?id=AItOawlWRa8JTK7NyaAvAC4KrGaZik80gsKfe2U”, # noqa
- “email”: “Not Implemented”, “name”: “Not Implemented”}
Authentication API¶
Summary¶
We make use of session cookies, a local cache and some logic flow to ensure users can work easily with minimal interruptions
rhaptos2.repo.auth¶
How does Authentication, Authorisation and Sessions work?¶
We are operating a Single-Sign-On service, using Valruse to do the authentication.
Workflow¶
A user will hit the repo-home page, and :func:`handle_user_authentication`will be fired, and based on the cookies stored in the user browser we will know one of three things about the user.
- Never seen before
- Known user, no in session
- Known user, in session
- Edge cases
Usually they will choose to login and will be directed to the login page on cnx-user. Here, the cnx-user will authenticate them in some fashion (OpenID) and the redirect the user browser back to the repo, with a token.
The repo then looks up this token against the cnx-user service. And hey presto, if the token matches, the repo knows they can trust the browser. (assuming SSL all the way)
The redirect hits at the /valid endpoint in the repo, which will check the token against the cnx-user, and then
user_uuid_to_user_details
given a authenticated user ID (OpenID), look up the user details
on cnx-user
create_session()
At this point, we now have a user who logged in against cnx-user, has then proven to cnx-repo that they did log in, and now has a session cookie set in their browser that the repo can trust as password-replacement for a set period.
Temporary sessions¶
Temporary sessions, or anonymous editing is where a user does not login but uses the repo anonymously, perhaps creating test modules.
This is supported by hitting the endpoint /tempsession which will trigger the view temp_session. This in turn calls set_temp_session(). Here we create both a random sessionID (as per usual for sessions) and we also create a random user_id. This user_id is used exactly as if it were a real registered user, but it is never sent back to the cnx-user, instead it solely is used in ACLs on the unpub repo.
This way, at the end of a temp session, the user effectively loses all their edits. This may want to be avoided, and is possible but not yet implemented.
known issues¶
- requesting_user_id This is passed around a lot This is suboptimal, and I think should be replaced with passing around the environ dict as a means of linking functions with the request calling them
- I am still passing around the userd in g. This is fairly silly but seems consistent for flask. Will need rethink.
- secure (https) - desired future toggle
- further notes at http://executableopinions.mikadosoftware.com/en/latest/labs/webtest-cookie/cookie_testing.html
- rhaptos2.repo.auth.create_session(userdata)[source]¶
A closure function that is stored and called at end of response, allowing us to set a cookie, with correct uuid, before response obj has been created (before request is processed !)
Param: userdata - a userdict format. Returns: sessionid cookie settings:
- cnxsessionid - a fixed key string that is constant
- expires - we want a cookie that will live even if user
shutsdown browser. However do not live forever ...? * httponly - prevent easy CSRF, however allow AJAX to request browser to send cookie.
- rhaptos2.repo.auth.delete_session(sessionid)[source]¶
request browser temove cookie from client, remove from session-cache dbase.
- rhaptos2.repo.auth.handle_user_authentication(flask_request)[source]¶
Correctly perform all authentication workflows
We have 16 options for different user states such as IsLoggedIn, NotRegistered. The states are listed below.
THis function is where eventually all 16 will be handled. For the moment only a limited number are.
Parameters: flask_request – request object of pococo flavour. Returns: No return is good because it allows the onward rpocessing of requests. Otherwise we return a login page.
This gets called on before_request (which is after processing of HTTP headers but before __call__ on wsgi.)
Note
All the functions in sessioncache, and auth, should be called from here (possibly in a chain) and raise errors or other signals to allow this function to take action, not to presume on some action (like a redirect) themselves. (todo-later: such late decisions are well suited for deferred callbacks)
Auth Reg InSession ProfileCookie Next Action / RoleType Handled Here Y Y Y Y Go Y Y Y Y N set_profile_cookie Y Y Y N Y set_session Y Y Y N N FirstTimeOK Y N Y Y ErrorA Y N Y N ErrorB Y N N Y ErrorC Y N N N NeedToRegister N N Y Y AnonymousGo N N Y N set_profile_cookie N N N Y LongTimeNoSee N N N N FreshMeat N Y Y Y Conflict with anonymous and reg? N Y Y N Err-SetProfile-AskForLogin N Y N Y NotArrivedYet N Y N N CouldBeAnyone All the final 4 are problematic because if the user has not authorised how do we know they are registered? Trust the profile cookie?
we examine the request, find session cookie, register any logged in user, or redirect to login pages
- rhaptos2.repo.auth.lookup_session(sessid)[source]¶
As this will be called on every request and is a network lookup we should storngly look into redis-style lcoal disk cacheing performance monitoring of request life cycle?
- returns python dict of user_details format.
- or None if no session ID in cache or Error if lookup failed for other reason.
- rhaptos2.repo.auth.session_to_user(flask_request_cookiedict, flask_request_environ)[source]¶
Given a request environment and cookie, return the user data.
>>> cookies = {"cnxsessionid": "00000000-0000-0000-0000-000000000000",} >>> env = {} >>> userd = session_to_user(cookies, env) >>> outenv["fullname"] 'pbrian'
Params flask_request_cookiedict: the cookiejar sent over as a dict(-like obj). Params flask_request_environ: a dict like object representing WSGI environ Returns: Err if lookup fails, userdict if not
- rhaptos2.repo.auth.set_autosession()[source]¶
This is a convenience function for development It should fail in production
- rhaptos2.repo.auth.set_temp_session()[source]¶
A temopriary session is not yet fully implemented A temporary session is to allow a unregistered and unauthorised user to vist the site, acquire a temporary userid and a normal session.
Then they will be able to work as normal, the workspace and acls set to the just invented temporary id.
However work saved will be irrecoverable after session expires...
NB - we have “made up” a user_id and uri. It is not registered in cnx-user. This may cause problems with distributed cacheing unless we share session-caches.
- rhaptos2.repo.auth.store_userdata_in_request(user_details, sessionid)[source]¶
given a userdict, keep it in the request cycle for later reference. Best practise here will depend on web framework.
- rhaptos2.repo.auth.user_uuid_to_user_details(ai)[source]¶
Given a user_id from cnx-user create a user_detail dict.
Parameters: ai – authenticated identifier. This used to be openID URL, now we directly get back the common user ID from the user serv.ce user_details no longer holds any user meta data aparrt from the user UUID.
- rhaptos2.repo.auth.user_uuid_to_valid_session(uid)[source]¶
Given a single UUID set up a session and return a user_details dict
Several different functions need this series of steps so it is encapsulated here.
rhaptos2.repo.sessioncache¶
sessioncache is a standalone module providing the ability to control persistent-session client cookies and profile-cookies.
sessioncache.py is a “low-level” piece, and is expected to be used in conjunction with lower-level authentication systems such as OpenID and with “higher-level” authorisation systems such as the flow-control in auth.py
- persistent-session
- This is the period of time during which a web server will accept a id-number presented as part of an HTTP request as a replacement for an actual valid form of authentication. (we remember that someone authenticated a while ago, and assume no-one is able to impersonate them in the intervening time period)
- persistent-session cookie
- This is a cookie set on a client browser that stores a id number pertaining to a persistant-session. It will last beyond a browser shutdown, and is expected to be sent as a HTTP header as part of each request to the server.
Why? Because I was getting confused with lack of fine control over sessions and because the Flask implementation relied heavily on encryption which seems to be the wrong direction. So we needed a server-side session cookie impl. with fairly fine control.
I intend to replace the existing SqlAlchemy based services with pure psycopg2 implementations, but for now I will be content not adding another feature to SA
Session Cache¶
The session cache needs to be a fast, distributed lookup system for matching a random ID to a dict of user details.
We shall store the user details in the tabl;e session_cache
Discussion¶
Caches are hard. They need to be very very fast, and in this case distributable. Distributed caches are very very hard because we need to ensure they are synched.
I feel redis makes an excellent cache choice in many circumstances - it is blazingly fast for key-value lookups, it is simple, it is threadsafe (as in threads in the main app do not maintain any pooling or thread issues other than opening a socket or keeping it open) and it has decent synching options.
However the synching is serious concern, and as such using a centralised, fast, database will allow us to move to production with a secure solution, without the immediate reliance on cache-invalidation strategies.
Overview¶
We have one single table, session_cache. This stores a json string (as a string, not 9.3 JSON type) as value in a key value pair. The key is a UUID-formatted string, passed in from the application. It is expected we will never see a collission.
We have three commands:
With this we can test the whole lifecyle as below
Example Usage¶
We firstly pass in a badly formed id.:
>>> sid = "Dr. Evil"
>>> get_session(sid)
- Traceback (most recent call last):
- ...
Rhaptos2Error: Incorrect UUID format for sessionid...
OK, now lets use a properly formatted (but unlikely) UUID
>>> sid = "00000000-0000-0000-0000-000000000001"
>>> set_session(sid, {"name":"Paul"})
True
>>> userd = get_session(sid)
>>> print userd[0]
00000000-0000-0000-0000-000000000001
>>> delete_session(userd[0])
To do¶
- greenlets & conn pooling
- wrap returned recordset in dict.
- pg’s UUID type?
Standalone usage¶
minimalconfd = {"app": {'pghost':'127.0.0.1',
'pgusername':'repo',
'pgpassword':'CHANGEME',
'pgdbname':'dbtest'}
}
import sessioncache
sessioncache.set_config(minimalconfd)
sessioncache.initdb()
sessioncache._fakesessionusers()
sessioncache.get_session("00000000-0000-0000-0000-000000000000")
{u'interests': None, u'user_id': u'cnxuser:75e06194-baee-4395-8e1a-566b656f6920', ...}
>>>
- rhaptos2.repo.sessioncache.connection_refresh(conn)[source]¶
Connections should be pooled and returned here.
- rhaptos2.repo.sessioncache.delete_session(sessionid)[source]¶
Remve from session_cache an existing but no longer wanted session(id) for whatever reason we want to end a session.
Parameters: sessionid – Sessionid from cookie :returns nothing if success.
- rhaptos2.repo.sessioncache.exec_stmt(insql, params)[source]¶
trivial ability to run a dm query outside SQLAlchemy.
Parameters: - insql – A correctly parameterised SQL stmt ready for psycopg driver.
- params – iterable of parameters to be inserted into insql
Return a dbapi recordset: (list of tuples)
- rhaptos2.repo.sessioncache.get_session(sessionid)[source]¶
- Given a sessionid, if it exists, and is “in date” then
- return userdict (oppostie of set_session)
Otherwise return None (We do not error out on id not found)
NB this depends heavily on co-ordinating the incoming TZ of the DB and the python app server - I am soley runnig the check on the dbase, which avoids that but does make it less portable.
- rhaptos2.repo.sessioncache.getconn()[source]¶
returns a connection object based on global confd.
This is, at the moment, not a pooled connection getter.
We do not want the ThreadedPool here, as it is designed for “real” threads, and listens to their states, which will be ‘awkward’ in moving to greenlets.
We want a pool that will relinquish control back using gevent calls
https://bitbucket.org/denis/gevent/src/5f6169fc65c9/examples/psycopg2_pool.py http://initd.org/psycopg/docs/pool.html
Return psycopg2 connection objpsycopg2 connection obj: conn obj Return psycopg2.Error: or Err
- rhaptos2.repo.sessioncache.maintenance_batch()[source]¶
A holdng location for ways to clean up the session cache over time. These will need improvement and testing.
- rhaptos2.repo.sessioncache.run_query(insql, params)[source]¶
trivial ability to run a query outside SQLAlchemy.
Parameters: - insql – A correctly parameterised SQL stmt ready for psycopg driver.
- params – iterable of parameters to be inserted into insql
Return a dbapi recordset: (list of tuples)
run_query(conn, “SELECT * FROM tbl where id = %s;”, (15,))
issues: lots.
- No fetch_iterator.
- connection per query(see above)
- We should at least return a dict per row with fields as keys.
- rhaptos2.repo.sessioncache.set_session(sessionid, userd)[source]¶
Given a sessionid (generated according to cnxsessionid spec elsewhere) and a userdict store in session cache with appropriate timeouts.
Parameters: - sessionid – a UUID, that is to be the new sessionid
- userd – python dict of format cnx-user-dict.
Returns: True on successful setting.
Can raise Rhaptos2Errors
TIMESTAMPS. We are comparing the time now, with the expirytime of the cookie in the database This reduces the portability.
This beats the previous solution of passing in python formatted UTC and then comparing on database.
FIXME: bring comaprison into python for portability across cache stores.
rhaptos2.repo.weblogging¶
author: | paul@mikadosoftware.com <Paul Brian> |
---|
This is initially a simple URL to be listened to:
/logging
The logging endpoint will take either of the two forms of message below apply it either to the local syslog, which we expect will be configured to centralise over rsyslogd, or it will take a triple, of the below form and convert it into a stasd call to be stored on the graphite database.
logging¶
This endpoint will capture a JSON encoded POST sent to /logging and will process one of three message types.
Firstly, just a block of text expected to be a traceback or other log-ready message. We would assume the client would not insert user data. There is no expectation of capturing session details here. Why would we want the log to be SSL protected? Might be an idea?
- {‘message-type’:’log’,
‘log-message’:’Traceback ...’,
‘metric-label’: null, ‘metric-value’: null, ‘metric-type’: null;, }
The common metric of simply adding one to a global counter is shown here. We are capturing the number of times anyone types in the word penguin.
- {‘message-type’:’metric’,
‘log-message’:null,
‘metric-label’: ‘org.cnx.writes.penguin’, ‘metric-value’: null, ‘metric-type’: ‘incr’, }
Here, a third type of message. We can capture a metric that is a specific value, this would be useful in aggregate reporting. It might be amount of time to perform an action, here its wpm.:
{'message-type':'metric',
'log-message':null,
'metric-label': 'org.cnx.wordsperminute',
'metric-value': 48,
'metric-type': 'timing';,
}
NB The above message is not yet supported.
Improvements¶
Run this as a WSGI middleware, so it is simple to import into the chain.
Security considerations¶
Fundamentally no different from any web service. I expect we shall need to use some form of long running token and keep the conversastions in SSL to prevent simplistic DDOS attacks.
Simple testing
>>> from weblogging import *
>>> confd = {'globals':
... {'syslogaddress':"/dev/log",
... 'statsd_host':'log.frozone.mikadosoftware.com',
... 'statsd_port':8125,
... }}
>>> testmsg = '''{"message-type":"log",
... "log-message":"This is log msg",
... "metric-label": null,
... "metric-value": null,
... "metric-type": null
... }'''
>>> configure_weblogging(confd)
>>> logging_router(testmsg)
### FIXME - there is no really decent way to snaffle syslogs in a unit test...
- rhaptos2.repo.weblogging.logging_router(json_formatted_payload)[source]¶
pass in a json message, this will check it, then action the message.
We have several types of incoming message, corresponding to an atc log message, an atc metric message (ie graphite). We want to correctly handle each so this acts as a router/dispatcher
- rhaptos2.repo.weblogging.validate_msg_return_dict(json_formatted_payload)[source]¶
>>> >>> payload_good = '''{"message-type":"log", ... "log-message":"This is log msg", ... "metric-label": null, ... "metric-value": null, ... "metric-type": null ... }''' >>> x = validate_msg_return_dict(payload_good) >>> x ({u'metric-type': None, u'metric-value': None, u'metric-label': None, u'message-type': u'log', u'log-message': u'This is log msg'}, True)
Future Developments¶
- registration on user service
- API Tokens and user service
- reliance by other services on user service logged in (single Sign on)
Common functionality¶
For various reasons the common functions like errors and config are held not in a seperate repo but here.
rhaptos2.repo.backend¶
rhaptos2.repo.configuration¶
Contains a common configuration parsing class and various utilities for dealing with configuration.
- class rhaptos2.repo.configuration.Configuration(settings={}, **sections)[source]¶
A configuration settings object This primarily used to read configuration from file.
- classmethod from_file(file, app_name='app')[source]¶
Initialize the class from an INI file. The app_name (defaults to DEFAULT_APP_NAME) is used to signify the main application section in the configuration INI. The application section is put into top-level mapping. All other sections are put in the mapping as a keyed section name and a sub-dictionary containing the sections key value pairs.
>>> ini = '''[app] ... appkey=appval ... ... [test] ... foo=1 ... ... [test2] ... bar=1 ... ''' >>> f = "/tmp/foo.ini" >>> open(f, "w").write(initxt) >>> C = Configuration.from_file(f) >>> expected = {'test': {'foo': '1'}, ... 'test2': {'bar': '1'}, ... "appkey":"appval"} >>> assert C == expected >>> assert C.test == {'foo': '1'} >>> assert C.appkey == "appval" >>> assert C.test["foo"] == '1'
rhaptos2.repo.log¶
run¶
How to run the repo, with different options.
rhaptos2.repo.run¶
Commandline utilities
Contains commandline utilities for initializing the database (initialize_database) and an application for use with PasteDeploy.
Misc.¶
Here are misc notes that need to be better incorporated into the body of the docs.
Glossary¶
- user_detail dict
- {user_id
- user_uri}
Concerns over use of <li> in storing data.
We are using textual representations of HTML5 to store a module. This means we store the HTML5 of a module as part of a document that represents that doc and its associated metadata.
THis seems to work well.
We are also storing a collection using HTML5 in the body of the documnet - that is the tree structure of a collection is represented in one documnet as a seires of <li> nodes.
Using <li> as nodes is of minor consequence, but there is consequence for storing the whole tree in one document. Let us take for example a collection of three levels deep - lets choose the article on penguins in the Encycloipaedia Britiannica. THe collection looks like:
Britannica | - P-O | - Penguin Now if Britannica is a collection (of all the volumes), and stores the whole tree within itself, and the P-O is another collection and stores the whole tree, we have two trees pointing to Penguion - and they need to be kept in synch. We basically cannot nest collections and store the whole tree within each