Where should audit data be persisted?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Where should audit data be persisted?

jewzaam
Administrator
Where should audit data be stored?
* MongoDB (via lightblue)
* RDBMS (i.e. MySQL)
* columnar storage (i.e. AWS RedShift)

Some points to consider:
* we don't really know how audit data will be used:
    * trending data
    * aggregate data
    * specific data at a point in time
* tools out there for using audit data know how to talk SQL
* columnar storage is really good at storing this data as well, and BI tools can talk to it find as well

I think this means looking at a non-document oriented storage.  We'd like to hit these requirements:
* capture when every field is changed
* capture who executed the change (authenticated entity)
* audit all changes (never know when you'll need that history)

I have a POC for audit in mongo that supports getting the data for a given entity at a point in time using aggregate functionality.  But this doesn't support trends or other aggregation.  Given the security of the mongo database we do not want to provide tools direct access.  But given how PII works it's not likely that we can audit such that it's not sensitive.

Sorry for some rambling, but I feel that recently we've taken a step back with audit given recommendations/discussions around non-document storage.
Reply | Threaded
Open this post in threaded view
|

Re: Where should audit data be persisted?

lcestari
I think the AWS RedShift should be used in the first place if we dont have time to do any further analysis and development around the audit. Otherwise, I think we could look other column oriented databases (like Cassandra or HBase) ( https://en.wikipedia.org/wiki/List_of_column-oriented_DBMSes ) so we would not have a a lock in  with AWS.

I also thought that Apache Gora (used in Nutch) could help us, we could choose a database and it would provide the integration (using avro) ( https://gora.apache.org/current/tutorial.html ) (we could even extend if we think it is missing a database)(recently it added a mongodb module, but there isnt any documentation about it)
Reply | Threaded
Open this post in threaded view
|

Re: Where should audit data be persisted?

jewzaam
Administrator

Ant idea how easy it will be to unit test if we pick redshift?

On Aug 12, 2014 8:39 AM, "lcestari [via lightblue-dev]" <[hidden email]> wrote:
I think the AWS RedShift should be used in the first place if we dont have time to do any further analysis and development around the audit. Otherwise, I think we could look other column oriented databases (like Cassandra or HBase) ( https://en.wikipedia.org/wiki/List_of_column-oriented_DBMSes ) so we would not have a a lock in  with AWS.

I also thought that Apache Gora (used in Nutch) could help us, we could choose a database and it would provide the integration (using avro) ( https://gora.apache.org/current/tutorial.html ) (we could even extend if we think it is missing a database)(recently it added a mongodb module, but there isnt any documentation about it)


If you reply to this email, your message will be added to the discussion below:
http://lightblue-dev.1011138.n3.nabble.com/Where-should-audit-data-be-persisted-tp21p24.html
To start a new topic under lightblue-dev, email [hidden email]
To unsubscribe from Where should audit data be persisted?, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

Re: Where should audit data be persisted?

lcestari
I dont know if there is something else that could help else then mocking it (using mockito or other mock library). ( I also know some people that worked with it using ruby that mocked it)