Rethinking RDBMS - overview and one scenario

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Rethinking RDBMS - overview and one scenario

bserdar
I don't like the idea of generating SQL operations based on
field/column mappings. Such mappings are not expressive enough for
edge cases. Here are some of those edge cases:

 - A 1-n relation with no PK for the n-side of the relationship.
   PARENT_TABLE ( _id, some_column )
   CHILD_TABLE  ( parent_id, some_other_column  )
   
   The PK of the main table is mapped to the document id, but there is
   no real reason to declare a field to map the FK of the child
   table. There needs to be a way to describe this relationship.

 - Lookup scenarios: lets say the parent table has child_id, which is
   a primary key to a child table. The JSON document maps only a field
   from that child table. Something like this:

   document: { _id, countryName }
   Document table: ( _id, country_id )
   country table: ( country_id, country_name )

   There should be a way to describe such lookups. This case gets
   uglier if the operation is a lookup and insert if not found.

 - n-n relationships where the association table is not mapped
   anywhere. The metadata that defines the association has to be
   defined somehow. Also, if the association table has more than just
   the IDs for the two tables it associates, things get even more
   difficult. With Hibernate, you have to define the association table
   as a separate entity to handle such cases.

So what I propose is a more procedural way of describing OR
mappings. Instead of defining the relationships between tables and
trying to infer the operations that needs to be performed on those
tables using metadata, we let the metadata maintainer define the
operations to implement our CRUD APIs. There will be default
procedures that handle trivial cases, so it will be possible to define
an OR mapping without any procedural definitions. But, the option to
override those defaults can make this aproach more powerful than
generated SQL.

It should be sufficient to define procedures for insert, update,
delete, and find APIs. Save operation can be implemented as an update
of all fields.

A special case about update: not all update operations can be
implemented as bulk updates, especially if the update touches more
than one tables. So, at least for now, update can be implemented as an
iterative update of a search result. So update should be constrained
to updating only one document.

An overview of what I propose is this: lets define some primitives,
operations like insert_row, update_row, etc. that can be invoked with
a set of arguments. These primitives perform a SQL operation using
field/column mappings defined in metadata, as well as mappings defined
within the primitive arguments. We should also have primitives to
execute some tests, loop over collections, etc. Each operation should
be defined as a sequence of primitives. If no definition is given for
an operation, we try to infer how the operation should be performed
using the metadata mappings. This should handle trivial mappings, but
whenever we encounter a non-trivial mapping, we can provide the SQL
operations to implement that without resorting to "mapping
hacks". This also allows us to call stored procedures whenever
necessary, or generate unique IDs from sequences, etc.

What I have below is simply ideas, not a formal spec. I haven't worked
out the syntax and semantics completely. I'll define some primitives,
and try to implement the scenarios using those primitives.

Lets start with operation definitions that can be used in metadata:

insert a row to a table:

{ $insert_row : { table : <tableName>,
                  fields: [ <fields> ],
                  columns: {
                     column : field,
                    ...
                  }
       }
}

This procedure inserts a row to a table using the current document.
  - table: The table name to insert a row
  - fields: The array of field names to insert. The column names for these
    fields will be obtained from the field mappings. If not given, all
    non-null fields are assumed.
      $non-null-fields can be used to represent all non-null fields
      $all-fields can be used to represent all fields
      [ $non-null-fields, x ] can be specified. This means, all non-null fields, plus 'x'
      A field can be explicitly assigned a value:
           [ $non-null-fields, { field:updated, value=1 } ]
      A field can be inserted with a sql statement:
           [ $non-null-fields, { field:somefield, sql="toupper(?)" } ]
           [ $non-null-fields, { field:somefield, sql="<sql clause>", bindings: [binding1, binding2,...] } ]
              where bindings can be  
                 - { field: <fieldName> }
                 - { value: <value> }
  - columns: Any additional columns that are not mapped. For each column,
    the colum name and field mapping is given. Here, temporary variables
    can be used.
       column : field
       column : { value : <value> }
       column : { sql : <sql clause>, bindings: [ binding1, binding2, ... ] }
             where bindings can be:
                - { field: <fieldName> }
                - { value: <value> }
               


Update a row:

{ $update_row : { table: <tableName>,
                  fields: [ <fields> ],
                  columns : { ... },
                  where : { where clause }
          }
}

  - table: The table name to update a row
  - fields: The fields to update. $all-modified-fields can be used.
  - columns: Same as above
  - where: Defaults to a WHERE clause written by the identifiers of the entity for tableName.
    Can specify a WHERE clause (without the "WHERE") with bindings:

       ..., where: { sql:"id=? and active=?", bindings: [ {field: id}, {value:true}] }


Delete a row:

{ $delete_row : { table: <tableName>,
                  where: { where clause }
                }
}


Iteration:

{ $foreach : { field: <array field name>,
               elem: <iteration temp variable name>,
               do:  <operation to perform> } }

This should be used to iterate through the elements of an embedded array.

  field: The array field whose fields will be iterated
  elem : This corresponds to the counter variable. At each iteration, elem points to the next array element.
  do : Operations to perform for each iteration

For instance:

{ $foreach : { field: arr, elem:x, do : { $insert_row : { table : mytable } } } }

This will insert a row to 'mytable' for every element of arr. At each iteration, the variable 'x'
will contain an element of 'arr', and all non-null fields of 'x' will be inserted.


Scenario: array no PK

Document with an array field. Array of strings.

Create Table BASE (
         ID NUMBER(19) NOT NULL,
         A_FIELD VARCHAR2(256),
         PRIMARY KEY (ID)
        );
       
        Create Table ARRAY_STRING_WITHOUT_PK (
         BASE_ID NUMBER(19) NOT NULL,
         S_FIELD VARCHAR2(256),
         FOREIGN KEY (BASE_ID) REFERENCES BASE(ID)
        );

insert:

{
 "id": "123456",
 "a": "b",
 "arrayStringWithoutPk": [
   "one",
   "two",
   "three"
   ]
 }


 [ { $insert_row : { table:BASE, fields: [$nonnull_columns ] } },
   { $for_each : { field: arrayStringWithoutPk, elem: x, do: {
            $insert_row : { table:ARRAY_STRING_WITHOUT_PK,
                            fields: [ x ],
                            columns: {  base_id:  $parent._id }
                          }
            } } }
 ]



update:
 "update": { "$append": { "arrayStringWithoutPk": [ "four"  ] }


New primitive:

{ $collection_update : { field: <collectionField>,
                         table: <tableName>,
                         retrieval: <sql script that retrieves the collection>,
                         inserted_rows: <Script that will be called for each inserted row>,
                         updated_rows: <Script that will be called for each updated row>,
                         deleted_rows: <Script that will be called for each deleted row>
                          } }

This does the following:
  - Using 'retrieval' criteria,  retrieves a collection of rows from table 'tableName'
  - Computes a list of inserted rows, updated rows, and deleted rows by comparing the loaded collection and 'collectionField'
  - inserts/updates/deletes rows using the scripts


So the update procedure should look like this:

[  { $update_row : { table: BASE, columns: [ $modified_columns ] } },
   { $collection_update : { field : arrayStringWithoutPk,
                            table: ARRAY_STRING_WITHOUT_PK,
                            retrieval: $select { from : ARRAY_STRING_WITHOUT_PK,
                                                     columns : [ $mapped_columns, { column: base_id, field: $parent_id} ],
                                                      where : { q: "base_id=?", bindings: [ $parent_id ] } },
                            inserted_rows: { $insert_row : {...} },
                            updated_rows: { $update_row: {...} },
                            deleted_rows: { $delete_row: {...} } } }
]




Reply | Threaded
Open this post in threaded view
|

Re: Rethinking RDBMS - overview and one scenario

jewzaam
Administrator
Insert with sql statement such as to_upper will mean maintaining that data transformation in many places.

Composite documents (representing many tables) will require many operations defined for each, just noting as something that'll be required.

Confused about fields vs columns in the metadata.  Is 'fields' representing what is expected to be inserted from metadata and 'columns' is to populate columns that are not mapped to metadata?

Where clauses for update, save, and delete should be driven by the client's request, not by something in metadata.  What if metadata has where: { sql:"id=?" ... } and my delete request is against something else?  You're generating the where clause from the request in that case and should for all requests.  If there's a subset of data that needs to be retrieved, such as in your example of requiring "active" to be true, we could look at using updatable views (https://docs.oracle.com/cd/E17952_01/refman-5.1-en/view-updatability.html) rather than re-implementing that in lightblue.

While I understand what you're proposing it doesn't look like metadata I would want to maintain.  I see value in the ability to call out PL/SQL, a scenario I didn't dig into yet, but I don't see it being the main use case.  The problems you outline initially are the key issues with what I have proposed so far.  Additional data could be added in metadata to describe these gaps in defining relationships.  The N-M relationship with additional data is an intersting case to dig into more.
Reply | Threaded
Open this post in threaded view
|

Re: Rethinking RDBMS - overview and one scenario

bserdar
insertion with to_upper: that can be maintained in the field mapping
only. That example is to illustrate what can be supported.

composite docs: for trivial mappings, the complete procedure can be
generated. This kind of mapping should be required only for
non-trivial mappings.

fields vs columns: fields is from the doc, columns is for columns that
are not mapped. Better ways of representing that can be found.

Where clauses: there should be sensible defaults for building where
clauses. This type of manual where clause building should be only
necessary if you can't write a where clause from the information
mapped in metadata.

As I said before: when possible, the insert/update/delete procedures
can be auto-generated using metadata mappings. This kind of metadata
specification should be necessary if the procedures cannot be
auto-generated, or if there are things like pl/sql calls. However,
even if we auto-generate the mappings, these are the procedures we'd
need. So, I consider this a two level problem:
  - A procedural description of how to perform a certain function.
  - A generation scheme that generates procedures from metadata definitions

What  I have in the document is #1.


On Wed, Feb 25, 2015 at 1:10 PM, jewzaam [via lightblue-dev]
<[hidden email]> wrote:

> Insert with sql statement such as to_upper will mean maintaining that data
> transformation in many places.
>
> Composite documents (representing many tables) will require many operations
> defined for each, just noting as something that'll be required.
>
> Confused about fields vs columns in the metadata.  Is 'fields' representing
> what is expected to be inserted from metadata and 'columns' is to populate
> columns that are not mapped to metadata?
>
> Where clauses for update, save, and delete should be driven by the client's
> request, not by something in metadata.  What if metadata has where: {
> sql:"id=?" ... } and my delete request is against something else?  You're
> generating the where clause from the request in that case and should for all
> requests.  If there's a subset of data that needs to be retrieved, such as
> in your example of requiring "active" to be true, we could look at using
> updatable views
> (https://docs.oracle.com/cd/E17952_01/refman-5.1-en/view-updatability.html)
> rather than re-implementing that in lightblue.
>
> While I understand what you're proposing it doesn't look like metadata I
> would want to maintain.  I see value in the ability to call out PL/SQL, a
> scenario I didn't dig into yet, but I don't see it being the main use case.
> The problems you outline initially are the key issues with what I have
> proposed so far.  Additional data could be added in metadata to describe
> these gaps in defining relationships.  The N-M relationship with additional
> data is an intersting case to dig into more.
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://dev.forum.lightblue.io/Rethinking-RDBMS-overview-and-one-scenario-tp352p353.html
> To start a new topic under lightblue-dev, email
> [hidden email]
> To unsubscribe from lightblue-dev, click here.
> NAML
Reply | Threaded
Open this post in threaded view
|

Re: Rethinking RDBMS - overview and one scenario

jewzaam
Administrator
Thanks for clarifying, I missed the point about generation being the norm for some reason.

I don't really agree with where clause definition.  If it's not required then it's fine, I don't see needing to define something beyond what the client requests for most cases.  The exception would be around pl/sql generation, which will not necessarily have a where clause unless it's a proc/function as part of another operation (select, insert, etc).

On Wed, Feb 25, 2015 at 3:24 PM bserdar [via lightblue-dev] <[hidden email]> wrote:
insertion with to_upper: that can be maintained in the field mapping
only. That example is to illustrate what can be supported.

composite docs: for trivial mappings, the complete procedure can be
generated. This kind of mapping should be required only for
non-trivial mappings.

fields vs columns: fields is from the doc, columns is for columns that
are not mapped. Better ways of representing that can be found.

Where clauses: there should be sensible defaults for building where
clauses. This type of manual where clause building should be only
necessary if you can't write a where clause from the information
mapped in metadata.

As I said before: when possible, the insert/update/delete procedures
can be auto-generated using metadata mappings. This kind of metadata
specification should be necessary if the procedures cannot be
auto-generated, or if there are things like pl/sql calls. However,
even if we auto-generate the mappings, these are the procedures we'd
need. So, I consider this a two level problem:
  - A procedural description of how to perform a certain function.
  - A generation scheme that generates procedures from metadata definitions

What  I have in the document is #1.


On Wed, Feb 25, 2015 at 1:10 PM, jewzaam [via lightblue-dev]
<[hidden email]> wrote:

> Insert with sql statement such as to_upper will mean maintaining that data
> transformation in many places.
>
> Composite documents (representing many tables) will require many operations
> defined for each, just noting as something that'll be required.
>
> Confused about fields vs columns in the metadata.  Is 'fields' representing
> what is expected to be inserted from metadata and 'columns' is to populate
> columns that are not mapped to metadata?
>
> Where clauses for update, save, and delete should be driven by the client's
> request, not by something in metadata.  What if metadata has where: {
> sql:"id=?" ... } and my delete request is against something else?  You're
> generating the where clause from the request in that case and should for all
> requests.  If there's a subset of data that needs to be retrieved, such as
> in your example of requiring "active" to be true, we could look at using
> updatable views
> (https://docs.oracle.com/cd/E17952_01/refman-5.1-en/view-updatability.html)
> rather than re-implementing that in lightblue.
>
> While I understand what you're proposing it doesn't look like metadata I
> would want to maintain.  I see value in the ability to call out PL/SQL, a
> scenario I didn't dig into yet, but I don't see it being the main use case.
> The problems you outline initially are the key issues with what I have
> proposed so far.  Additional data could be added in metadata to describe
> these gaps in defining relationships.  The N-M relationship with additional
> data is an intersting case to dig into more.
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://dev.forum.lightblue.io/Rethinking-RDBMS-overview-and-one-scenario-tp352p353.html
> To start a new topic under lightblue-dev, email
> To unsubscribe from lightblue-dev, click here.
> NAML
If you reply to this email, your message will be added to the discussion below:
To start a new topic under lightblue-dev, email [hidden email]
To unsubscribe from lightblue-dev, click here.
NAML