Should we use embedding for association (aka joins)?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Should we use embedding for association (aka joins)?

jewzaam
Administrator
One thing to consider with MongoDB is that when you start doing "joins" it is in software.  We know this, there are no joins.  When designing models we need to keep this in mind.  Nothing new.  But, what about embeding data in the model instead of only doing aggregation (aka "association" now)?  Basically wondering if there's a way to keep a subset of data from a reference embeded in documents instead of having to load the referenced document every time.  Maybe it could be populated as requested.  And keep track of how volitile the referenced document is.  So if I have a subscription that references product, I would embed the data for product that is requested frequently by lightblue clients.  In the product model we keep track of how often the document is updated.  Frequency of updates could be converted to some "volitility index".  The reference and embeded data in subscription could use the volitility index to decide how often to refresh data in subscription.  This isn't a refresh in a background job but simply at some point don't trust the data that is embeded, grab it from the source before sending it back to the client.  I assume this would mean also updating it in the subscription document.
 
Some steps to help clarify what I mean as I suspect I'm rambling a bit:
1. Create product X.
2. Create subscription Y with a reference to product X.
    1. No data is embeded in subscription Y.
3. Client A requests subscription Y with a subset of product X's data (sku, created date)
    1. subscription Y is loaded from mongodb
    2. product X data is not populated in subscription Y, therefore must fetch from product entity.. load the data!
    3. update subscription Y to embed the data requested by the client and write to database
    4. return response to client A
4. Client B requests subscription Y with a different subset of product X's data (sku, description)
    1. subscription Y is loaded from mongodb
    2. product X data has part of the data embeded but not everything, therefore must fetch from product entity.. load the data!
    3. update subscription Y to embed the data requested by the client along with any data that was already embeded and write to database
    4. return response to client B
5. Client C requests subscription Y with subset of product X's data (sku, created date, description)
    1. subscription Y is loaded from mongodb
    2. product X data is embed and not stale (assuming this)
    3. return response to client B
Reply | Threaded
Open this post in threaded view
|

Re: Should we use embedding for association (aka joins)?

bserdar
We can't embed mutable parts of an entity into another one because we
don't know when that  embedded part becomes stale. Also, when the data
is stale, refreshing it becomes even more of a problem, because the
amount of data that needs refreshing is potentially huge. Also, users
probably won't tolarate a long refresh time for something critical in
the embedded content.

We will not load referenced entities unles they are projected anyway.


On Tue, Aug 12, 2014 at 8:27 AM, jewzaam [via lightblue-dev]
<[hidden email]> wrote:

> One thing to consider with MongoDB is that when you start doing "joins" it
> is in software.  We know this, there are no joins.  When designing models we
> need to keep this in mind.  Nothing new.  But, what about embeding data in
> the model instead of only doing aggregation (aka "association" now)?
> Basically wondering if there's a way to keep a subset of data from a
> reference embeded in documents instead of having to load the referenced
> document every time.  Maybe it could be populated as requested.  And keep
> track of how volitile the referenced document is.  So if I have a
> subscription that references product, I would embed the data for product
> that is requested frequently by lightblue clients.  In the product model we
> keep track of how often the document is updated.  Frequency of updates could
> be converted to some "volitility index".  The reference and embeded data in
> subscription could use the volitility index to decide how often to refresh
> data in subscription.  This isn't a refresh in a background job but simply
> at some point don't trust the data that is embeded, grab it from the source
> before sending it back to the client.  I assume this would mean also
> updating it in the subscription document.
>
> Some steps to help clarify what I mean as I suspect I'm rambling a bit:
> 1. Create product X.
> 2. Create subscription Y with a reference to product X.
>     1. No data is embeded in subscription Y.
> 3. Client A requests subscription Y with a subset of product X's data (sku,
> created date)
>     1. subscription Y is loaded from mongodb
>     2. product X data is not populated in subscription Y, therefore must
> fetch from product entity.. load the data!
>     3. update subscription Y to embed the data requested by the client and
> write to database
>     4. return response to client A
> 4. Client B requests subscription Y with a different subset of product X's
> data (sku, description)
>     1. subscription Y is loaded from mongodb
>     2. product X data has part of the data embeded but not everything,
> therefore must fetch from product entity.. load the data!
>     3. update subscription Y to embed the data requested by the client along
> with any data that was already embeded and write to database
>     4. return response to client B
> 5. Client C requests subscription Y with subset of product X's data (sku,
> created date, description)
>     1. subscription Y is loaded from mongodb
>     2. product X data is embed and not stale (assuming this)
>     3. return response to client B
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://lightblue-dev.1011138.n3.nabble.com/Should-we-use-embedding-for-association-aka-joins-tp28.html
> To start a new topic under lightblue-dev, email
> [hidden email]
> To unsubscribe from lightblue-dev, click here.
> NAML
Reply | Threaded
Open this post in threaded view
|

Re: Should we use embedding for association (aka joins)?

jewzaam
Administrator
Agree.  After more discussions with folks it doesn't make sense to embed.  It complicates things a lot.  It gets into knowing when individual fields become stale.  I can't remember all the points to discussions, but summary is don't embed.  If it became a performance problem later we could cache at a higher level, outside of lightblue, but it shouldn't be a concern of the core functionality.