He said that the NoSQL movement helped the database community realize two things. First, not every application needs ACID and that relaxing ACID enabled scaling to Internet scale. Second, the tabular data organization is still good for much data but not good for all datasets. But as time goes on the strong SQL/NoSQL distinction will disappear, and DBMS customers will benefit from more choices.
The entity-relationship (ER) modeling techniques have been used for SQL databases for a long time but they don't work the same way for NoSQL databases. In the workshop, Hills discussed the Concept and Object Modeling Notation (COMN, pronounced "common"). COMN is used to represent the new data structures supported by different NoSQL databases.
He talked about the promise of COMN notation to represent the new multi-model NoSQL databases. It can be used by data modelers as well as programmers who can model their software in COMN right along with their data. Hills also discussed how to model schema-less databases.
InfoQ spoke with him about data modeling in NoSQL databases and COMN notation topics.
InfoQ: Can you please define Concept and Object Modeling Notation (COMN)?
Ted Hills: The Concept and Object Modeling Notation (COMN) is a data modeling notation that enables the expression of requirements, graphs and ontological predicates, logical data, software class structure, and NoSQL and SQL physical implementations in a familiar graphical notation (boxes and lines) that enables the modeling of the non-trivial mappings that exist between these layers in non-traditional implementations.
InfoQ: Can you talk about Concept and Object Modeling Notation (COMN) in the context of NoSQL databases and how data modeling is different from relational database modeling?
Hills: Entity-relationship (E-R) and other notations assume that data will ultimately be stored in tables. With the advent of NoSQL databases we now can store data in graphs and documents, as well as other tabular structures such as wide-column tables, column-oriented tables, and key/value pairs. We can no longer assume that a mapping from a logical data design to a physical implementation that is close to 1:1. Furthermore, modeling physical implementations, including modeling non-tabular structures, and even modeling queries, becomes more important than before. COMN enables the expression of this full variety of physical structures, and the non-trivial mapping to the data that they represent.
InfoQ: Is data modeling approach different for each NoSQL database type, e.g. wide-column database like Cassandra v. graph database like Neo4j?
Hills: Yes, the focus of data modeling is different for most of the NoSQL database types. Property graph data models focus on relationships but then annotate both nodes and relationships with data attributes. Knowledge graph data models also focus on relationships but add sub/super-type relationships. Document (XML and JSON) data models put hierarchical relationships at the forefront. So, although the focus of a physical data model changes with each NoSQL database type, COMN is just as effective in modeling each one. Further, it can represent all these non-traditional data structures along with tabular (which hasn’t gone away), and relate the physical model back to the logical data model that is, ideally, unaffected by physical representation choices.
InfoQ: Can you discuss the multi-model NoSQL databases and how they can help with data management of different data structures?
Hills: In the NoSQL world, you have to choose a physical representation for your data that is optimal for your application. Do you need random writes or writes only at the end of a log? Do you need your data organized around hierarchical document structures, or organized around relationships? Many NoSQL DBMSs offer you exactly one way to organize your data. If you need to change your data organization, or need more than one way to organize it, then you will have to change out your entire DBMS. This involves dealing with a different vendor, different support requirements, different languages and APIs, etc. It's a non-trivial database. If instead you use a hybrid DBMS that can support multiple data organizations, it becomes much easier to use multiple approaches to organizing your data, and it becomes much easier to change your mind.
InfoQ: How can microservices help with data modeling in general?
Hills: I wouldn’t say that microservices help with the data modeling task per se, but they do have a significant positive impact on data architecture. A microservice must be designed to be self-sufficient: it must always have all of the data it needs locally. This involves two types of data: data that the microservice creates and maintains, and data that the microservice must obtain from sources outside itself. The physical model of how data is stored outside the microservice doesn't matter to the microservice, but the model of how it arrives at the microservice does matter. That is probably as an XML or JSON document. A data model needs to represent that document structure as well as how the microservice will store the data, and needs to show the mapping between them, which might be non-trivial. COMN can express both models and their mapping.
InfoQ: You talked about states and representations in your conference presentation. Can you discuss how these concepts are modeled in the databases?
Hills: Every DBMS, whether NoSQL or SQL, ultimately represents data by mapping meaningless physical states (high-voltage and low-voltage, or on and off) to meaning. We call that mapping a physical representation. At one level higher, we use structures such as tables, graphs, and documents to represent relationships. It is key to understand that a logical data model should completely ignore such physical mapping issues. A logical data model should be focused purely on what the data means and how it logically represents the data in the problem domain. But when moving from the logical model to the physical model, physical representation design becomes paramount, as well as preserving that mapping from the physical model back to the logical model.
InfoQ: What are the emerging trends in NoSQL databases landscape?
Hills: The main trend is for the differences between NoSQL and SQL to become less and less. For starters, the term "NoSQL" started out meaning "no SQL", that is, no support for the standard Structured Query Language for tabular databases. Now, however, it means "not only SQL", meaning that more and more "NoSQL" DBMSs are supporting SQL. In the early days, NoSQL did not provide ACID-strength transactions, which are essential for financial applications. Now, many NoSQL DBMSs implement ACID. Simultaneously, some SQL DBMSs are allowing the relaxation of ACID, enabling them to scale to nearly the same level as some NoSQL DBMSs. Some hybrid DBMSs support tabular and non-tabular data organizations. What ought to happen eventually is that every DBMS supports a variety of physical data organizations, plus ACID and non-ACID ("BASE"), all selectable by the user. The fact that SQL was born in the tabular era, and no replacement has arrived yet, will hold back this complete transformation a bit. But COMN can cross all these data organizations.
Ted also said traditional modeling tool vendors have a limiting view of data models based on three layers and one application at a time, and that NoSQL modeling tools are focused on physical modeling to the exclusion of logical data models and real-world models. Tools like COMN can help with data modeling and the promise of COMN is in representing the new multi-model world of data management.
Further information on COMN, including a full specification, white papers, and Visio stencils, can be freely obtained from DATAVERSITY website.