Research Document

Database research for fARm

Author: Kyle van Raaij

Preface

In our project we need a way to keep track off all the scannable cards in the game, and allow for new cards to be added. To fulfill this requirement we are going to use an external database. There are a lot of different databases available for this and to make sure we pick one that’s right for our project we first conduct a research. In this document you will be able to read the different steps and results from the research.

Different kinds of Databases

There are different kinds of databases that could be used and it’s important that the project group decides what type of database is going to be used. The following different kinds of databases are:

Centralised database

The data is stored at a centralized location and the users from different locations can access this data. This type of database contains application procedures that help the user to access the data from a remote location. “Various kinds of authentication procedures are applied for the verification and validation of end users, likewise, a registration number is provided by the application procedures which keep a track and record of data usage” (Boyini, 2018b).

Distributed database

Just opposite of the centralized database concept, the distributed database has contributions from the common database as well as the information captured by local computers. The data is not at one place and is distributed at various sites of an organization. These sites are connected to each other with the help of communication links which helps them to access the distributed data easily. Distributed database are as one in which various portions of a database are stored in multiple different locations(physical) along with the application procedures which are replicated and distributed among various points in a network. “There are two kinds of distributed database, homogeneous and heterogeneous. The databases which have same underlying hardware and run over same operating systems and application procedures are known as homogeneous DDB. Whereas, in a heterogeneous DDB, the operating systems, underlying hardware as well as application procedures can be different at various sites” (Sam, 2018a)

Personal database

Data is collected and stored on personal computers which is small and easily manageable. The data is generally used by the same department of an organization and is accessed by a small group of people (Boyini, 2018g)

End-user database

The end user is usually not concerned about the transaction or operations done at various levels and is only aware of the product which may be a software or an application. Therefore, this is a shared database which is specifically designed for the end user, just like different levels’ managers. Summary of whole information is collected in this database. (Sam, 2018b)

Commercial database

These are the paid versions of the huge databases designed, developed and maintained by a commercial entity. Access to such commercial databases is provided through commercial links. (Boyini, 2018d)

NoSQL database

These are used for large sets of distributed data. There are some big data performance issues which are effectively handled by relational databases, such kind of issues are easily managed by NoSQL databases. “They are very efficient in analyzing large size unstructured data that may be stored at multiple virtual servers of the cloud” (Sam, 2018c).

Operational database 

Information related to operations of an enterprise is stored inside this database. These databases are used to update data in real-time and allow the users to do more that simply view archived data. Functional lines like marketing, employee relations, customer service etc. require such kind of databases.(Boyini, 2018f)(Wikipedia contributors, 2019h)

Relational database

These databases are categorized by a set of tables where data gets fit into a pre-defined category. The table consists of rows and columns where the column has an entry for data for a specific category and rows contains instance for that data defined according to the category. The Structured Query Language (SQL) is the standard user and application program interface for a relational database. “There are various simple operations that can be applied over the table which makes these databases easier to extend, join two databases with a common relation and modify all existing applications” (Sam, 2018e).

Cloud database 

Now a day, data has been specifically getting stored over clouds also known as a virtual environment, either in a hybrid cloud, public or private cloud. A cloud database is a database that has been optimized or built for such a virtualized environment. “There are various benefits of a cloud database, some of which are the ability to pay for storage capacity and bandwidth on a per-user basis, and they provide scalability on demand, along with high availability” (Boyini, 2018c). A cloud database also gives enterprises the opportunity to support business applications in a software-as-a-service deployment.

Object-oriented database 

An object-oriented database is a collection of object-oriented programming and relational database. There are various items which are created using object-oriented programming languages like C++, Java which can’t be stored in relational databases, but object-oriented databases are well-suited for those items. “An object-oriented database is organized around objects rather than actions, and data rather than logic. For example, a multimedia record in a relational database can be a definable data object, as opposed to an alphanumeric value” (Sam, 2018d).

Graph database

The graph is a collection of nodes and edges where each node is used to represent an entity and each edge describes the relationship between entities. “A graph-oriented database, or graph database, is a type of NoSQL database that uses graph theory to store, map and query relationships“ (Boyini, 2018e). Graph databases are basically used for analyzing interconnections. For example, companies might use a graph database to mine data about customers from social media. 

Result

Only a few of these database types can really be used in the project. Since we need a database that is able to hold a lot of records which all our users can access we need to decide between either Relational databases or NoSQL databases. Next we’ll research the advantages and disadvantages of using each database.

Relational databases

The main advantages of relational databases are that they enable users to easily categorize and store data that can later be queried and filtered to extract specific information for reports. Relation databases are easy to extend and aren’t reliant on the physical organization. When the database is created, a new data category can be added without all existing applications being altered.

Data integrity: Data integrity is an essential feature of the relational model. Strong data typing and validity checks ensure that data fall within acceptable ranges, and required data is present. (Lee, 2018),

Users: Multiple users can access the same database and Complex queries are easy for users to carry out.

Flexibility: The relational data model is naturally scalable and extensible, providing a flexible structure to meet changing requirements and increasing amounts of data.(Lee, 2018)

Scale: Relational databases do not scale out horizontally very well. The reason for this are the concurrency and data size, vertical scaling can be done unless sharding is used. (GridGain, 2018)

NoSQL databases

NoSQL databases were created in response to the limitation of traditional relational database technology. When compared against relational databases, NoSQL databases are more scalable and provide better performance. Their data model addresses several shortcomings of the relational model.

Large data: Large volumes of structured, semi-structured; and unstructured data can be handled in a NoSQL database. (mongoDB, 2019)

Efficiënt: Efficient, scale-out architecture instead of expensive, monolithic architecture. (mongoDB, 2019)

Key Differences

BeschrijvingNOSQLRelational
Column oriented databasesX
Row oriented databases
X
Crossover databasesXX
Data structure should be known in advance
X
More features
X
Speed is fasterX
Faster with complex queries X
Scalability is betterX

Database Type Choice

We choose to use a NoSQL database for our project since we need a database which is highly and easily scalable. The most important part of the database we choose is how it handles huge amount of data. NoSQL databases are designed to expand horizontally, instead of needing bigger and more expensive servers there are being scaled by adding more machines into the pool of resources. Maintaining a NoSQL server is also less expensive because there are less to maintain. (Farias, 2018)

NoSQL Databases 

There are a lot of different NoSQL databases available for us to choose from. To help us decide we made a list, containing the highest rated NoSQL databases with their advantages and disadvantages.(i’m programmer, 2019)(Loifman, 2019)

more machines into the pool of resources. Maintaining a NoSQL server is also less expensive because there are less to maintain. (Farias, 2018)

Cassandra

Cassandra was developed at Facebook for inbox search. Cassandra is a free and open-source, distributed data storage system for handling very large amounts of structured data across many commodity servers, providing high availability with no single point of failure. It is very scalable and resilient. Cassandra is easy to master and simple to configure, providing neat solutions for quite complex problems. Cassandra is written in Java and uses Cassandra Query Language (CQL), which is a SQL-like language for querying Cassandra database. Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, Rackspace, eBay, Twitter, Netflix and more (Fedak, 2018)(Wikipedia contributors, 2019)

Features

  • Linearly scalable
  • Maintains a quick response time
  • Supports properties like Atomicity, Consistency, Isolation, and Durability (ACID)
  • Supports MapReduce with Apache Hadoop
  • Maximal flexibility to distribute the data
  • Peer-to-peer architecture
ProsCons
Highly scalableLimited support for aggregations
No single point of failureUnpredictable Performance
Multi-DC ReplicationDoesn’t Support ad-hoc query
Integrate tightly with other JVM based applications
More suitable for multiple data-center deployments, redundancy, failover and disaster recovery

Redis 

Redis (Remote Dictionary Server) is the most famous key-value store. It is an in-memory data structure project implementing a distributed, in-memory key-value database with optional durability. Redis supports different kinds of abstract data structures, such as strings, lists, maps, sets, sorted sets, HyperLogLogs, bitmaps, streams and spatial indexes. Redis is composed in C language and has support for C++, PHP, Ruby, Python, Perl, Scala. It is authorized under BSD. Redis can handle up to 2³² keys and was tested in practice to handle at least 250 million keys per instance. It is an in-memory but persistent on-disk database. (Wikipedia contributors, 2019b)

Features

  • Automatic failover
  • Holds its database entirely in the memory
  • Transactions
  • Lua scripting
  • Replicate data to any number of slaves
  • Keys with a limited time-to-live
  • LRU eviction of keys
  • Supports Publish/Subscribe
ProsCons
Supports a huge variety of data typesDoesn’t support joins
Easy to installKnowledge required of Lua for stored procedures
Very fast(perform about 110000 SETs per second, about 81000 GETs per second)the dataset has to fit comfortably in memory
Operations are atomic
Multi-utility tool(used in a number of use cases)
Redis Sentinel is featured provided by Redis to create replication into a distributed system.

MongoDB 

MongoDB is the most well known among NoSQL databases. It is an open-source, cross-platform NoSQL database which is document-oriented. MongoDB is written in C++ and uses JSON like documents to store any data. MongoDB can be utilized as the file system and JavaScript can be utilized as the query language. MongoDB has amazing performance and by utilizing sharding it scales horizontally. It is available in both Community and Enterprise Edition. (Wikipedia contributors, 2019b)

Features

  • Provides high performance
  • Auto-sharding
  • Run over multiple servers
  • Supports Master-Slave replication
  • Data is stored in the form of JSON style documents
  • index any field in a document
  • It has an automatic load balancing configuration because of data placed in shards
  • Supports regular expression searches
  • Easy to administer in the case of failures
ProsCons
Easy to setup MongoDBDoesn’t support joins
MongoDB Inc. provides professional support to its clientsData Size is High
Support ad-hoc queryNesting of documents is limited
High-Speed DatabaseIncrease unnecessary usage of memory
Schema-less database
Horizontally scalable database
Performance is very high

Couchbase 

Couchbase Server is a open-source, distributed multi-model NoSQL document-oriented database for interactive web applications. It has a flexible data model, is easily scalable, provides consistently high performance. The focus from Couchbase is on the ease of use, embracing the web. It has a flexible data model and provides consistently high performance. Couchbase Server is designed to provide easy-to-scale key-value or JSON documents access with low latency and high sustained throughput. (Wikipedia contributors, 2019b)

Features

  • Auto-FailoverDeploying and Managing Couchbase at Scale With Kubernetes
  • Index partitioning
  • Support JSON data natively via N1QL queries
  • Data Compression
  • Couchbase Eventing Service
ProsCons
Aggregate optimizationCouchbase is not open source
Reduces the cost of network, memory, and storage
Great admin panel that provides tons of insights into how your cluster is performing

Amazon DynamoDB 

DynamoDB uses a NoSQL database model, which is nonrelational, allowing documents, graphs, and columnar among its data models. DynamoDB uses synchronous replication across multiple data centers for high durability and availability. Each DynamoDB query is executed by a primary key identified by the user, which uniquely identifies each item. It also relieves the customer from the burden of operating and scaling a distributed database. Hence, hardware provisioning, setup, configuration, replication, software patching, cluster scaling, etc. is managed by Amazon. (Wikipedia contributors, 2019a)

Features

  • High Scalable
  • Hash-Range for indexing a range of values
  • Stores data in partitions
  • Utilizes JSON as a transport protocol, not as a storage format
ProsCons
Easy to set upDoesn’t back up your tables for free
Provide a low-level AWS DynamoDB APISize limit
Auto-scaling
Reduces the complexity of managing the high availability and scaling for peak usage times.
Encryption at rest
Security for DynamoDB is governed by AWS Identity

HBase

HBase is an open-source, distributed and non-relational database which is designed for the BigTable database by Google. It provides a fault-tolerant way of storing large quantities of sparse data. One of the main goals of HBase is to host billions of rows and columns. You can add servers anytime to increase capacity. HBase is composed in Java 8. It’s authorized under Apache. (Wikipedia contributors, 2019a)

Features

  • Support automatic failure
  • Linearly scalable
  • Provides data replication
  • Integrates with Hadoop, both as a source and a destination
ProsCons
Provides fast lookups for larger tables.Doesn’t support transaction
Provides low latency access to single rows from billions of recordsNo permissions or built-in authentication
Easy Java API for clientIndexed and sorted only on key.
Auto-shardingSingle point of failure (when only one HMaster is used)
License-freeDoesn’t support for SQL structure
Handle large datasets on top of HDFS file storageMemory issues on the cluster
Flexible on schema design
High-speed

Neo4j 

Neo4j is referred to as an ACID-compliant transactional database with native graph storage and processing because it effectively implements the property graph model down to the storage level. This means that the data is stored exactly as you whiteboard it, and the database uses pointers to navigate and traverse the graph. Neo4j has both a Community and Enterprise Edition. (Wikipedia contributors, 2019a)

Features

  • It supports UNIQUE constraints
  • Neo4j supports full ACID(Atomicity, Consistency, Isolation, and Durability) rules
  • Java API: Cypher API and Native Java API
  • Indexes by using Apache Lucence
  • Easy query language Neo4j CQL
  • Contains a UI to execute CQL Commands: Neo4j Data Browser
ProsCons
Easy to retrieve its adjacent node or relationship details without Joins or IndexesDoesn’t support Sharding
Easy to learn Neo4j CQL query language commands
Not require complex Joins to retrieve data
Represents semi-structured data very easily
High availability for large enterprise real-time applications
Simplified tuning

Oracle NoSQL 

Oracle NoSQL database implements a map from user-defined keys to opaque data items. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring. It is design for the cloud and can be hosted on a single server or multiple servers, and it enables the management of databases holding billions of records. Some of the features of the latest version include a grid framework and the use of both physical and logical structures. Oracle Database provides customers with a high-performance, reliable and secure platform to easily and cost-effectively modernize their transactional and analytical workloads either in the Cloud, on-premises or in a Hybrid Cloud configuration. (Wikipedia contributors, 2019a)

Features

  • Oracle NoSQL Database handle big data
  • Supports SQL, and it can be accessed from Oracle relational databases
  • Oracle NoSQL Database using Java/C API to read and write data
  • Distributed database
  • Provides access to the data through the node for the requested key.
ProsCons
Based on PL/SQL Programming constructHigh cost for small organizations
Peer to peer communities help to solve all problemsRequire significant resources for installation
Oracle database is secure and ensures that user data is not tampered with through prompt updates.Hardware upgrades may be required to even implement Oracle

Takes up a lot of space


Memcached

It is an open source, high-performance, distributed memory caching systems intended to speed up dynamic web applications by reducing the database load. It is a key-value dictionary of strings, objects, etc., stored in the memory, resulting from database calls, API calls, or page rendering. It is now being used by Netlog, Facebook, Flickr, Wikipedia, Twitter and YouTube.

(Wikipedia contributors, 2019)

Features

  • Client-server application over TCP or UDP
  • Reduces the database load
  • Memcached server is a big hash table
  • Efficient for websites with high database load
  • Distributed under Berkeley Software Distribution license
  • Combine memory caches into a logical pool
ProsCons
Installation is fastOnly supported on Linux operating systems and systems that are similar to BSD
Widely documented with a huge communityDoesn’t support data redundancy

Doesn’t support for locks, read-through, CAS

CouchDB

It is an Open Source NoSQL database which utilizes JSON to store information and JavaScript as its query language. CouchDB implements a form of multiversion concurrency control (MVCC) so it does not lock the database file during writes. Conflicts are left to the application to resolve. It applies a type of Multi-Version Controlling system for avoiding the blockage of the DB file during writing. It is authorized under Apache and was ranked 1st for the Best NoSQL Database 2016 list for popularity. (Wikipedia contributors, 2019g)

Features

  • Map/Reduce List and Show
  • Provide database-level security
  • Authentication opens via a session cookie like a web application
  • JSONP for Free
  • Follow document storage
  • Support ACID Properties
  • Provide the simplest form of replication
  • Browser-based GUI to handle your data, permission, and configuration
ProsCons
Map/Reduce, querying data is somewhat separated from the data itselfArbitrary queries are expensive
Store any JSON dataA bit of extra space overhead with CouchDB

Doesn’t support XML

Sources

Leave a Reply

Your email address will not be published. Required fields are marked *