Social Networking Timeline

Weekly Cloud Computing Assignment, Spring 2019

A Reddit dataset-based timeline page showing trending posts from followees; extracting user details from MySQL, friends from Neo4j, posts from MongoDB.

Overview

Heterogeneous Backends:

In Database-as-a-Service, cloud providers offer packaged database services which users can configure and deploy while the cloud provider performs the traditional database administration and maintenance functions. The database can seamlessly scale based on load. DBaaS also provide better monitoring and failure resiliency than typical off-the-shelf database installations, and can hence improve application availability and manageability.

Neo4j - Graph Database:

An open-source, NoSQL, native graph database that provides an ACID-compliant transactional backend which efficiently implements the property graph model down to the storage level. The data is stored exactly as we draw a graph on a whiteboard, and the database uses pointers to navigate and traverse the graph. I used Neo4j to store the friends data from the Reddit dataset.

MongoDB - Document Store:

Document Stores are schema-free; most columns or fields are optional and can be added or removed at any time. It is a subtype of key-value stores, but while the value content may be opaque to a key-value store, fields in a document store can be indexed. I used MongoDB to store the posts data from the Reddit dataset.

MySQL - RDBMS:

Traditional RDBMS, used to store the login data from the Reddit dataset.

Architecture Overview:

Technologies used:

  • Java
  • Servlet
  • MySQL
  • MongoDB
  • Neo4j
  • MySQL Connector/J
  • JavaScript
  • Cypher Query Language
  • Maven
  • Terraform
  • GCP

Login with SQL

Cloud SQL is a fully managed database service on the Google Cloud Platform. Cloud SQL provides a database infrastructure for applications running anywhere. It supports either MySQL or PostgreSQL. As authentication involves highly structured data and is normally done with a relational database, you will use MySQL in this scenario. When a user logs into the website with username and password, the backend server looks up the pair to see if there is a match in the SQL database table.

Social Graph using Neo4j

Designed to treat the relationships between data as equally important to the data itself, a graph database stores data like drawing a picture showing how each individual entity connects with or is related to others. As a result, the execution time for each query is only proportional to the size of the part of the graph traversed, rather than the size of the overall graph.
Graphs are naturally additive, which means we can add new nodes, new labels, and new relationships to an existing structure without disturbing existing queries and application functionality.

I took advantage of Neo4j and used the property graph model to organize the data as nodes, relationships, and properties.

Homepage using MongoDB

MongoDB stores sparse and unstructured data in BSON format (which is the binary version of JSON) and thus supports storage of complex data types. The document-oriented model makes MongoDB very easy to scale out in distributed servers.

I took advantage of MongoDB to build a database system to store all the comments on the social network.

I used BSON for the following three purposes:
Space efficiency: BSON occupies much less space than does plain JSON, even in the worst case.
Mobility: BSON sometimes introduces a small amount of overhead in the transferred data to ease transmission. For example, a size header is used in place of a terminating character to ease data modification.
Performance: BSON encoding and decoding are fast in the context of many programming languages.

Putting Everything Together

This query asked for the most popular 30 comments from the followees and the personal information of that given user. The website received a single username and responded with the user's UserName, Profile Image URL, list of Followers, and from the user's followees, along with 30 most popular comments and the parent and grandparent comments of each of them.

Learnings

Compared the advantages and disadvantages of SQL and NoSQL databases and their suitable applications domains.

Configured, populated and deployed several heterogeneous, SQL and NoSQL, databases in a social network web service context.

Designed effective database schema based on the requirements of an application.

Practiced writing efficient database queries using the Java API.

Integrated together SQL and NoSQL databases to build a complex social networking web application and practice writing complex fan-out queries that span multiple databases.

Explored whether a raw dataset requires an extraction, transformation, and loading (ETL) step prior to loading in a database.

Practiced using Terraform to orchestrate and manage a complex cloud architecture with heterogeneous database backends.

More Details

Academic Integrity restricts me to go into details about the implementation and my unique approach to the project. If you would like to learn about the details, please contact me.
I'm always happy to chat.