top of page
Writer's pictureVj

Series 1 Part 6 - Big Data; Formats Structured, Unstructured, and Semi-structured

Updated: Jan 25, 2020

There are multiple source formats for #BigData; Some of those formats are #Structured, #SemiStructured, #QuasiStructured, and #UnStructured data.


Note: If you did not get to read the previous post in this series click the link: Series 1 Part 5 https://www.abigdatablog.com/post/series-1-part-5-big-data-the-v-s-volume-velocity-variety












The introduction to source data formats in Big Data:

In the #BigData world, the data comes in different formats: Big Data is capable of handling beyond traditional structured data formats. The data sources like text, streaming, audio, video, and #IoT changed the nature of the data collected for today's business.

The following are the main data source formats in general for Big Data environment:


Structured Data:

The Data that can be stored and processed in a fixed format called Structured Data. Data stored in a relational database management system RDBMS is one example of 'structured' data. It is easy to process structured data as it has a fixed schema. Structured Query Language (SQL) is often used to manage such kind of Data.

Semi-Structured Data:

Semi-Structured Data is a type of data that does not have a formal structure of a data model, i.e., a table definition in a relational #DBMS. Still, nevertheless, it has some organizational properties like tags and other markers to separate semantic elements that make it easier to analyze. #XML files or #JSON documents are examples of semi-structured data.

Unstructured Data:

The Data which have unknown form and cannot be stored in #RDBMS and cannot be analyzed unless it transformed into a structured format called unstructured data. Text Files and multimedia contents like images, audios, videos are an example of unstructured data. The unstructured Data is growing quicker than others; experts say that 80 percent of the data in an organization is unstructured.

Conclusion:

Big Data sources data in partially structured and usually un-structured; because the sources of the data are XML, JSON, Text, Audio, Video, and sometimes RDBMS. The nature of the source makes Big Data challenging to collect, process, and store.

249 views0 comments

Comentarios


bottom of page