Series 1 Part 6 - Big Data; Formats Structured, Unstructured, and Semi-structured

Vj
Jan 18, 2020
2 min read

Updated: Jan 25, 2020

There are multiple source formats for #BigData; Some of those formats are #Structured, #SemiStructured, #QuasiStructured, and #UnStructured data.

Note: If you did not get to read the previous post in this series click the link: Series 1 Part 5 https://www.abigdatablog.com/post/series-1-part-5-big-data-the-v-s-volume-velocity-variety

The introduction to source data formats in Big Data:

In the #BigData world, the data comes in different formats: Big Data is capable of handling beyond traditional structured data formats. The data sources like text, streaming, audio, video, and #IoT changed the nature of the data collected for today's business.

The following are the main data source formats in general for Big Data environment:

Structured Data:

The Data that can be stored and processed in a fixed format called Structured Data. Data stored in a relational database management system RDBMS is one example of 'structured' data. It is easy to process structured data as it has a fixed schema. Structured Query Language (SQL) is often used to manage such kind of Data.

Semi-Structured Data:

Semi-Structured Data is a type of data that does not have a formal structure of a data model, i.e., a table definition in a relational #DBMS. Still, nevertheless, it has some organizational properties like tags and other markers to separate semantic elements that make it easier to analyze. #XML files or #JSON documents are examples of semi-structured data.

Unstructured Data:

The Data which have unknown form and cannot be stored in #RDBMS and cannot be analyzed unless it transformed into a structured format called unstructured data. Text Files and multimedia contents like images, audios, videos are an example of unstructured data. The unstructured Data is growing quicker than others; experts say that 80 percent of the data in an organization is unstructured.

Conclusion:

Big Data sources data in partially structured and usually un-structured; because the sources of the data are XML, JSON, Text, Audio, Video, and sometimes RDBMS. The nature of the source makes Big Data challenging to collect, process, and store.

Series 1 Part 6 - Big Data; Formats Structured, Unstructured, and Semi-structured

The introduction to source data formats in Big Data:

The following are the main data source formats in general for Big Data environment:

Conclusion:

Recent Posts

Comentarios