There are multiple source formats for #BigData; Some of those formats are #Structured, #SemiStructured, #QuasiStructured, and #UnStructured data.
Note: If you did not get to read the previous post in this series click the link: Series 1 Part 5 https://www.abigdatablog.com/post/series-1-part-5-big-data-the-v-s-volume-velocity-variety
The introduction to source data formats in Big Data:
The following are the main data source formats in general for Big Data environment:
Structured Data:
The Data that can be stored and processed in a fixed format called Structured Data. Data stored in a relational database management system RDBMS is one example of 'structured' data. It is easy to process structured data as it has a fixed schema. Structured Query Language (SQL) is often used to manage such kind of Data.
Semi-Structured Data:
Semi-Structured Data is a type of data that does not have a formal structure of a data model, i.e., a table definition in a relational #DBMS. Still, nevertheless, it has some organizational properties like tags and other markers to separate semantic elements that make it easier to analyze. #XML files or #JSON documents are examples of semi-structured data.
Unstructured Data:
The Data which have unknown form and cannot be stored in #RDBMS and cannot be analyzed unless it transformed into a structured format called unstructured data. Text Files and multimedia contents like images, audios, videos are an example of unstructured data. The unstructured Data is growing quicker than others; experts say that 80 percent of the data in an organization is unstructured.
Conclusion:
Big Data sources data in partially structured and usually un-structured; because the sources of the data are XML, JSON, Text, Audio, Video, and sometimes RDBMS. The nature of the source makes Big Data challenging to collect, process, and store.
Comments