To consider what semi-structured data is, let's start with an analogy -- interviewing.
Let's say you're conducting a semi-structured interview. This, as the name implies, falls somewhere in-between a structured and unstructured interview.
For context, a structured interview is one in which the questions being asked, as well as the order in which they are asked, is pre-determined by your HR team and consistent for each candidate. An unstructured interview, on the other hand, is one in which the questions, and the order in which they are asked, is up to the discretion of the interviewer -- and could be entirely different for each candidate.
When you consider these two extremes, you can begin to see the benefits of semi-structured interviews, which are fairly consistent and quantitative (like a structured interview), but still provide the interviewer with a window for building rapport, and asking follow-up questions.
Semi-structured data is similar in nature to a semi-structured interview -- it's not as messy and uncontrolled as unstructured data, but not as rigid and readily quantifiable as structured data.
What Is Semi Structured Data
Semi-structured data is information that does not reside in a relational database or any other data table, but nonetheless has some organizational properties to make it easier to analyze, such as semantic tags. A good example of semi-structured data is HTML code, which doesn't restrict the amount of information you want to collect in a document, but still enforces hierarchy via semantic elements.
Here, we're going to explore the difference between structured, semi-structured, and unstructured data to ensure you have a good understanding of the terms.
Structured, Semi-Structured, and Unstructured Data
Structured data is known as quantitative data, and is objective facts and numbers that analytics software can collect -- this type of data is easy to export, store, and organize in a database such as Excel or SQL. Structured data is valuable because you can gain insights into overarching trends by running the data through data analysis methods, such as regression analysis and pivot tables.
Here's an example of structured data in an excel sheet:
Alternatively, semi-structured data does not conform to relational databases such as Excel or SQL, but nonetheless contains some level of organization through semantic elements like tags. For instance, consider HTML, which does not restrict the amount of information you can collect in a document, but enforces a certain hierarchy:
This is a good example of semi-structured data. As you can see, HTML is organized through code, but it's not easily extractable into a database, and you can't use traditional data analytics methods to gain insights.
Finally, unstructured data -- otherwise known as qualitative data. When it comes to marketing, unstructured data is any opinion or comment you might collect about your brand. While what your consumers are saying is undeniably important, you can't easily extract meaningful analytical data from those messages.
An example of unstructured data includes email responses, like this one:
Take a look at Unstructured Data Vs. Structured Data: A 3-Minute Rundown for more clarification on structured vs. unstructured data.
Semi Structured Data Examples
- CSV, XML and JSON documents
- NoSQL databases
- HTML
- Electronic data interchange (EDI)
- RDF