Let me take a swipe at it: a semantic layer helps express queries and their results in terms the end-consumers will care about / prefer to reason in, instead of whatever extremely correct and efficient atrocities the database nerds came up with.
Did I get that right?
> A semantic layer, also known as a metrics layer, lies between business users and the database, and lets those users compose queries in the concepts that they understand. It also governs access to the data, manages data transformations, and can tune the database by defining materializations.
There's also now a paper: https://arxiv.org/pdf/2406.00251
> There's a lot of information out there, including from myself about the history and rise [2022], comparing it to an MVC-like approach, or explaining its capabilities. That's why in this article I focus on the why and showcase how to use it in a practical example in the next chapter.
[1] https://www.ssp.sh/blog/rise-of-semantic-layer-metrics/ [2] https://cube.dev/blog/exploring-the-semantic-layer-through-t... [3] https://cube.dev/blog/universal-semantic-layer-capabilities-...
My one line definition that I use atm:
> A semantic layer acts as an intermediary, translating complex data into understandable user business concepts. It bridges the gap between raw data in databases (such as sales data with various attributes) and actionable insights (such as revenue per store or popular brands). This layer helps business users access and interpret data using familiar terms without needing deep technical knowledge. https://www.ssp.sh/brain/semantic-layer#semantic-layer-defin...
Edit: I'm the OP.
So I took a step back and tried to think about why one "feels" to a reader more like a definition than the other. I think it comes down to phrasing more than informational content. The definition you provide in your comment comes off, for lack of a better term, too much like a sales pitch.
Less is more when it comes to definitions, at least for defining terms in articles/blog posts like these.
Here's my attempt at a better (for this use case) definition:
A semantic layer is an interface to data stores that is designed to be queryable in terms relevant and familiar to those with knowledge of the business domain.
> A semantic layer is an interface to data stores that is designed to be queryable in terms relevant and familiar to those with knowledge of the business domain.
Sounds good to me, but I think it's too simplified. A semantic layer, IMO, does more. See Julian Hyde's definition, which is also similar to mine, and more involved as well:
> A semantic layer, also known as a metrics layer, lies between business users and the database, and lets those users compose queries in the concepts that they understand. It also governs access to the data, manages data transformations, and can tune the database by defining materializations.
> Like many new ideas, the semantic layer is a distillation and evolution of many old ideas, such as query languages, multidimensional OLAP, and query federation.
I appreciated your feedback. Will think a little more about it.
Defining a car as "a vehicular conveyance that helps people get from A to B" is similarly technically correct, but provides little help to the reader in determining if the thing they're looking at is a car or not.
Pivoting a decent sized BI shop toward using one instead of splashing the same SQL all over the place is *tough*. It's one of those: "the analyst could have been building important report for director and you want them to create re-usable logic??? we'll do that later, get report done now. Just copy/paste that SQL over here"
This is how you end up with the the 1000 model, "the numbers don't match up", hot mess situations that gain momentum and are hard to slow down.
Right now a lot of semantic tools introduce a big discontinuity in both workflows that keeps the two worlds separate.
we took the same approach when we started https://www.definite.app/.
Limiting support to only duckdb would make some really useful features trivial to implement. e.g. duckdb has a `json_serialize_sql` function that would handle a lot of the tedious parts of building a semantic layer.
it's purely meant to run SQL transformations in DuckDB in a reliable way with data lineage.
Semantic Layer is about decomposing views into dimensions and aggregates, then letting downstream apps/users compose their own views on top without having to redefine/re-calculate business level metrics.
This makes data analyis more flexible than sql views which are hardcoded on particular groupings.
create view active_cx as select * from customer join audit_events using(...) join ... where -- active condition
-- use active_cx wherever
select ... from orders join active_cx using(...) where ts > start_of_month() group by active_cx.id
Semantic Layer needs proper language and tooling support which Malloy provides.
I curate some more on here in case of interest: https://www.ssp.sh/brain/data-modeling-languages.
Shameless plug for the list, though - I work on https://github.com/trilogy-data/pytrilogy - semantic layer directly embedded in otherwise (mostly) SQL syntax.
I'll do an equivalent example on the taxi dataset when I have some time.
As one of the consumers of a "semantic layer" for many years now, I am firmly convinced that a "single source of truth" must either be useless or a lie.
Ok, the DBA has produced some joins that I can count up to decide how many "customers" we have. We immediately have the issue that a "customer count" from the semantic layer cannot always be the meaningful or relevant figure. In my experience, outside of the exllicit context it was written it, it cannot be the correct figure. So, I have my single source of truth customer count, but my revenue per customer needs to to use a different count that's slightly off. Another analyst needs to produce customer calls to our call center and that uses a slightly different definition. And so on, until the semantic layer is just a special database for pre-defined executive KPI dashboards and no more.
This is a good write up that doesn’t require DuckDB as it isn’t specific to a particular database.
Feature stores explored here: https://www.xorq.dev/blog/featurestore-to-featurehouse
I think my key takeaway building this is that we need better expression systems and Ibis is a great foundation to build yours..maybe you want to build a language for some other domain etc.
PS: I am one of the authors of bsl and co-founder of Xorq.
I am one of the authors of bsl and founder of xorq.