I’m quite looking forward to a future where we’ve finally accepted that all this stuff is just part of the domain and shouldn’t be treated like an ugly stepchild, and we’ve merged OLTP and OLAP with great performance for both, and the wolf also shall dwell with the lamb, and we’ll all get lots of work done.
Wide events are good, but watch out they don't become "god events". The event that every service needs to ingest, and, therefore, if there's new data that a service needs then we just add it onto the god event, because, conveniently, it's already being ingested. Before too long, the query that generates the wide event is getting so complex it's setting the db on fire. Like anything, there are trade offs; practical limits to how wide an event should reasonably become.
Maybe I’m missing something, but this doesn’t seem like what the article is talking about at all. These events are just telemetry — they’re downstream from everything, and no service is ingesting them or relying on them for actual operational data.
i wonder if there are any semi automated approaches to finding outliers or “things worth investigating” in these traces, or is it just eyeballs all the way down?
This is possible by semi-automatic detection of anomalies over time for some preset of fields used for grouping the events (aka dimensions) and another preset of fields used in stats calculations (aka metrics). In general case this is hard to resolve taks, since it is impossible to check for anomalies across all the possible combinations of dimensions and metrics for wide events with hundreds of fields.
This is also complicated by the possibility to apply various filters for the events before and after ststs' calculations.
Wide events is a great concept for observability space! This a superset of structured logs and traces. Wide events is basically structured logs, where every log entry contains hundreds of fields with various properties of the log entry. This allows slicing and dicing the collected events by arbitrary subsets of thier fields. This opens an infinite possibilities to obtain useful analytics from the collected events.
Wide events can be stored in traditional databases. But this approach has a few drawbacks:
- Every wide event can have different sets of fields. Such fields cannot be mapped to the classical relational table columns, since the full set of potential fields, which can be seen in wide events, isn't known beforehand.
- The number of fields in wide events is usually quite big - from tens to a few hundreds. If we are going to store them in a traditional relational table, this table will end up with hundreds of columns. Such tables aren't processed efficiently by traditional databases.
- Typical queries over wide events usually refer only a few fields out of hundreds of available fields. Traditional databases usually store every row in a table as a contiguous chunk of data with all the values for all the fields of the row (aka row-based storage). Such a scheme is very inefficient when the query needs to process only a few fields out of hundreds of available fields, since the database needs to read all the hundreds fields per each row and then extract the needed few fields.
It is much better to use analytical databases such as ClickHouse for storing and processing of big volumes of wide events. Such databases usually store values per every field in contiguous data chunks (aka column-oriented storage). This allows reading and processing only the needed few fields mentioned in the query, while skipping the rest of hundreds fields. This also allows efficiently compressing field values, which reduces storage space usage and improves performance for queries limited by disk read speed.
Analytical databases don't resolve the first issue mentioned above, since they usually need creating a table with the pre-defined columns before storing wide events into it. This means that you cannot store wide events with arbitrary sets of fields, which can be unknown before creating the table.
I'm working on a specialized open-source database for wide events, which resolves all the issues mentioned above. It doesn't need creating any table schemas before starting ingesting wide events with arbitrary sets of fields (e.g. it is schemaless). It automatically creates the needed columns for all the fields it sees during data ingestion. It uses column-oriented storage, so it provides query performance comparable to analytical databases. The name of this database is VictoriaLogs. Strange name for the database specialized for efficient processing of wide events :) This is because initially it was designed for storing logs - both plaintext and structured. Later it has been appeared that it's architecture ideally fits wide events. Check it out - https://docs.victoriametrics.com/victorialogs/
It is a great step, but in my testing with the new JSON type if you use beyond 255 unique json locations/types (255 max_dynamic_types in their config) you will fall back to much worse performance for certain queries and aggregations. This is quite easy to hit with some of the suggestions in this blog post, especially if you are designing for multi-tenant use.
A small question on the schema, I noticed that you have only “_now” as the Order By (so should just use that for the primary key). Do you expect any cross tenant queries?
Just my feeling would be that I’d add the tenant ID before the timestamp as it should filter the parts more effectively
JSON column type in ClickHouse [1] looks promising, since it allows storing wide events with arbitrary sets of fields. This feature is still in beta. Let's see how it will evolve.
How is that a "superset" ? From what I gather, it's... just a "JSON-formatted log"? They just decide to put as much data in it as they can and decide that it should be called a "wide event", but it makes no sense... it's just a regular JSON-formatted log, with all the data inside, nothing new?
Practitioner of what? What is a "wide event"? In what context is this concept relevant? It took several sentences before I was even confident that this is something to do with programming.
They link to three separate articles right at the start that cover all of this. Not every article needs to start from first principles. You wouldn't expect an article about a new Postgres version to start with what databases are and why someone would need them.
>Not every article needs to start from first principles.
Sure, but it would be nice if title submissions made it feasible to predict the topic category of the article for people who are not already in the relevant niche.
Wide events are a very well known approach, especially if you do any work with observability, and articles about it have been on the HN front page too. You not knowing about something does not automatically make it a narrow niche.
From my point of view, "it has something to do with web dev" already makes it a niche. And as a rule of thumb, if you're using letter-number-letter abbreviations like "o11y" and assuming everyone knows what you're talking about, you're in a niche. (E.g.: I could parse "i18n" and "l10n" already, but I wouldn't expect random HN readers to. When I first saw "k8s" and looked it up I thought "man, really?".)
None of this is web dev specific. It applies most strongly to distributed systems, of which web systems are a subset, but in principle it can apply to any system with non-trivial requirements around logging and metrics.
> Adopting Wide Event-style instrumentation has been one of the highest-leverage changes I’ve made in my engineering career. The feedback loop on all my changes tightened and debugging systems became so much easier.
That doesn’t really give an objective definition of what wide events are, just an opinion and example in this one persons life.
I had to lookup wide events in the middle of the article, and I can’t say I can viscerally see and feel the benefits the OP was espousing. Just felt like an adderall-fueled dump of information being thrown at me.
What I get is: here's a thing that made a big improvement to how I debug systems.
Except, it turns out that the systems in question are very specific ones.
> The tl;dr is that for each unit-of-work in your system (usually, but not always an HTTP request / response) you emit one “event” with all of the information you can collect about that work.
Okay, but... as opposed to what? And why is it better this way?
>“Event” is an over-loaded term in telemetry so replace that with “log line” or “span” if you like. They are all effectively the same thing.
In the programming I do, "event" doesn't mean anything to do with logging or telemetry.
Okay, so a web search and some looking around gives me https://www.honeycomb.io/frontend-observability. I guess this is something to do with tools for sending telemetry back from web applications and then doing statistics on them and giving the user some nice reports.
"Observability" seems like a weird term for that to me, but okay.
But I don't understand why not just give the appropriate context in the submission, rather than keeping a title that only makes sense to a very specific niche audience and then not saying up front what the niche is.
The concept of an "event" is coherent in many other programming contexts, so the possibility that one could be coherently "wide" is at least plausibly interesting. But then I get there and find myself completely disoriented, and eventually figure out that it's not actually relevant to anything I do. And anyway it looks like a lot of this jargon is really just not necessary to convey the core ideas... ?
If the title had said something like "A guide to using Wide Events in website telemetry for [insert objective here]", I wouldn't have had the original objection.
Wide events aren't limited to website analytics. Thy are useful for observability of any application types - databases, services, microservices, web servers, application servers, mobile apps, industrial apps, IoT, etc.
Okay, and why would people who aren't already in the field have any idea about your specific jargon meaning of "observability"? My browser's spellcheck underlines that. My understanding of ordinary English turns it into "the fact, of something which can be observed, that it can be observed" which is... supremely unenlightening.
I get that HN isn't appealing to the general population, but the world of programmers etc. is still quite broad.
You're missing the point. My complaint is not about the article content. My complaint is about the fact that the submission title does not adequately prepare anyone to understand what the article will be about.
That’s a recurring theme on HN. The site prefers the original title, and not every blog post has a title that adequately prepares one for the contents, especially since many blogs have a recurring theme.
While the article is written by observability vendor, it contains an excellent information about wide events, without annoying advertisement of the vendor.
I’m quite looking forward to a future where we’ve finally accepted that all this stuff is just part of the domain and shouldn’t be treated like an ugly stepchild, and we’ve merged OLTP and OLAP with great performance for both, and the wolf also shall dwell with the lamb, and we’ll all get lots of work done.
Wide events are good, but watch out they don't become "god events". The event that every service needs to ingest, and, therefore, if there's new data that a service needs then we just add it onto the god event, because, conveniently, it's already being ingested. Before too long, the query that generates the wide event is getting so complex it's setting the db on fire. Like anything, there are trade offs; practical limits to how wide an event should reasonably become.
Maybe I’m missing something, but this doesn’t seem like what the article is talking about at all. These events are just telemetry — they’re downstream from everything, and no service is ingesting them or relying on them for actual operational data.
i wonder if there are any semi automated approaches to finding outliers or “things worth investigating” in these traces, or is it just eyeballs all the way down?
This is possible by semi-automatic detection of anomalies over time for some preset of fields used for grouping the events (aka dimensions) and another preset of fields used in stats calculations (aka metrics). In general case this is hard to resolve taks, since it is impossible to check for anomalies across all the possible combinations of dimensions and metrics for wide events with hundreds of fields.
This is also complicated by the possibility to apply various filters for the events before and after ststs' calculations.
That seems a good usecase for AI: Its trivial to have it suggest some queries and test if they give interesting results.
Wide events is a great concept for observability space! This a superset of structured logs and traces. Wide events is basically structured logs, where every log entry contains hundreds of fields with various properties of the log entry. This allows slicing and dicing the collected events by arbitrary subsets of thier fields. This opens an infinite possibilities to obtain useful analytics from the collected events.
Wide events can be stored in traditional databases. But this approach has a few drawbacks:
- Every wide event can have different sets of fields. Such fields cannot be mapped to the classical relational table columns, since the full set of potential fields, which can be seen in wide events, isn't known beforehand.
- The number of fields in wide events is usually quite big - from tens to a few hundreds. If we are going to store them in a traditional relational table, this table will end up with hundreds of columns. Such tables aren't processed efficiently by traditional databases.
- Typical queries over wide events usually refer only a few fields out of hundreds of available fields. Traditional databases usually store every row in a table as a contiguous chunk of data with all the values for all the fields of the row (aka row-based storage). Such a scheme is very inefficient when the query needs to process only a few fields out of hundreds of available fields, since the database needs to read all the hundreds fields per each row and then extract the needed few fields.
It is much better to use analytical databases such as ClickHouse for storing and processing of big volumes of wide events. Such databases usually store values per every field in contiguous data chunks (aka column-oriented storage). This allows reading and processing only the needed few fields mentioned in the query, while skipping the rest of hundreds fields. This also allows efficiently compressing field values, which reduces storage space usage and improves performance for queries limited by disk read speed.
Analytical databases don't resolve the first issue mentioned above, since they usually need creating a table with the pre-defined columns before storing wide events into it. This means that you cannot store wide events with arbitrary sets of fields, which can be unknown before creating the table.
I'm working on a specialized open-source database for wide events, which resolves all the issues mentioned above. It doesn't need creating any table schemas before starting ingesting wide events with arbitrary sets of fields (e.g. it is schemaless). It automatically creates the needed columns for all the fields it sees during data ingestion. It uses column-oriented storage, so it provides query performance comparable to analytical databases. The name of this database is VictoriaLogs. Strange name for the database specialized for efficient processing of wide events :) This is because initially it was designed for storing logs - both plaintext and structured. Later it has been appeared that it's architecture ideally fits wide events. Check it out - https://docs.victoriametrics.com/victorialogs/
Thoughts on stuff like ClickHouse with JSON column support? Less upfront knowledge of columns needed.
It is a great step, but in my testing with the new JSON type if you use beyond 255 unique json locations/types (255 max_dynamic_types in their config) you will fall back to much worse performance for certain queries and aggregations. This is quite easy to hit with some of the suggestions in this blog post, especially if you are designing for multi-tenant use.
For this clickhouse wide event lib I'm working on (not worth anyones time atm) I am still using this schema https://www.val.town/v/maxm/wideLib#L34-39 (which is from a Boris Tane talk https://youtu.be/00gW8txIP5g?t=801) for good multi-tenant performance.
I hope clickhouse performance here can still be vastly improved, but I think it is a little awkward to get optimal performance with wide events today.
A small question on the schema, I noticed that you have only “_now” as the Order By (so should just use that for the primary key). Do you expect any cross tenant queries?
Just my feeling would be that I’d add the tenant ID before the timestamp as it should filter the parts more effectively
Yes, I think you are correct. In the video Boris/Baselime uses (_tenantId, _traceId, _timestamp). Will update that :)
Clickhouse's revised JSON type is still quite new (in beta currently), but I'm hopeful for it. Their first attempt fell apart if the schema changed.
[1] https://clickhouse.com/blog/a-new-powerful-json-data-type-fo...
JSON column type in ClickHouse [1] looks promising, since it allows storing wide events with arbitrary sets of fields. This feature is still in beta. Let's see how it will evolve.
[1] https://clickhouse.com/docs/en/sql-reference/data-types/newj...
ClickHouse is open core too. If you care about that.
How is that a "superset" ? From what I gather, it's... just a "JSON-formatted log"? They just decide to put as much data in it as they can and decide that it should be called a "wide event", but it makes no sense... it's just a regular JSON-formatted log, with all the data inside, nothing new?
Tldr; just use slog package (structured logs) to log everything and then visualize.
This works only for Go language, which provides slog package ( https://go.dev/blog/slog ). What about other programming languages?
Practitioner of what? What is a "wide event"? In what context is this concept relevant? It took several sentences before I was even confident that this is something to do with programming.
They link to three separate articles right at the start that cover all of this. Not every article needs to start from first principles. You wouldn't expect an article about a new Postgres version to start with what databases are and why someone would need them.
>Not every article needs to start from first principles.
Sure, but it would be nice if title submissions made it feasible to predict the topic category of the article for people who are not already in the relevant niche.
Wide events are a very well known approach, especially if you do any work with observability, and articles about it have been on the HN front page too. You not knowing about something does not automatically make it a narrow niche.
From my point of view, "it has something to do with web dev" already makes it a niche. And as a rule of thumb, if you're using letter-number-letter abbreviations like "o11y" and assuming everyone knows what you're talking about, you're in a niche. (E.g.: I could parse "i18n" and "l10n" already, but I wouldn't expect random HN readers to. When I first saw "k8s" and looked it up I thought "man, really?".)
None of this is web dev specific. It applies most strongly to distributed systems, of which web systems are a subset, but in principle it can apply to any system with non-trivial requirements around logging and metrics.
I felt like I got the gist after the first two:
> Adopting Wide Event-style instrumentation has been one of the highest-leverage changes I’ve made in my engineering career. The feedback loop on all my changes tightened and debugging systems became so much easier.
That doesn’t really give an objective definition of what wide events are, just an opinion and example in this one persons life.
I had to lookup wide events in the middle of the article, and I can’t say I can viscerally see and feel the benefits the OP was espousing. Just felt like an adderall-fueled dump of information being thrown at me.
>I felt like I got the gist after the first two:
What I get is: here's a thing that made a big improvement to how I debug systems.
Except, it turns out that the systems in question are very specific ones.
> The tl;dr is that for each unit-of-work in your system (usually, but not always an HTTP request / response) you emit one “event” with all of the information you can collect about that work.
Okay, but... as opposed to what? And why is it better this way?
>“Event” is an over-loaded term in telemetry so replace that with “log line” or “span” if you like. They are all effectively the same thing.
In the programming I do, "event" doesn't mean anything to do with logging or telemetry.
It’s about observability and strongly related to Honeycombs o11y 2.0 vision.
Okay, so a web search and some looking around gives me https://www.honeycomb.io/frontend-observability. I guess this is something to do with tools for sending telemetry back from web applications and then doing statistics on them and giving the user some nice reports.
"Observability" seems like a weird term for that to me, but okay.
But I don't understand why not just give the appropriate context in the submission, rather than keeping a title that only makes sense to a very specific niche audience and then not saying up front what the niche is.
The concept of an "event" is coherent in many other programming contexts, so the possibility that one could be coherently "wide" is at least plausibly interesting. But then I get there and find myself completely disoriented, and eventually figure out that it's not actually relevant to anything I do. And anyway it looks like a lot of this jargon is really just not necessary to convey the core ideas... ?
If the entire contents of the article was in the title, you’d still have to read all the words
If the title had said something like "A guide to using Wide Events in website telemetry for [insert objective here]", I wouldn't have had the original objection.
Wide events aren't limited to website analytics. Thy are useful for observability of any application types - databases, services, microservices, web servers, application servers, mobile apps, industrial apps, IoT, etc.
"[An Observability] Practitioner's Guide to Wide Events"
That's how I would have titled it.
Okay, and why would people who aren't already in the field have any idea about your specific jargon meaning of "observability"? My browser's spellcheck underlines that. My understanding of ordinary English turns it into "the fact, of something which can be observed, that it can be observed" which is... supremely unenlightening.
I get that HN isn't appealing to the general population, but the world of programmers etc. is still quite broad.
It seems to be the primary meaning in software: https://en.wikipedia.org/wiki/Observability_(software)
You’ve made a lot of critical comments here.
You are obviously the one who is not understanding or is perhaps misunderstanding something.
Observability is a pretty standard term in software development.
Events have nothing per se to do with logging or tracing, but you can visualize/trace events with logs/spans.
From my perspective, you seem to misunderstand a lot in the article, I am not judging you for that, just observing this.
I suggest you try to understand the gist of the article instead of scolding the language used.
You're missing the point. My complaint is not about the article content. My complaint is about the fact that the submission title does not adequately prepare anyone to understand what the article will be about.
That’s a recurring theme on HN. The site prefers the original title, and not every blog post has a title that adequately prepares one for the contents, especially since many blogs have a recurring theme.
I had a very fine idea about what the article would be about from reading the title.
You’re being unreasonable about this IMO.
you just read an advertisement article and some people don't like you pointing that out. hence the downvotes i assume
While the article is written by observability vendor, it contains an excellent information about wide events, without annoying advertisement of the vendor.