The string to specify the column types is not a terrible idea. Does it have other configuration options, like whether or not to assume the first row is the headers, or specifying the separator character?
Lil's readcsv[] takes three arguments: a data string, an optional typecode-string (which can also skip columns with "_"), and an optional delimiter character. First row is always assumed to be headers; I find it easy enough to concatenate on a header row before calling the function if I'm ever dealing with a headerless CSV file.
The typecode-string approach in Lil is very similar to how Q handles it with dyadic 0:.
In this specific example I could do without the typecode-string since arithmetic operators like sum, -, and * will coerce string columns into numbers, but I think this way is cleaner.
I see. Kap tries to be as generic as possible, so assuming that the table has headers doesn't feel right. If the table dont have headers, and the reader assumes it does, then you'll potentially silently lose the first row of data.
You have to make the decision somewhere in your code, unless you're willing to lean on a heuristic; all of the examples in R and Lil make assumptions about the names of columns in the file on-disk just as they make assumptions about the delimiter and the presence of headers.
If I knew the CSV file didn't have built-in headers, I'd write the Lil script like this:
Thanks, that makes sense. I guess most CSV data you see in the real world do have headers. Perhaps I was looking too much about thr default CSV export format from Excel, focusing on making sure it can always be parsed. And Excel doesn't have column headers.
In Lil, the readcsv[] function takes an optional string specifying a type for each column to decode:
Summing a column: To create a summary, we need to reduce each group to a single row: Discounting: Lil doesn't have a "median" primitive. Decks can contain multiple modules, but we happen to know this one is alone. Your path will vary: Calculating the median within each group is merely a matter of reordering clauses:The string to specify the column types is not a terrible idea. Does it have other configuration options, like whether or not to assume the first row is the headers, or specifying the separator character?
Lil's readcsv[] takes three arguments: a data string, an optional typecode-string (which can also skip columns with "_"), and an optional delimiter character. First row is always assumed to be headers; I find it easy enough to concatenate on a header row before calling the function if I'm ever dealing with a headerless CSV file.
The typecode-string approach in Lil is very similar to how Q handles it with dyadic 0:.
In this specific example I could do without the typecode-string since arithmetic operators like sum, -, and * will coerce string columns into numbers, but I think this way is cleaner.
I see. Kap tries to be as generic as possible, so assuming that the table has headers doesn't feel right. If the table dont have headers, and the reader assumes it does, then you'll potentially silently lose the first row of data.
You have to make the decision somewhere in your code, unless you're willing to lean on a heuristic; all of the examples in R and Lil make assumptions about the names of columns in the file on-disk just as they make assumptions about the delimiter and the presence of headers.
If I knew the CSV file didn't have built-in headers, I'd write the Lil script like this:
Thanks, that makes sense. I guess most CSV data you see in the real world do have headers. Perhaps I was looking too much about thr default CSV export format from Excel, focusing on making sure it can always be parsed. And Excel doesn't have column headers.