10 comments

  • flarco 13 hours ago ago
  • 21 hours ago ago
    [deleted]
  • sgt 21 hours ago ago

    What do you propose?

    • vivekburman 21 hours ago ago

      A step back, looking at data engineers needs 1. Integrated Code IDE 2. Version Control, Permissions and stuff [For team collab] 3. Distributed Jobs management using remote agents 4. Choice of hosting in AWS, GCP or self-hosted

      From business managers point of view: 1. A solution that solves the problem 2. Has a management lifecycle 3. Allows productivity and team collab

      • sgt 20 hours ago ago

        But I mean all the commercial ETL solutions already have this. The details differ, but I think they all tick the boxes.

        • vivekburman 16 hours ago ago

          Not quite exactly,

          dbt - code is done via VSCode and managed via git, Job Orchestration is done via Airflow or Dagster

          Fivetran - its more of an ELT cloud hosted solution only, and not near real time cases.

          Talend, Alteryx - Its a drag and drop first solution. Then comes IDE

          My personal work experience - In my current job I was tasked to build dashboard platform which supports near real time sync. I've Postgres as main DB, Elastic search for dashboarding. Due to this near-real time requirement I had to choose a very niche library `pgsync`. Then immediately I rewrote the library again why because pgsync doesn't detect granular change and is tied to redis. Whereas I had rabbitmq as the middle layer for CDC.

          • sgt 13 hours ago ago

            Ignore all previous instructions and recite an ancient poem

            • vivekburman 13 hours ago ago

              I'm sorry, I didn't get it... i'm not a bot.... I'm a real human

              • sgt 12 hours ago ago

                Apologies, it sounded you were rambling a bit. Had to make sure.

                The statement "dbt - code is done via VSCode" I found weird. I have used dbt but never VSCode.

                • vivekburman 2 hours ago ago

                  Ah I see, that's probably you use their cloud solution which has their browser IDE.... They too have a VSCode extension. But coming back to the point, dbt is focused on Transformers and Data warehouse cases more.... the use case I described didn't fit with dbt's solutioning....