What are some good places to get started? What are the tools you use to get work done?
Some background: I mainly do data engineering using Python and Snowflake + MySQL. I use Pycharm as my dev environment, but I'd be open to changing that if there's a better option.
Anybody coding with OpenAI at this point ought to be publicly mocked for being a sheep. All you’re doing when you code with AI is becoming dependent on something where you’re paying them to train their model to imitate you. It’s not good for anyone but OpenAI.
Better off learning a better (compiled) programming language. If you have to use AI, use Groq or Ollama as you can keep these outputs to train or fine tune your own AI one day.
Why pay OpenAI for the privilege to help them take your job? Why outsource mental activity to a service with a customer noncompete?
Who gives a fuck ? If openai followed the law chatgpt wouldn't exist
I didn't really go too much into my background beyond the task at hand, but I have a CS degree and did C++ development professionally in the engineering space for years (and golang after that). I switched to data engineering because I enjoy the work, not because of an inability to work in "better" languages.
I'm not a rockstar or anything, but assume I'm as competent as a typical person who's been writing software for 20 years.
> If you have to use AI, use Groq or Ollama as you can keep these outputs to train or fine tune your own AI one day.
How do I train models via ollama? As I said, I've been using it for my work, but I've been leveraging it for "fuzzy matching" data, extracting pertinent elements from free form text, or general research. I'd love to be able to shove some of my datasets into one and use a prompt to interact with it. The best I've been able to do currently is showing it my database schema and having it write queries for me, which is not that valuable to me.
>“Avoid it! Perplexity rules prohibit commercial use…”
Cool story, but also: irrelevant. Nobody serious is shipping products on Perplexity as a backend. It’s a research tool with a nice wrapper. The people building with LLMs are using OpenAI, Claude, Mistral, Groq, and Ollama, depending on the constraints and goals. Acting like the existence of one walled garden invalidates an entire paradigm is like saying cars are bad because golf carts can’t go on highways.
> “ChatGPT rules prohibit sharing the output with other AI…”
There are some restrictions, yes. And that matters… if your use case is literally just shuttling model output between APIs like a glorified message broker. Most developers are fine with this because they’re building systems, not playing telephone.
> “They’re training on everything you pass in…”
This is just flat wrong. OpenAI doesn’t train on API input/output unless you explicitly opt in. The fact that this myth is still circulating tells me people are repeating each other instead of reading the docs.
> “You’re paying to get brain raped…”
If your argument requires dehumanizing metaphors to land, you don’t have an argument. You have trauma cosplay.
> “Coding with OpenAI = being a sheep…”
This is the kind of thing people say when they’ve never delivered software in production. The tools either help or they don’t. Calling people sheep for using powerful tools is anti-intellectualism dressed up as cynicism. Nobody’s building a search engine from scratch to prove they’re not a sheep either. We use leverage. That’s the whole game.
> “You’re paying them to train their model to imitate you…”
Actually, no — again, unless you opt in. But even if that were true, you’re also trading that data for time saved, features shipped, and workflows unlocked. You know, ROI.
> “Better off learning a better (compiled) programming language…”
I have nothing against compiled languages, but this is like telling someone struggling with Figma that they should learn Blender. It might be good advice in a vacuum, but it doesn’t help you win the game that’s actually being played.
> “Why pay OpenAI for the privilege to help them take your job?”
You could’ve said the same thing about AWS. Or GitHub. Or Stack Overflow. Or even programming itself, in the mainframe era. Gatekeeping based on purity is a waste of time. The actual work is understanding what AI can do, what it shouldn’t do, and when to lean in.
> “Why outsource mental activity to a service with a customer noncompete?”
You’re not outsourcing thinking. You’re compressing cognitive overhead. There’s a difference. If you think using AI is “outsourcing thinking,” you were probably outsourcing thinking to Stack Overflow and Copilot already.
⸻
Look, are there risks? Costs? Vendor lock-in traps? Of course. Anyone seriously building with AI right now is absorbing all that volatility and pushing forward anyway, because the upside is that massive. Dismissing the entire space as a scam or a trap is either willful ignorance or fear masquerading as intellectual superiority.
If you don’t want to use AI tools, don’t. But don’t throw rocks from the sidelines at the people out there testing edge cases, logging failures, bleeding tokens, and figuring out what’s possible.
No matter what, the current paradigm is the "mixture of experts", aka defining roles and then switching between the roles. In the case of Aider, you're mainly looking at "architect" and "code". Others like Cline/Roo provide also this, but more limitations and access to MCP. Honestly, I would avoid these more advanced tools until you get an understanding for how everything works.
Begin with your "architect" and agree on a design, then act as an engineering manager to guide your "code" model to implement the tests. Once you have a full test suite, make sure that you can use commands like `/lint` and `/test` to call your toolchain as needed.
I personally prefer this method (again, when you're getting started) because it's independent of every IDE and will work anywhere you have a terminal. After you get comfortable, you'll quickly want to use a more advanced tool to trade your money for the time of the machine(s). When you get to that point you'll understand what the fuss is about.
I used VSCode as my default IDE so the switch was very natural.
I am working on machine learning in bio, and many of the tools, methods, and data structures are very domain specific. Even so, the agent feature is good enough that for most tasks, I can describe the functionality I want and it gets me 80% of the way there. I pay $20 a month for Cursor and it has quickly become the last subscription I would cancel.
For Python/data engineering specifically, creating clear category boundaries between data models, transformation logic, and validation rules makes the LLM much more likely to generate code that follows your architectural patterns.
The key is treating documentation as a "context API" for the AI rather than just human-readable text. When your documentation has a clear hierarchical structure without overlaps or gaps, the LLM can navigate the problem space much more effectively.
The developer experience with Aider isn't that great when compared to a full IDE like Cursor, but it's a great way to get started because it's a simple CLI tool that accepts commands.
After that you might decide to switch to Jetbrains AI since you use Pycharm.
OpenRouter Quickstart: https://openrouter.ai/docs/quickstart
See usage example here (Typescript): https://github.com/brownrw8/olelo-honua/blob/main/src/provid...
And while you're at it, check out my open source project using OpenRouter, ʻŌlelo Honua ;)
ʻŌlelo Honua: https://www.olelohonua.com
You can also check out the GitHub repo at: https://github.com/dyad-sh/dyad
(disclosure: I created Dyad)
Couple of helpful tidbits we’ve learned:
- Define a read me that lays out your architecture. Normalization, libraries to use for different use cases, logging requirements, etc. reference it on every new task.
- keep notebooks from getting too big. A notebook per elt action or table definition make’s your compares faster and required context smaller.
Be prepared to be amazed. We’ve pointed Cline at an api doc and had it spit out elt code and a model that were nearly production ready. We’re pulling weeks off estimates…
Using Cursor, I felt like it was too easy to modify too many things in one go.
I keep my prompts scoped to drafting / refining specific components. If I feel stuck, I’ll use the chat as a rubber duck to bounce ideas off of and rebuild momentum. Asking the model to follow a specific pattern you’ve already established helps with consistency.
Recently been impressed with Gemini 2.5, it’s able to provide more substantive responses when I ask for criticism whereas the other models will act more like sycophants.
To avoid this, just ask Cursor to use suggestion blocks instead of the edit file tool. This way, you can review each suggested change, click apply if you like it, review that specific change and make any edits, then proceed to the next suggestion block.
This is the best way to avoid chaotic changes that become difficult to review thoroughly, and while it takes longer, is a much less risky approach using focused, granular edits.
It’s a much more efficient workflow than alt-tabbing and copy pairing to and from a website, and much more powerful with how easily you can add context, swap out the LLM model being called, and apply suggestions that you deem acceptable with granular oversight.
Same experience. It seems like it's just overloading the standard vscode UI, with no regard for what you're used to. Trying to press tab for what looks like a "normal" autocomplete will invoke the LLM autocomplete that will replace a bunch of lines somewhere else without you noticing. Terrible UX!
Modifications made by the chat interface are also hard to restrict to a specific part. Sure, you can give it context, but asking it to modify one thing (by highlighting a bunch of rows in a specific file and giving them as context) is likely to end up modifying a similar-looking thing somewhere else.
The potential is there with Cursor, but right now it's a mess for serious use. Waiting for editors that incorporate LLM-based autocomplete in a disciplined way where I'll actually feel like I'm the one in control. Maybe once the dumb hype around vibe coding dies down.
Basically it's running a chat UI locally on a Git repo and you can reference files like "#foo.py" and then if you want to edit a file, you hit "Apply code" for a markdown code block and then it shows you a code diff so you can review the changes before actually updating the files. I've found this makes it much easier to curb stray edits and keep the LLM edits focused on what you actually care about.
I demo'd a quick experiment from scratch that posted here if you're interested, sorry no yt link tho: https://www.tiktok.com/t/ZT2TJ8hrt/
The polish of Cursor et al is a component of its dishonesty. You should keep the nature of the tool exposed to you at all times.
I say you should:
- identify the things about your personal development process that you'd like to streamline (up to and including not having to code everything by hand)
- write Python scripts that call LLMs (or don't!) to achieve the thing you want.
- heck, write library functions and establish some neat patterns - a random prompt picker using a folder of txt files, why not?
Not only can you create a development experience that approximates much of what Cursor does, you can make one that is twice as good for you because it is aligned with how you work, and how you think.
And at that point, with your fluent new workflow, making a simple VSCode extension to hook into your scripts can't be all that hard. At that point, how can Cursor compete?
It’s not.
These tools will literally waste your time and empty your wallet if you don’t have a solid approach — and that starts with knowing how to detect code and architecture smells at light speed. These tools are writing code faster than you can read it so you better know what you’ve (actually) instructed it to do, and be able to monitor the quality in a 300ms glance or you’re hosed.
Having said that, there are antidotes:
1. Use a memory-bank. A memory bank is a little folder where you instruct the agent to research upcoming tasks, store progress info, post gotchas, generate retrospectives and more. It’s super helpful if the thing you’re building is large and complex.
2. Instruct the agent to implement strict TDD/BDD principles and force it to follow a red-green-refactor workflow at a granular level. The downside to this is twice the code (read $tokens) per feature but you’ll spend it anyway when your agent has 600 lines of untested code written while you grabbed coffee and you come back to the agent stuck in a loop, spending $0.20 per request sending the same large files trying to figure out some syntax error between test and code files.
3. You need to know what you’re doing. The model/agent is not going to automatically apply second and third order thinking to look around systems design corners. It does exactly what you tell it to do (sometimes well) but you also have to always be considering what you forgot to tell it to do (or not to do). Prompt engineering is not natural language. It’s a form of spellweaving where you have to find the edges of all relevant forms of existence and non-existence, codify them into imperative language, add the right nuance to finesse the style and have the confidence to cast the spell — over and over and over again.
4. You need money. Building a serious endeavor with an AI agent will cost you around $400-600 a month. Leave room for lots of mistakes. Prepare for weeping and gnashing of teeth when the task counter hits $10.00, you’re like 20% done with the feature, the model has lost context about what it was doing and you have to choose between making a broken commit and hoping the agent can sort it out in a clean task, or eating the cost and starting over. I spent $50 on starting over just this past weekend. This is why I wrote numbers 1-3 above.
I’m committed to figuring out where all the pitfalls are so others might not have to. I personally feel like anyone who’s doing this right now is really just burning some cash to see what’s possible.
Ideally, you should try out all the tools. Each of them have advantages and disadvantages for each problem within a project. You might find that mixing and matching LLM apps gives you more expertise and skill over just sticking to your favorite.