Show HN: Loclean – Local semantic data cleaning with LLMs and Pydantic
1 points
1 hour ago
| 0 comments
| github.com
| HN
Hi HN, I’m the author of Loclean.

I built this because I work with sensitive data that I can't send to OpenAI, but traditional tools like Regex were too brittle for the messiness of real world inputs (like address typos or inconsistent date formats).

Loclean is a Python library that: - Runs entirely locally (CPU friendly) using quantized models via llama-cpp-python. - Uses Pydantic to enforce strict schemas (no more hallucinations or invalid JSON). - Compatible with Pandas/Polars/PyArrow workflows.

It's designed to be a "middle ground" between rigid Regex and expensive/risky Cloud LLMs.

Repo link: https://github.com/nxank4/loclean

I’d love to hear your feedback on the API design or use cases you might have for local data scrubbing. Thank you!

No one has commented on this post.