Skip to content
View brandonrobertz's full-sized avatar

Organizations

@html-extract @next-LI

Block or report brandonrobertz

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
brandonrobertz/README.md

Yes Hello

I'm Brandon Roberts. I'm an investigative journalist specializing in open source and bringing computational techniques to journalism projects. You can read more on my site: bxroberts.org

Pinned Loading

  1. SparseLSH SparseLSH Public

    A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.

    Python 149 27

  2. propublica/django-collaborative propublica/django-collaborative Public

    ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

    Python 100 18

  3. autoscrape-py autoscrape-py Public

    An automated, programming-free web scraper for interactive sites

    HTML 111 20

  4. chatgpt-document-extraction chatgpt-document-extraction Public archive

    A proof of concept tool for using ChatGPT to transform messy text documents into structured JSON

    Python 122 11

  5. html-extract/hext.js html-extract/hext.js Public

    Use Hext in a browser or with node. Hext is a domain-specific language for extracting structured data from HTML documents.

    C++ 6 1

  6. llm-document-extraction llm-document-extraction Public

    A proof of concept tool for using local LLMs to transform messy text documents into structured JSON

    Python 25 1