Hr_interview_preperationAgent

Introduction

This project extracts text from PDF textbooks, generates hierarchical embeddings using Sentence-BERT, and stores them in a PostgreSQL database for retrieval and analysis.

Prerequisites

Ensure you have the following installed:

Python 3.8 or later
PostgreSQL with the vector extension
Required Python packages (see below)

Installation & Setup

1. Clone the Repository

git clone https://github.com/zarouz/Hr_interview_preperationAgent/tree/knowledgeBase

2. Install Required Packages

pip install -r requirements.txt

Dependencies include:

psycopg2
sentence-transformers
nltk
pdfplumber
numpy
python-dotenv

3. Set Up PostgreSQL Database

Ensure PostgreSQL is running and create the necessary schema:

psql -U <your-username> -d KnowledgeBase -f schema.sql

Alternatively, run setup_database() in the script to create tables automatically.

4. Run the Script

To parse a textbook directory and store embeddings, execute:

python main.py /path/to/textbook/folder

Textbook Storage Guidelines

To ensure proper processing, store textbooks in the following format:

Place all textbooks inside the textbooks/ folder.
Each textbook should have its own subfolder named after the book title.

Chapters should be stored as separate PDF files with a consistent naming convention:

textbooks/
  ├── Book_Title/
  │   ├── Introduction_1.pdf
  │   ├── Basics_2.pdf
  │   ├── Advanced_Topics_3.pdf
  │   └── ...

Use a consistent naming format (name_number.pdf).
Avoid spaces in filenames; use underscores (_) instead.

Bookmaker: Structuring the Textbook

The bookmaker.py utility helps in structuring textbooks by splitting PDFs into structured chapters based on user input. It uses PyPDF2 to divide PDFs into meaningful sections.

Database Availability & Usage

The database containing embeddings of the OS book is available for download at the following Google Drive link: Download Database

Restoring the Database

To restore the database from the provided SQL dump file, use the following command:

psql -U <your-username> -d <your-database-name> -f knowledgeBase.sql

Example:

psql -U karthikyadav -d KnowledgeBase -f knowledgeBase.sql

Uploading the Database Backup

If you wish to upload the database after making changes, export it using:

pg_dump -U <your-username> -d <your-database-name> -f knowledgeBase.sql

Example:

pg_dump -U karthikyadav -d KnowledgeBase -f knowledgeBase.sql

This will create a backup file that can be shared or uploaded for others to use.

Usage Guide

Parsing PDFs: The script expects textbooks in a structured folder.
Storing Embeddings: The Sentence-BERT model generates embeddings for hierarchical text units.
Retrieving Data: Query the database using vector similarity search on chunks.embedding.

Troubleshooting

If vector extension errors occur, install it in PostgreSQL:

psql -U <your-username> -d KnowledgeBase -c "CREATE EXTENSION IF NOT EXISTS vector;"

Ensure your .env file is correctly set up.
Verify nltk_data path matches your system setup.

Contact

For issues or improvements, submit a pull request or open an issue in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Books		Books
rawBooks		rawBooks
.gitignore		.gitignore
README.md		README.md
bookmaker.py		bookmaker.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hr_interview_preperationAgent

Introduction

Prerequisites

Installation & Setup

1. Clone the Repository

2. Install Required Packages

3. Set Up PostgreSQL Database

4. Run the Script

Textbook Storage Guidelines

Bookmaker: Structuring the Textbook

Database Availability & Usage

Restoring the Database

Uploading the Database Backup

Usage Guide

Troubleshooting

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hr_interview_preperationAgent

Introduction

Prerequisites

Installation & Setup

1. Clone the Repository

2. Install Required Packages

3. Set Up PostgreSQL Database

4. Run the Script

Textbook Storage Guidelines

Bookmaker: Structuring the Textbook

Database Availability & Usage

Restoring the Database

Uploading the Database Backup

Usage Guide

Troubleshooting

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages