ChatCSEC: Cybersecurity Expert AI
ChatCSEC is a cutting-edge cybersecurity chatbot leveraging OpenAI's GPT model. Designed to provide accurate and contextual responses to cybersecurity queries, ChatCSEC integrates private or internal document embeddings for enhanced responses. This project was collaboratively developed by Domenic Lo Iacono, Kyri Lea, Brian McNulty, and Rich Kleinhenz.
Project Overview
The ChatCSEC application includes modular components such as a user interface, AI models, embedding systems, a vector database, and a web scraper. Together, these components create a robust platform for cybersecurity knowledge sharing and query resolution.
Application Components
- User Interface: The primary interface for interacting with ChatCSEC.
- Model: Utilizes OpenAI's GPT-4-0125-preview model to formulate responses. Requires an OpenAI API key.
- Embedding Model: Processes documents using OpenAI's text-embedding-3-small model to extend the chatbot's knowledge base. Requires an OpenAI API key.
- Vector Database: QDrant stores embeddings for efficient retrieval and training persistence.
- Web Scraper: Crawls websites to gather content and embed new data, ensuring the chatbot remains up-to-date with recent information.
Key Features
- Cybersecurity Expertise: Provides reliable and accurate responses tailored to cybersecurity topics.
- Customizable Knowledge Base: Integrates private documents into the model’s training data via embeddings.
- Automated Updates: Includes a web scraper for dynamic updates, ensuring information remains current.
- Modular Design: Flexible architecture for substituting components such as models, databases, or scrapers.
Installation
To set up ChatCSEC, developers should:
- Set up an OpenAI account and obtain API keys for the specified models.
- Install QDrant as the vector database.
- Run the application locally and configure component connections as needed.
Usage
ChatCSEC allows users to interact with a cybersecurity chatbot and train it using internal or external documents. Documentation generation is supported using Sphinx. To build documentation:
- Run `sphinx-apidoc -o docs ChatCSEC` to create or update the rst files.
- Use the Makefile in the `docs` directory to generate HTML, Latex, or other formats. The generated files will appear in `/docs/_build/`.
Future Work
- Expand compatibility with additional vector databases and embedding models.
- Improve the web scraper to handle more complex websites and file formats.
- Introduce a more user-friendly interface for non-technical users.
- Enable advanced query analytics to identify trends in cybersecurity queries.