Cuckoo Watchtower: Malware Forensics Assistant
Project Overview
Cuckoo Watchtower (CW) is a tool designed to assist malware forensics investigations by leveraging the power of GPT-4, an advanced LLM by OpenAI. It processes large Cuckoo Sandbox JSON reports into accessible, chatbot-style insights via a Retrieval Augmented Generation (RAG) pipeline, all within a Google Colab notebook. CW aids analysts by dynamically responding to queries about malware behavior, pivoting investigation paths, and generating tailored reports.
Features
- Data Integration: Processes Cuckoo Sandbox JSON outputs into organized text files covering general, network, signature, and static analysis.
- Retrieval Augmented Generation (RAG): Enhances context understanding for the LLM to generate highly accurate and relevant responses.
- Vector Database: Employs Weaviate to store and retrieve analysis vectors efficiently.
- Chatbot Interaction: Provides dynamic, question-based analysis for streamlined investigations.
Use Cases
- Comprehensive Reports: Generates detailed, audience-specific malware analysis reports.
- Interactive Queries: Enables users to ask follow-up questions or clarify malware behaviors.
- Pivot Assistance: Facilitates rapid data retrieval for efficient investigative pivoting.
Limitations
- Data Size Constraints: Limited by the JSON file size and OpenAI's API token constraints.
- Privacy Concerns: May require alternative, private LLMs (e.g., LLaMA) for sensitive data processing in operational environments.
Future Work
- Integrating additional data sources, such as memory snapshots or BSON files.
- Adding the ability to run Cuckoo scans directly within the tool for streamlined workflows.
Technologies Used
- Cuckoo Sandbox: Malware analysis tool generating JSON reports.
- Python: Primary programming language for text processing and integration.
- Weaviate: Vector database for efficient context storage and retrieval.
- Google Colab: Notebook environment for tool deployment and user interaction.