Cuckoo Watchtower: Malware Forensics Assistant

Project Overview

Cuckoo Watchtower (CW) is a tool designed to assist malware forensics investigations by leveraging the power of GPT-4, an advanced LLM by OpenAI. It processes large Cuckoo Sandbox JSON reports into accessible, chatbot-style insights via a Retrieval Augmented Generation (RAG) pipeline, all within a Google Colab notebook. CW aids analysts by dynamically responding to queries about malware behavior, pivoting investigation paths, and generating tailored reports.

Features

  • Data Integration: Processes Cuckoo Sandbox JSON outputs into organized text files covering general, network, signature, and static analysis.
  • Retrieval Augmented Generation (RAG): Enhances context understanding for the LLM to generate highly accurate and relevant responses.
  • Vector Database: Employs Weaviate to store and retrieve analysis vectors efficiently.
  • Chatbot Interaction: Provides dynamic, question-based analysis for streamlined investigations.

Use Cases

  • Comprehensive Reports: Generates detailed, audience-specific malware analysis reports.
  • Interactive Queries: Enables users to ask follow-up questions or clarify malware behaviors.
  • Pivot Assistance: Facilitates rapid data retrieval for efficient investigative pivoting.

Limitations

  • Data Size Constraints: Limited by the JSON file size and OpenAI's API token constraints.
  • Privacy Concerns: May require alternative, private LLMs (e.g., LLaMA) for sensitive data processing in operational environments.

Future Work

  • Integrating additional data sources, such as memory snapshots or BSON files.
  • Adding the ability to run Cuckoo scans directly within the tool for streamlined workflows.

Technologies Used

  • Cuckoo Sandbox: Malware analysis tool generating JSON reports.
  • Python: Primary programming language for text processing and integration.
  • Weaviate: Vector database for efficient context storage and retrieval.
  • Google Colab: Notebook environment for tool deployment and user interaction.