In our production environment we use a Apache Spark/Cassandra cluster with the following Hardware components:
1 master node: DELL PowerEdge R740, 2×Intel Xeon 2.60GHz (14 cores), 320 GB RAM, 3×4TB HDD (Raid 5), 2×512GB SSD (Raid 0)
8 worker nodes: Supermicro 815TQC-R501WB, 2×Intel Xeon 1.7 GHz (6 cores), 256GB RAM, 4TB HDD, 3×2TB SSD
All nodes are connected via 10Gbit ethernet interfaces and a corresponding switch. With this setup the computation time to create the address and entity graph using the transformation component is currently ~13h for Bitcoin.
In our previous setup we used just four worker nodes, which can be seen as minimal requirement to run the transformation pipeline for larger blockchains like Bitcoin.
GraphSense is not a single piece of software but a highly modular analytics platform comprising several components. Each component provides a detailed README file, explaining how to set it up.
If you want to setup the entire GraphSense platform, please follow this order:
Setup and run graphsense-blocksci for ingesting raw blockchain transaction data and exchange into a dedicated Cassandra keyspace.
After having ingested transaction data, exchange rates, and TagPacks into a raw Cassandra keyspace, you can now run the graphsense-transformation pipeline, which computes all sorts of statistics and various types of Graph abstractions and stores them in a so-called transformed keyspace. Running the transformation pipeline is a resource intensive task and can, depending your hardware infrastructure, take some time.
Having both a raw and transformed Cassandra in place, you can now run the graphsense-REST against these keyspaces. This will expose the GraphSense REST API, which you can use in your client app.
Finally, when the REST interface is up and running, you can setup the graphsense-dashboard, which relies on the REST interface, and provides an interactive analytics interface.
Q: Why doesn’t GraphSense run on my computer?
GraphSense processes hundreds of millions transactions, which requires quite some RAM and disk space. Your computer most likely just doesn’t have enough hardware resources for processing and storing vast amounts of transactions and derived statics. You can however, run each component in development mode, just as we do it. Please check the README files in each repository.
Q: Can I run the GraphSense Dashboard without setting up an Apache Spark / Cassandra cluster?
The GraphSense Dashboard is a Web-app running on the client side (in your browser) only. It just needs a GraphSense REST API endpoint for retrieving data. You can operate your own endpoint or reuse one provided by others. For playing around, you can also use or semi-public GraphSense Demo.
Q: How can I get access to the GraphSense demo?
Just drop an email to firstname.lastname@example.org and briefly explain who you are and why you want access to our demo. You will receive access credentials and we will also sign you up for our public GraphSense users mailing list.
Q: Who is behind GraphSense and who is driving development?
GraphSense has a strong research background and development is mainly driven by GraphSense core team (see About). Members of this team are mostly scientists and engineers working for AIT’s Data Science and Artificial Intelligence Research Group.
Q: Who is funding GraphSense?
Q: Why don’t you just some other existing cryptocurrency/blockchain analytics tool?
GraphSense development is very much driven by the needs of our project partners and our own research needs. We found that no existing commercial tool fulfills the most important need, if you want to conduct more advanced cryptocurrency analysis: full control over collected data and the ability to run customized analytics jobs. GraphSense is very much designed for data-driven cryptocurrency analytics.
We do, however, make use of an existing open source tool: we wrapped and integrated BlockSci in our pipeline.