Deciphering a Financial Network with Gephi
Author: Brian Tsz Ho WONG (t.h.wong-2@sms.ed.ac.uk)
This tutorial uses the ‘Enigma Network’ in contemporary Hong Kong to explain how to prepare a Gephi dataset in Microsoft Excel and OpenRefine, and how to interpret a Gephi network.
If you are new to Gephi, it is recommended that you complete another CDCS tutorial, ‘Mapping a Family Network’, first.
Completing this tutorial will enable you to:
- Understand the uses of Gephi beyond social network analysis
- Design your own Gephi dataset in Excel and OpenRefine using real-life data
- Analyse Gephi networks (centrality and network level)
Things to Do First
Download Gephi 0.10.1 and OpenRefine to your own device. You can choose to download the Windows or Mac version.
You will also need a computer mouse to navigate Gephi graphs.
What is ‘Enigma Network’?
On 17 May 2017, David Webb , a retired investment banker and activist investor, published an article on his website. The article, entitled "The Enigma Network: 50 stocks not to own". Webb argued that investors should not invest in 50 HK-listed companies, which he said had cross-owned each other’s shares. This allowed the ‘big bosses’ behind the ‘Enigma Network’ to easily manipulate the prices of these stocks, and ultimately damage investors’ interests. Webb’s article had shocked the market, and the Securities and Futures Commission began to investigate Webb’s allegations. As a result, several executives of the companies in Webb’s networks were arrested and prosecuted, although most were eventually acquitted.
Webb’s website provided comprehensive data on the ‘Enigma Network’, especially the daily holdings data from the Central Clearing and Automated Settlement System (CCASS). This tutorial utilises the CCASS data from 16 May 2017, the day before Webb’s article was published, to reconstruct the ‘Enigma Network’ and to demonstrate Gephi’s ability to analyse real-life financial data.
Prepare a Gephi Dataset
In his article on the ‘Enigma Network’, Webb provides links to data on the 50 companies in the network:
- Click on a company hyperlink
- Go to ‘Key Data’
- Click on the ‘CCASS’ button on the page
- Select 16 May 2017 from the date picker
- Copy the CCASS data table to Excel. Only include shareholders who own more than 0.01 per cent of that company’s shares
- Do the same for all 50 companies in the ‘Enigma Network’
Cleaning CCASS Data in OpenRefine
- Download ‘Enigma Network_CCASS Records.xlsx’
- Open OpenRefine, upload ‘Enigma Network_CCASS Records.xlsx’, and then click ‘Next’
- Uncheck the box next to ‘Parse next’, check the box next to ‘Ignore first’, then enter ‘3’ between ‘Ignore first’ and ‘line(s) at beginning of file’
- Click on ‘Create projects’
- Click on the button next to ‘All’, select ‘Edit columns’, and then choose ‘Re-order/ remove columns’
- Remove columns 1, 2, 4, 5 and 7. Move column 8 to be before column 6. Then click ‘OK’
- Click on the button next to ‘All’, select ‘Edit all columns’, and then click ‘To titlecase’. Only select columns 3 and 8
- Export it as an .xlsx file
Preparing a Gephi Dataset in Excel
- Open the file exported from OpenRefine in Excel (refer to the ‘OpenRefine Results’ spreadsheet in the 'Enigma Network_Gephi Dataset Guide.xlsx’)
- Copy all the entitles in column 3 to a new spreadsheet, then remove the duplicates
- Do the same for column 8
- Copy the filtered list of column 8 entities under that of column 3, and then remove the duplicates
- Add one row to the left and one to the top of the filtered entities. Assign an ID to each entity. Rename the spreadsheet ‘Nodes’ (nodes table) (refer to the 'Nodes’ spreadsheet in the 'Enigma Network_Gephi Dataset Guide.xlsx’)
- Go back to the first spreadsheet. Add a column next to columns 3 and 8, then copy the ‘nodes table’ to this spreadsheet. (refer to the ‘xlookup’ spreadsheet in the 'Enigma Network_Gephi Dataset Guide.xlsx’)
- Use the ‘xlookup’ function to match the entities to their IDs.
- Open a new spreadsheet. Copy the IDs of the entities from column 3 to the first column. Rename this column ‘Source’.
- Copy the IDs of the entities from column 8 to the second column. Rename this column ‘Target’. In this case, the entities in column 3 are the shareholders of the companies in the ‘Enigma Network’. The entities in column 8 represent the 50 companies in the network.
- Copy column 6 to the third column. Rename this column ‘Weight’. These numbers represent the percentage of shares held by the shareholders. A higher percentage would give an edge more weight.
- Rename the spreadsheet to ‘Edges’ (edges table) (refer to the 'Edges’ spreadsheet in the 'Enigma Network_Gephi Dataset Guide.xlsx’)
- Copy the ‘Nodes’ and ‘Edges’ to a new Excel file and remame it ‘Enigma Network_Gephi’.
Can we use generative AI to create a Gephi dataset?
Yes, to some extent:
- I tried several generative AIs: ChatGPT-5, Claude Sonnet 4, Gemini 2.5 Pro and Grok 4. Grok could only identify fewer than 200 nodes (out of 490). Claude and Gemini failed to generate the nodes and edges table for Gephi, which is probably due to the complexity and volume of the data.
- ChatGPT-5 Thinking produced the best results. After receiving several prompts, it successfully transformed ‘Enigma Network_CCASS Records.xlsx’ into a Gephi dataset. While the dataset created by ChatGPT correctly identified all the nodes, it swapped the values in the ‘Source’ and ‘Target’ columns. This can easily be resolved.
- In short, generative AI can accelerate the process of preparing a Gephi dataset, but the results may vary. Users should therefore understand the logic and structure of a Gephi dataset before using AI to generate one.
Visualise and Analyse the ‘Enigma Network’ in Gephi
- Download the ‘Enigma Network_Gephi.xlsx’ and open Gephi.
Do not open the spreadsheets in Excel, if you have opened them, close them before opening Gephi
- Click on ‘New Project’ and then go to ‘Data Laboratory’
- Click on ‘Import Spreadsheet’, select ‘Enigma Network_Gephi.xlsx’ and press ‘open’.
- Import ‘Nodes’. Select ‘Directed’ for ‘Graph Type’
- Go to ‘Import spreadsheet’ and import ‘Edges’
- Select ‘BigDecimal’ for the ‘Weight’ column, and then choose ‘Append to existing workplace’
- Go to ‘Overview’ and select ‘Force Atlas’ for the layout. Then reset the ‘Repulsion strength’ to '10000’
- Click ‘Run’
Measuring the Network
- Go to ‘Statistics’.
- Run ‘Avg. Weighted Degree’
- Go to ‘Data Laboratory’, and you will find statistics for several types of centrality measures, including ‘Weighted In-Degree’, ‘Weighted Out-Degree’, and ‘Weighted Degree’
We will focus on ‘Weighted Out-Degree’ centrality, as this represents the number of edges connected to each node, taking into account the weight of the edges.
In our case, it measures a broker’s investement in or its shares of the 50 companies in the Enigma Network.
In other words, the nodes with high degree in this measurement are probably the ‘big bosses’ behind the network.
Visualising the Network
Before we analysis the result of centrality measures, we can do some refinements to the network visualisation.
- Go to ‘Overview’ and select ‘Filters’
- Select ‘Edge Weight’ under ‘Edges’
- Change the ‘Edge Weight’ range from ‘0.01 to 75.04’ to ‘1 to 75.04’.
- Click on ‘Select’ and then on ‘Filter’
- Click on the ‘Attributes’ button and select ‘company’. Then click on the bold ‘T’ next to the ‘Attributes’ button to make the names of the entities pop up.
- Go to ‘Appearance’ and click on the ‘Size’ button
- Select ‘Ranking’ and choose ‘Weighted Out-Degree’. Then, change the ‘Min size’ to ‘10’ and the ‘Max size’ to '80’
- Click on the ‘Label Size’ button
- Select ‘Ranking’ and choose ‘Weighted Out-Degree’. Then, change the ‘Min size’ to ‘0.5’ and the ‘Max size’ to '4’
- Select ‘Label Adjust’ under ‘Layout’, then click ‘Run’
You can export the visualisation as a graph file or take a screenshot using the ‘Take screenshot’ button.
Remember to adjust the width and height in the ‘Screenshot settings’ menu to improve the picture’s resolution.
Analysing the Network
- Go to ‘Data Laboratory’
- Re-order the ‘Weighted Out-Degree’ from largest to smallest
You can then identify the securities companies that masterminded the ‘Enigma Network’. Interestingly, some of the nodes with a high weighted out-degree are major banks, such as HSBC and the Bank of China.
These banks were certainly not involved in the conspiracy, other than holding shares in the 50 companies for various reasons. It is therefore useful to remove these banks from the dataset.
Refine the Dataset in OpenRefine
- Upload the ‘OpenRefine Results’ spreadsheet to OpenRefine from the 'Enigma Network_Gephi Dataset Guide.xlsx’.
- Select only the ‘OpenRefine Results’ spreadsheet, and then check the box next to ‘Parse next’ and enter ‘1’ between ‘Parse next’ and ‘line(s) as column’
- Click on ‘Create projects’
- Click on the button next to column 3, select ‘Edit cells’, and then choose ‘Replace’
- Type ‘The Hongkong And Shanghai Banking’ into the ‘Find’ box, then leave the ‘Replace with’ box blank. Then click ‘OK’
- Do the same for ‘Bank Of China (hong Kong) Ltd’
- Click on the button next to ‘All’, select ‘Facet’, and then click ‘Facet by blank (null or empty string)’
- Go to ‘Facet/ Filter’, the click on the box under ‘Blank rows’
- Change the General Refine Expression Language (GREL) expression to 'isBlank(cells[“Column 3”].value)'
- Select ‘true’
- Click on the button next to ‘All’, select ‘Edit rows’, and then click 'Remove all matching rows’
You can do the same for other banks, such as Citibank and UBS. Save the file as .xlsx and then repeat the above steps to prepare a Gephi dataset and network.
Credits
About the author
Brian Tsz Ho Wong is a PhD student at the University of Edinburgh.
His thesis examines the economic and financial mobilisation of the Japanese Empire during WWII.
Nascent findings have recently been published in the Financial History Review.
Outside of his PhD project, he is a Training Fellow at the University of Edinburgh’s Centre for Data, Culture and Society, where he has delivered courses on applying network analysis to humanities research.
He is also a regular contributor to the Digital Orientalist and is a member of the Cold War Archival Research Institute.
Suggested citation: Tsz Ho Wong, “Deciphering a Financial Network with Gephi,” CDCS Tutorials, 2025
This resource’s data was extracted from Webb-site, a free database.
This resource is covered by a CC-BY-NC 4.0 license
Please help us keeping our tutorials up to date. If you find something that is not working email us at cdcs@ed.ac.uk