Taking advantage of the Python and R scripts that I developed in a previous post, I wanted to learn about the state of the art of the companies in Dublin according to Crunchbase (APIv2). It’s true however that Crunchbase, and other similar websites, can give an incomplete vision about the companies in a specific city and on the other hand, this information, sometimes, is a bit biased, because they themselves manage, as it’s logical, what information want to show to potential investors and what not. In any case, it’s a good starting point to glimpse the technological potential of this city, where by the way, are many of the largest software companies worldwide. In this sense, I wanted to know some things related to “business ecosystem” of Dublin as the following ones: Which are the companies that have received more investments in the last years?, Who are the main investors?, Which is the order of magnitude of the investments?, Which are the most important business areas (“categories”) in the city?, etc.
An aspect to highlight in Crunchbase is that the companies indicate one or many categories (or fields) where their businesses are developed, so, there isn’t an unique “tag” that describes a company. However, some companies for example don’t choose the category “startup”, but in their descriptions they consider themselves as “startups”. On the other hand, doing a search for Dublin city, I gathered 1227 companies where only 186 included information about their investments, i.e. they mentioned investors and funding, although sometimes a “undisclosed amount” was considered as zero EUR. Furthermore, some companies like Mongodb-Inc, maybe must be considered as “outliers” or an unrepresentative company, because in this case, Mongodb’s Headquarter is in New York city, but its EMEA Headquarter is exactly in Dublin city. So, it’d be necessary to improve the search with filters more accurate to avoid this situation. Unfortunately, GPS coordinates are “missing in action” in the system and they must be generated directly using the addresses of the offices. I’ve a pending Python script via geopy to generate a new column with lat/long coordinates.
a) Total Fundraised vs Year
b) Top 10 Companies
(*) Mongodb-Inc can be considered an “outlier”.
c) Top 10 Inversors
d) Main Categories
By using Igraph for R, it’s possible to see the different connected components in the whole graph. This is an example:
Two charts that relate Funding with metrics like degree and pagerank.