The State of Open Source ML Frameworks
TensorFlow vs PyTorch and the top data engineers to watch.
When it comes to deep learning, TensorFlow and Pytorch are without a doubt the leading frameworks in both industry and academia. Horace He, a researcher from Cornell University has analyzed references to TensorFlow and PyTorch in public sources over the past year. Horace found that whilst the popularity of PyTorch is growing rapidly within the research community, when it comes to industry TensorFlow is still the favorite for production deployments. Horace’s findings can be found at the Gradient where it was first published - for the VC in a rush we have extracted the relevant key findings from the article and added our own recent study on TensorFlow production adoption, with an emphasis on fortune 500 companies.
Key findings from the article
For the research community, the author surveyed abstracts submitted to five top AI conferences in 2018. The author found an average increase of 275 percent of researchers using PyTorch, and an average decrease of roughly 0.5 percent for TensorFlow over the year. Whereas for industry, the author analyzed 3,000 job listings and the results show that job vacancies with a requirement for TensorFlow experience are outnumbering the ones asking for experience in PyTorch. The author also surveyed LinkedIn and found 2,030 more articles in favor of TensorFlow than for PyTorch.
The graph above shows the ratio between PyTorch papers and papers that use either Tensorflow or PyTorch at each of the top research conferences (described below) over time. All the lines slope upwards, and every major conference in 2019 has had a majority of papers implemented in PyTorch.
CVPR, ICCV, ECCV - computer vision conferences
NAACL, ACL, EMNLP - NLP conferences
ICML, ICLR, NeurIPS - general ML conferences
The graph above suggest that in 2018, PyTorch’s adoption was very small amongst researchers. Whereas, in 2019, PyTorch became the most adopted, with 69% of CVPR using PyTorch, 75+% of both NAACL & ACL, and 50+% of ICLR & ICML. While PyTorch’s dominance is strongest at vision and language conferences (outnumbering TensorFlow by 2:1 and 3:1 respectively), PyTorch is also more popular than TensorFlow at general machine learning conferences such as ICLR and ICML.
Why should VCs care about all this?
Firstly, although PyTorch is dominant in research, data shows that TensorFlow is clearly the dominant framework in industry. The research also shows that from 2018 to 2019, TensorFlow had 1541 new job listings vs. 1437 job listings for PyTorch on public job boards. This indicates to us that in general both frameworks are still not widely adopted by the enterprise community yet, else we would have seen a large number of job listings! Other interesting stats are; 3230 new TensorFlow Medium articles vs. 1200 PyTorch, 13.7k new GitHub stars for TensorFlow vs 7.2k for PyTorch, etc.
In addition, recent research analysis by Kähler’s team regarding TensorFlow production deployments suggests that only 12% of fortune 500 companies have actually deployed TensorFlow in a production environment. This helps explain why TensorFlow’s new job listings are not that large compared to more widely adopted technologies.
Consequently, there exists a great market opportunity for AI/ML dev tool startups to help fortune 500 companies successfully deploy TensorFlow in production at scale. Moreover, our research analysis estimates that 60% of the top 100 machine learning-powered startups in the world are currently using TensorFlow in production. This means that companies that are not using TensorFlow in production are probably at a greater disadvantage in terms of deep learning benefits.
Finally, we estimate that around 55% of TensorFlow production deployments from fortune 500 companies, and the top 100 machine learning-powered startups, are all running on AWS. This clearly indicates that AWS is well ahead of other cloud players in terms of AI/ML workloads, namely; Google Cloud Platform (GCP).
Top Data Engineers to watch
We’ve been tracking US / European based data engineers contributing to top open-source projects and are also working at big tech companies/notable VC-backed startups. Today, we’re delighted to share the profiles of the following TensorFlow & PyTorch contributors - we’ve also computed the probability of each of them co-founding a new company! Plus, as an early-stage VC, you’ll ideally want to connect and try to establish a relationship with these engineers way before they start a company. This increases the chances of your firm being one of their lead investors, comfortably before other VCs join the party!
Yong Tang
Director of Engineering at MobileIron | GitHub | LinkedIn
What is the chance of Yong co-founding a new company?
49.9%
Adam Paszke
Student at University Warsaw, co-creator of PyTorch | GitHub | LinkedIn
What is the chance of Adam co-founding a new company?
88.2%
Please subscribe to our paid newsletter plan below to access the profiles of other great engineers with high chances of starting new companies.
We are also offering “Limited Trial” to select VCs firms and members of the media - please fill out the request below so we can follow up with you.
About Us
Kähler VC.X uses Big Data & AI to provide market intelligence for the venture capital sector by collecting & analyzing millions of data points from various sources/formats to help VCs identify new trends and hidden high-value opportunities before the competition.
The Kähler VC.X platform is powered by a novel quasi Topological Data Analysis (TDA) framework that is specifically deployed to help find hidden relationships that exist in data sets from various sources/formats. These hidden relationships are synthesized into actionable insights that are meaningful to VCs with respect to; deal source process, M&A strategies and market trends.