Exploratory Data Analysis (v0.1.0)

Version 0.1.0 of the lexical-cloud-docs project introduces product tiers to support the models and components of a product. This is the first release with analysis of the content at lexical-cloud-data. The full notebooks, including output and commentary, are available in lexical-cloud-analysis. A summary of that content is found within this page.

Taxonomy

Full analysis of taxonomy data can be found in the notebook on GitHub. Let’s review some findings below.

How many of each taxonomy entity are there?

count
category 110
domain 27
feature 40
label 1
provider 4
service 16
Takeaways
  1. Feature coverage is currently an area for improvement.
  2. There should be far more features than categories.

Products

Full analysis of product data can be found in the notebook on GitHub. Let’s review some findings below.

How is the taxonomy represented across product tiers?

count providers services domains categories features labels
component 3 3 3 3 3 1 0
model 13 13 13 13 13 3 0
product 261 261 261 261 261 62 3

Takeaways

  1. Components and models are still a new concept.
  2. A deeper dive into existing products is necessary.

How is the taxonomy represented across providers?

count services domains categories features labels
aws 108 108 108 108 19 2
azure 89 89 89 89 25 0
gcp 62 62 62 62 17 1
github 2 2 2 2 1 0

Takeaways

  1. Services, domains and categories have a one-to-one relationship with products for every provider.
  2. Features should far exceed the product count.

In retrospect, it would be useful to also review the unique taxonomy entries by provider. Next time!

How is each provider represented across services?

aws azure gcp github
ai 4 7 7 0
analytics 3 4 3 0
compute 11 8 8 0
database 8 9 5 0
developer tools 12 9 2 2
framework 3 2 2 0
governance 14 12 5 0
hybrid 5 2 1 0
identity 3 2 2 0
integration 6 8 7 0
iot 0 5 0 0
migration 12 1 4 0
monitor 9 4 6 0
network 13 14 9 0
security 10 5 2 0
storage 6 5 3 0
Takeaways
  1. Some providers have receieved more attention on certain services.
  2. Full consideration has yet to be given to any provider or service.

Review

Consistency is the theme from the takeaways discussed above. These improvements need prioritization:

  1. Features should exist for every existing product.
  2. Components and models should be added where applicable to existing products.
  3. Equitable attention should be given to providers across services.