Democratizing AI with ANAI.
Did you know:
- 63% of companies are still experimenting with AI
- AI/ML Projects take quarters and Years before they get operationalized.
Do you want to know the reason?
It happens because they struggle a lot with the different stages required to create good Models; ironically, those stages don’t even involve the creation of ML Models. Instead, the sets come before Model creation and Post Model Creation.
These problem-causing stages involve Ingestion of Data to the system, Pre-Processing Data, and Feature Engineering. Then, after model training Tracking Model Experiments, Deployment of that Model, and Managing Deployed Model.
So, you must be thinking if there’s any tool or way where everything can integrate with very few lines of code. Yes, my friend, one tool covers all of these. The tool’s name is….. ANAI
noun [ C ]
– Simple, Scalable, Integrated All in One ML Platform
– From Ingesting the Data to Data Pre-processing to Feature Engineering to Model Training to Model Explanations to Model Deployment to Managing the Model. ANAI got all covered
– Easily outperforms and provides the best performance for any AI-based system
I’ve been using ML for the last 4–5 years. When I was a beginner (though I’m still a beginner), it was challenging to learn all the libraries required to do various steps before even training Models.
It takes an average of 20 lines of code to load the data, get data description, Encode the data, Scale the data, and Split the data into train and test sets. So I always find it unnecessary even though it was necessary. But that’s not the case with ANAI.
These days “Less is More”.
Therefore, with ANAI we can ingest the data in one line, summarize the data in one line, Scale/Encode/Split/Smote in one line, and even train a model in one line. Exciting, isn’t it?
1. Installing ANAI
pip install anai-opensource
2. Importing ANAI
from anai.preprocessing import Preprocessor
3. Loading Data
This step is optional and only needed when you want to use data anywhere other than training.
df = anai.load(df_filepath = '../input/stroke-prediction-dataset/healthcare-dataset-stroke-data.csv')
ANAI uses Modin[dask] as its backbone for data handling so that you can use Big Dataset without worrying about Memory.
prep = Preprocessor(dataset = df, target = 'stroke')
a) Summary of Data
ANAI provides a single pane summary view of the dataset. This summary includes the Number of Records, Variables, Cells, Missing Values, Duplicate Values, and Even giving how many Anomalies are there.
summary = prep.summary()
b) Column Level summary
This function gives the Column level report from the dataset.
column_summary = prep.column_summary()
5. Model Training (Best Part)
We are going to Train 5 ML models from ANAI.
Remember, I told loading data is an optional step because anai.run() can load the dataset automatically from a file.
Here, our Target variable from this dataset is the “Stroke” column.
ai = anai.run(filepath = '../input/stroke-prediction-dataset/healthcare-dataset-stroke-data.csv', target = 'stroke', predictor = ['rfc', 'cat', 'xgb', 'lgbm', 'ext'], except_columns = ['id'])
Did you notice in anai.run() we didn’t specify the task? Because ANAI determines the task automatically.
ANAI took 144.39 seconds to train the models along with Ensembling all models.
ANAI is an AI for AI platform whose engine automatically determines the task for the Model
Here, we are going to explain the best Model
a) Permutation Explanations
b) SHAP Explanations
We can see the leaderboard of the Trained Models like this.