Our client's global operations are spread across ten offices in five countries, and they use a closed enterprise suite, called Brandwatch, to analyze digital marketing and social networks. It provides them with good insights and metrics for the English language, by periodically crawling several social networks, blogs, and other websites in search of keywords.
However, Brandwatch does not perform at the same level of accuracy when switching to other languages such as Spanish, Portuguese, or Italian. From a large set of Brandwatch features, our customer remains focused on its NLP (Natural Language Processing) capabilities and insights discovery. Since they provide global services to their multinational customers, they wanted to find a way to support this type of analysis in a multi-language setting.
Even though the client is not a small company –they have more than 500 employees and its own IT department–, they were struggling to develop a custom solution to generate insights beyond the existing Brandwatch limitations.
After analyzing the requirements and agreeing on a proposed implementation, 3XM Group quickly assembled a team composed of a Data Architect, a Data Engineer, and a Data Scientist to implement a Natural Language Processing & Understanding (NLP & NLU) platform for multiple languages. This platform has the capacity to reveal a more accurate set of insights on Social Network Analysis, based on extracted data from Brandwatch plus other data sources such as Social Networks, blog sites, forums, newspapers, etc.
KPIs and metrics stored in a database are useless for an analyst without a flexible Dashboard for rendering infocharts, so our team also implemented and deployed a custom dashboard based on the open-source BI tool Superset. The multi-user and auto-scalable UI was a containerized solution on AWS ECS.
Below is a diagram of the overall solution architecture:
- Scalable, Extensible, and fully customizable Natural Multi-Language Processing pipeline
- Comprehensive Multi-User Dashboard & BI User interface based on Apache SuperSet.
- Technical coordination of multicultural & globally distributed teams under Scrum Agile methodology
- Source Code + Infrastructure as code for deploying the entire platform in AWS cloud
- Complete documentation for deployment, configuration, and usage of this platform
The data flows over a comprehensive NLP pipeline, with every step following strict data life cycle practices: labeling and storing data and resulting KPIs in aggregated and raw formats. This enables traceability, which allows our customer to rebase the historical data with different business rules and threshold configurations, for even more accurate insights.
Deployed in our customer AWS account, the platform is composed of two main blocks: The NLP pipeline and the pluggable Dashboard.
The NLP pipeline is orchestrated by a custom DAG workflow, managed by AWS StepFunction. It automatically triggers distributed jobs once new files reach the platform. The workflow is also aware of errors and automatically retries management, alert notifications, and the AWS EMR life cycle in the event of an error, to optimize costs.
The entire platform can be deployed in one-click mode, thanks to IaC Terraform scripts. After the deployment process, more than 90 AWS items are properly configured, guaranteeing the security, reliability, and scalability our SLAs require.
The implemented solution allows the client to interpret and understand the impact of each of their campaigns on the LATAM Spanish-speaking market, making use of Artificial Intelligence for Natural Language Understanding.
Also, the solution was designed to easily support new languages, making it readily expansible.
➔ Apache Superset
➔ Brandwatch API
➔ Social Media APIs (i.e. TW API)
➔ Redis Cache (Superset)
➔ NLP Models
- Spacy - TF-IDF
- Spark ML - LDA
- Spacy - NER
- Spacy - Lemmatizer
- Spacy - Autoencoders
- Pytorch - Translate
- TextBlob - Polarity & Subjectivity
- AWS EMR
- S3 Buckets
- Step Functions
- Lambda Functions
- AWS Athena
- AWS Glue
- IAM Roles and Policies
- AWS Sagemaker Jupyter Notebooks
- Application Load Balancer (ALB)
- SNS topics for Success and Errors
- Elastic Container Services Cluster (SuperSet)
- AWS RDS MySQL (SuperSet)