Clustering Large-Scale Text Data Using BERT Embeddings and Spectral Methods

Authors

  • Prof. Jonathan Sharma Author

Abstract

Text clustering is a fundamental task in natural language processing, often challenged by high dimensionality and semantic ambiguity. This paper explores the integration of BERT embeddings with spectral clustering methods for large-scale text data. By leveraging BERT’s contextualized representations, the approach captures semantic nuances, while spectral clustering ensures robust partitioning. Experiments on news articles and customer reviews datasets reveal improved clustering coherence and scalability, making the proposed method suitable for applications in sentiment analysis and topic discovery.

References

Kotler, P., & Keller, K. L. (2012). Marketing management (14th ed.). Pearson Education.

Krugman, P. R., & Obstfeld, M. (2009). International economics: Theory and policy (8th ed.). Pearson.

Maxwell, J. A. (2013). Qualitative research design: An interactive approach (3rd ed.). Sage.

Mintzberg, H. (1994). The rise and fall of strategic planning. Free Press.

Porter, M. E. (1980). Competitive strategy: Techniques for analyzing industries and competitors. Free Press.

Sekaran, U., & Bougie, R. (2016). Research methods for business: A skill-building approach (7th ed.). Wiley.

Yin, R. K. (2018). Case study research and applications: Design and methods (6th ed.). Sage.

Adusumilli, S., Damancharla, H., & Metta, A. (2020). Artificial Intelligence-Driven Predictive Analytics for Educational Behavior Assessment. Transactions on Latest Trends in Artificial Intelligence, 1(1). Retrieved from https://www.ijsdcs.com/index.php/TLAI/article/view/638

Adusumilli, S., Damancharla, H., & Metta, A. (2020). Machine Learning Algorithms for Fraud Detection in Financial Transactions. International Journal of Sustainable Development in Computing Science, 2(1). Retrieved from https://www.ijsdcs.com/index.php/ijsdcs/article/view/639

Adusumilli, S., Damancharla, H., & Metta, A. (2021). Deep Learning Techniques for Image Recognition in Autonomous Vehicles. (2021). International Meridian Journal, 3(3). https://meridianjournal.in/index.php/IMJ/article/view/94

Adusumilli, S., Damancharla, H., & Metta, A. (2021). Integrating Machine Learning and Blockchain for Decentralized Identity Management Systems. (2021). International Journal of Machine Learning and Artificial Intelligence, 2(2). https://jmlai.in/index.php/ijmlai/article/view/46

Adusumilli, S., Damancharla, H., & Metta, A. (2022). Blockchain-Based Secure Framework for IoT Data Management. International Journal of Sustainable Development in Computing Science, 4(1). Retrieved from https://www.ijsdcs.com/index.php/ijsdcs/article/view/640

Adusumilli, S., Damancharla, H., & Metta, A. (2022). Optimizing Supply Chain Efficiency Through Blockchain and Smart Contracts. (2022). International Numeric Journal of Machine Learning and Robots, 6(6). https://injmr.com/index.php/fewfewf/article/view/183

Adusumilli, S. B. K., Damancharla, H., & Metta, A. R. (2021). AI-Powered Cybersecurity Solutions for Threat Detection and Prevention. International Journal of Creative Research In Computer Technology and Design, 3(3).

Adusumilli, S. B. K., Damancharla, H., & Metta, A. R. (2020). Leveraging AI for Real-Time Sentiment Analysis in Social Media Networks. International Numeric Journal of Machine Learning and Robots, 4(4).

Dhaiya, S., Pandey, B. K., Adusumilli, S. B. K., & Avacharmal, R. (2021). Optimizing API Security in FinTech Through Genetic Algorithm based Machine Learning Model.

Adusumilli, S. B. K. Mitigating Cybersecurity Risks in Embedded Systems A Software-First Approach.

Whig, P., & Adusumilli, S. B. K. (2022). Machine Learning Applications in Healthcare Supply Chains: Improving Efficiency, Resilience, and Patient Outcomes. Transactions on Recent Developments in Health Sectors, 5(5).

Published

2022-01-30

Issue

Section

Articles

How to Cite

Clustering Large-Scale Text Data Using BERT Embeddings and Spectral Methods. (2022). International Journal of Data Science and Analytics (INN-DS&A), 3(3). https://internationaljournals.glawards.org/index.php/INNDSA/article/view/36