ARTICLE
TITLE

Web Scraping with HTML DOM Method for Website News API creation

SUMMARY

Information is one of the important things in this era, one of the information that always exists every day is news. The amount of news that appears every day becomes a new problem when news websites do not provide API (Application Programming Interface) services to get the news. This is an obstacle for researchers who will analyze news topics. The copy and paste method is less effective in getting news every day on news websites because it takes a long time. In this research, web scraping is done with the HTML (Hypertext Markup Language) DOM (Document Object Model) method to retrieve data from news sites. The results of web scraping are in the form of datasets which are then entered into the database and made into an API. The API that has been created is tested using black box testing and testing the suitability of the data, between the data obtained when scraping and the data on the news website at the time of testing. The results of testing using black box testing show that the filters on the API created run according to their functions and get a high percentage of data conformity. The Tribunnews.com news website has a conformity rate of 99.2%, Detik.com of 97.9% and Li-putan6.com of 98.6%.

KEYWORDS

 Articles related

Doni Setyawan,Edi Winarko    

 Through online stores, consumers can give an opinion of a product, one of the best-selling products is smartphone. Their opinions become valuable and can be worthwhile to know the advantages or disadvantages of products based on the user’s experien... see more


Antonius Mbay Ndapamuri,Danny Manongga,Ade Iriani    

Penelitian ini bertujuan untuk mengklasifikasikan review pada aplikasi Tripadvisor yang terdapat pada Google Playstore berbasis Word Cloud dan Visual Network Explorer dengan algoritma Support Vector Machine (SVM), ... see more


Maximus Aurelius Wiranata,Theresia Ratih Dewi Saputri    

Talent in the field of information technology is much needed. However, studying in the field of information technology requires a sizable fee. Online courses are a cost-effective option for learning. Online course sites like Udemy provide and sell hundre... see more


(1) Nur Aliah Khairina Mohd Haris (School of Computing Sciences, College of Computing, Informatics and Media, Universiti Teknologi MARA, Malaysia) (2) Sofianita Mutalib (School of Computing Sciences, College of Computing, Informatics and Media, Universiti Teknologi MARA, Malaysia) (3) Ariff Md Ab Malik (Faculty of Business and Management, Universiti Teknologi MARA, Malaysia) (4) Shuzlina Abdul-Rahman (Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Malaysia) (5) Siti Nur Kamaliah Kamarudin (Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Malaysia)    

User-generated content is critical for tourism destination management as it could help them identify their customers' opinions and come up with solutions to upgrade their tourism organizations as it could help them identify customer opinions. There are m... see more


Gede Indrawan, Ahmad Asroni, Luh Joni Erawati Dewi, I Gede Aris Gunadi, I Ketut Paramarta    

One of the main factors causing the decline in the use of Balinese Script is that Balinese people are less interested in reading Balinese Script because of their reluctance to learn Balinese Script, which is relatively complicated in the recognition proc... see more

Revista: Lontar Komputer