Documentations

Installation and training for My Web Intelligence tools

0 72

0/4 – General presentation

This video is the presentation of a series of very short training modules for the adoption of My Web Intelligence tools. My Web Intelligence is a project by Amar LAKEL, a researcher in Information and Communication Sciences (SIC) within the E3D team of the MICA Laboratory, which aims to equip researchers in SHS and experts in the information, communication, and marketing of a suite of OPEN SOURCE and REALLY FREE tool resulting from public research to build large corpora of the web. this tool is an essential prerequisite before any controversy analysis and influencer mapping project.

1/4 - Software Installation

20 minutes to understand how to install the My Web Intelligence software suite. You absolutely must have seen the video presentation above.

2/4 - Build a list of urls for web browsing

In the survey methodology known as step by step or ‘snowball’, we rely on the network of links to explore/recruit a corpus. Obviously, the subnets are not all interconnected so it is important to ” have a maximum of relevant starting links because each one can be an entry key in an invisible sub-network Here is a small tool developed by Sciences Po’s Media Lab and one or two tips to make URLs “spit” to Google who does not like to give them away.

3/4 - Crawling the web and building its digital corpus

Let’s get to the heart of the competence of My Web Intelligence with the 1st software brick, MyWebIntelligencePython, and learn how to crawl the web.
5 command lines, nothing could be easier!

4/4 - Cleaning and annotation of the web corpus

Any survey and a fortiori any automated crawl of a complex ocean of data require a phase of data cleaning and enrichment. The MyWebClient software brick is a web interface that allows you to explore your corpus, clean it, and enrich it before the phase.

Problems ?

Make a ZOOM appointment?

About the author / 

Amar LAKEL

Related Posts

Linkedin Page

Follow our news

Facebook Page

Top articles

Contact us

Recent comments