You surely
aware about it, Power Query is an Excel add-in published by Microsoft via its Power
BI new technology, dedicated to the importation of data from multiple sources.
If we are
numerous to see in Power Query a tool with an exceptional potential, there is a
field where it is still rather weak: the importation of data displayed on a Web
page. Finally it is not really Power Query which is weak, it is rather than
after 2500 lines of Javascript, the final HTML of the sites is often completely
not exploitable…
To surmount
the problem there is a fantastic tool called kimono. With kimono, you will
generate in a completely graphic way, an API from a Web site directly from of
your browser.
To
understand the subject we can take an example, data since MetaCritic Web Site will
be used, the Web site which incorporates the notes given in the press (paper or
Web) about the video games.
We will
start with trying to connect directly to MetaCritic via Power Query, to note
the inutilisability of the thing:
1 – I launch
Excel, Power Query Menu, option Import from Web, then I write the address of
the site source:
2 – Oups,
the browser of Power Query does not leave himself there all alone, we will Edit
it:
3 – And
boom, good luck! If somebody found a method, I am taking, me I give up there in
general…
So let’s
connect to kimono (after the subscribtion, we add the bookmarklet to its bar of
shortcut) and it will create the API with a graphic way, let’s start!
1 – I open
my browser, I go on the site source and I use kimonify (the bookmarklet). The
tool bar Kimono appears at the top of the page, and I start by importing the
names of the games right while clicking on them in the page of MetaCritic. The
engine of Kimono recognizes attribute HTML then and identifies the 88 names, I
rename the field “Name of the game”:
2 – I press
on + in the bar Kimono, I select the first note (92 for DS2), the second (88
for Hearthstone), and again Kimono identifies the 88 following values. I rename
the field “Notes”:
3 – I
repeat the manipulation with the Score User, this time the engine of Kimono
hesitates a little, it proposes several series to me and by select other scores
and to refuse other attributes (as well as the label “To use Score”), it finds
me 85 (indeed 3 games do not have this score, but that does not break the
recognition):
4 – I have
enough information for the moment, I validate (Done in the bar Kimono), I give a name to my API, and a period of
refreshing (real time for the demonstration, but one can reduce the load on the
source in refreshing the data set only periodically):
5 – And
Kimono returns me to the dashboard of my API:
6 – Via the
menu “How To Use” I find the necessary elements to reach the API, including the
URL (endpoints) which I will be able to transmit to Power Query in JSON, CS or
RSS:
7 – We can
test EndPoint CSV (yeah I am oldschool) immediately:
8 – But it’s
better to call it directly from Power Query (From Web always):
9 – And
after some small handling (we delete the first line, we separate the columns by
delimiter, we use the first line like a name of column, we clean User Score, we
filter the lines of waste), we obtain the good dataset. Note: if you choose the
JSON and It’s not working, it is that the API is badly formed, best is to break
it and start again (Kimono is still in beta).
10 – to
obtain the expected data in Excel:
Personally
I find that just enormous! And it’s all in live, Query in Excel and the API
side Kimono, therefore it is enough to refresh to get the updated data from the
source.
No comments:
Post a Comment