How to get the APIs from a Single Page Application?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I'm browsing a web page which is a React SPA. There's some data I want to scrape but I was thinking if I can get it directly from the site API.
My question is: is there any way I can get the URL of the API(s) of a Single Page Application when browsing it?
reactjs api web-scraping
add a comment |
I'm browsing a web page which is a React SPA. There's some data I want to scrape but I was thinking if I can get it directly from the site API.
My question is: is there any way I can get the URL of the API(s) of a Single Page Application when browsing it?
reactjs api web-scraping
2
well, you definitely can try to use the browser console to see the network calls, but it's unlikely that any good website with any basic amount of security will allow you to just make random calls against their server..... That's called Cross Site Request Forgery, and most even basic APIs prevent it.
– Claies
Nov 24 '18 at 3:32
I've tried the network tab but with no success. I was wondering if it's an open API, but maybe you're right. Thank you.
– timarcosdias
Nov 24 '18 at 3:39
@Claies - csrf tokens don't prevent you from making requests, it prevents other sites from doing so.
– pguardiario
Nov 24 '18 at 4:44
@pguardiario isn’t that exactly what this question is asking how to do?
– Claies
Nov 24 '18 at 4:46
1
If it isn't coming from a XHR (which you would see in the Network tab) then it's probably in a big json object in the html
– pguardiario
Nov 24 '18 at 4:48
add a comment |
I'm browsing a web page which is a React SPA. There's some data I want to scrape but I was thinking if I can get it directly from the site API.
My question is: is there any way I can get the URL of the API(s) of a Single Page Application when browsing it?
reactjs api web-scraping
I'm browsing a web page which is a React SPA. There's some data I want to scrape but I was thinking if I can get it directly from the site API.
My question is: is there any way I can get the URL of the API(s) of a Single Page Application when browsing it?
reactjs api web-scraping
reactjs api web-scraping
asked Nov 24 '18 at 3:25
timarcosdiastimarcosdias
32338
32338
2
well, you definitely can try to use the browser console to see the network calls, but it's unlikely that any good website with any basic amount of security will allow you to just make random calls against their server..... That's called Cross Site Request Forgery, and most even basic APIs prevent it.
– Claies
Nov 24 '18 at 3:32
I've tried the network tab but with no success. I was wondering if it's an open API, but maybe you're right. Thank you.
– timarcosdias
Nov 24 '18 at 3:39
@Claies - csrf tokens don't prevent you from making requests, it prevents other sites from doing so.
– pguardiario
Nov 24 '18 at 4:44
@pguardiario isn’t that exactly what this question is asking how to do?
– Claies
Nov 24 '18 at 4:46
1
If it isn't coming from a XHR (which you would see in the Network tab) then it's probably in a big json object in the html
– pguardiario
Nov 24 '18 at 4:48
add a comment |
2
well, you definitely can try to use the browser console to see the network calls, but it's unlikely that any good website with any basic amount of security will allow you to just make random calls against their server..... That's called Cross Site Request Forgery, and most even basic APIs prevent it.
– Claies
Nov 24 '18 at 3:32
I've tried the network tab but with no success. I was wondering if it's an open API, but maybe you're right. Thank you.
– timarcosdias
Nov 24 '18 at 3:39
@Claies - csrf tokens don't prevent you from making requests, it prevents other sites from doing so.
– pguardiario
Nov 24 '18 at 4:44
@pguardiario isn’t that exactly what this question is asking how to do?
– Claies
Nov 24 '18 at 4:46
1
If it isn't coming from a XHR (which you would see in the Network tab) then it's probably in a big json object in the html
– pguardiario
Nov 24 '18 at 4:48
2
2
well, you definitely can try to use the browser console to see the network calls, but it's unlikely that any good website with any basic amount of security will allow you to just make random calls against their server..... That's called Cross Site Request Forgery, and most even basic APIs prevent it.
– Claies
Nov 24 '18 at 3:32
well, you definitely can try to use the browser console to see the network calls, but it's unlikely that any good website with any basic amount of security will allow you to just make random calls against their server..... That's called Cross Site Request Forgery, and most even basic APIs prevent it.
– Claies
Nov 24 '18 at 3:32
I've tried the network tab but with no success. I was wondering if it's an open API, but maybe you're right. Thank you.
– timarcosdias
Nov 24 '18 at 3:39
I've tried the network tab but with no success. I was wondering if it's an open API, but maybe you're right. Thank you.
– timarcosdias
Nov 24 '18 at 3:39
@Claies - csrf tokens don't prevent you from making requests, it prevents other sites from doing so.
– pguardiario
Nov 24 '18 at 4:44
@Claies - csrf tokens don't prevent you from making requests, it prevents other sites from doing so.
– pguardiario
Nov 24 '18 at 4:44
@pguardiario isn’t that exactly what this question is asking how to do?
– Claies
Nov 24 '18 at 4:46
@pguardiario isn’t that exactly what this question is asking how to do?
– Claies
Nov 24 '18 at 4:46
1
1
If it isn't coming from a XHR (which you would see in the Network tab) then it's probably in a big json object in the html
– pguardiario
Nov 24 '18 at 4:48
If it isn't coming from a XHR (which you would see in the Network tab) then it's probably in a big json object in the html
– pguardiario
Nov 24 '18 at 4:48
add a comment |
1 Answer
1
active
oldest
votes
There is no straight-forward answer to this question. But you can try one of these two methods.
Inspect the dev tools to get the API endpoints
If the site has docs, you can check if there are docs for their API
But mostly, you'll have problems accessing the API if the site is not open to exposing their API.
CORS prevention. They might not let you or anyone make API calls to their API. Only their site will be allowed to make API calls.
Tokens. If the tokens have a quicky expiry, you will have to find a way to obtain tokens often.
CORS is a browser security feature, so it almost never applies to scraping and definitely doesn't here.
– pguardiario
Nov 24 '18 at 6:42
@pguardiario The OP is not asking about scraping. The OP wants to get the data directly from the API instead of scraping. Also, CORS is a server configuration. Browsers by default disable CORS unless the server asks the browsers to allow-cors explicitly in the response headers.
– Dinesh Pandiyan
Nov 24 '18 at 6:46
You can read more about CORS here - developer.mozilla.org/en-US/docs/Web/HTTP/CORS
– Dinesh Pandiyan
Nov 24 '18 at 6:47
1
No, he specifically says "scrape". If he is making the request from a python script (most common) there is no CORS preflight. And if he is making it from the browser context it will pass. Also, CORS can be easily disabled.
– pguardiario
Nov 24 '18 at 6:57
"Also, CORS can be easily disabled." Wait what? Even if the server is set to not allow CORS requests?
– Dinesh Pandiyan
Nov 24 '18 at 7:00
|
show 4 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53454907%2fhow-to-get-the-apis-from-a-single-page-application%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
There is no straight-forward answer to this question. But you can try one of these two methods.
Inspect the dev tools to get the API endpoints
If the site has docs, you can check if there are docs for their API
But mostly, you'll have problems accessing the API if the site is not open to exposing their API.
CORS prevention. They might not let you or anyone make API calls to their API. Only their site will be allowed to make API calls.
Tokens. If the tokens have a quicky expiry, you will have to find a way to obtain tokens often.
CORS is a browser security feature, so it almost never applies to scraping and definitely doesn't here.
– pguardiario
Nov 24 '18 at 6:42
@pguardiario The OP is not asking about scraping. The OP wants to get the data directly from the API instead of scraping. Also, CORS is a server configuration. Browsers by default disable CORS unless the server asks the browsers to allow-cors explicitly in the response headers.
– Dinesh Pandiyan
Nov 24 '18 at 6:46
You can read more about CORS here - developer.mozilla.org/en-US/docs/Web/HTTP/CORS
– Dinesh Pandiyan
Nov 24 '18 at 6:47
1
No, he specifically says "scrape". If he is making the request from a python script (most common) there is no CORS preflight. And if he is making it from the browser context it will pass. Also, CORS can be easily disabled.
– pguardiario
Nov 24 '18 at 6:57
"Also, CORS can be easily disabled." Wait what? Even if the server is set to not allow CORS requests?
– Dinesh Pandiyan
Nov 24 '18 at 7:00
|
show 4 more comments
There is no straight-forward answer to this question. But you can try one of these two methods.
Inspect the dev tools to get the API endpoints
If the site has docs, you can check if there are docs for their API
But mostly, you'll have problems accessing the API if the site is not open to exposing their API.
CORS prevention. They might not let you or anyone make API calls to their API. Only their site will be allowed to make API calls.
Tokens. If the tokens have a quicky expiry, you will have to find a way to obtain tokens often.
CORS is a browser security feature, so it almost never applies to scraping and definitely doesn't here.
– pguardiario
Nov 24 '18 at 6:42
@pguardiario The OP is not asking about scraping. The OP wants to get the data directly from the API instead of scraping. Also, CORS is a server configuration. Browsers by default disable CORS unless the server asks the browsers to allow-cors explicitly in the response headers.
– Dinesh Pandiyan
Nov 24 '18 at 6:46
You can read more about CORS here - developer.mozilla.org/en-US/docs/Web/HTTP/CORS
– Dinesh Pandiyan
Nov 24 '18 at 6:47
1
No, he specifically says "scrape". If he is making the request from a python script (most common) there is no CORS preflight. And if he is making it from the browser context it will pass. Also, CORS can be easily disabled.
– pguardiario
Nov 24 '18 at 6:57
"Also, CORS can be easily disabled." Wait what? Even if the server is set to not allow CORS requests?
– Dinesh Pandiyan
Nov 24 '18 at 7:00
|
show 4 more comments
There is no straight-forward answer to this question. But you can try one of these two methods.
Inspect the dev tools to get the API endpoints
If the site has docs, you can check if there are docs for their API
But mostly, you'll have problems accessing the API if the site is not open to exposing their API.
CORS prevention. They might not let you or anyone make API calls to their API. Only their site will be allowed to make API calls.
Tokens. If the tokens have a quicky expiry, you will have to find a way to obtain tokens often.
There is no straight-forward answer to this question. But you can try one of these two methods.
Inspect the dev tools to get the API endpoints
If the site has docs, you can check if there are docs for their API
But mostly, you'll have problems accessing the API if the site is not open to exposing their API.
CORS prevention. They might not let you or anyone make API calls to their API. Only their site will be allowed to make API calls.
Tokens. If the tokens have a quicky expiry, you will have to find a way to obtain tokens often.
edited Nov 24 '18 at 7:51
answered Nov 24 '18 at 6:26
Dinesh PandiyanDinesh Pandiyan
2,78011028
2,78011028
CORS is a browser security feature, so it almost never applies to scraping and definitely doesn't here.
– pguardiario
Nov 24 '18 at 6:42
@pguardiario The OP is not asking about scraping. The OP wants to get the data directly from the API instead of scraping. Also, CORS is a server configuration. Browsers by default disable CORS unless the server asks the browsers to allow-cors explicitly in the response headers.
– Dinesh Pandiyan
Nov 24 '18 at 6:46
You can read more about CORS here - developer.mozilla.org/en-US/docs/Web/HTTP/CORS
– Dinesh Pandiyan
Nov 24 '18 at 6:47
1
No, he specifically says "scrape". If he is making the request from a python script (most common) there is no CORS preflight. And if he is making it from the browser context it will pass. Also, CORS can be easily disabled.
– pguardiario
Nov 24 '18 at 6:57
"Also, CORS can be easily disabled." Wait what? Even if the server is set to not allow CORS requests?
– Dinesh Pandiyan
Nov 24 '18 at 7:00
|
show 4 more comments
CORS is a browser security feature, so it almost never applies to scraping and definitely doesn't here.
– pguardiario
Nov 24 '18 at 6:42
@pguardiario The OP is not asking about scraping. The OP wants to get the data directly from the API instead of scraping. Also, CORS is a server configuration. Browsers by default disable CORS unless the server asks the browsers to allow-cors explicitly in the response headers.
– Dinesh Pandiyan
Nov 24 '18 at 6:46
You can read more about CORS here - developer.mozilla.org/en-US/docs/Web/HTTP/CORS
– Dinesh Pandiyan
Nov 24 '18 at 6:47
1
No, he specifically says "scrape". If he is making the request from a python script (most common) there is no CORS preflight. And if he is making it from the browser context it will pass. Also, CORS can be easily disabled.
– pguardiario
Nov 24 '18 at 6:57
"Also, CORS can be easily disabled." Wait what? Even if the server is set to not allow CORS requests?
– Dinesh Pandiyan
Nov 24 '18 at 7:00
CORS is a browser security feature, so it almost never applies to scraping and definitely doesn't here.
– pguardiario
Nov 24 '18 at 6:42
CORS is a browser security feature, so it almost never applies to scraping and definitely doesn't here.
– pguardiario
Nov 24 '18 at 6:42
@pguardiario The OP is not asking about scraping. The OP wants to get the data directly from the API instead of scraping. Also, CORS is a server configuration. Browsers by default disable CORS unless the server asks the browsers to allow-cors explicitly in the response headers.
– Dinesh Pandiyan
Nov 24 '18 at 6:46
@pguardiario The OP is not asking about scraping. The OP wants to get the data directly from the API instead of scraping. Also, CORS is a server configuration. Browsers by default disable CORS unless the server asks the browsers to allow-cors explicitly in the response headers.
– Dinesh Pandiyan
Nov 24 '18 at 6:46
You can read more about CORS here - developer.mozilla.org/en-US/docs/Web/HTTP/CORS
– Dinesh Pandiyan
Nov 24 '18 at 6:47
You can read more about CORS here - developer.mozilla.org/en-US/docs/Web/HTTP/CORS
– Dinesh Pandiyan
Nov 24 '18 at 6:47
1
1
No, he specifically says "scrape". If he is making the request from a python script (most common) there is no CORS preflight. And if he is making it from the browser context it will pass. Also, CORS can be easily disabled.
– pguardiario
Nov 24 '18 at 6:57
No, he specifically says "scrape". If he is making the request from a python script (most common) there is no CORS preflight. And if he is making it from the browser context it will pass. Also, CORS can be easily disabled.
– pguardiario
Nov 24 '18 at 6:57
"Also, CORS can be easily disabled." Wait what? Even if the server is set to not allow CORS requests?
– Dinesh Pandiyan
Nov 24 '18 at 7:00
"Also, CORS can be easily disabled." Wait what? Even if the server is set to not allow CORS requests?
– Dinesh Pandiyan
Nov 24 '18 at 7:00
|
show 4 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53454907%2fhow-to-get-the-apis-from-a-single-page-application%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
well, you definitely can try to use the browser console to see the network calls, but it's unlikely that any good website with any basic amount of security will allow you to just make random calls against their server..... That's called Cross Site Request Forgery, and most even basic APIs prevent it.
– Claies
Nov 24 '18 at 3:32
I've tried the network tab but with no success. I was wondering if it's an open API, but maybe you're right. Thank you.
– timarcosdias
Nov 24 '18 at 3:39
@Claies - csrf tokens don't prevent you from making requests, it prevents other sites from doing so.
– pguardiario
Nov 24 '18 at 4:44
@pguardiario isn’t that exactly what this question is asking how to do?
– Claies
Nov 24 '18 at 4:46
1
If it isn't coming from a XHR (which you would see in the Network tab) then it's probably in a big json object in the html
– pguardiario
Nov 24 '18 at 4:48