HARPA.AI
LIBRARYAPIGUIDESAI COMMANDSBLOG

🧬  Instagram Data Extraction v1

Extracts Instagram follower and following profiles, saving it as JSON with names and profile links. #instagram #json

Created by Bohdan Kovtunenko
Updated on Nov 11, 2024 08:11
Installed 161 times
RUNS JS CODESENDS HTTP REQUESTS

How to Use

IMPORT COMMAND

Content

- type: calc
  func: delete
  param: g.json
  label: DEL GLOBAL JSON
- type: ask
  param: input
  message: >-
    Send the Instagram profile as **@username** or **username**, the link as
    **https://www.instagram.com/username/**, or start processing this tab.
  options:
    - label: ✅ USE THIS TAB
      value: tab
    - $custom
  default: ''
  vision:
    enabled: false
    mode: area
    send: true
    hint: ''
  label: ASK FOR INPUT
  optionsInvalid: false
- type: group
  steps:
    - type: js
      args: url
      code: |-
        let regex = /instagram\.com\/[\w\.]+(?:\/.*)?$/;

        let isValid = regex.test(url);
        return isValid;
      param: linkCheck
      timeout: 15000
      silent: true
    - type: say
      message: >-
        ⛔️ The webpage is not suitable for processing. Please start on a new
        page or provide me with a username or link.
      condition: '{{linkCheck}} = false'
    - type: jump
      to: ASK FOR INPUT
      condition: '{{linkCheck}} = false'
    - code: |-
        let urlPattern = /^https:\/\/www\.instagram\.com\/([\w\.]+)/;

        let result = url.match(urlPattern)[1]; 

        return result;
      param: username
      type: js
      args: url
      timeout: 15000
      silent: true
    - to: EXTRACT STATS
      type: jump
  label: CHECK CURRENT TAB
  condition: '{{input}} = tab'
- steps:
    - code: |-
        let urlPattern = /^https:\/\/www\.instagram\.com\/([\w\.]+)/;
        let result = input; // Default to the original value

        if (input.startsWith('@')) {
            result = input.slice(1); // Remove '@' symbol if it exists
        } else if (urlPattern.test(result)) {
            result = input.match(urlPattern)[1]; // Extract username from the URL
        }

        console.log(result);
        return result;
      type: js
      args: input
      param: username
      timeout: 15000
      silent: true
  label: EXTRACT USERNAME
  condition: '{{input-option}} = $custom'
  type: group
- steps:
    - message: ⏳ Navigating to **@{{username}}**'s profile.
      type: say
    - type: navigate
      url: https://www.instagram.com/{{username}}/
      waitForIdle: false
      silent: true
    - type: wait
      for: 2s
      silent: true
    - code: |-
        let text = document.querySelector('header section ul').innerText;
        let formattedText = text.replace(/\n/g, ', '); 
        return formattedText;
      param: stats
      type: js
      args: ''
      timeout: 15000
      silent: true
      label: EXTRACT STATS
  label: NAVIGATE
  type: group
- param: count
  message: >-
    **@{{username}}** has {{stats}}. 


    How many followers/following profiles should be extracted? Pick an option or
    enter a number.
  options:
    - label: ♻️ ALL
      value: 99999999999
    - label: 50
      value: 50
    - label: 100
      value: 100
    - label: 250
      value: 250
    - label: 500
      value: 500
    - label: 1k
      value: 1000
    - label: 5k
      value: 5000
    - label: 10k
      value: 10000
    - $custom
  vision:
    enabled: false
    mode: area
    hint: ''
    send: true
  type: ask
  default: ''
  optionsInvalid: false
- param: option
  options:
    - label: ✅ EXTRACT BOTH
      value: both
    - label: ➡️ ONLY FOLLOWERS
      value: followers
    - label: ⬅️ ONLY FOLLOWING
      value: following
  vision:
    enabled: false
    mode: area
    hint: ''
    send: true
  type: ask
  message: ''
  default: ''
  optionsInvalid: false
- steps:
    - steps:
        - message: ⏳ Extracting followers' profiles, wait a bit.
          type: say
        - url: https://www.instagram.com/{{username}}/followers/
          type: navigate
          waitForIdle: false
          silent: true
        - type: control
          action: show
        - for: idle
          timeout: 3500
          type: wait
          silent: true
        - code: |-
            let linkOpened = !!document.querySelector('div[role="dialog"]');

            return linkOpened;
          param: linkOpened
          label: CHECK IF OPENED
          type: js
          args: ''
          timeout: 15000
          silent: true
        - code: >-
            document.querySelector('header section ul li:nth-child(2)
            a').click();
          condition: '{{linkOpened}} = false'
          label: CLICK IF FAILED
          type: js
          args: linkOpened
          param: ''
          timeout: 15000
          silent: true
        - type: wait
          for: 2s
          silent: true
          condition: '{{linkOpened}} = false'
        - code: |-
            let openData = !!document.querySelector('[placeholder="Search"]');

            return openData;
          param: openData
          label: CHECK IF HIDDEN
          type: js
          args: ''
          timeout: 15000
          silent: true
        - message: 💡 Followers info hidden. Extracting what's accessible...
          condition: '{{openData}} = false'
          type: say
        - code: |-
            async function extractDataWithScroll(count) {
              // Find the dialog element
              const dialog = document.querySelector('div[role="dialog"]');
              if (!dialog) {
                console.error('Dialog not found');
                return null;
              }

              const results = [];
              const uniqueUsernames = new Set();
              let retries = 0;
              const maxRetries = 10; // Limit the number of scrolling attempts

              // Function to scroll and wait for new content to load
              async function scrollAndWait() {
                const scrollableElement = document.querySelector('div[role="dialog"] div:has(> div > input) ~ div:last-child');
                if (scrollableElement) {
                  console.log('Scrolling to load more users...');
                  scrollableElement.scrollBy(0, 1000); // Scroll down by 1000 pixels
                  await new Promise(resolve => setTimeout(resolve, 3000)); // Wait for 1 second for new elements to load
                } else {
                  console.error('Error: Unable to find the scrollable element.');
                }
              }

              // Function to extract user data from the current view
              async function extractUsers() {
                const userElements = dialog.querySelectorAll('div[role="dialog"] div[style*="flex-direction: column"] > div > div > div:nth-child(1)');
                let newUsersFound = false;

                for (const userElement of userElements) {
                  // Find the link element containing the username
                  const linkElement = userElement.querySelector('a[href^="/"][role="link"]');
                  if (!linkElement) continue;

                  // Extract username from the link
                  const link = linkElement.getAttribute('href');
                  const username = link.slice(1).replace('/', '');

                  // Skip if this username has already been processed
                  if (uniqueUsernames.has(username)) continue;

                  // Try to find the full name
                  let fullNameElement = linkElement.nextElementSibling;
                  let fullName = fullNameElement && fullNameElement.tagName === 'SPAN' 
                    ? fullNameElement.textContent.trim() 
                    : '';

                  // If full name not found, search in other span elements
                  if (!fullName) {
                    const spans = userElement.querySelectorAll('span');
                    for (const span of spans) {
                      const text = span.textContent.trim();
                      if (text && text !== username && text !== 'Search') {
                        fullName = text;
                        break;
                      }
                    }
                  }

                  // If both username and full name are found, add to results
                  if (username && fullName) {
                    uniqueUsernames.add(username);
                    results.push({
                      username: username,
                      fullName: fullName,
                      link: `https://www.instagram.com/${username}`
                    });
                    newUsersFound = true;
                  }
                }

                return newUsersFound;
              }

              // Main loop: extract users and scroll until target count is reached or max retries hit
              while (results.length < count && retries < maxRetries) {
                const newUsersFound = await extractUsers();
                console.log(`Collected ${results.length} users so far.`);

                if (results.length < count) {
                  await scrollAndWait();
                  retries++;
                }

                if (!newUsersFound) {
                  retries++;
                } else {
                  retries = 0; // Reset the retry counter if new users were found
                }
              }

              console.log(`Total users found: ${results.length}`);
              
              // Trim the results array to the requested count
              if (results.length > count) {
                console.log(`Trimming results from ${results.length} to ${count}`);
                results.splice(count);
              }
              
              // Return the results as a JSON string, or undefined if no results
              return results.length > 0 ? JSON.stringify(results, null, 2) : undefined;
            }

            // Function call and result output
            extractDataWithScroll(count).then(data => {
              console.log(data ? data : "No data found");
            });

            // Return the promise from the function call
            return extractDataWithScroll(count);
          timeout: 150000
          type: js
          args: count
          param: followers
          silent: true
        - func: extract-json
          index: all
          type: calc
          to: followers
          param: followers
        - func: list-add
          list: instagram.followers
          type: calc
          index: ''
          item: followers
        - message: ✅ Extracted {{followers.length}} followers' profiles.
          type: say
      label: FOLLOWERS
      condition: '{{option}} != following'
      type: group
    - steps:
        - message: ⏳ Extracting following profiles, wait a bit.
          type: say
        - url: https://www.instagram.com/{{username}}/following/
          type: navigate
          waitForIdle: false
          silent: true
        - type: control
          action: show
        - type: wait
          for: idle
          silent: true
          timeout: 3500
        - type: js
          args: ''
          code: |-
            let linkOpened = !!document.querySelector('div[role="dialog"]');

            return linkOpened;
          param: linkOpened
          timeout: 15000
          label: CHECK IF OPENED
          silent: true
        - code: >-
            document.querySelector('header section ul li:nth-child(3)
            a').click();
          type: js
          args: linkOpened
          param: ''
          timeout: 15000
          condition: '{{linkOpened}} = false'
          silent: true
          label: CLICK IF FAILED
        - type: wait
          for: 2s
          silent: true
          condition: '{{linkOpened}} = false'
        - type: js
          args: ''
          code: |-
            let openData = !!document.querySelector('[placeholder="Search"]');

            return openData;
          param: openData
          timeout: 15000
          label: CHECK IF HIDDEN
          silent: true
        - message: 💡 Following profiles hidden. Extracting what's accessible...
          type: say
          condition: '{{openData}} = false'
        - type: js
          args: count
          code: |-
            async function extractDataWithScroll(count) {
              // Find the dialog element
              const dialog = document.querySelector('div[role="dialog"]');
              if (!dialog) {
                console.error('Dialog not found');
                return null;
              }

              const results = [];
              const uniqueUsernames = new Set();
              let retries = 0;
              const maxRetries = 10; // Limit the number of scrolling attempts

              // Function to scroll and wait for new content to load
              async function scrollAndWait() {
                const scrollableElement = document.querySelector('div[role="dialog"] div:has(> div > input) ~ div:last-child');
                if (scrollableElement) {
                  console.log('Scrolling to load more users...');
                  scrollableElement.scrollBy(0, 1000); // Scroll down by 1000 pixels
                  await new Promise(resolve => setTimeout(resolve, 3000)); // Wait for 1 second for new elements to load
                } else {
                  console.error('Error: Unable to find the scrollable element.');
                }
              }

              // Function to extract user data from the current view
              async function extractUsers() {
                const userElements = dialog.querySelectorAll('div[role="dialog"] div[style*="flex-direction: column"] > div > div > div:nth-child(1)');
                let newUsersFound = false;

                for (const userElement of userElements) {
                  // Find the link element containing the username
                  const linkElement = userElement.querySelector('a[href^="/"][role="link"]');
                  if (!linkElement) continue;

                  // Extract username from the link
                  const link = linkElement.getAttribute('href');
                  const username = link.slice(1).replace('/', '');

                  // Skip if this username has already been processed
                  if (uniqueUsernames.has(username)) continue;

                  // Try to find the full name
                  let fullNameElement = linkElement.nextElementSibling;
                  let fullName = fullNameElement && fullNameElement.tagName === 'SPAN' 
                    ? fullNameElement.textContent.trim() 
                    : '';

                  // If full name not found, search in other span elements
                  if (!fullName) {
                    const spans = userElement.querySelectorAll('span');
                    for (const span of spans) {
                      const text = span.textContent.trim();
                      if (text && text !== username && text !== 'Search') {
                        fullName = text;
                        break;
                      }
                    }
                  }

                  // If both username and full name are found, add to results
                  if (username && fullName) {
                    uniqueUsernames.add(username);
                    results.push({
                      username: username,
                      fullName: fullName,
                      link: `https://www.instagram.com/${username}`
                    });
                    newUsersFound = true;
                  }
                }

                return newUsersFound;
              }

              // Main loop: extract users and scroll until target count is reached or max retries hit
              while (results.length < count && retries < maxRetries) {
                const newUsersFound = await extractUsers();
                console.log(`Collected ${results.length} users so far.`);

                if (results.length < count) {
                  await scrollAndWait();
                  retries++;
                }

                if (!newUsersFound) {
                  retries++;
                } else {
                  retries = 0; // Reset the retry counter if new users were found
                }
              }

              console.log(`Total users found: ${results.length}`);
              
              // Trim the results array to the requested count
              if (results.length > count) {
                console.log(`Trimming results from ${results.length} to ${count}`);
                results.splice(count);
              }
              
              // Return the results as a JSON string, or undefined if no results
              return results.length > 0 ? JSON.stringify(results, null, 2) : undefined;
            }

            // Function call and result output
            extractDataWithScroll(count).then(data => {
              console.log(data ? data : "No data found");
            });

            // Return the promise from the function call
            return extractDataWithScroll(count);
          param: following
          timeout: 150000
          silent: true
        - type: calc
          func: extract-json
          to: following
          param: following
          index: all
        - list: instagram.following
          type: calc
          func: list-add
          index: ''
          item: following
        - message: ✅ Extracted {{following.length}} following profiles.
          type: say
      label: FOLLOWING
      condition: '{{option}} != followers'
      type: group
  label: DATA EXTRACTION
  type: group
- func: clone
  from: instagram
  type: calc
  to: g.json
- type: navigate
  url: https://www.instagram.com/{{username}}/
  waitForIdle: false
  silent: true
- type: control
  action: show
- message: JSON is stored in **{{g.json}}** and can be used in other commands.
  type: say
  interpolate: false
- param: final
  options:
    - label: 📦 EXPORT
      value: export
    - label: 🔗 SEND VIA WEBHOOK
      value: webhook
    - label: ✅ DONE
      value: done
  vision:
    enabled: false
    mode: area
    hint: ''
    send: true
  type: ask
  message: ''
  default: ''
  optionsInvalid: false
- what: param
  condition: '{{final}} = export'
  type: export
  param: instagram
- steps:
    - message: 'Please provide the Webhook URL:'
      options: null
      vision:
        enabled: false
        mode: area
        hint: ''
        send: true
      type: ask
      param: webhook
      default: ''
    - type: request
      url: '{{webhook}}'
      method: auto
      body: '{{instagram}}'
      auth:
        enabled: false
        username: ''
        password: ''
      headers: null
      param: ''
      bodyInvalid: true
  label: WEBHOOK
  condition: '{{final}} = webhook'
  type: group
Notice: Please read before using

This automation command is created by a community member. HARPA AI team does not audit community commands.

Please review the command carefully and only install if you trust the creator.

Contact us
HomeUse CasesGuidesPrivacy PolicyTerms of Service
CAN WE STORE COOKIES?
Our website uses cookies for the purposes of accessibility and security. They also allow us to gather statistics in order to improve the website for you. More info: Privacy Policy