HARPA.AI
LIBRARYAPIGUIDESAI COMMANDSBLOG

๐Ÿ“ฆย ย Post & Comments Extraction

Extracts discussions from Reddit, Facebook, Twitter, Telegram, WhatsApp, or Discord, as well as comments from YouTube, into a JSON file to send via a webhook or use in HARPA.

Created by HARPA AI
Updated on Nov 11, 2024 09:04
Installed 89 times
RUNS JS CODESENDS HTTP REQUESTS

How to Use

IMPORT COMMAND

Content

- type: calc
  func: delete
  param: g.thread
- type: group
  steps:
    - type: say
      message: >-
        ๐Ÿ“Œ It seems that the content on the current page is not supported by
        this command. Please go to a page where I can extract user messages,
        e.g.:


        ```text

        https://www.facebook.com/...

        https://youtube.com/watch?v=dQ11w22gX33

        https://x.com/elonmusk/status/1234567890

        https://www.reddit.com/r/programming/comments/abc123

        https://web.telegram.org/a/#123456789

        https://discord.com/channels/123456789/987654321

        https://web.whatsapp.com/

        ```
    - type: ask
      param: option
      message: ''
      options:
        - label: โœ… OK
          value: done
        - label: ๐Ÿ“ฆ BULK DATA EXTRACTION
          value: bulk
      vision:
        enabled: false
        mode: area
        send: true
        hint: ''
      default: ''
      optionsInvalid: false
    - type: stop
      condition: '{{option}} = done'
    - type: jump
      to: ASK URLS BULK DATA EXTRACTION
      condition: '{{option}} = bulk'
  condition: '{{thread exists}} ='
  label: INCORRECT LINK
- param: targetMessageCount
  message: I'll extract up to 100 messages, if needed - specify a different number.
  options:
    - label: โœ… EXTRACT UP TO 100 MESSAGES
      value: 100
    - label: ๐Ÿ“ฆ BULK DATA EXTRACTION
      value: bulk
    - $custom
  vision:
    enabled: false
    mode: area
    send: true
    hint: ''
  condition: '{{thread exists}}'
  type: ask
  default: ''
  optionsInvalid: false
- steps:
    - param: urls
      message: >-
        Please paste a list of links to specific posts, tweets, dialogues, or
        discussions, separated by commas or spaces, for example:



        ```text

        https://youtube.com/watch?v=dQ11w22gX33

        https://x.com/elonmusk/status/1234567890

        https://www.reddit.com/r/programming/comments/abc123

        https://web.telegram.org/a/#123456789

        https://discord.com/channels/123456789/987654321

        ```
      options: null
      vision:
        enabled: false
        mode: area
        hint: ''
        send: true
      type: ask
      default: ''
      label: ASK URLS BULK DATA EXTRACTION
    - type: js
      code: |-
        let links = [];
        const linkRegex = /(https?:\/\/\S+?)(?=[,;\n\s\]]|$)/gi;

        try {
          // any data to string
          const urlString = String(urls);
          // split string
          const urlParts = urlString.split(/[,;\n\s]+/);
          urlParts.forEach(part => {
            const matchedLinks = part.match(linkRegex) || [];
            links = links.concat(matchedLinks);
          });

          // delete "
          links = links.map(link => link.replace(/^"|"$/g, ''));

          // delete duplicates
          links = [...new Set(links.filter(Boolean))];

        } catch (error) {
            return false;
        }

        return links;
      param: links
      timeout: 15000
      args: urls
      silent: true
    - options:
        - label: โœ… START
          value: 100
        - $custom
      vision:
        enabled: false
        mode: area
        hint: ''
        send: true
      type: ask
      param: targetMessageCount
      message: I'll extract up to 100 messages, if needed - specify a different number.
      default: ''
      optionsInvalid: false
    - type: loop
      steps:
        - message: |-
            โœด๏ธ **Invalid URL format:** {{item}}

            **Skipped.**
          condition: >-
            {{item}} =~
            ^(?!https:\/\/(?:(?:www\.)?facebook\.com\/|(?:www\.)?youtube\.com\/watch\?v=|x\.com\/[^\/]+\/status\/|www\.reddit\.com\/r\/[^\/]+\/comments\/|web\.telegram\.org\/a\/#|discord\.com\/channels\/|web\.whatsapp\.com\/))https:\/\/.*$
          type: say
        - to: TASK COMPLETED
          type: jump
          condition: >-
            {{item}} =~
            ^(?!https:\/\/(?:(?:www\.)?facebook\.com\/|(?:www\.)?youtube\.com\/watch\?v=|x\.com\/[^\/]+\/status\/|www\.reddit\.com\/r\/[^\/]+\/comments\/|web\.telegram\.org\/a\/#|discord\.com\/channels\/|web\.whatsapp\.com\/))https:\/\/.*$
        - type: navigate
          url: '{{item}}'
          condition:
            - '{{item}} =~ ^https:\/\/(?:www\.)?facebook\.com\/.*'
            - '{{item}} =~ ^https:\/\/(?:www\.)?youtube\.com\/watch\?v=.*'
            - '{{item}} =~ ^https:\/\/x\.com\/[^\/]+\/status\/.*'
            - '{{item}} =~ ^https:\/\/www\.reddit\.com\/r\/[^\/]+\/comments\/.*'
            - '{{item}} =~ ^https:\/\/web\.telegram\.org\/a\/#.*$'
            - '{{item}} =~ ^https:\/\/discord\.com\/channels\/.*'
            - '{{item}} =~ ^https:\/\/web\.whatsapp\.com\/.*'
          waitForIdle: false
          silent: false
        - param: data
          type: calc
          func: delete
        - type: control
          action: show
        - type: wait
          for: idle
          timeout: 10000
          silent: true
        - func: set
          value: '{{thread {{targetMessageCount}}}}'
          type: calc
          param: data
          format: ''
        - func: extract-json
          index: all
          type: calc
          to: data
          param: data
        - func: list-add
          index: last
          type: calc
          list: g.thread
          item: data
        - steps: []
          type: group
          label: TASK COMPLETED
      list: links
    - type: control
      action: show
    - to: SAY RESULT
      type: jump
  condition: '{{targetMessageCount}} = bulk'
  label: BULK DATA EXTRACTION
  type: group
- format: auto
  type: calc
  func: set
  param: g.thread
  value: '{{thread {{targetMessageCount}}}}'
- type: calc
  func: extract-json
  param: g.thread
  to: g.thread
  index: all
- message: '{{g.thread}}'
  type: say
  label: SAY RESULT
- message: >-
    โ˜‘๏ธ All content and messages extracted and saved in **{{g.thread}}** for use
    in other commands.
  type: say
  interpolate: false
- param: final
  options:
    - label: ๐Ÿ“ฆ EXPORT
      value: export
    - label: ๐Ÿ”— SEND VIA WEBHOOK
      value: webhook
    - label: โœ… DONE
      value: done
  vision:
    enabled: false
    mode: area
    hint: ''
    send: true
  type: ask
  message: ''
  default: ''
  optionsInvalid: false
- what: param
  condition: '{{final}} = export'
  type: export
  param: g.thread
  filename: ''
- condition: '{{final}} = done'
  type: stop
- steps:
    - message: 'Please provide the Webhook URL:'
      vision:
        enabled: false
        mode: area
        hint: ''
        send: true
      type: ask
      param: webhook
      options: null
      default: ''
    - type: request
      url: '{{webhook}}'
      auth:
        enabled: false
        username: ''
        password: ''
      method: auto
      headers: null
      body: '{{g.thread}}'
      param: ''
      bodyInvalid: true
  condition: '{{final}} = webhook'
  label: WEBHOOK
  type: group
Contact us
HomeUse CasesGuidesPrivacy PolicyTerms of Service
CAN WE STORE COOKIES?
Our website uses cookies for the purposes of accessibility and security. They also allow us to gather statistics in order to improve the website for you. More info: Privacy Policy