HARPA.AI
LIBRARYAPIGUIDESAI COMMANDSBLOG

🧩  Facebook Post & Comments Extraction

Extracts the Facebook post and all its comments. Use this command on a post page that can be accessed by clicking the comments count button. #extraction

Created by Adrian Larsson
Updated on Nov 9, 2024 04:32
Installed 78 times
RUNS JS CODE

How to Use

IMPORT COMMAND

Content

- type: say
  message: >-
    📌 Extracts the Facebook post and all its comments. Use this command on a
    post page that can be accessed by clicking the comments count button.


    You can use this command or JS code as a base for creating other commands or
    automations.
- type: ask
  param: targetMessageCount
  message: How many comments would you like to extract?
  options:
    - label: 50 comments
      value: 50
    - label: 100 comments
      value: 100
    - label: 200 comments
      value: 200
    - $custom
  default: ''
  vision:
    enabled: false
    mode: area
    send: true
    hint: ''
  optionsInvalid: false
- type: js
  code: |
    async function scrollAndCollectMessages(targetMessageCount) {
      // Configuration object containing all DOM selectors
      const config = {
        postContent: '[role="dialog"] [data-ad-comet-preview="message"]',
        postAuthor: '[role="dialog"] [data-ad-rendering-role="profile_name"]',
        comment: '[role="dialog"] [role="article"]',
        commentContent: '[role="dialog"] [role="article"] div[style="text-align: start;"]',
        commentAuthor: '[role="dialog"] [role="article"] a[href*="/user/"] span',
        commentTime: '[role="dialog"] [role="article"] li a[role="link"]'
      }

      // Main function to collect comments with scrolling functionality
      async function scrollAndCollectComments() {
        // Initialize storage arrays and counters
        const comments = []
        const uniqueComments = new Set()
        let retries = 0
        const maxRetries = 5

        // Function to scroll down and wait for new content to load
        async function scrollAndWait() {
          const dialog = document.querySelector('[role=dialog]:not([aria-label])')
          const children = [...dialog.querySelectorAll('*')]
          const scroll = children.find(c => getComputedStyle(c).overflow === 'hidden auto')
          scroll.scrollTop = scroll.scrollHeight
          await new Promise(resolve => setTimeout(resolve, 1500))
        }

        // Function to scroll back to top
        function scrollToTop() {
          const dialog = document.querySelector('[role=dialog]:not([aria-label])')
          const children = [...dialog.querySelectorAll('*')]
          const scroll = children.find(c => getComputedStyle(c).overflow === 'hidden auto')
          scroll.scrollTop = 0
        }

        // Function to extract comments from current view
        function extractCurrentComments() {
          const commentElements = document.querySelectorAll(config.comment)
          let newCommentsFound = false

          // Process each comment element
          commentElements.forEach((comment) => {
            const authorElement = comment.querySelector('a[href*="/user/"] span')
            const timeElement = comment.querySelector('li a[role="link"]')
            const contentElements = comment.querySelectorAll('div[style="text-align: start;"]')
            
            if (authorElement && contentElements.length > 0) {
              const author = authorElement.innerText || 'Unknown'
              const time = timeElement?.innerText || ''
              const content = Array.from(contentElements)
                .map(el => el.innerText)
                .join(' ')

              const commentId = `${time}-${author}-${content}`

              if (!uniqueComments.has(commentId) && author && content) {
                const formattedComment = `${time}. ${author}: ${content}`
                uniqueComments.add(commentId)
                comments.push(formattedComment)
                newCommentsFound = true
              }
            }
          })

          return newCommentsFound
        }

        // Main collection loop - continue until target reached or max retries exceeded
        while (comments.length < targetMessageCount && retries < maxRetries) {
          const newCommentsFound = extractCurrentComments()

          if (comments.length < targetMessageCount) {
            await scrollAndWait()
            if (!newCommentsFound) {
              retries++
            } else {
              retries = 0
            }
          }
        }

        // Trim excess messages if needed
        if (comments.length > targetMessageCount) {
          comments.splice(targetMessageCount)
        }

        // Scroll back to top after collection
        scrollToTop()

        return comments
      }

      try {
        const result = {
          author: document.querySelector(config.postAuthor)?.innerText || '',
          originalPost: document.querySelector(config.postContent)?.innerText || '',
          comments: await scrollAndCollectComments()
        }

        return result

      } catch (error) {
        console.error('Error during data collection:', error)
        return null
      }
    }

    return scrollAndCollectMessages(targetMessageCount)
  param: array
  timeout: 15000
  args: targetMessageCount
  silent: true
- type: say
  message: |-
    **Data Array:**

    {{array}}
Notice: Please read before using

This automation command is created by a community member. HARPA AI team does not audit community commands.

Please review the command carefully and only install if you trust the creator.

Contact us
HomeUse CasesGuidesPrivacy PolicyTerms of Service
CAN WE STORE COOKIES?
Our website uses cookies for the purposes of accessibility and security. They also allow us to gather statistics in order to improve the website for you. More info: Privacy Policy